Mimic Python's strip() function in C
Asked Answered
D

5

7

I started on a little toy project in C lately and have been scratching my head over the best way to mimic the strip() functionality that is part of the python string objects.

Reading around for fscanf or sscanf says that the string is processed upto the first whitespace that is encountered.

fgets doesn't help either as I still have newlines sticking around. I did try a strchr() to search for a whitespace and setting the returned pointer to '\0' explicitly but that doesn't seem to work.

Distributary answered 28/9, 2009 at 17:41 Comment(0)
M
12

There is no standard C implementation for a strip() or trim() function. That said, here's the one included in the Linux kernel:

char *strstrip(char *s)
{
        size_t size;
        char *end;

        size = strlen(s);

        if (!size)
                return s;

        end = s + size - 1;
        while (end >= s && isspace(*end))
                end--;
        *(end + 1) = '\0';

        while (*s && isspace(*s))
                s++;

        return s;
}
Merchant answered 28/9, 2009 at 17:51 Comment(2)
Of course, to use the code presented here, the project must be GPLv2 and no later, since that's what the Linux kernel uses.Porch
great.. this seems to be perfect.. Thanks :)Distributary
F
14

Python strings' strip method removes both trailing and leading whitespace. The two halves of the problem are very different when working on a C "string" (array of char, \0 terminated).

For trailing whitespace: set a pointer (or equivalently index) to the existing trailing \0. Keep decrementing the pointer until it hits against the start-of-string, or any non-white character; set the \0 to right after this terminate-backwards-scan point.

For leading whitespace: set a pointer (or equivalently index) to the start of string; keep incrementing the pointer until it hits a non-white character (possibly the trailing \0); memmove the rest-of-string so that the first non-white goes to the start of string (and similarly for everything following).

Firebug answered 28/9, 2009 at 17:48 Comment(3)
seems reasonable. you memmove() instead of strncpy() because python strings are buffers, and may contain '\0' characters?Fixity
@Matt exactly: a trailing \0 is guaranteed, but there might be others "inside" the Python byte strings.Firebug
memmove the rest-of-string so that the first non-white goes to the start of string (and similarly for everything following). Be careful. That is not guaranteed to work if a C strip() method is passed a string literal. In fact, it will probably fail with a memory access violation, type depending on OS.Diffractive
M
12

There is no standard C implementation for a strip() or trim() function. That said, here's the one included in the Linux kernel:

char *strstrip(char *s)
{
        size_t size;
        char *end;

        size = strlen(s);

        if (!size)
                return s;

        end = s + size - 1;
        while (end >= s && isspace(*end))
                end--;
        *(end + 1) = '\0';

        while (*s && isspace(*s))
                s++;

        return s;
}
Merchant answered 28/9, 2009 at 17:51 Comment(2)
Of course, to use the code presented here, the project must be GPLv2 and no later, since that's what the Linux kernel uses.Porch
great.. this seems to be perfect.. Thanks :)Distributary
H
1

If you want to remove, in place, the final newline on a line, you can use this snippet:

size_t s = strlen(buf);
if (s && (buf[s-1] == '\n')) buf[--s] = 0;

To faithfully mimic Python's str.strip([chars]) method (the way I interpreted its workings), you need to allocate space for a new string, fill the new string and return it. After that, when you no longer need the stripped string you need to free the memory it used to have no memory leaks.

Or you can use C pointers and modify the initial string and achieve a similar result.
Suppose your initial string is "____forty two____\n" and you want to strip all underscores and the '\n'

____forty two___\n
^ ptr

If you change ptr to the 'f' and replace the first '_' after two with a '\0' the result is the same as Python's "____forty two____\n".strip("_\n");

____forty two\0___\n
    ^ptr

Again, this is not the same as Python. The string is modified in place, there's no 2nd string and you cannot revert the changes (the original string is lost).

Hegemony answered 28/9, 2009 at 18:7 Comment(0)
M
1

I wrote C code to implement this function. I also wrote a few trivial tests to make sure my function does sensible things.

This function writes to a buffer you provide, and should never write past the end of the buffer, so it should not be prone to buffer overflow security issues.

Note: only Test() uses stdio.h, so if you just need the function, you only need to include ctype.h (for isspace()) and string.h (for strlen()).

// strstrip.c -- implement white space stripping for a string in C
//
// This code is released into the public domain.
//
// You may use it for any purpose whatsoever, and you don't need to advertise
// where you got it, but you aren't allowed to sue me for giving you free
// code; all the risk of using this is yours.



#include <ctype.h>
#include <stdio.h>
#include <string.h>



// strstrip() -- strip leading and trailing white space from a string
//
// Copies from sIn to sOut, writing at most lenOut characters.
//
// Returns number of characters in returned string, or -1 on an error.
// If you get -1 back, then nothing was written to sOut at all.

int
strstrip(char *sOut, unsigned int lenOut, char const *sIn)
{
    char const *pStart, *pEnd;
    unsigned int len;
    char *pOut;

    // if there is no room for any output, or a null pointer, return error!
    if (0 == lenOut || !sIn || !sOut)
        return -1;

    pStart = sIn;
    pEnd = sIn + strlen(sIn) - 1;

    // skip any leading whitespace
    while (*pStart && isspace(*pStart))
        ++pStart;

    // skip any trailing whitespace
    while (pEnd >= sIn && isspace(*pEnd))
        --pEnd;

    pOut = sOut;
    len = 0;

    // copy into output buffer
    while (pStart <= pEnd && len < lenOut - 1)
    {
        *pOut++ = *pStart++;
        ++len;
    }


    // ensure output buffer is properly terminated
    *pOut = '\0';
    return len;
}


void
Test(const char *s)
{
    int len;
    char buf[1024];

    len = strstrip(buf, sizeof(buf), s);

    if (!s)
        s = "**null**";  // don't ask printf to print a null string
    if (-1 == len)
        *buf = '\0';  // don't ask printf to print garbage from buf

    printf("Input: \"%s\"  Result: \"%s\" (%d chars)\n", s, buf, len);
}


main()
{
    Test(NULL);
    Test("");
    Test(" ");
    Test("    ");
    Test("x");
    Test("  x");
    Test("  x   ");
    Test("  x y z   ");
    Test("x y z");
}
Mustache answered 28/9, 2009 at 21:13 Comment(0)
T
0

This potential ‘solution' is by no means as complete or thorough as others have presented. This is for my own toy project in C - a text-based adventure game that I’m working on with my 14-year old son. If you’re using fgets() then strcspn() may just work for you as well. The sample code below is the beginning of an interactive console-based loop.

#include <stdio.h>
#include <string.h> // for strcspn()

int main(void)
{
    char input[64];
    puts("Press <q> to exit..");
    do {
        
        printf("> ");
        fgets(input,64,stdin); // fgets() captures '\n'
        input[strcspn(input, "\n")] = 0; // replaces '\n' with 0 
        if (input[0] == '\0') continue; 
        printf("You entered '%s'\n", input);
        
    } while (strcmp(input,"q")!= 0); // returns 0 (false) when input = "q"

    puts("Goodbye!");
    return 0;
}
Tightfisted answered 2/1, 2023 at 19:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.