How do I split a string without strtok?
Asked Answered
V

3

2

I'm a beginner at C, please don't bash me. So, I have this function that gets the mask of an "ip/mask" type of string:

char *getmask(char n[]) {
    char x[255] = { 0 };
    strcpy(x, n);
    char *mask;
    mask = strtok(x, "/");
    mask = strtok(NULL, "/");
    return mask;
}

The problem is that I need to do that on multiple "ip/mask" that are all in one string. So when I do this:

net = strtok(x4, " ");
net = strtok(NULL, " ");
while (net != NULL) {
    net = strtok(NULL, " ");
    strcpy(masca, "\n");
    strcpy(masca, getmask(net));
    //some other code
}

The problem is that strtok() malfunctions since I call it at first in the while but then it gets called again in getmask(net).

Anyway to bypass it? And if not, how else can I split a string?

Vaudois answered 31/10, 2016 at 1:39 Comment(5)
Are you starting with net=strtok(NULL," ")? The problem description is confusing. Show a minimum program with input and output.Holmun
Sorry , the first strtok is before the while. I will editVaudois
strtok operates as a state-machine, so each call carries state. In order to split a series of space-delimited tokens and a subset of / delimited tokens, you will need to prevent the states from colliding. This can be done by iterating twice, or as mentioned by @LudonopeActium
@Actium so I'm guessing I could just not use the getmask function and instead I could first strtok(NULL," ") , strtok(NULL,"/") to get the ip and then strtok(NULL,"/") again which would return me the mask ?Vaudois
Basically once you call strtok, it will continue on the last string you assigned it, until it reaches the end of that string. So until you finish the first string, you cannot 'feed' it new strings. It's unsafe to start a new string halfway through because strtok will add NULL characters to the original string. There are better options; strtok_r is the most obvious, but you can also use functions like strpbrk to achieve the same behaviour (but with a bit more work).Actium
A
2

Use strtok_r(). It's the same behaviour as strtok, but allow you to work with multiple strings "simultaneously".

char *strtok_r(char *str, const char *delim, char **saveptr);

The strtok_r() function is a reentrant version strtok(). The saveptr argument is a pointer to a char * variable that is used internally by strtok_r() in order to maintain context between successive calls that parse the same string.

On the first call to strtok_r(), str should point to the string to be parsed, and the value of saveptr is ignored. In subsequent calls, str should be NULL, and saveptr should be unchanged since the previous call.

Different strings may be parsed concurrently using sequences of calls to strtok_r() that specify different saveptr arguments.

Source: Linux man strtok_r

Aleppo answered 31/10, 2016 at 1:44 Comment(2)
I'm afraid using strtok_r(), which might not be available on the OP's system, will not fix the blatant flaw in getmask(): returning a pointer into the local array x leads to undefined behavior.Caitlin
Try to code a similar function, it require some string manipulation, but that's not so hard :)Aleppo
C
0

Your function getmask() invokes undefined behavior:

  • you copy the string argument into a local array x;
  • you parse that with strtok() which returns a pointer into the same local array x.
  • you return this pointer mask to the caller. This pointer becomes invalid as soon as you exit this function.

You should instead return a pointer to an allocated copy of this array so it remains valid after you return to the caller.

Futhermore, you should indeed avoid using strtok() because it is not reentrant: as you noticed, you cannot implement nested parsers with this function.

Other functions are available for parsing strings:

  • strchr() locates a character in a string ;
  • strstr() locates a substring in a string ;
  • strspn() matches a set of characters at the beginning of a string ;
  • strcspn() matches the complement of a set of characters at the beginning of a string ;
  • you can also parse the string by hand, testing characters in a loop.

Here is an example:

#include <stdlib.h>
#include <string.h>

char *getmask(const char *s) {
    /* allocate a copy of the mask part
     *  the mask starts after the first '/'
     *  and stops at the first space or another '/'
     */
    len - 0;
    s = strchr(s, '/');
    if (s != NULL) {
        s++;
        len = strcspn(s, "/ \t\r\n");
    }
    /* allocate space for the mask string */
    char *mask = malloc(len + 1);
    if (mask != NULL) {
        /* copy the mask string */
        memcpy(mask, s, len);
        mask[len] = '\0';
    }
    return mask;
}

The function is cumbersome but very precise. It behaves almost the same as what you intended with strtok(), the only difference is the handling of multiple consecutive / bytes which strtok() would skip and strchr() does not.

Here is an alternative with sscanf():

#include <stdlib.h>
#include <string.h>

char *getmask(const char *s) {
    char mask[256];
    /* skip characters different from /, then skip slashes, then
     * copy characters until another / or whitespace
     */
    if (sscanf(s, "%*[^/]%*[/]%255[^/ \t\n]", mask) != 1) {
        *mask = '\0';
    }
    return strdup(mask);  /* POSIX function to allocate a copy of a string */
}

It is much simpler, but fails if the string starts with /.

strdup() is a very useful function to allocate a copy of a string. It is available on POSIX compliant systems. It you do not have it, it can be easily implemented as:

#include <stdlib.h>
#include <string.h>

char *strdup(const char *s) {
    char *p = malloc(strlen(s));
    if (p != NULL) {
        strcpy(p, s);
    }
    return p;
}

Strings allocated by getmask() should be freed by free() when no longer needed.

You can use similar methods to parse your input string into ip/mask address pairs before calling getmask().

You could also provide a destination buffer to getmask() to avoid the complexities of memory management:

char *getmask(char *dest, size_t size, const char *s) {
    if (dest != NULL && size > 0) {
        char mask[256];
        /* skip characters different from /, then skip slashes, then
         * copy characters until another / or whitespace
         * dest cannot be used directly because size cannot be passed
         * sscanf easily
         */
        *dest = '\0';
        if (sscanf(s, "%*[^/]%*[/]%255[^/ \t\n]", mask) != 1) {
            strncat(dest, mask, size - 1);
        }
    }
    return dest;
}

Parsing is tricky because you must be careful to handle all cases. The specifications are usually not precise enough so implementors must make choices for special cases. The tools provided by the C library for parsing are old and clunky, especially strtok(), sscanf(). Be careful when using these, even experienced programmers get bitten by their side effects and shortcomings.

Caitlin answered 31/10, 2016 at 11:27 Comment(0)
C
0

strtok() cannot be used on comma separated strings if there are null values there e.g. "one,two,,four,five,,,eight".

Using strtok(token,",") will return "one", "two","four"... where you might have thought an null string "" would be returned after"two". This caught me out when trying to parse the output from a GPS receiver.

Instead use strsep() which is thread safe like strtok_r() but will return zero-length strings i.e. "one","two","","four"... in the example earlier.

strsep() might not be in some runtime libraries, but you can find implementations by searching online.

Cnidus answered 19/4, 2019 at 17:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.