How does the strtok function in C work? [duplicate]
Asked Answered
B

4

28

I found this sample program which explains the strtok function:

#include <stdio.h>
#include <string.h>

int main ()
{
    char str[] ="- This, a sample string.";
    char * pch;
    printf ("Splitting string \"%s\" into tokens:\n",str);
    pch = strtok (str," ,.-");
    while (pch != NULL)
    {
        printf ("%s\n",pch);
        pch = strtok (NULL, " ,.-");
    }
    return 0;
}

However, I don't see how this is possible to work.

How is it possible that pch = strtok (NULL, " ,.-"); returns a new token. I mean, we are calling strtokwith NULL . This doesen't make a lot sense to me.

Berty answered 13/1, 2014 at 17:11 Comment(4)
en.cppreference.com/w/c/string/byte/strtokTiepolo
"I found this sample program which explains the strtok function it's not the example that explains, but the documentation, so you might like to read here: man7.org/linux/man-pages/man3/strtok.3.htmlDimarco
And it doesn't make sense to anyone .. so strtok_r() was created...Harlanharland
OT: It's int main (void) by the way.Dimarco
S
59

Two things to know about strtok. As was mentioned, it "maintains internal state". Also, it messes up the string you feed it. Essentially, it will write a '\0' where it finds the token you supplied, and returns a pointer to the start of the string. Internally it maintains the location of the last token; and next time you call it, it starts from there.

The important corollary is that you cannot use strtok on a const char* "hello world"; type of string, since you will get an access violation when you modify contents of a const char* string.

The "good" thing about strtok is that it doesn't actually copy strings - so you don't need to manage additional memory allocation etc. But unless you understand the above, you will have trouble using it correctly.

Example - if you have "this,is,a,string", successive calls to strtok will generate pointers as follows (the ^ is the value returned). Note that the '\0' is added where the tokens are found; this means the source string is modified:

t  h  i  s  ,  i  s  ,  a  ,  s  t  r  i  n  g \0         this,is,a,string

t  h  i  s  \0 i  s  ,  a  ,  s  t  r  i  n  g \0         this
^
t  h  i  s  \0 i  s  \0 a  ,  s  t  r  i  n  g \0         is
               ^
t  h  i  s  \0 i  s  \0 a  \0 s  t  r  i  n  g \0         a
                        ^
t  h  i  s  \0 i  s  \0 a  \0 s  t  r  i  n  g \0         string
                              ^

Hope it makes sense.

Superscribe answered 13/1, 2014 at 17:17 Comment(1)
Very illustrative, thanks. This is exactly what I am struggling with.Caiaphas
V
5

strtok maintains internal state. When you call it with non-NULL it re-initializes itself to use the string you supply. When you call it with NULL it uses that string, and any other state its currently got to return the next token.

Because of the way strtok works you need to ensure that you link with a multithreaded version of the C runtime if you're writing a multithreaded application. This will ensure that each thread get its own internal state for strtok.

Volcanic answered 13/1, 2014 at 17:13 Comment(0)
V
2

The strtok() function stores data between calls. It uses that data when you call it with a NULL pointer.

From http://www.cplusplus.com/reference/cstring/strtok/ :

The point where the last token was found is kept internally by the function to be used on the next call (particular library implementations are not required to avoid data races).

Violative answered 13/1, 2014 at 17:13 Comment(1)
Most modern runtimes store the state in thread local storage. Which means that it is thread safe but not safe when used re-entrantly.Pearcy
S
1

The strtok function stores data in an internal static variable which is shared among all threads.

For thread safety you should use strtok_r

From http://www.opensource.apple.com/source/Libc/Libc-167/string.subproj/strtok.c

Take a look to static char *last;

char *
strtok(s, delim)
    register char *s;
    register const char *delim;
{
    register char *spanp;
    register int c, sc;
    char *tok;
    static char *last;


    if (s == NULL && (s = last) == NULL)
        return (NULL);

    /*
     * Skip (span) leading delimiters (s += strspn(s, delim), sort of).
     */
cont:
    c = *s++;
    for (spanp = (char *)delim; (sc = *spanp++) != 0;) {
        if (c == sc)
            goto cont;
    }

    if (c == 0) {       /* no non-delimiter characters */
        last = NULL;
        return (NULL);
    }
    tok = s - 1;

    /*
     * Scan token (scan for delimiters: s += strcspn(s, delim), sort of).
     * Note that delim must have one NUL; we stop if we see that, too.
     */
    for (;;) {
        c = *s++;
        spanp = (char *)delim;
        do {
            if ((sc = *spanp++) == c) {
                if (c == 0)
                    s = NULL;
                else
                    s[-1] = 0;
                last = s;
                return (tok);
            }
        } while (sc != 0);
    }
    /* NOTREACHED */
}
Selfconfessed answered 13/1, 2014 at 17:16 Comment(1)
strtok_s on WindowsPearcy

© 2022 - 2024 — McMap. All rights reserved.