tokenizing a string twice in c with strtok()
Asked Answered
B

3

9

I'm using strtok() in c to parse a csv string. First I tokenize it to just find out how many tokens there are so I can allocate a string of the correct size. Then I go through using the same variable I used last time for tokenization. Every time I do it a second time though it strtok(NULL, ",") returns NULL even though there are still more tokens to parse. Can somebody tell me what I'm doing wrong?

char* tok;
int count = 0;
tok = strtok(buffer, ",");
while(tok != NULL) {
    count++;
    tok = strtok(NULL, ",");
}

//allocate array

tok = strtok(buffer, ",");
while(tok != NULL) {
    //do other stuff
    tok = strtok(NULL, ",");
}

So on that second while loop it always ends after the first token is found even though there are more tokens. Does anybody know what I'm doing wrong?

Blessed answered 28/12, 2010 at 6:8 Comment(1)
How is it that apparently everyone knows what strtok() is these days, but nobody has read the documentation? Nobody told me about it when I was learning C, but as soon as I had to know about it, I read up on it.Fond
E
19

strtok() modifies the string it operates on, replacing delimiter characters with nulls. So if you want to use it more than once, you'll have to make a copy.

Epithalamium answered 28/12, 2010 at 6:13 Comment(0)
P
2

There's not necessarily a need to make a copy - strtok() does modify the string it's tokenizing, but in most cases that simply means the string is already tokenized if you want to deal with the tokens again.

Here's your program modified a bit to process the tokens after your first pass:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main()
{
    int i;
    char buffer[] = "some, string with  ,  tokens";

    char* tok;
    int count = 0;
    tok = strtok(buffer, ",");
    while(tok != NULL) {
        count++;
        tok = strtok(NULL, ",");
    }


    // walk through the tokenized buffer again
    tok = buffer;

    for (i = 0; i < count; ++i) {
        printf( "token %d: \"%s\"\n", i+1, tok);
        tok += strlen(tok) + 1;  // get the next token by skipping past the '\0'
        tok += strspn(tok, ","); //   then skipping any starting delimiters
    }

     return 0;
  }

Note that this is unfortunately trickier than I first posted - the call to strspn() needs to be performed after skipping the '\0' placed by strtok() since strtok() will skip any leading delimiter characters for the token it returns (without replacing the delimiter character in the source).

Panslavism answered 28/12, 2010 at 7:17 Comment(1)
Another way would be to store the token pointers from the first pass in an array. Of course, that would mean having a maximum number of tokens or a dynamic array. But that could work too.Epithalamium
B
2

Use strsep - it actually updates your pointer. In your case you would have to keep calling NULL versus passing in the address of your string. The only issue with strsep is if it was previously allocated on the heap, keep a pointer to the beginning and then free it later.

char *strsep(char **string, char *delim);

char *string; char *token; token = strsep(&string, ",");

strtok is used in your normal intro to C course - use strsep, it's much better. :-) No getting confused on "oh shit - i have to pass in NULL still cuz strtok screwed up my positioning."

Birthroot answered 1/1, 2011 at 21:35 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.