C - Determining which delimiter used - strtok()
Asked Answered
M

3

10

Let's say I'm using strtok() like this..

char *token = strtok(input, ";-/");

Is there a way to figure out which token actually gets used? For instance, if the inputs was something like:

Hello there; How are you? / I'm good - End

Can I figure out which delimiter was used for each token? I need to be able to output a specific message, depending on the delimiter that followed the token.

Monteiro answered 17/9, 2012 at 13:28 Comment(3)
For info, when you man strtok: This interface is obsoleted by strsep(3).Gravitt
No but if you make a question asking for a version that does it, someone might write one. char *strtok_new(char *string, const char *delimiters,char *matched);Pyles
Note that strsep isn't in C or Posix (although strtok_r is in Posix and strtok_s is in C11), but it is in LSB and BSD. So strtok is obsoleted by strsep on some systems and not others. The thing about man pages is, they only apply to the system you're on when you run man.Herat
C
9

Important: strtok is not re-entrant, you should use strtok_r instead of it.

You can do it by saving a copy of the original string, and looking into offsets of the current token into that copy:

char str[] = "Hello there; How are you? / I'm good - End";
char *copy = strdup(str);
char *delim = ";-/";
char *res = strtok( str, delim );
while (res) {
    printf("%c\n", copy[res-str+strlen(res)]);
    res = strtok( NULL, delim );
}
free(copy);

This prints

;
/
-

Demo #1

EDIT: Handling multiple delimiters

If you need to handle multiple delimiters, determining the length of the current sequence of delimiters becomes slightly harder: now you need to find the next token before deciding how long is the sequence of delimiters. The math is not complicated, as long as you remember that NULL requires special treatment:

char str[] = "(20*(5+(7*2)))+((2+8)*(3+6*9))";
char *copy = strdup(str);
char *delim = "*+()";
char *res = strtok( str, delim );
while (res) {
    int from = res-str+strlen(res);
    res = strtok( NULL, delim );
    int to = res != NULL ? res-str : strlen(copy);
    printf("%.*s\n", to-from, copy+from);
}
free(copy);

Demo #2

Catalyst answered 17/9, 2012 at 13:40 Comment(5)
Thank you, this is the kind of behavior I was hoping to achieve.Monteiro
@dasblinkenlight, this wont work if your have multiple separator in succession. e.g consider tokenization of arithmetic expression 20*5+(7*2) where you define your delim as "+-/*()" and you are interested in tokenization of operator and operands. +( will not be tokenized correctly.Mustard
@Mustard You are right, the code assumes there is always one delimiter. However, there is an easy fix for that - all you need to do is to get the next token to decide how long is the current run of delimiters (see the edit and the demo).Catalyst
@dasblinkenlight Thanks . it works but it still misses the leading delimiters. think if you want to evaluate the expression you need all the delimiters (including the leading one). Also you have to split the delimiters again as it is printing multiple delimiter together. I found out writing custom tokenizer by using find_first_of() or find_first_not_of() is much cleaner. Also boost tokens() class achieve this nicely.Mustard
Something like #9823763Mustard
H
3

You can't. strtok overwrites the next separator character with a nul character (in order to terminate the token that it's returning this time), and it doesn't store the previous value that it overwrites. The first time you call strtok on your example string, the ; is gone forever.

You could do something if you keep an unmodified copy of the string you're modifying with strtok - given the index of the nul terminator for your current token (relative to the start of the string), you can look at the same index in the copy and see what was there.

That might be worse than just writing your own code to separate the string, of course. You can use strpbrk or strcspn, if you can live with the resulting token not being nul-terminated for you.

Herat answered 17/9, 2012 at 13:33 Comment(0)
G
2

man 3 strtok

The strtok() and strtok_r() functions return a pointer to the beginning of each subsequent token in the string, after replacing the token itself with a NUL character. When no more tokens remain, a null pointer is returned.

But with a little pointer arithmetic you can do something like:

char* string = "Hello,World!";
char* dup = strdup(string);

char* world = strtok(string, ",");
char delim_used = dup[world - string];

free(dup);
Gravitt answered 17/9, 2012 at 13:39 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.