Why is strtok() Considered Unsafe?
Asked Answered
F

4

37

What feature(s) of strtok is unsafe (in terms of buffer overflow) that I need to watch out for?

What's a little weird to me is that strtok_s (which is "safe") in Visual C++ has an extra "context" parameter, but it looks like it's the same in other ways... is it the same, or is it actually different?

Flowery answered 14/5, 2011 at 2:18 Comment(8)
Maybe because both arguments of strtok are pointer to char, so strtok may not reach any terminator character overflowing any local buffer?Saphena
@0verbose: Hm... but isn't that true with all C strings?Flowery
You can look at the implementation: openbsd.org/cgi-bin/cvsweb/src/lib/libc/string/… . I'll have a look too.Saphena
On a side note, why still use this when there is strsepCounterproposal
I think the context in strtok_s is to make it thread safe, like strtok_rAnalysis
@Mel: Because I'm using Windows. :P @Splat: Ah, that makes sense, seems like it's the reason.Flowery
@mehrdad So drop it in your libutil/libcompat. Look at how small it is.Counterproposal
See detailed info on why strtok is bad at this questionClarendon
S
32

According with the strtok_s section of this document:

6.7.3.1 The strtok_s function The strtok_s function fixes two problems in the strtok function:

  1. A new parameter, s1max, prevents strtok_s from storing outside of the string being tokenized. (The string being divided into tokens is both an input and output of the function since strtok_s stores null characters into the string.)
  2. A new parameter, ptr, eliminates the static internal state that prevents strtok from being re-entrant (Subclause 1.1.12). (The ISO/IEC 9899 function wcstok and the ISO/IEC 9945 (POSIX) function strtok_r fix this problem identically.)
Saphena answered 14/5, 2011 at 2:33 Comment(3)
+1 I hadn't seen that document, seems to explain the answer precisely. :)Flowery
Note that this specification of strtok_s() is from (optional) Annex K of ISO/IEC 9899:2011, and its definition is different from the Microsoft specification of strtok_s().Jingo
strtok_s is not supported by glibc, see also Finding Tokens in a String # strsep and Do you use the TR 24731 'safe' functions? and strtok_s is undefined on os x and C-string alternatives to strtok_rCowl
O
13

There is nothing unsafe about it. You just need to understand how it works and how to use it. After you write your code and unit test, it only takes a couple of extra minutes to re-run the unit test with valgrind to make sure you are operating withing memory bounds. The man page says it all:

BUGS

Be cautious when using these functions. If you do use them, note that:

  • These functions modify their first argument.
  • These functions cannot be used on constant strings.
  • The identity of the delimiting character is lost.
  • The strtok() function uses a static buffer while parsing, so it's not thread safe. Use strtok_r() if this matters to you.
Odont answered 14/5, 2011 at 2:22 Comment(5)
I made a minor edit regarding strtok_s... do you happen to know how it's different?Flowery
I am not familiar with that one, but it sounds like strtok_r. If you are tokenizing two or more strings at the same time, then you need one of these. If you are only dealing with one, then there is no need to use it.Odont
To clarify more, strtok keeps a hidden statis reference into your string. If you are calling it with more than one string the reference will be incorrect. strtok_r/s forces you to provide the reference to the function.Odont
It may not be you who's running strtok in another thread. It could be your plugin. Or maybe your code gets used in a library that runs in another thread from the main app that also uses strtok. Or maybe you just decide to make your code multithreaded some day after you've forgotten that there's a strtok call in there.Whitening
There is nothing unsafe about it. Remove that sentence. You literally quoted the section in the man page that says it's not thread safe.Involucrum
F
8

strtok is safe in Visual C++ (but nowhere else), as it uses thread local storage to save its state between calls. Everywhere else, global variable is used to save strtok() state.

However even in VC++, where strtok is thread-safe it is still still a bit weird - you cannot use strtok()s on different strings in the same thread at the same time. For example this would not work well:

     token = strtok( string, seps );
     while(token)
     {
        printf("token=%s\n", token)
        token2 = strtok(string2, seps);
        while(token2)  
        {
            printf("token2=%s", token2);
            token2 = strtok( NULL, seps );
        }
        token = strtok( NULL, seps );
     }

The reason why it would not work well- for every thread only single state can be saved in thread local storage, and here one would need 2 states - for the first string and for the second string. So while strtok is thread-safe with VC++, it is not reentrant.

What strtok_s (or strtok_r everywhere else) provides - an explicit state, and with that strtok becomes reentrant.

Flagpole answered 14/5, 2011 at 2:52 Comment(0)
A
0

If you do not have a properly null terminated string; you will end up in a buffer overflow. Also note (this is something that I learned the hard way) strtok does NOT seem to care about internal strings. I.E. having "hello"/"world" will parse "hello"/"world" whereas "hello/world" will parse into "hello world". Notice that it splits on the / and ignores the fact that it is within a parenthesis.

Aminoplast answered 14/5, 2011 at 2:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.