strtok segmentation fault
Asked Answered
S

8

19

I am trying to understand why the following snippet of code is giving a segmentation fault:

void tokenize(char* line)
{
   char* cmd = strtok(line," ");

   while (cmd != NULL)
   {
        printf ("%s\n",cmd);
        cmd = strtok(NULL, " ");
   } 
}

int main(void)
{
   tokenize("this is a test");
}

I know that strtok() does not actually tokenize on string literals, but in this case, line points directly to the string "this is a test" which is internally an array of char. Is there any of tokenizing line without copying it into an array?

Sun answered 22/1, 2012 at 0:0 Comment(2)
Dude - "this is a test" is a STRING LITERAL. Meaning it's a READ ONLY "array of char". You might even get away with trying to modify it without crashing on certain platforms. But it's definitely a no-no on ANY platform :)Gath
By the way, this error is not always a SEGFAULT - in my case it showed up as "process] terminated by signal SIGBUS (Misaligned address error)"Rhynchocephalian
E
33

The problem is that you're attempting to modify a string literal. Doing so causes your program's behavior to be undefined.

Saying that you're not allowed to modify a string literal is an oversimplification. Saying that string literals are const is incorrect; they're not.

WARNING : Digression follows.

The string literal "this is a test" is of an expression of type char[15] (14 for the length, plus 1 for the terminating '\0'). In most contexts, including this one, such an expression is implicitly converted to a pointer to the first element of the array, of type char*.

The behavior of attempting to modify the array referred to by a string literal is undefined -- not because it's const (it isn't), but because the C standard specifically says that it's undefined.

Some compilers might permit you to get away with this. Your code might actually modify the static array corresponding to the literal (which could cause great confusion later on).

Most modern compilers, though, will store the array in read-only memory -- not physical ROM, but in a region of memory that's protected from modification by the virtual memory system. The result of attempting to modify such memory is typically a segmentation fault and a program crash.

So why aren't string literals const? Since you really shouldn't try to modify them, it would certainly make sense -- and C++ does make string literals const. The reason is historical. The const keyword didn't exist before it was introduced by the 1989 ANSI C standard (though it was probably implemented by some compilers before that). So a pre-ANSI program might look like this:

#include <stdio.h>

print_string(s)
char *s;
{
    printf("%s\n", s);
}

main()
{
    print_string("Hello, world");
}

There was no way to enforce the fact that print_string isn't allowed to modify the string pointed to by s. Making string literals const in ANSI C would have broken existing code, which the ANSI C committee tried very hard to avoid doing. There hasn't been a good opportunity since then to make such a change to the language. (The designers of C++, mostly Bjarne Stroustrup, weren't as concerned about backward compatibility with C.)

Electroencephalograph answered 22/1, 2012 at 0:22 Comment(1)
@KoushikShomChoudhury: Or somebody saw a problem in my answer, decided to downvote it, and most likely didn't see my comment in the intervening three years. Perhaps they though my digression was excessive, which would be a valid criticism. Votes are anonymous by design. The idea of requiring, or at least encouraging, a comment along with a downvote has been considered and rejected. Personally I think encouraging a comment would be a good idea, and perhaps having a way for a comment to notify a downvoter (though that's probably difficult to do).Electroencephalograph
M
5

As you said, you can't modify a string literal, which is what strtok does. You have to do

char str[] = "this is a test";
tokenize(str);

This creates the array str and initialises it with this is a test\0, and passes a pointer to it to tokenize.

Mcbryde answered 22/1, 2012 at 0:2 Comment(0)
L
5

There's a very good reason that trying to tokenize a compile-time constant string will cause a segmentation fault: the constant string is in read-only memory.

The C compiler bakes compile-time constant strings into the executable, and the operating system loads them into read-only memory (.rodata in a *nix ELF file). Since this memory is marked as read-only, and since strtok writes into the string that you pass into it, you get a segmentation fault for writing into read-only memory.

Limb answered 22/1, 2012 at 0:4 Comment(0)
P
4

Strok modifies its first argument in order to tokenize it. Hence you can't pass it a literal string, as it's of type const char * and cannot be modified, hence the undefined behaviour. You have to copy the string literal into a char array that can be modified.

Peseta answered 22/1, 2012 at 0:4 Comment(0)
O
2

What point are you trying to make by your "...is internally an array of char" remark?

The fact that "this is a test" is internally an array of char does not change anything at all. It is still a string literal (all string literals are non-modifiable arrays of char). Your strtok still tries to tokenize a string literal. This is why it crashes.

Oriente answered 22/1, 2012 at 0:6 Comment(0)
K
2

I have also big trouble with this error. I found a simple solution.

please include <string.h> it will remove strtok segmentation fault error.

Keeling answered 10/12, 2021 at 9:47 Comment(1)
Same error in case of a variable char ARRAY (not a STRING LITERAL), same solution: include <string.h>Faucal
G
1

I'm sure you'll get beaten up about this... but "strtok()" is inherently unsafe and prone to things like access violations.

Here, the answer is almost certainly using a string constant.

Try this instead:

void tokenize(char* line)
{
   char* cmd = strtok(line," ");

   while (cmd != NULL)
   {
        printf ("%s\n",cmd);
        cmd = strtok(NULL, " ");
   } 
}

int main(void)
{
  char buff[80];
  strcpy (buff, "this is a test");
  tokenize(buff);
}
Gath answered 22/1, 2012 at 0:2 Comment(1)
If you're going to bring up the unsafe nature of strtok, we might as well remember that strncpy is much safer than strcpy. Although strcpy is perfectly safe for a compile-time constant string, a later refactoring could turn the strcpy call into a buffer overflow vulnerability.Limb
P
0

I just hit the Segmentation Fault error from trying to use printf to print the token (cmd in your case) after it became NULL.

Paradise answered 27/6, 2016 at 20:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.