c++ tokenize std string [duplicate]
Asked Answered
C

4

9

Possible Duplicate:
How do I tokenize a string in C++?

Hello I was wondering how I would tokenize a std string with strtok

string line = "hello, world, bye";    
char * pch = strtok(line.c_str(),",");

I get the following error

error: invalid conversion from ‘const char*’ to ‘char*’
error: initializing argument 1 of ‘char* strtok(char*, const char*)’

I'm looking for a quick and easy approach to this as I don't think it requires much time

Cida answered 27/9, 2012 at 17:49 Comment(1)
I have seen this kind of questions before. Possible duplicate.Durra
S
18

I always use getline for such tasks.

istringstream is(line);
string part;
while (getline(is, part, ','))
  cout << part << endl;
Shawana answered 27/9, 2012 at 17:54 Comment(0)
N
11
std::string::size_type pos = line.find_first_of(',');
std::string token = line.substr(0, pos);

to find the next token, repeat find_first_of but start at pos + 1.

Nematode answered 27/9, 2012 at 17:52 Comment(1)
With this, there would have to be another variable to keep track of pos1 and pos2. Otherwise, you would be using substring from 0 to what the new pos is, instead of pos1 to pos2.Gardal
N
4

You can use strtok by doing &*line.begin() to get a non-const pointer to the char buffer. I usually prefer to use boost::algorithm::split though in C++.

Notepaper answered 27/9, 2012 at 17:54 Comment(6)
I think by discarding the const on the internal pointer of the string, you allow strtok to modify the string's internal pointer - very dirty.Isacco
This is a terrible idea. It will put the std::string into an undefined state. You are not supposed to modify a std::string using C string functions.Batman
@Batman How can it possibly go wrong? There's nothing wrong with modifying the characters in a string through its iterators, and C++ strings are always contiguous in practice, and are guaranteed to be contiguous and null-terminated in C++11.Notepaper
@spencercw: There is no guarantee that the string's internal representation is zero-terminated; and it might use copy-on-write semantics, in which case subverting const could change other copies of the string. It might (or might not) be possible to demonstrate that what you're doing is well-defined for any conformant implementation, but even if you can, I wouldn't like to test the edge cases of conformance like that.Spirochete
@MikeSeymour The internal buffer is guaranteed to be null-terminated in C++11 (see this answer). You raise an interesting point with copy-on-write though. I would guess that in such an implementation dereferencing the iterator would trigger the copy, or some sort of memory guard would trigger the copy anyway when strtok writes to the buffer. Are there any implementations that actually do CoW?Notepaper
@spencercw: No, C++11 doesn't guarantee that the buffer is zero-terminated; just that the characters of the string are stored contiguously, s[s.size()] == 0, and s.data() and s.c_str() return const pointers to zero-terminated arrays. In practice that means that any sane implementation will use a zero-terminated contiguous buffer, but it's not guaranteed. GCC uses CoW; they've probably got all the awkward details of access via pointer-to-dereferenced-iterator right, but personally I'd rather not rely on that.Spirochete
S
1

strtok is a rather quirky, evil function that modifies its argument. This means that you can't use it directly on the contents of a std::string, since there's no way to get a pointer to a mutable, zero-terminated character array from that class.

You could work on a copy of the string's data:

std::vector<char> buffer(line.c_str(), line.c_str()+line.size()+1);
char * pch = strtok(&buffer[0], ",");

or, for more of a C++ idiom, you could use a string-stream:

std::stringstream ss(line);
std::string token;
std::readline(ss, token, ',');

or find the comma more directly:

std::string token(line, 0, line.find(','));
Spirochete answered 27/9, 2012 at 17:54 Comment(1)
There is no std::readline(). Did you mean std::getline()?Enthusiasm

© 2022 - 2024 — McMap. All rights reserved.