How to read a file and get words in C++
Asked Answered
J

4

1

I am curious as to how I would go about reading the input from a text file with no set structure (Such as notes or a small report) word by word. The text for example might be structured like this:

"06/05/1992
Today is a good day;
The worm has turned and the battle was won."

I was thinking maybe getting the line using getline, and then seeing if I can split it into words via whitespace from there. Then I thought using strtok might work! However I don't think that will work with the punctuation.

Another method I was thinking of was getting everything char by char and omitting the characters that were undesired. Yet that one seems unlikely.

So to sort the thing short: Is there an easy way to read an input from a file and split it into words?

Jaborandi answered 12/9, 2010 at 2:5 Comment(1)
Thanks for the solutions guys. Gonna try them all!Jaborandi
B
3

Since it's easier to write than to find the duplicate question,

#include <iterator>

std::istream_iterator<std::string> word_iter( my_file_stream ), word_iter_end;

size_t wordcnt;
for ( ; word_iter != word_iter_end; ++ word_iter ) {
    std::cout << "word " << wordcnt << ": " << * word_iter << '\n';
}

The std::string argument to istream_iterator tells it to return a string when you do *word_iter. Every time the iterator is incremented, it grabs another word from its stream.

If you have multiple iterators on the same stream at the same time, you can choose between data types to extract. However, in that case it may be easier just to use >> directly. The advantage of an iterator is that it can plug into the generic functions in <algorithm>.

Brubaker answered 12/9, 2010 at 2:11 Comment(5)
+1 for the istream_iterator solution -- though I do note this might not be the best for a beginner :)Rilke
@Billy: Eh, I dunno. I think iterators are more fundamental than containers, and many beginners gloss over them and don't learn until they already have a body of code that passes vector everywhere.Brubaker
@wilhelmtell: How exactly would you replicate the above using std::copy? There seems to be a bit more complicated logic going on inside the for than just copying...Rilke
oops i should have read through. pardon the fastest gun in the west who can't aim.Castellany
@Trygle: No, I should have explained. So now I did.Brubaker
R
3

Yes. You're looking for std::istream::operator>> :) Note that it will remove consecutive whitespace but I doubt that's a problem here.

i.e.

std::ifstream file("filename");
std::vector<std::string> words;
std::string currentWord;
while(file >> currentWord)
    words.push_back(currentWord);
Rilke answered 12/9, 2010 at 2:10 Comment(2)
To be honest, this way of doing it didn't even cross my mind. +1Brubaker
I like this one. I need to break out of my Vector saftety zone, so PotatoSwatter's solution gave me the best learning experience.Pretty hard to choose the solution when all these work just fine for my problem.Jaborandi
B
3

Since it's easier to write than to find the duplicate question,

#include <iterator>

std::istream_iterator<std::string> word_iter( my_file_stream ), word_iter_end;

size_t wordcnt;
for ( ; word_iter != word_iter_end; ++ word_iter ) {
    std::cout << "word " << wordcnt << ": " << * word_iter << '\n';
}

The std::string argument to istream_iterator tells it to return a string when you do *word_iter. Every time the iterator is incremented, it grabs another word from its stream.

If you have multiple iterators on the same stream at the same time, you can choose between data types to extract. However, in that case it may be easier just to use >> directly. The advantage of an iterator is that it can plug into the generic functions in <algorithm>.

Brubaker answered 12/9, 2010 at 2:11 Comment(5)
+1 for the istream_iterator solution -- though I do note this might not be the best for a beginner :)Rilke
@Billy: Eh, I dunno. I think iterators are more fundamental than containers, and many beginners gloss over them and don't learn until they already have a body of code that passes vector everywhere.Brubaker
@wilhelmtell: How exactly would you replicate the above using std::copy? There seems to be a bit more complicated logic going on inside the for than just copying...Rilke
oops i should have read through. pardon the fastest gun in the west who can't aim.Castellany
@Trygle: No, I should have explained. So now I did.Brubaker
S
0

You can use getline with a space character, getline(buffer,1000,' ');

Or perhaps you can use this function to split a string into several parts, with a certain delimiter:

string StrPart(string s, char sep, int i) {
  string out="";
  int n=0, c=0;
  for (c=0;c<(int)s.length();c++) {
    if (s[c]==sep) {
      n+=1;
    } else {
      if (n==i) out+=s[c];
    }
  }
  return out;
}

Notes: This function assumes that it you have declared using namespace std;.

s is the string to be split. sep is the delimiter i is the part to get (0 based).

Shira answered 12/9, 2010 at 2:11 Comment(3)
Why not use std::getline(std::istream&, std::string&) here? Oh, and n and c are horrendous variable names.Rilke
For a very short function, those variable names will do.Shira
-1 for the terrible formatting. Spaces aid readability, as do parseable variable names. (The broader a variable's scope, the "better" its name should be. Function-scope variables should be at least a word.)Cacodemon
T
0

You can use the scanner technique to grabb words, numbers dates etc... very simple and flexible. The scanner normally returns token (word, number, real, keywords etc..) to a Parser.

If you later intend to interpret the words, I would recommend this approach.

I can warmly recommend the book "Writing Compilers and Interpreters" by Ronald Mak (Wiley Computer Publishing)

Treat answered 12/9, 2010 at 2:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.