Reading formatted data with C++'s stream operator >> when data has spaces
Asked Answered
F

5

13

I have data in the following format:

4:How do you do?
10:Happy birthday
1:Purple monkey dishwasher
200:The Ancestral Territorial Imperatives of the Trumpeter Swan

The number can be anywhere from 1 to 999, and the string is at most 255 characters long. I'm new to C++ and it seems a few sources recommend extracting formatted data with a stream's >> operator, but when I want to extract a string it stops at the first whitespace character. Is there a way to configure a stream to stop parsing a string only at a newline or end-of-file? I saw that there was a getline method to extract an entire line, but then I still have to split it up manually [with find_first_of], don't I?

Is there an easy way to parse data in this format using only STL?

Flameproof answered 26/2, 2010 at 0:57 Comment(2)
Streams in C++ are one of the things I hate about C++.Anathema
As I'm new to C++, I was hoping that streams were one of those things that eventually lead to an epiphany of "oooooh that's clever" but after your comment I'm starting to think that'll never happen. :(Flameproof
J
10

You can read the number before you use std::getline, which reads from a stream and stores into a std::string object. Something like this:

int num;
string str;

while(cin>>num){
    getline(cin,str);

}
Jd answered 26/2, 2010 at 1:16 Comment(4)
That looks clean; I assume it will be safe to replace cin with an istream that I am given?Flameproof
If you are reading from a file, you can replace cin with a valid ifstream object.Jd
I'm just given a stream and I my code is expected to parse the data, manipulate it and write it to another stream. I don't create either stream. I assume my filter wouldn't be invoked if the istream or ostream was invalid, but at the same time I don't think it's any of my concern. Garbage in garbage out :) . . . or maybe garbage in segfault out.Flameproof
I had an extra char variable and did while (cin >> num >> dummy) to get rid of the colon character.Flameproof
F
14

The C++ String Toolkit Library (StrTk) has the following solution to your problem:

#include <string>
#include <deque>
#include "strtk.hpp"

int main()
{
   struct line_type
   {
      unsigned int id;
      std::string str;
   };

   std::deque<line_type> line_list;

   const std::string file_name = "data.txt";

   strtk::for_each_line(file_name,
                        [&line_list](const std::string& line)
                        {
                           line_type temp_line;
                           const bool result = strtk::parse(line,
                                                            ":",
                                                            temp_line.id,
                                                            temp_line.str);
                           if (!result) return;
                           line_list.push_back(temp_line);
                        });

   return 0;
}

More examples can be found Here

Fizgig answered 30/12, 2010 at 20:12 Comment(0)
J
10

You can read the number before you use std::getline, which reads from a stream and stores into a std::string object. Something like this:

int num;
string str;

while(cin>>num){
    getline(cin,str);

}
Jd answered 26/2, 2010 at 1:16 Comment(4)
That looks clean; I assume it will be safe to replace cin with an istream that I am given?Flameproof
If you are reading from a file, you can replace cin with a valid ifstream object.Jd
I'm just given a stream and I my code is expected to parse the data, manipulate it and write it to another stream. I don't create either stream. I assume my filter wouldn't be invoked if the istream or ostream was invalid, but at the same time I don't think it's any of my concern. Garbage in garbage out :) . . . or maybe garbage in segfault out.Flameproof
I had an extra char variable and did while (cin >> num >> dummy) to get rid of the colon character.Flameproof
E
10

You've already been told about std::getline, but they didn't mention one detail that you'll probably find useful: when you call getline, you can also pass a parameter telling it what character to treat as the end of input. To read your number, you can use:

std::string number;
std::string name;

std::getline(infile, number, ':');
std::getline(infile, name);   

This will put the data up to the ':' into number, discard the ':', and read the rest of the line into name.

If you want to use >> to read the data, you can do that too, but it's a bit more difficult, and delves into an area of the standard library that most people never touch. A stream has an associated locale that's used for things like formatting numbers and (importantly) determining what constitutes "white space". You can define your own locale to define the ":" as white space, and the space (" ") as not white space. Tell the stream to use that locale, and it'll let you read your data directly.

#include <locale>
#include <vector>

struct colonsep: std::ctype<char> {
    colonsep(): std::ctype<char>(get_table()) {}

    static std::ctype_base::mask const* get_table() {
        static std::vector<std::ctype_base::mask> 
            rc(std::ctype<char>::table_size,std::ctype_base::mask());

        rc[':'] = std::ctype_base::space;
        rc['\n'] = std::ctype_base::space;
        return &rc[0];
    }
};

Now to use it, we "imbue" the stream with a locale:

#include <fstream>
#include <iterator>
#include <algorithm>
#include <iostream>

typedef std::pair<int, std::string> data;

namespace std { 
    std::istream &operator>>(std::istream &is, data &d) { 
       return is >> d.first >> d.second;
    }
    std::ostream &operator<<(std::ostream &os, data const &d) { 
        return os << d.first << ":" << d.second;
    }
}

int main() {
    std::ifstream infile("testfile.txt");
    infile.imbue(std::locale(std::locale(), new colonsep));

    std::vector<data> d;

    std::copy(std::istream_iterator<data>(infile), 
              std::istream_iterator<data>(),
              std::back_inserter(d));

    // just for fun, sort the data to show we can manipulate it:
    std::sort(d.begin(), d.end());

    std::copy(d.begin(), d.end(), std::ostream_iterator<data>(std::cout, "\n"));
    return 0;
}

Now you know why that part of the library is so neglected. In theory, getting the standard library to do your work for you is great -- but in fact, most of the time it's easier to do this kind of job on your own instead.

Effie answered 26/2, 2010 at 16:27 Comment(0)
N
2
int i;
char *string = (char*)malloc(256*sizeof(char)); //since max is 255 chars, and +1 for '\0'
scanf("%d:%[^\n]s",&i, string); //use %255[^\n]s for accepting 255 chars max irrespective of input size
printf("%s\n", string);

Its C and will work in C++ too. scanf provides more control, but no error management. So use with caution :).

Noel answered 26/2, 2010 at 1:9 Comment(2)
It looks like the m flag is not standardised, so I can't use it. But, again, won't this still only read to the first whitespace character instead of to the end of the line?Flameproof
It is still only reading the first word of the line, not the whole line, and you have an error in your code: you are providing i but scanf needs a pointer to i (&i).Flameproof
B
2

Just read the data line by line (whole line) using getline and parse it.
To parse use find_first_of()

Brach answered 26/2, 2010 at 1:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.