Detecting end of input using std::getline
Asked Answered
H

3

7

I have a code with the following snippet:

std::string input;
while(std::getline(std::cin, input))
{   
    //some read only processing with input
}

When I run the program code, I redirect stdin input through the file in.txt (which was created using gedit), and it contains:

ABCD
DEFG
HIJK

Each of the above lines end with one newline in the file in.txt.

The problem I am facing is, after the while loop runs for 3 times (for each line), the program control does not move forward and is stuck. My question is why is this happening and what can I do to resolve the problem?

Some clarification:

I want to be able to run the program from the command line as such:

$ gcc program.cc -o out
$ ./out < in.txt

Additional Information:

I did some debugging and found that the while loop actually is running for 4 times (the fourth time with input as empty string). This is causing the loop to program to stall, because the //some processing read only with input is unable to do its work.

So my refined question:

1) Why is the 4th loop running at all?

Rationale behind having std::getline() in the while loop's condition must be that, when getline() cannot read any more input, it returns zero and hence the while loop breaks.

Contrary to that, while loop instead continues with an empty string! Why then have getline in the while loop condition at all? Isn't that bad design?

2) How do I ensure that the while doesn't run for the 4th time without using break statements?

For now I have used a break statement and string stream as follows:

std::string input;
char temp;
while(std::getline(std::cin, input))
{       
    std::istringstream iss(input);
    if (!(iss >>temp))
    {    
        break;
    } 
    //some read only processing with input
}

But clearly there has to be a more elegant way.

Hardset answered 30/10, 2013 at 3:32 Comment(15)
See https://mcmap.net/q/53854/-read-file-line-by-line-using-ifstream-in-c.Chiffon
It really shouldn't get stuck. What compiler are you using?Contumacious
I am using gcc version 4.6.3Hardset
@zalenix I'm pretty sure your problems come up with //some processing with input ...Invitatory
@zalenix 'the while loop runs for 3 times for each line' sounds pretty strange (and shouldn't be a compiler/lib problem BTW. I'm pretty sure we would know about it, if ever existed!)Invitatory
Works fine here with gcc 4.7.2Indore
There's nothing wrong with the code you show above - if it gets stuck, the something inside the loop must be causing it.Bylaw
@g-makulik I have clarified my question. Please have a look.Hardset
@TonyD Please have a look now. I have added additional informationHardset
@ZhiWang Thanks for the resource. Please see the additional information I have added.Hardset
@zalenix: glad you got to understand something about your mistake, though I don't think you're quite ready to second guess the library behaviours ala "I feel the need to mention...": the choice/use of getline and/or >> (the "and" case being use of '>>' on a std::istringstream created from the line), or even regexps in C++11 or spirit in boost, is all sane once understood - they work quite well together. Your suggestion to use while(std::cin>>a>>b) is only good if you don't need to verify the number of arguments per line (e.g. to report errors in input data).Bylaw
"Basically, we should not have to check for string specification (empty string etc.) if input specifications are known beforehand." -- You need to decide how to deal with incorrect input. Ignoring the possibility is rarely a good idea.Oklahoma
@RichardTingle I have made my solution into an answer nowHardset
@KeithThompson You are right. I have removed that sentence and have addressed the need to deal with incorrect input in my answer.Hardset
@zalenix To go off topic; I once had a specification for an end of week report writer. Did they run it at times other that the end of the week; yes, did it produce strange bugs under those conditions; yes, was this my fault; yesSkillern
O
9

Contrary to DeadMG's answer, I believe the problem is with the contents of your input file, not with your expectation about the behavior of the newline character.


UPDATE : Now that I've had a chance to play with gedit, I think I see what caused the problem. gedit apparently is designed to make it difficult to create a file without a newline on the last line (which is sensible behavior). If you open gedit and type three lines of input, typing Enter at the end of each line, then save the file, it will actually create a 4-line file, with the 4th line empty. The complete contents of the file, using your example, would then be "ABCD\nEFGH\nIJKL\n\n". To avoid creating that extra empty line, just don't type Enter at the end of the last line; gedit will provide the required newline character for you.

(As a special case, if you don't enter anything at all, gedit will create an empty file.)

Note this important distinction: In gedit, typing Enter creates a new line. In a text file stored on disk, a newline character (LF, '\n') denotes the end of the current line.


Text file representations vary from system to system. The most common representations for an end-of-line marker are a single ASCII LF (newline) character (Unix, Linux, and similar systems), and as sequence of two characters, CR and LF (MS Windows). I'll assume the Unix-like representation here. (UPDATE: In a comment, you said you're using Ubuntu 12.04 and gcc 4.6.3, so text files should definitely be in the Unix-style format.)

I just wrote the following program based on the code in your question:

#include <iostream>
#include <string>
int main() {
    std::string input;
    int line_number = 0;
    while(std::getline(std::cin, input))
    {   
        line_number ++;
        std::cout << "line " << line_number
                  << ", input = \"" << input << "\"\n";
    }
}

and I created a 3-line text file in.txt:

ABCD
EFGH
IJHL

In the file in.txt each line is terminated by a single newline character.

Here's the output I get:

$ cat in.txt
ABCD
EFGH
IJHL
$ g++ c.cpp -o c
$ ./c < in.txt
line 1, input = "ABCD"
line 2, input = "EFGH"
line 3, input = "IJHL"
$

The final newline at the very end of the file does not start a newline, it merely marks the end of the current line. (A text file that doesn't end with a newline character might not even be valid, depending on the system.)

I can get the behavior you describe if I add a second newline character to the end of in.txt:

$ echo '' >> in.txt
$ cat in.txt
ABCD
EFGH
IJHL

$ ./c < in.txt
line 1, input = "ABCD"
line 2, input = "EFGH"
line 3, input = "IJHL"
line 4, input = ""
$

The program sees an empty line at the end of the input file because there's an empty line at the end of the input file.

If you examine the contents of in.txt, you'll find two newline (LF) characters at the very end, one to mark the end of the third line, and one to mark the end of the (empty) fourth line. (Or if it's a Windows-format text file, you'll find a CR-LF-CR-LF sequence at the very end of the file.)

If your code doesn't deal properly with empty lines, then you should either ensure that it doesn't receive any empty lines on its input, or, better, modify it so it handles empty lines correctly. How should it handle empty lines? That depends on what the program is required to do, and it's probably entirely up to you. You can silently skip empty lines:

if (input != "") {
    // process line
}

or you can treat an empty line as an error:

if (input == "") {
    // error handling code
}

or you can treat empty lines as valid data.

In any case, you should decide exactly how you want to handle empty lines.

Oklahoma answered 13/11, 2013 at 16:26 Comment(8)
That's exactly what I thought: that an empty line should be composed of 2 consecutive '\n' characters. That's the behavior that you are getting too. But on my machine an empty line requires only 1 '\n' character. Hence, the confusion. Thank you for your answerHardset
@zalenix: What system are you using (operating system, compiler)? There has to be a way to represent a text file whose last line is not empty.Oklahoma
Ubuntu 12.04 and gcc 4.6.3Hardset
@zalenix: Then you've misunderstood what your input file actually contains. I'm using a very similar system myself. If you're seeing an empty line at the end of your input file, you must have a double newline. Try od -c in.txt; you should see something like ... \n I J H L \n \nOklahoma
That's amazing! How is it possible though? I pressed the Enter key only thriceHardset
@zalenix: I have no idea how you created your in.txt input file. Whatever you did, you ended up creating a 4-line file. (Which your program should be able to handle without choking anyway.)Oklahoma
@zalenix: I just played around with gedit; see my updated answer, just after the first paragraph.Oklahoma
That is the perfect answer! Accepting it :)Hardset
C
6

Why is the 4th loop running at all?

Because the text input contains four lines.

The new line character means just that- "Start a new line". It does not mean "The preceeding line is complete", and in this test, the difference between those two semantics is revealed. So we have

1. ABCD
2. DEFG
3. HIJK
4.

The newline character at the end of the third line begins a new line- just like it should do and exactly like its name says it will. The fact that that line is empty is why you get back an empty string. If you want to avoid it, trim the newline at the end of the third line, or, simply special-case if (input == "") break;.

The problem has nothing to do with your code, and lies in your faulty expectation of the behaviour of the newline character.

Congreve answered 30/10, 2013 at 21:31 Comment(5)
The new line character means just that- "Start a new line". It does not mean "The preceeding line is complete" -- Huh? The new line character means that the preceding line is complete. The OP's input file probably has a double newline at the end, i.e., it ends in a blank line. See my answer.Oklahoma
I suppose it's possible that the OP is using a system with a bizarre text file representation, but I've never seen such a system. I've asked the OP just what system he's using.Oklahoma
Nope, the OP is using Ubuntu and gcc. Unless I'm missing something obvious (which is possible but, I think, unlikely) it appears that you've misunderstood the semantics of the newline character (which is surprising).Oklahoma
If your answer was about C, it would be completely wrong: a newline character '\n' returned by input APIs does indicate the end of a line. This is also the case for file formats under Unix: a text file is by definition a sequence of lines each of which ends with a newline character. Does C++ do things differently?Dyke
@Gilles: "Does C++ do things differently?" -- No.Oklahoma
H
1

Finale:

Edit: Please read the accepted answer for the correct explanation of the problem and the solution as well.


As a note to people using std::getline() in their while loop condition, remember to check if it's an empty string inside the loop and break accordingly, like this:

string input;
while(std::getline(std::cin, input))
{
    if(input = "")
        break;
    //some read only processing with input 
}

My suggestion: Don't have std::getline() in the while loop condition at all. Rather use std::cin like this:

while(std::cin>>a>>b)
{
    //loop body
}

This way extra checking for empty string will not be required and code design is better.

The latter method mentioned above negates the explicit checking of an empty string (However, it is always better to do as much explicit checking as possible on the format of the input).

Hardset answered 13/11, 2013 at 16:41 Comment(2)
This will make the input empty, no matter what it was previously.Hardshell
operator>> does not read line-by-line, it reads word-by-word (until the first whitespace). Although it might suffice for your situation, it certainly isn't a replacement for std::getline.Galiot

© 2022 - 2024 — McMap. All rights reserved.