Why is the beginning of my string disappearing?
Asked Answered
E

4

6

In the following C++ code, I realised that gcount() was returning a larger number than I wanted, because getline() consumes the final newline character but doesn't send it to the input stream.

What I still don't understand is the program's output, though. For input "Test\n", why do I get " est\n"? How come my mistake affects the first character of the string rather than adding unwanted rubbish onto the end? And how come the program's output is at odds with the way the string looks in the debugger ("Test\n", as I'd expect)?

#include <fstream>
#include <vector>
#include <string>
#include <iostream>

using namespace std;

int main()
{
    const int bufferSize = 1024;
    ifstream input( "test.txt", ios::in | ios::binary );

    vector<char> vecBuffer( bufferSize );
    input.getline( &vecBuffer[0], bufferSize );
    string strResult( vecBuffer.begin(), vecBuffer.begin() + input.gcount() );
    cout << strResult << "\n";

    return 0;
}
Eyrir answered 24/6, 2009 at 16:59 Comment(10)
Works for me using g++ 3.4.3 on RHEL4. What's your platform ?Woozy
And also for me: g++ 3.4.5 on MinGW/MSYS (Windows XP)Paramour
I'm using Visual Studio 2005 on Windows XP. Could it be something to do with the fact that I'm creating my test file in a Windows environment, and the newline is therefore made out of two characters?Eyrir
Try determining what "vecBuffer.begin()" is returning. I know it's stupid, but it looks like an off by one error, and that is where your "one" is being run from.Terpsichore
Both calls to vecBuffer.begin() are returning 84 ('T').Eyrir
Sorry - they return iterators that point to 'T', not chars as I implied.Eyrir
Then I would suggest mix'n'match the following. New testfile (written in notepad (just in case.)) New VS project with copy/pasted code. Try vecBuffer.begin() + 1 for your first parameter, or vecBuffer + gcount - 1 for the last parameter. Just some tweaks to see if you can 'control' and 'make predictable' the unexpected behaviour.Terpsichore
vecBuffer.begin() + input.gcount() - 1 makes the weirdness stop, I think because gcount includes the dropped newline character in its total.Eyrir
This question turned out to be awesomely interesting even though I've never done any C++.Terpsichore
the problem isn't at the beginning of the string, but at the end of the string. strResult has an embedded 0 at the end. You should be using vecBuffer.begin() + input.gcount() - 1Lightening
J
12

I've also duplicated this result, Windows Vista, Visual Studio 2005 SP2.

When I figure out what the heck is happening, I'll update this post.

edit: Okay, there we go. The problem (and the different results people are getting) are from the \r. What happens is you call input.getline and put the result in vecBuffer. The getline function strips off the \n, but leaves the \r in place.

You then transfer the vecBuffer to a string variable, but use the gcount function from input, meaning you will get one char too much, because the input variable still contains the \n, and the vecBuffer does not.

The resulting strResult is:

-       strResult   "Test"
        [0] 84 'T'  char
        [1] 101 'e' char
        [2] 115 's' char
        [3] 116 't' char
        [4] 13 '␍'  char
        [5] 0   char

So then "Test" is printed, followed by a carriage return (puts the cursor back at the start of the line), a null character (overwriting the T), and finally the \n, which correctly puts the cursor on the new line.

So you either have to strip out the \r, or write a function that gets the string length directly from vecBuffer, checking for null characters.

Jackelinejackelyn answered 24/6, 2009 at 18:29 Comment(5)
Excellent. Thanks for the explanation. It's what I'd started to suspect. The only remaining question is why Naaff's result differs from mine and T.E.D.'s.Eyrir
Not sure, maybe he missed his Enter key and ended up with a file without a line break? My first guess would've been that he had used an editor that uses Linux-style line breaks, but he says he used Notepad, so unless XP SP3 changed the behaviour of Notepad...Jackelinejackelyn
The bit about only stripping one terminator is the rub I think. I checked the text file in emacs hexl-mode, and it ends with a CRLF combo. The LF on the end is not read in, but clearly the CR is.Applicator
As noted on T.E.D.'s answer, I wasn't testing in a console project, so I suspect that might be the reason I didn't see the problem.Dyandyana
The problem is that the 0 is getting output, and the console is overwriting the T with a blank character, while some other shells are just doing nothing. strResult.length()==6Lightening
A
6

I've duplicated Tommy's problem on a Windows XP Pro Service Pack 2 system with the code compiled using Visual Studio 2005 SP2 (actually, it says "Version 8.0.50727.879"), built as a console project.

If my test.txt file contains just "Test" and a CR, the program spits out " est" (note the leading space) when run.

If I had to take a wild guess, I'd say that this version of the implementation has a bug where it is treating the Windows newline character like it should be treated in Unix (as a "go to the front of the same line" character), and then it wipes out the first character to hold part of the next prompt or something.


Update: After playing with it a bit, I'm positive that is what is going on. If you look at strResult in the debugger, you will see that it copied over a decimal 13 value at the end. That's CR, which in Windows-land is '\n', and everywhere else is "return to the beginning of the line". If I instead change your constructor to read:

string strResult( vecBuffer.begin(), vecBuffer.begin() + input.gcount() - 1 );

...(so that the CR isn't copied) then it prints out "Test" like you'd expect.

Applicator answered 24/6, 2009 at 18:20 Comment(5)
That sounds promising. The newline character shouldn't be getting into the string at all, but maybe Notepad's CR+LF newlines are interpreted as a weird instruction followed by a new line. The weird instruction must be to go to the front and then print a space - see the comments to Naaff's answer. Wonder why he couldn't reproduce?Eyrir
I think I wasn't able to reproduce the problem because I hadn't put your code into a console project -- I just tacked in into some code that I was working on, not thinking that it would matter.Dyandyana
I've had this happen on Windows more than once. And each time I spend an unusually long time staring at my monitor before I remember what's up.Spancake
I used emacs to create the file, so you can't blame notepad.Applicator
input.gcount-1 still copies the \r, but skips the \0 (see my answer)Lightening
L
2

I am pretty sure that the T is actually getting written and then overwritten. Running the same program in an rxvt window (cygwin) produces the expected output. You can do a couple things. If you get rid of the ios::binary in your open it will autoconvert \r\n to \n and things will work like you expect.

You can also open up your text file in the binary editor by clicking on the little down arrow on the open file dialog's open button and selecting open with...->Binary Editor. This will let you look at your file and confirm that it does indeed have \r\n and not just \n.

Edit: I redirected the output to a file and it is writing out:

Test\r\0\r\n

The reason you are getting the \0 is that gcount returns 6 (6 characters were removed from the stream) but the final delimiter is not copied to the buffer, a '\0' is instead. when you are constructing the string, you are actually telling it to include the '\0'. std::string has no problem with the embedded 0 and outputs it as asked. Some shells are apparently outputting a blank character and overwriting the T, while others don't do anything and the output looks okay, but is still probably wrong because it has the embedded '\0'

cout << strResult.c_str() << "\n";

Changing the last line to this will stop on the \0 and also get the output expected.

Lightening answered 24/6, 2009 at 19:14 Comment(1)
Thanks, Dolphin. A lot of us had been assuming that gcount() returned 5. I don't know why - it seems obvious that it would be 6 now that you've pointed it out. The difference between subtracting 1 from gcount's result and subtracting 2 can't be seen using my posted code, but Naaff's suggested change to the cout line shows it up.Eyrir
D
1

I tested your code using Visual Studio 2005 SP2 on Windows XP Pro SP3 (32-bit), and everything works fine.

Dyandyana answered 24/6, 2009 at 18:4 Comment(8)
Interesting. How did you create the test file? I used Notepad++.Eyrir
Notepad. Typed "Test" and then pressed return and saved "test.txt".Dyandyana
What happens when you change your cout line to this?: cout << "strResult = \"" << strResult << "\"\n";Dyandyana
Weird: I just did the same and it spat out " est". The "About" box for my installation of Visual Studio has "(SP.050727-7600)" after the version number. So I guess I don't have SP2 installed. Do you think that could be it?Eyrir
Oh, hang on, I'll try what you suggested - I hadn't updated before commenting.Eyrir
' "rResult = "Test' - which appears to support T.E.D.'s theory.Eyrir
Looks likely. You could try building your string like this to crop off the last character (newline): string strResult( vecBuffer.begin(), vecBuffer.begin() + input.gcount() - 1 );Dyandyana
Yes, that makes everything fine. Mysterious though.Eyrir

© 2022 - 2024 — McMap. All rights reserved.