C++ reading from file puts three weird characters
Asked Answered
B

3

6

When i read from a file string by string, >> operation gets first string but it starts with "i" . Assume that first string is "street", than it gets as "istreet".

Other strings are okay. I tried for different txt files. The result is same. First string starts with "i". What is the problem?

Here is my code :

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
using namespace std;

int cube(int x){ return (x*x*x);}

int main(){

int maxChar;
int lineLength=0;
int cost=0;

cout<<"Enter the max char per line... : ";
cin>>maxChar;
cout<<endl<<"Max char per line is : "<<maxChar<<endl;

fstream inFile("bla.txt",ios::in);

if (!inFile) {
    cerr << "Unable to open file datafile.txt";
    exit(1);   // call system to stop
}

while(!inFile.eof()) {
    string word;

    inFile >> word;
    cout<<word<<endl;
    cout<<word.length()<<endl;
    if(word.length()+lineLength<=maxChar){
        lineLength +=(word.length()+1);
    }
    else {
        cost+=cube(maxChar-(lineLength-1));
        lineLength=(word.length()+1);
    }   
}

}
Brilliance answered 2/5, 2012 at 16:11 Comment(1)
Aside: Never use .eof() as a loop condition. It almost always produces buggy code, as it does in your case. Prefer doing the input operation in the loop condition: string word; while(inFile >> word) { … }.Poona
P
10

You're seeing a UTF-8 Byte Order Mark (BOM). It was added by the application that created the file.

To detect and ignore the marker you could try this (untested) function:

bool SkipBOM(std::istream & in)
{
    char test[4] = {0};
    in.read(test, 3);
    if (strcmp(test, "\xEF\xBB\xBF") == 0)
        return true;
    in.seekg(0);
    return false;
}
Playreader answered 2/5, 2012 at 16:13 Comment(4)
Also you might want to mention that he's reading the file wrong; it should be while (inFile >> word) not while (!inFile.eof())Naseberry
@vk7x #8881048Rollicking
The code above won't work unless you add a cast of "(unsigned char)" before into the if statement, e.g. if ((unsigned char)test[0] == 0xEF && (unsigned char)test[1] == 0xBB && (unsigned char)test[2] == 0xBF). Either that, or compare to -17, -69 and -65. See my answer below.Sister
@Contango, I don't know why it took me so long to see your comment but thanks. I came up with a completely different way to solve the problem, see my latest edit.Playreader
S
3

With reference to the excellent answer by Mark Ransom above, adding this code skips the BOM (Byte Order Mark) on an existing stream. Call it after opening a file.

// Skips the Byte Order Mark (BOM) that defines UTF-8 in some text files.
void SkipBOM(std::ifstream &in)
{
    char test[3] = {0};
    in.read(test, 3);
    if ((unsigned char)test[0] == 0xEF && 
        (unsigned char)test[1] == 0xBB && 
        (unsigned char)test[2] == 0xBF)
    {
        return;
    }
    in.seekg(0);
}

To use:

ifstream in(path);
SkipBOM(in);
string line;
while (getline(in, line))
{
    // Process lines of input here.
}
Sister answered 20/6, 2013 at 16:58 Comment(0)
H
0

Here is another two ideas.

  1. if you are the one who create the files, save they length along with them, and when reading them, just cut all the prefix with this simple calculation: trueFileLength - savedFileLength = numOfByesToCut
  2. create your own prefix when saving the files, and when reading search for it and delete all what you found before.
Hypoblast answered 2/5, 2012 at 16:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.