reading first line in a file gives me a "\357\273\277" prefix in the first row [duplicate]
Asked Answered
C

1

8

when I use the function readTheNRow with row=0 (i read the first row) i find that the three first chars are \357 ,\273 and \277. i found that this prefix is some how related to UTF-8 files, but some files have this prefix and some don't :( . how do i ignore all type of such prefixes in the files that i want to read from them?

int readTheNRow(char buff[], int row) {

int file = open("my_file.txt", O_RDONLY);
if (file < 0) {
    write(2, "closing fifo was unsuccessful\n", 31);
    exit(-1);
}

// function's variables
int i = 0;
char ch; // a temp variable to read with it
int check; // helping variable for checking the read function

// read till we reach the needed row
while (i != row) {

    // read one char
    check = read(file, &ch, 1);
    if (check < 0) {
        // write a error message to the user
        write(2, "error occurred in reading\n", 27);
        exit(-1);
    }

    if (check < 0) {
        // if means that we reached the end of file
        return -1; // couldn't read the N row (N is bigger than X)
    }
    printf("%c",ch);
    // check that the char is a \n
    if (ch == '\n') {
        i++;
    }
}

// read the number to the received buffer
i = 0;

do {
    // read one char
    check = read(file, buff + i, 1);
    if (check < 0) {
        // write a error message to the user
        write(2, "error occurred in reading\n", 27);
        exit(-1);
    }

    // if we reached the end of file
    if (check == 0) {
        break;
    }
    i++;

} while (buff[i - 1] != '\n');

// put the \0 in the end of the string
 buff[i - 1] = '\0';
return 1; // return that reading was successful

    // try to close the file
if (close(file) < 0) {
    write(2, "closing fifo was unsuccessful\n", 31);
    exit(-1);
}
}
Catechize answered 7/6, 2014 at 11:53 Comment(5)
Read the first 3 characters right after opening and check if they are "\xEF\xBB\xBB". If not, use rewind on your input. Then continue "as usual".Szymanski
BTW: your second if (check < 0) needs to be if (check == 0).Szymanski
The solution depends on your files. If you have full knowledge about what the files may contain, and you know that the valid content won't begin with '\357', just ignore any line that begins with that char. You don't need to do anything else.Shavon
See also en.wikipedia.org/wiki/Byte_order_markCelina
@user3564091 That's wrong, the line is valid, it's just that the first three bytes are not part of the data.Celina
N
9

You seem to be trying to read a file carrying a so called BOM (Byte Ordering Mark).

Test for such prefixes and if they are around used the potenial info draw from it, then go on and read the file, interpreting it as the BOMs indicates.

The sequence \357 \273 \277 indicates UTF-8 is following. Which does not need to take byte-ordering into account, as the byte is the unit for such files.

More on the various existing BOMs here: http://en.wikipedia.org/wiki/Byte_order_mark#Representations_of_byte_order_marks_by_encoding

Nunez answered 7/6, 2014 at 12:4 Comment(2)
is there any way of reading from a txt file in C without taking into account all BOM prefixes???Catechize
@user3717551: Sure, if its just for reading and not for interpreting the content read, just ignore them. This might make sense if it's just copying what shall be done. If however the text content shall be interpreted, for example by a human reader looking at what the program read, one wouldn't get around taking care of the BOM.Nunez

© 2022 - 2024 — McMap. All rights reserved.