C fgets() how to tell if line is greater than specified size
Asked Answered
B

2

5

I am using fgets() to read lines from popen("ps -ev", "r") and I cannot find out how to know if fgets() reads a line partially or fully, and if partially how to read/throw away the excess.

When reading each line from popen(), I am reading in the first 1024 characters and getting the information I need from that, which works perfectly fine. The issue arises when the lines are greater than 1024 characters and then the next line I read is a continuation of the previous line, which is not in the format I need (that being the value of each column at the beginning of each line). If I can know if I only partially read a line (that being the line has 1024 or more characters, I want to read and throw away every 1024 characters until it reaches the end. Once at the end, I can call fgets() again and this time it will read from the beginning of the next line rather than the continuation of the previous line.

I know that fgets() reads up until it either finds a newline or until it reaches the provided limit, and then continues reading the remaining part of the line. I have tried checking that the last character is '\0' and that the second last character in the line is '\n', but that does not work. I will post that code below in case that helps.

If you run the code, you will see LINE: num S num:num.num ... (where num is a number) which is what each line should begin with. Some lines will instead look something like LINE: AAAAAAQAAABMAAAAQAAAAAAAAAAMAAAAFAAAAEAAAAAAAAAADAAAACwAAABA.... These are the lines that are excess from the previous line, and these are the ones causing the issues since they are not in the correct format.

Any and all help is highly appreciated.

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <math.h>

#define NEWLINE() printf("\n");
#define DIVIDER() printf("============================================================================\n");
#define PL(l) printf("LINE: %s\n", l);

int const MAX_PROCESSES = 20;
int const BUFFER_SIZE = 1024;

int exhaustedLine(char* line) {
    if (line[sizeof line - 1] == '\0' && line[sizeof line - 2] != '\n') {
        printf("n:%c 0:%c\n", line[sizeof line - 2], line[sizeof line - 1]);
        NEWLINE();
        return -1;
    }
    return 0;   
}

int main(int argc, char const *argv[]) {
    FILE* fp = popen("ps -ev", "r");
    char buf[BUFFER_SIZE];
    char* line = (char*)1;

    while (line) {
        DIVIDER();
        line = fgets(buf, BUFFER_SIZE, fp);
        PL(line);
        if (exhaustedLine(line) != 0) {
            printf("END OF LINE\n");
        }
    }

    return 0;
}
Butanol answered 24/4, 2019 at 6:3 Comment(7)
If you read just about any fgets documentation or reference (like e.g. this one) it will tell you that if the full line was read, the last character would be a newline.Leuko
@Someprogrammerdude Why don't you consider that an answer?Massa
In exhaustedLine , sizeof line is the size of the char pointer, so it's not what you want. If the line was not terminated by a newline, line[n - 1] will hold the null terminator and line[n - 2] will hold a non-null character other than a newline.Kinghood
@MOehm Same to you.Massa
@Yunnosch: Yes, I know. Too late now.Kinghood
@Someprogrammerdude If EOF is reached, the last line will be fully read but will not be terminated by a newline. Some care is needed if that happens and if the last line happens to be 1 byte shorter than the buffer size.Quadrangular
@Quadrangular Of course that requires that the last line isn't terminated by a newline. Such text files should be banished! ;)Leuko
R
5

You have the right idea: If a complete line was read, the buffer contains a newline. Otherwise the line is either longer than the buffer size or we are at the end of the file and the last line was unterminated.

The main problem with your implementation is char* line ... sizeof line. sizeof yields the size of the type of its operand expression, so sizeof line means sizeof (char *), which is the size of a pointer, not the size of the array line is pointing into.

Also, if a shorter line was read, then line[SIZE - 1] would access uninitialized memory.

Easiest solution:

int is_full_line(const char *line) {
    return strchr(line, '\n') != NULL;
}

Just use strchr to search the string for '\n'.

To throw away the rest of an overlong line, you have several options:

  • You could call fgets again in a loop.
  • You could call fgetc in a loop: int c; while ((c = fgetc(fp)) != EOF && c != '\n') {}
  • You could use fscanf: fscanf(fp, "%*[^\n]"); fscanf(fp, "%*1[\n]");

Regarding

int const BUFFER_SIZE = 1024;

Note that const does not declare constants in C; it declares read-only variables. char buf[BUFFER_SIZE] is considered a variable-length array because the size is not a constant.

To get a true integer constant in C, you need to use enum instead:

enum { BUFFER_SIZE = 1024 };
Russelrussell answered 24/4, 2019 at 6:15 Comment(5)
#define BUFFER_SIZE 1024 will also work, and is a more idiomatic way of defining constants. But the enum approach plays better with symbolic debuggers, since the BUFFER_SIZE symbol exists at compile time, whereas the #define approach substitutes BUFFER_SIZE with the literal number during the preprocessing phase, so the debugger will not recognize the symbol.Flabellum
@Flabellum I'd say that using a #define is more common, but I wouldn't say it's idiomatic (or at least, I wouldn't say that it's a good idiom).Quadrangular
is_full_line won't work for the last line if there's no terminating newline. (I suppose one might argue whether that's a "full line".)Quadrangular
@Quadrangular As I wrote ("... or we are at the end of the file and the last line was unterminated"), that is indeed not a full line.Russelrussell
Pedantically, the various approaches here of finding if "the buffer contains a newline" are broken. A line of 8 characters "abc\0xyz\n" will incorrectly be assessed as not containing a '\n'.Dishpan
F
2

Your problem is this bit:

line[sizeof line - 1]

line in this case is a char*, so sizeof line evaluates to the size of the pointer, not the size of the string. You need to do something like this:

size_t len = strlen(line);
if (len && '\n' == line[len - 1]) ...

You don't need to test that line[len] == '\0'; that is true for all strings. (Not for all character arrays, mind you, but any standard library function that returns a string will return a null-terminated array.)

Flabellum answered 24/4, 2019 at 6:11 Comment(4)
strlen(line) - 1 is a route into disaster.Okechuku
@Okechuku In general, yes; in this case line cannot be empty (after a successful call to fgets), but it shouldn't be used in a general exhaustedLine function.Russelrussell
@Russelrussell Agree that a line can not be empty after a successful call to fgets(), yet a successful read can begin will a null character and so strlen(line) - 1 is a route to disaster as that evaluates to SIZE_MAXDishpan
@Dishpan , alk : Quite right. fgets returns NULL if it fails to read any characters, but if the input stream actually contains a '\0', fgets will place \0\0 in the buffer and return non-NULL, and strlen(line) will be 0. I've edited the answer accordingly.Flabellum

© 2022 - 2024 — McMap. All rights reserved.