Going through a text file line by line in C
Asked Answered
K

4

72

I have been working on a small exercise for my CIS class and am very confused by the methods C uses to read from a file. All that I really need to do is read through a file line by line and use the information gathered from each line to do a few manipulations. I tried using the getline method and others with no luck. My code is currently as follows:

int main(char *argc, char* argv[]){
      const char *filename = argv[0];
      FILE *file = fopen(filename, "r");
      char *line = NULL;

      while(!feof(file)){
        sscanf(line, filename, "%s");
        printf("%s\n", line);
      }
    return 1;
}

Right now I am getting a seg fault with the sscanf method and I am not sure why. I am a total C noob and just wondering if there was some big picture thing that I was missing. Thanks

Knotgrass answered 9/2, 2012 at 5:57 Comment(3)
This code should not even compile. sscanf(line, filename, "%s"); should be sscanf(line, file, "%s");Tranquillize
Note that while (!feof(file)) is always wrong.Carbolize
Possible duplicate of C read file line by lineDieppe
F
157

So many problems in so few lines. I probably forget some:

  • argv[0] is the program name, not the first argument;
  • if you want to read in a variable, you have to allocate its memory
  • one never loops on feof, one loops on an IO function until it fails, feof then serves to determinate the reason of failure,
  • sscanf is there to parse a line, if you want to parse a file, use fscanf,
  • "%s" will stop at the first space as a format for the ?scanf family
  • to read a line, the standard function is fgets,
  • returning 1 from main means failure

So

#include <stdio.h>

int main(int argc, char* argv[])
{
    char const* const fileName = argv[1]; /* should check that argc > 1 */
    FILE* file = fopen(fileName, "r"); /* should check the result */
    char line[256];

    while (fgets(line, sizeof(line), file)) {
        /* note that fgets don't strip the terminating \n, checking its
           presence would allow to handle lines longer that sizeof(line) */
        printf("%s", line); 
    }
    /* may check feof here to make a difference between eof and io failure -- network
       timeout for instance */

    fclose(file);

    return 0;
}
Fullmouthed answered 9/2, 2012 at 6:23 Comment(7)
don't forget fclose(file) before return.Murguia
the fclose(file) is actually not necessary, since it's happening in main and it'll automatically close all opened file buffers.Electrodynamics
@Electrodynamics it's always better to be safe, than sorry!Robrobaina
Still good to have for beginners, because sometimes it is necessary even at the end of main. FILE* objects are buffered in C, so if data was being written to a file and fclose wasn't called some of the data might not get flushed.Papoose
Hi, @alecRN: Are you sure? AFAIK, buffered output on a stream is flushed automatically when the program terminates by calling exit (see: gnu.org/software/libc/manual/html_node/Flushing-Buffers.html ), and the OS will decides when to flush(can call fsync). There's an implicit call to exit_group at the end of the execution, you can see it using strace and nm. I suppose it's not added by gcc because there's no such symbol, probably it's added by the runtime. Even _exit closes the open file descriptors. Anyway I agree with you that it's a good habit to close open file explicitly /ÁngelGoodwill
Since when and why "not loop on feof()"? I never saw that "best practice" before...Internalize
@BitTickler, I was confused about feof() as well. Until I accidentially spelled a file name wrong and the program ended up in an infinite loop.Windmill
S
9

To read a line from a file, you should use the fgets function: It reads a string from the specified file up to either a newline character or EOF.

The use of sscanf in your code would not work at all, as you use filename as your format string for reading from line into a constant string literal %s.

The reason for SEGV is that you write into the non-allocated memory pointed to by line.

Select answered 9/2, 2012 at 6:6 Comment(0)
G
6

In addition to the other answers, on a recent C library (Posix 2008 compliant), you could use getline. See this answer (to a related question).

Goofball answered 9/2, 2012 at 6:12 Comment(0)
G
6

Say you're dealing with some other delimiter, such as a \t tab, instead of a \n newline.

A more general approach to delimiters is the use of getc(), which grabs one character at a time.

Note that getc() returns an int, so that we can test for equality with EOF.

Secondly, we define an array line[BUFFER_MAX_LENGTH] of type char, in order to store up to BUFFER_MAX_LENGTH-1 characters on the stack (we have to save that last character for a \0 terminator character).

Use of an array avoids the need to use malloc and free to create a character pointer of the right length on the heap.

#define BUFFER_MAX_LENGTH 1024

int main(int argc, char* argv[])
{
    FILE *file = NULL;
    char line[BUFFER_MAX_LENGTH];
    int tempChar;
    unsigned int tempCharIdx = 0U;

    if (argc == 2)
         file = fopen(argv[1], "r");
    else {
         fprintf(stderr, "error: wrong number of arguments\n"
                         "usage: %s textfile\n", argv[0]);
         return EXIT_FAILURE;
    }

    if (!file) {
         fprintf(stderr, "error: could not open textfile: %s\n", argv[1]);
         return EXIT_FAILURE;
    }

    /* get a character from the file pointer */
    while(tempChar = fgetc(file))
    {
        /* avoid buffer overflow error */
        if (tempCharIdx == BUFFER_MAX_LENGTH) {
            fprintf(stderr, "error: line is too long. increase BUFFER_MAX_LENGTH.\n");
            return EXIT_FAILURE;
        }

        /* test character value */
        if (tempChar == EOF) {
            line[tempCharIdx] = '\0';
            fprintf(stdout, "%s\n", line);
            break;
        }
        else if (tempChar == '\n') {
            line[tempCharIdx] = '\0';
            tempCharIdx = 0U;
            fprintf(stdout, "%s\n", line);
            continue;
        }
        else
            line[tempCharIdx++] = (char)tempChar;
    }

    return EXIT_SUCCESS;
}

If you must use a char *, then you can still use this code, but you strdup() the line[] array, once it is filled up with a line's worth of input. You must free this duplicated string once you're done with it, or you'll get a memory leak:

#define BUFFER_MAX_LENGTH 1024

int main(int argc, char* argv[])
{
    FILE *file = NULL;
    char line[BUFFER_MAX_LENGTH];
    int tempChar;
    unsigned int tempCharIdx = 0U;
    char *dynamicLine = NULL;

    if (argc == 2)
         file = fopen(argv[1], "r");
    else {
         fprintf(stderr, "error: wrong number of arguments\n"
                         "usage: %s textfile\n", argv[0]);
         return EXIT_FAILURE;
    }

    if (!file) {
         fprintf(stderr, "error: could not open textfile: %s\n", argv[1]);
         return EXIT_FAILURE;
    }

    while(tempChar = fgetc(file))
    {
        /* avoid buffer overflow error */
        if (tempCharIdx == BUFFER_MAX_LENGTH) {
            fprintf(stderr, "error: line is too long. increase BUFFER_MAX_LENGTH.\n");
            return EXIT_FAILURE;
        }

        /* test character value */
        if (tempChar == EOF) {
            line[tempCharIdx] = '\0';
            dynamicLine = strdup(line);
            fprintf(stdout, "%s\n", dynamicLine);
            free(dynamicLine);
            dynamicLine = NULL;
            break;
        }
        else if (tempChar == '\n') {
            line[tempCharIdx] = '\0';
            tempCharIdx = 0U;
            dynamicLine = strdup(line);
            fprintf(stdout, "%s\n", dynamicLine);
            free(dynamicLine);
            dynamicLine = NULL;
            continue;
        }
        else
            line[tempCharIdx++] = (char)tempChar;
    }

    return EXIT_SUCCESS;
}
Gentlemanly answered 9/2, 2012 at 6:38 Comment(4)
I'll down-vote any while(!feof(file)) even the case occurring once it a blue moon where it isn't damageable (Note that here it will probably never been true, there is a break to leave the loop in that case, while (true) would work as well.) There are too many who think it is the correct idiom.Fullmouthed
I had no idea that was an issue. I'd honestly like to learn more about this. What are the problems with that usage?Gentlemanly
There are at lot of questions where this came up, #5432441 for instance.Fullmouthed
Okay, I fixed the loops. Thanks for the pointer. I learn something new every day.Gentlemanly

© 2022 - 2024 — McMap. All rights reserved.