Reading the file with comma delimiter using fgets() and strtok()
Asked Answered
S

3

1

I have a text file with three fields separated by comma. Example of the content of my text file: 12345, true programming newbie, BS ME To load the file into the program, i used the below code.... my problem is that sometimes the code works and sometimes it doesn't (no error message appears the program just closes itself and doesn't continue). i also observed the text file is blank (nothing is written) it automatically closes itself and doesn't continue. Your help would be highly appreciated. Thanks!

int read(){
     FILE *stream = NULL;
     int ctr;
     char linebuffer[45];
     char delims[]=", ";
     char *number[3];
     char *token = NULL;
     stream = fopen("student.txt", "rt");
     if (stream == NULL) stream = fopen("student.txt", "wt");
     else {
          printf("\nReading the student list directory. Wait a moment please...");
          while(!feof(stream)){            
                ctr=0;
                fgets(linebuffer, 46, stream);
                token = strtok(linebuffer, delims);
                while(token != NULL){
                  number[ctr] = linebuffer;
                  token = strtok(NULL, delims); 
                  ctr++;
                }
          recordCtr++;                      
          }                     
     recordCtr--;
     }
     fclose(stream);
}    
Shumway answered 2/12, 2010 at 13:44 Comment(1)
Why do you use so many dots in your question? It looks horrible.Epicanthus
E
2

You never copy the token once you've found it. You can't copy linebuffer, as the data in there will be overwritten as the next line is loaded.

This line:

number[ctr] = linebuffer;

should reference token to save the most recently found token, but it doesn't. It should probably read something like1:

strcpy(number[ctr], token);

but then you'd have to change the declaration to make sure there's space:

char number[3][32];

Obviously, this introduces a buffer overrun risk, if there's a very long token it won't fit. How to best handle that is left as an exercise. :)

1 Why the temporary vector is called "number" when it is used to store two numbers and one string (the name) is beyond me.

Epicanthus answered 2/12, 2010 at 13:48 Comment(6)
@Epicanthus sorry. I don't understand. Can you give me more explanation please. ThanksShumway
He means that you get the "token" returned by "strtok()" but then throw it away. Instead you assign "linebuffer" to the field.Longoria
@Epicanthus Nice answer, but I think I found more bugs and potential bugs than you ;) Am I too competitive?Longoria
@alastairG and @unwind. I think both of you are very good programmers. I wish I could be like you in the future... So i could also answer questions on this website. :)Shumway
@Alastair: no, I guess you're more thorough. I can't un-accept, though. :|Epicanthus
@Shumway There's nothing wrong with being a newbie. We all start somewhere. There have been times I have looked at code I wrote only a year or so before and shuddered. I still make mistakes even today, quite silly sometimes.Longoria
L
1

Your fgets() call needs to specify 45 as the size or you oveflow the buffer when fgets writes the NULL terminator. That will set the "delims" string to be an empty string.

Also you are not returning any value even though the function declaration states it returns an int.

I don't know what the definition of your "struct student" is, but you may be overflowing buffers when using strcpy(). Also you decrement "recordCtr". Why? Why do you open the file for writing if you can't open it for writing? Why? If that fails too, you call fclose on a NULL pointer. I doubt that helps much.

I just noticed that you do not initialise "number". If you don't get three numbers on the first line, you then strcpy() from an uninitialised pointer. It probably has a NULL value so the program will segfault.

Also you have an array of size 3, but if the line you read has more than 3 comma separated fields you will overflow the array.

There are probably many other errors as well.

A lot of programmers just can't be bothered to do all the good coding practices like checking return values, initialising variables, and so on. They often end up with code like this. If you want to be a really good programmer, try and get in the habit of doing all these things, or at least always thinking about whether you need to or not.

There are so many potential bugs in this code. What happens if a line is more than 45 characters long? You don't strip newlines. You don't convert the strings to numbers (although number[1] appears to be a string data so why store it in an array called "numbers"?) or check that fgets actually returned any data or check how many pieces of data you get.

Longoria answered 2/12, 2010 at 13:55 Comment(5)
i made it 46 long because i allow only 5 char (for the ID), 30 char (for the student name) and 5 char (for the course). Then create an extra of 5. Is my logic wrong? ThanksShumway
my function declaration is int because when i changed it to void.. there's a compiler errorShumway
Possibly because read() is a system function. Try renaming it to something different. At work we have a coding standard that all public functions MUST begin with the name of the module/component they are part of. E.g. you could call it StudentDB_Read().Longoria
For the length of the read, it should be the same as the buffer size. The man page for fgets() states that the function reads one fewer characters than the size. That one less is to leave space for the NULL terminator. Other functions are less helpful and you need to enter a size value one less than the buffer size. By specifying 46 for a buffer that is 45 characters big, fgets will read up to 45 characters of text from the line and then add a null terminator which will overwrite the memory after the array.Longoria
That memory coincides with "delims" thus making it a zero length string in effect. I believe that that means that strtok() will then return each individual string as a token, but the behaviour is undefined. I just read the man page for strtok(), and realised that strtok() will split the string on commas and spaces in your program, so that the string "1, John Smith, Computer Science" will be parsed as "1", "John", "Smith", "Computer", "Science" for a total of 5 tokens in all. This will overflow your "number" array.Longoria
B
1

Let the buyer beware.

strtok may have some edge cases to worry about.

"one,two,three" will yield 3 tokens.

"one,,three" will yield 2 tokens.

Badmouth answered 2/12, 2010 at 14:16 Comment(1)
It means what it says. Do you understand what a "token" is? If not, you should not be using strtok; what did you think the 'tok' in 'strtok' is short for?Highwrought

© 2022 - 2024 — McMap. All rights reserved.