c - strcmp not returning 0 for equal strings
Asked Answered
I

2

9

So I've tried searching for a solution to this extensively but can only really find posts where the new line or null byte is missing from one of the strings. I'm fairly sure that's not the case here.

I am using the following function to compare a word to a file containing a list of words with one word on each line (dictionary in the function). Here is the code:

int isWord(char * word,char * dictionary){
  FILE *fp;
  fp = fopen(dictionary,"r");
  if(fp == NULL){
    printf("error: dictionary cannot be opened\n");
    return 0;
  }
  if(strlen(word)>17){
    printf("error: word cannot be >16 characters\n");
    return 0;
  }
  char longWord[18];
  strcpy(longWord,word);
  strcat(longWord,"\n");
  char readValue[50] = "a\n";
  while (fgets(readValue,50,fp) != NULL && strcmp(readValue,longWord) != 0){
    printf("r:%sw:%s%d\n",readValue,longWord,strcmp(longWord,readValue));//this line is in for debugging
  }
  if(strcmp(readValue,longWord) == 0){
    return 1;
  }
  else{
    return 0;
  }
}

The code compiles with no errors and the function reads the dictionary file fine and will print the list of words as they appear in there. The issue I am having is that even when the two strings are identical, strcmp is not returning 0 and so the function will return false for any input.

eg I get:

r:zymoscope
w:zymoscope
-3

Any ideas? I feel like I must be missing something obvious but have been unable to find anything in my searches.

Interphone answered 27/7, 2015 at 18:29 Comment(18)
I'd guess you're on a Windows machine, that you've read the file with CRLF line endings, and that you've not stripped the endings accurately. There's a difference of 3 between the value of '\r' and '\n' in many (most) codesets. It's curious you have one printf() printing all that data, but no \n in the format string. You're relying on the newlines in the data, which seems a little suspect. (Write a function to print the bytes in a string in hex; call it on each string; spot the difference.)Bellis
Your length check also assumes a single \n, you may be off by one if \r\n is usedCollation
Also, your file is never closed. Do all checks possible before opening the file. Close it when done (or on errors).Lem
@Jonathan I'm running Linux (openSUSE) however it is more than possible that the list of words I downloaded was created in windows.Interphone
Print out the length of both strings. If it's not 9, then you have extra characters that might not match. You clearly have a line ending after the r: line.Arrowworm
Suggest readValue[strcspn(readValue, "\r\n")] = 0; right after using fgets(readValue,50,fp) to eliminate line ending characters.Kinematograph
OK; in some ways, that scenario (Linux reading a file possibly — probably — created on Windows) makes more sense. Provide yourself with the string dump function I suggested and use it. You might use: static void dump_string(const char *tag, const char *string) { size_t len = strlen(string); printf("%s (%zu):", tag, len); size_t i; for (i = 0; i < len; i++) { printf(" %.2X", (unsigned char)string[i]); if (i % 16 == 15) putchar('\n'); } if (i % 16 != 0) putchar('\n'); } and call it: dump_string("r", readValue); dump_string("w", longWord); or similar.Bellis
For future 1) When printing troublesome strings, use something like printf("'%s'\n", bad_string); to help identify leading and trailing white-space, new-lines, etc. 2) Rather than thinking strcmp() is wrong, ask why the strings are not equal.Kinematograph
What did your debugger tell you was in the two strings being compared?Unfruitful
@JonathanLeffler Yes you're correct the file uses \r\n, the functions is now working correctly.Interphone
@FelixPalmen Thanks for point that out, the close line has been lost at some point during debugging. Am I understanding you correctly that I should be looking to do as little as possible while the file is open in general then? If so, why?Interphone
You need to close the file before your function returns (if it was succesful opening it), as otherwise you leak file streams — the one opened in the first call cannot be closed, and there is an upper bound on the number of file streams you can have open, even if they're all pointing at the same file.Bellis
@Interphone out of laziness (or, better, avoiding duplicate code). If the file is already opened, you need an additional fclose() in your check for the length of the word for cleanup.Lem
@Interphone adding to that, iff your function gets lengthy and there are error checks you can only perform after the file was opened, this is one of the rare cases a goto is useful and good practice, e.g. if (some error cond) { ret = -1; goto cleanup; }Lem
1) always close the open file before returning 2) this line: 'strcat(longWord,"\n");' when strlen(word) == 17, then this line: 'strcat(longWord,"\n");' is writing past the end of the longWord buffer. This results in undefined behaviour and can lead to a seg fault eventAntioch
@FelixPalmen, goto .. very bad idea. rather write a 'cleanup()' function and pass it, as parameters, what ever it needs to cleanup, like open files, malloc'd memory pointers, etc.Antioch
@Antioch -- a comment like this shows you have no C knowledge whatsoever.Lem
the 'while' loop walks through the 'dictionary' file, and will output a message for every word, until the desired word is (if ever) found. In a dictionary with thousands of words, that is thousands of lines of output that do not help the program perform its' function.Antioch
T
5

I see you are appending a newline to your test strings to try to deal with the problem of fgets() retaining the line endings. Much better to fix this at source. You can strip all trailing stuff like this, immediately after reading from file.

readValue [ strcspn(readValue, "\r\n") ] = '\0';   // remove trailing newline etc
Tisiphone answered 27/7, 2015 at 18:39 Comment(1)
I would add that this is completely safe for a '\0' terminated string. If there are no kind of line-endings, strcspn() returns the string length, thus harmlessly overwriting the '\0' which was already present.Tisiphone
T
4

The string you are reading contains trailing character(s), and hence is not the same as the string you are comparing it against.

Remove the trailing newline (and CR if that is there); then you do not need to add any newline or carriage return to the string being compared:

int isWord(char *word, char *dictionary){
  FILE *fp;
  fp = fopen(dictionary, "r");
  if (fp == NULL){
    fprintf(stderr, "error: dictionary cannot be opened\n");
    return 0;
  }
  if (strlen(word) > 16){
    fprintf(stderr, "error: word cannot be >16 characters\n");
    return 0;
  }
  char readValue[50];
  while (fgets(readValue, 50, fp) != NULL){
    char *ep = &readValue[strlen(readValue)-1];

    while (*ep == '\n' || *ep == '\r'){
      *ep-- = '\0';
    }
    if (strcmp(readValue, word) == 0){
      return 1;
    }
  }
  return 0;
}
Traweek answered 27/7, 2015 at 18:51 Comment(4)
Note: readValue[strlen(readValue)-1] can result in undefined behavior/wrong in the rare situation where the first character read is the null character. Checking that the string length is greater than 0 is one way to fix that.Kinematograph
while (*ep == '\n' || *ep == '\r'){ *ep-- = '\0'; } *ep-- = '\0'; is a problem when readValue is "\n" or "\r\n".Kinematograph
And, though he was too polite to say it, @chux's answer does avoid both those problems (though it might be argued to stop prematurely on an input line with a CR in the middle, separate from a CRLF or LF line ending). And the need to do a separate strlen() operation is a reason to prefer the POSIX getline() function over fgets(); it returns the length of the data it read, which (amongst other things) allows you to read past leading null bytes if you consider that desirable.Bellis
Yes I can see how both of those would be problematic, I originally chose this answer as it seemed more general in that it dealt with different possible line endings. @chux, now it seems like it would be best to simply use your solution with a check for each possible line ending.Interphone

© 2022 - 2024 — McMap. All rights reserved.