C++ - strcmp() does not work correctly?
Asked Answered
N

12

5

There's something really weird going on: strcmp() returns -1 though both strings are exactly the same. Here is a snippet from the output of the debugger (gdb):

(gdb) print s[i][0] == grammar->symbols_from_int[107][0]
$36 = true
(gdb) print s[i][1] == grammar->symbols_from_int[107][1]
$37 = true
(gdb) print s[i][2] == grammar->symbols_from_int[107][2]
$38 = true
(gdb) print s[i][3] == grammar->symbols_from_int[107][3]
$39 = true
(gdb) print s[i][4] == grammar->symbols_from_int[107][4]
$40 = true
(gdb) print s[i][5] == grammar->symbols_from_int[107][5]
$41 = false
(gdb) print grammar->symbols_from_int[107][4]
$42 = 0 '\0'
(gdb) print s[i]
$43 = (char * const&) @0x202dc50: 0x202d730 "Does"
(gdb) print grammar->symbols_from_int[107]
$44 = (char * const&) @0x1c9fb08: 0x1c9a062 "Does"
(gdb) print strcmp(s[i],grammar->symbols_from_int[107])
$45 = -1

Any idea what's going on?

Thanks in advance,

Onur

Edit 1: Here are some snippets of my code:

# include <unordered_map>       // Used as hash table
# include <stdlib.h>
# include <string.h>
# include <stdio.h>
# include <vector>

using namespace std;
using std::unordered_map;
using std::hash;

struct eqstr
{
  bool operator()(const char* s1, const char* s2) const
  {
    return strcmp(s1, s2) == 0;
  }
};

...
<some other code>
...

class BPCFG {

  public:

        char *symbols;  // Character array holding all grammar symbols, with NULL seperating them.
        char *rules;    // Character array holding all rules, with NULL seperating them.

        unordered_map<char *, int , hash<char *> , eqstr> int_from_symbols; // Hash table holding the grammar symbols and their integer indices as key/value pairs.
...
<some other code>
...

vector<char *> symbols_from_int;        // Hash table holding the integer indices and their corresponding grammar symbols as key/value pairs.
void load_symbols_from_file(const char *symbols_file);
}

void BPCFG::load_symbols_from_file(const char *symbols_file) {
        char buffer[200];
        FILE *input = fopen(symbols_file, "r");
        int symbol_index = 0;
        while(fscanf(input, "%s", buffer) > 0) {
                if(buffer[0] == '/')
                        strcpy(symbols + symbol_index, buffer+1);
                else
                        strcpy(symbols + symbol_index, buffer);
                symbols_from_int.push_back(symbols + symbol_index);
                int_from_symbols[symbols+symbol_index] = symbols_from_int.size()-1;
                probs.push_back(vector<double>());
                hyperprobs.push_back(vector<double>());
                rules_from_IntPair.push_back(vector<char *>());
                symbol_index += strlen(symbols+symbol_index) + 1;
        }


        fclose(input);
}

This last function (BPCFG::load_symbols_from_file) seems to be the only function I modify symbols_from_int in my whole code. Please tell me if you need some more code. I'm not putting everything because it's hundreds of lines.

Edit 2: OK, I think I should add one more thing from my code. This is the constructor of BPCFG class:

BPCFG(int symbols_length, int rules_length, int symbol_count, int rule_count):
   int_from_symbols(1.5*symbol_count),
   IntPair_from_rules(1.5*rule_count),
   symbol_after_dot(10*rule_count)
{
    symbols = (char *)malloc(symbols_length*sizeof(char));
    rules = (char *)malloc(rules_length*sizeof(char));
}

Edit 3: Here is the code on the path to the point of error. It's not compilable, but it shows where the code stepped through (I checked with next and step commands in the debugger that the code indeed follows this route):

BPCFG my_grammar(2000, 5500, 194, 187);
my_grammar.load_symbols_from_file("random_50_1_words_symbols.txt");
<some irrelevant code>
my_grammar.load_rules_from_file("random_50_1_words_grammar.txt", true);
<some irrelevant code>
my_grammar.load_symbols_after_dots();

BPCFGParser my_parser(&my_grammar);
BPCFGParser::Sentence s;

// (Sentence is defined in the BPCFGParser class with
// typedef vector<char *> Sentence;)

Edge e;
try {
        my_parser.parse(s, e);
}
catch(char *e) {fprintf(stderr, "%s", e);}

void BPCFGParser::parse(const Sentence & s, Edge & goal_edge) {

        /* Initializing the chart */

        chart::active_sets.clear();
        chart::passive_sets.clear();
        chart::active_sets.resize(s.size());
        chart::passive_sets.resize(s.size());

        // initialize(sentence, goal);

        try {
                initialize(s, goal_edge);
        }
        catch (char *e) {
                if(strcmp(e, UNKNOWN_WORD) == 0)
                        throw e;
        }
<Does something more, but the execution does not come to this point>
}

void BPCFGParser::initialize(const Sentence & s, Edge & goal_edge) {
        // create a new chart and new agendas
        /* For now, we plan to do this during constructing the BPCFGParser object */

        // for each word w:[start,end] in the sentence
        //   discoverEdge(w:[start,end])

        Edge temp_edge;

        for(int i = 0;i < s.size();i++) {
                temp_edge.span.start = i;
                temp_edge.span.end = i+1;
                temp_edge.isActive = false;
                /* Checking whether the given word is ever seen in the training corpus */
                unordered_map<char *, int , hash<char *> , eqstr>::const_iterator it = grammar->int_from_symbols.find(s[i]);
                if(it == grammar->int_from_symbols.end())
                        throw UNKNOWN_WORD;
                <Does something more, but execution does not come to this point>
        }
}

Where I run the print commands in the debugger is the last

throw UNKNOWN_WORD;

command. I mean, I was stepping with next on GDB and after seeing this line, I ran all these print commands.

Thank you for your interest,
Onur


OK, I think I should add one more thing from my code. This is the constructor of BPCFG class:

BPCFG(int symbols_length, int rules_length, int symbol_count, int rule_count):
   int_from_symbols(1.5*symbol_count),
   IntPair_from_rules(1.5*rule_count),
   symbol_after_dot(10*rule_count)
{
    symbols = (char *)malloc(symbols_length*sizeof(char));
    rules = (char *)malloc(rules_length*sizeof(char));
}
Nerland answered 24/12, 2009 at 20:0 Comment(10)
please print numerical values of both strings. looks strange to me.Araucaria
my guess is that both strings are in fact not the same. I always solve problems in programming by realizing that it is always something I did not the system.Seine
Can you share a minimal compilable example that exhibits the problem?Cashbook
Post the code, not the gdb output.Marginalia
You call to strcmp might be using the wrong pointers, post the code.Electrobiology
Interesting problem. Post more info so we can see what is going onTrypsin
@kk6yb: it doesn't matter.. He's using the right pointers in GDB and it still fails (he should post the code anyway though)Tricia
@Pavel Radzivilovsky: What do you mean with "numerical values of strings"? I couldn't get it? About the code: It's rather big, but I will post some snippets that I think to be relevant.Nerland
Please update the original question rather than posting 'answers', stackoverflow doesn't work like a forum and it becomes really rather hard to follow. That said, since you seem to be running under linux (or a linux like platform) try running it using valgrind, as it isolates many problems related to memory corruption.Tortosa
Don't forget to mark an answer as "accepted" if one of the answers below solves your problem.Tunnel
J
10

This sounds like s is a pointer to an array that was on the stack which is overwritten as soon as a new function is called, ie strcmp()

What does the debugger say they are after the strcmp() call?

Judi answered 24/12, 2009 at 20:30 Comment(4)
+1. Yep. A million times more likely to be data corruption than a bug in strcmp! The strings are clearly the same when you examine them with the debugger, but that doesn't mean the memory locations still have those values in them by the time strcmp executes. (@Onur: Posting the actual code that is failing would make everything much clearer).Satiny
s is of type vector<char *> and its reference is passed on through function calls with const keyword (so compiler checked whether it is being tried to be modified) and according to the debugger, it does not change. When I type "print s" before and after the call to strcmp(), I get exactly the same output.Nerland
The best move is to post the code - if it isn't doing what you think it should do, then describing what you think it should do won't help us diagnose it.Satiny
it's not an answer because custom strcmp realization works at same point of code. There are many clarifications on situation in responses and comments, that not placed in question, look for comment here: #1960072Orosco
P
3

In recent Linux distributions strcmp is a symbol of type STT_GNU_IFUNC. This is not supported in the last release of GDB (7.2 at the time of writing). That may be the cause of your problem, although in your case the return value looks genuine.

Purvis answered 22/3, 2011 at 15:25 Comment(0)
T
1

Given that GDB output the only possible cause I can see is that strcmp() is bugged.

You basically did in GDB what strcmp does: compare character per character, until both are zeros (at 4).

Can you try print strcmp("Does", "Does"); ?


EDIT: also try:

print stricmp(s[i], grammar->symbols_from_int[107], 1);
print stricmp(s[i], grammar->symbols_from_int[107], 2);
print stricmp(s[i], grammar->symbols_from_int[107], 3);
print stricmp(s[i], grammar->symbols_from_int[107], 4);
print stricmp(s[i], grammar->symbols_from_int[107], 5);
Tricia answered 24/12, 2009 at 20:5 Comment(6)
strcmp has been out in the wild for a long time. It's far more likely that the questioner has done something that's not matching his/her expectations.Culminant
A bug in a very commonly used library, especially doing something this trivial, is pretty damn unlikely. Unless he's getting strcmp from somewhere other than the standard library.Deroo
@duffymo: what I was thinking is that he may be using a strcmp() he wrote himself, or anyway not the one in the standard library. If you have an alternative explanation you're welcome to offer one, but given the GDB output it appears strcmp() is not behaving like it should.Tricia
I'm also considering stuff like stack smashing and similar, but I don't see a scenario where they could result in thisTricia
print strcmp("Does", "Does") returns 0. I included stdlib.h and string.h and I compile with the option -std=gnu++0x. My only expectation from strcmp() is that it returns 0 whenever the given strings are equal and returns something else otherwise.Nerland
To clarify, I did not write my strcmp(). I'm using the one in string.hNerland
S
1

The only way you'll ever figure this out is to step INTO strcmp with the debugger.

Sandfly answered 24/12, 2009 at 21:15 Comment(0)
R
1

I would strongly suggest you zero out the memory before you start using it. I realize that the GDB output makes no sense because you do verify it's a null terminated strings, but I've had a lot of string.h bizarre problems go away with a memset, bzero, calloc or whatever you want to use.

Specifically, zero out the memory in the constructor and the buffer you use when reading from the file.

Rorqual answered 24/12, 2009 at 21:45 Comment(0)
D
1

As others have pointed out, it's near to impossible that there would be a problem with strcmp.

It would be best to strip down the problematic code to the absolute minimum required to reproduce the problem (also include instructions on how to compile - which compiler, flags, runtime library are you using?). Most likely you will find a bug in the process.

If not, you will receive a lot of credit for finding a bug in one of the most intensively used C functions ;-)

Dynatron answered 24/12, 2009 at 22:40 Comment(1)
As I wrote in one of my self-replies: The string comparison function I wrote works correctly (finds that the strings are actually the same) at the very same point where strcmp() fails (credit it or not :) ). I compile with: g++ -std=gnu++0x -o experiment experiments.cpp -g (the last one for debugger) I'm working on Linux Ubuntu 9.04, g++ 4.3.3., what other kind of information would you like? I already stripped down the portions of code which I thought to be relevant.Nerland
T
0

Does strlen(s[i]) and strlen(grammar->symbols_from_int[107]) return the same?

Also, I can't imagine this is the problem but could you use a constant value rather than i just to make sure that something weird is not going?

Trypsin answered 24/12, 2009 at 20:9 Comment(1)
The value of i is 0 at the time I check strcmp(). "print s[i]" and "print s[0]" returns exactly the same output by the debugger.Nerland
G
0

May be size of the two string are not same.

Giselagiselbert answered 24/12, 2009 at 20:10 Comment(1)
No, he's shown that both strings are null terminated at element 4.Flagler
V
0

I would strongly recommend trying to solve the problem with regular STL strings first - you'll get cleaner code and automated memory management so you can concentrate on the parser logic. Only after the thing is working and profiling proves that string manipulation is the performance bottleneck I would look at all the algorithms used in greater detail, then at specialized string allocators, and - as the last resort - at manual character array manipulation, in that order.

Vegetarian answered 24/12, 2009 at 21:27 Comment(2)
-1, There's always a guy plugging his favorite library/coding style while ignoring the question at hand. Wish I could give more than -1..Scarletscarlett
@Blindy: the STL is not merely a favorite library. Nikolai is advocating an approach that obviates the problem with strcmp, which is an answer to the problem. Sometime the best solution is to take a different route.Angeliaangelic
N
0

Thanks to all spared time to answer. I wrote my own string comparison function and it worked correctly at the same point, so obviously it's a problem with strcmp(). Nevertheless, my code still does not work as I expect. But I'll request help for that only after I analyze it thoroughly. Thanks.

Nerland answered 24/12, 2009 at 22:15 Comment(0)
O
0

Mark you own strcmp realization as inline and look what happens ...

From GCC 4.3 Release Series Changes, New Features, and Fixes for GCC 4.3.4 :

"During feedback directed optimizations, the expected block size the memcpy, memset and bzero functions operate on is discovered and for cases of commonly used small sizes, specialized inline code is generated."

May be some other related bugs exists ...
Try to switch compiler optimization or/and function inlining off.

Orosco answered 25/12, 2009 at 8:10 Comment(0)
P
-1

Maybe there is a non-printable character in one of the strings?

Perjury answered 24/12, 2009 at 20:13 Comment(2)
No, please examine the GDB output carefully.Tricia
Yes I see the gdb output. But that doesn't mean that the failure isn't a result of corrupted data.Perjury

© 2022 - 2024 — McMap. All rights reserved.