Segmentation fault when using regexec/strtok_r in C
Asked Answered
A

3

6

I'm having problems in figuring out where and why I'm receiving a segmentation fault.

I'm writing a C code that prompts the user to input a regular expression and compile it and then enter a string with multiple sentences:

int main(void){

  char RegExp[50];
  regex_t CompiledRegExp;
  char *para;
  char delim[] = ".!?,";
  char *sentence;
  char *ptr1;

  printf("Enter regular expression: ");
  fgets(RegExp, 50, stdin);

if (regcomp(&CompiledRegExp,RegExp,REG_EXTENDED|REG_NOSUB) != 0) {                        

    printf("ERROR: Something wrong in the regular expression\n");                         

    exit(EXIT_FAILURE);                                                                   

  }

  printf("\nEnter string: ");

strtok_r is used to split the string with either of the following delimiters .,?! and then the resulting token (sentence) is used as the string parameter in the regexec function that searches it to see if the regular expression previously compiled is contained within the token:

if( fgets(para, 1000, stdin)){

    char *ptr = para;
    sentence = strtok_r(ptr, delim, &ptr1);

    while(sentence != NULL){

      printf("\n%s", sentence);

      if (regexec(&CompiledRegExp,sentence,(size_t)0,NULL,0) == 0) {
        printf("\nYes");
      } else {
        printf("\nNo");
      }
      ptr = ptr1;
      sentence = strtok_r(ptr, delim, &ptr1);

    }
  }
regfree(&CompiledRegExp);
}

It's probably a silly mistake I'm making but any help in locating the reasons of the segfaul would be greatly appreciated!

EDIT: Moved regfree to a more suitable location. However, segfault still occurring. I'm pretty sure It has something got to do with either how the regular expression is being read in or how it is being compared in regexec. Clueless, though.

Anthemion answered 22/4, 2016 at 22:24 Comment(7)
What about the debugger?Montanez
Compile the program for debugging and run the program under a debugger. The debugger will tell you exactly what happened.Strunk
The gdb debugger doesn't give me any specifics - just claims that a segfault was foundAnthemion
Afraid to say you are using the debugger wrong. When GDB halts, bt will list the stack trace leading up to the the halt and print nameOfVariable will print out the current state of nameOfVariable. when stopped for a segfault you can then look at what lead up to it and start reading the variables to see which may have contributed to badness.Dewhurst
Your regex isn't working because you didn't cut the newline from fgets off it.Portable
@Portable Adding len = strlen(para); para[len-1] = '\0'; still causes regex to failAnthemion
@Portable Forgot I had two fgets. Eliminating the newline character from both made it work. Thank you!Anthemion
A
3

Instead of this:

char *para;
fgets(para, 1000, stdin);

Write this:

char para[1000];
fgets(para, 1000, stdin);

In the first variant, para is a pointer that points somewhere in memory, and to this somewhere the user-entered string is written. Most probably, para points to some address that is invalid, crashing your program immediately.

Arela answered 22/4, 2016 at 22:46 Comment(1)
Fixed segmentation fault - thank you! Now my regular expressions aren't being correctly analysed. Back to the drawing board.Anthemion
P
2

You called regfree inside the loop. The second time around the loop you call regexec on freed memory with undefined behavior.

Portable answered 22/4, 2016 at 22:28 Comment(0)
M
0

You are using strtok_r() incorrectly.

To parse a string with strtok_r(), in the first call the first argument is a pointer to the string you want parsed. Subsequent calls to strtok_r() to parse the same same string should have NULL passed as the first argument. What you're doing:

ptr = ptr1;  
sentence = strtok_r(ptr, delim, &ptr1); 

makes no sense.

Michaeu answered 22/4, 2016 at 22:54 Comment(2)
My understanding was that the pointer within strtok_r was pointing to the split string after the delimiter was found and so it could recursively cut through the string. It works for me.Anthemion
Makes sense to me. I use strtok_r like that a lot.Portable

© 2022 - 2024 — McMap. All rights reserved.