How can I remove 'garbage input' from a C string?
Asked Answered
R

4

6

I am attempting to write a function that will remove all characters from an array, except for '+', '-', '*', '/', and numbers. This is the code I came up with:

void eliminateJunk(char string[MAX]){
    int i,j;
    char stringOut[MAX];
    int length = strlen(string) - 1;

    for(i=0; i <= length; i++){
        if(string[i] != '+'
        && string[i] != '-'
        && string[i] != '*'
        && string[i] != '/'
        && !(isdigit(string[i]))){
            for(j=i; j < length; j++){
                string[j] = string[j+1];
            }
        }
    }
}

However, the function does not always remove all garbage characters from the c string - it gets most of them, but occasionally leaves some behind.

Example input:

123 123

Example output of array, after it has been modified:

123123

However, in some inputs, it leaves spaces...

Example input:

123   123

Example output:

123 123

What can I do to fix this? I feel like the solution is right under my nose, but I can't seem to find it.

Rosinweed answered 8/4, 2015 at 10:0 Comment(0)
A
10

This is the classic problem with removing as you go: after the for loop you need to decrement i, otherwise you skip the next character.

However, the nested for loop is unnecessary: you can do the entire thing with a single loop by maintaining separate read and write indexes. When you see a valid character, move it from the read location to the write location, and increment both indexes. When you see an invalid character, increment the read index without moving the character. At the end of the loop terminate the string - and you are done!

int r, w; // r for "read", w for "write"
for(r=0, w=0; r != length ; r++) {
    // Your condition is inverted
    if(string[r] == '+'
    || string[r] == '-'
    || string[r] == '*'
    || string[r] == '/'
    || (isdigit(string[r]))) {
        // No loop inside
        string[w++] = string[r];
    }
}
// Add null terminator at the end
string[w] = '\0';
According answered 8/4, 2015 at 10:6 Comment(4)
Instead of using a for loop, I would rather use a while (string[r]). This would avoid to call strlen.Unstained
@Maxime That is definitely true. Moreover, I would use pointers instead of indexes. However, I wanted to keep as much of the original code structure as possible to make the explanation easier to understand.According
I fully understand, my comment only intends to complete your answer.Unstained
Another way of writing the test would be if ( strchr("+-*/0123456789", string[r]) )Slow
A
3

The problem is that you do not decrement the value of i after you remove a character.

Let's explain that better. If you find a character that is not one of the characters you want and you remove it, then all the other characters will move one index back. And then , you increment i, due to which you will not check if the first character you moved back in that iteration is correct, thereby, that character is skipped.

Lets look at your string. After your input, it becomes

1 2 3       1 2 3
^ ^ ^ ^ ^ ^ ^ ^ ^
0 1 2 3 4 5 6 7 8       // these are the indeces

Now, let us skip a few iterations and go to the part where the space at index 3 is removed. After removing it, your string will look like

1 2 3     1 2 3 3
^ ^ ^ ^ ^ ^ ^ ^ ^
0 1 2 3 4 5 6 7 8

But then , you go on to the next index of i, that is to i = 4 and the space at i = 3 is left as it is. And after this iteration, it becomes

1 2 3   1 2 3 3 3
^ ^ ^ ^ ^ ^ ^ ^ ^
0 1 2 3 4 5 6 7 8

As you can see, the space is left there. That is what is causing the problem.

So, you need to decrement i and length so that all the characters are checked.

You should do

void eliminateJunk(char string[MAX])
{
    int i,j;
    char stringOut[MAX];
    int length = strlen(string) - 1;

    for(i=0; i <= length; i++)
    {
        if( string[i] != '+' 
            && string[i] != '-' 
            && string[i] != '*' 
            && string[i] != '/' 
            && !( isdigit(string[i]) ) )
        {
               for(j=i; j < length; j++)
               {
                   string[j] = string[j+1];
               }
               i--;
               length--;
        }
    }
    string[i]='\0';
    printf("%s",string);
}

I also added

    string[i]='\0';

So that you can end the array at the right length.

Amelina answered 8/4, 2015 at 10:11 Comment(5)
That also, the format of the if statement in the question is much more readable, however even bigger problem is the indentation of curly braces. No need to add a new indent for curly braces (i.e. for if (x) {, curly brace should be under i.Basin
@Basin , well, that's how I have been taught ( but I agree with the if part ). But, since everyone intends just like in your suggestion, I will edit . Thanks for leaving a comment and notifying.Amelina
I don't understand @Basin comment. There are some weird spaces but the && must be aligned with the first character after the ( as it is done here. My advise is to use indent -kr file.c.Unstained
@Maxime , what is indent -kr file.c? I tried Googling, but no good resultAmelina
indent is a command you can find in almost all Linux. This command properly indents your code.Unstained
K
2

You when you eliminate a character from the array you reduce the length so you need to reduce the variable length by one, as well as the index by one as well.

Also remember that strings in c are null terminated so that you will also need to copy the trailing null character as well otherwise you'll skip the next character.

If you make these changes your function will look like:

void eliminateJunk(char string[MAX]){
    int i,j;
    char stringOut[MAX];
    int length = strlen(string) - 1;

    for(i=0; i <= length; i++){
        if(string[i] != '+'
        && string[i] != '-'
        && string[i] != '*'
        && string[i] != '/'
        && !(isdigit(string[i]))){
            for(j=i; j < length + 1; j++){
                string[j] = string[j+1];
            }
            length--;
            i--;
        }
    }
}
Kimbro answered 8/4, 2015 at 10:5 Comment(0)
S
0
there are a few problems with the code.  
1) the string actually gets shorter when ever a character is removed. 
2) when a character is removed, 
   i needs to point to the new current character, 
   not the next character.  The 'for' statement will increment 'i'
   the code needs to decrement 'i'.
3) the new string array is unused.
   suggest either copying the characters to keep to 
   successive positions in the new string array or 
   remove the new string array from the code. 
   suggest compiling with all warnings enabled
   so the compiler can tell you about problems in the code

   As it is, that unused new string array is 
   causing the compiler to raise a warning.
   for several reasons, the warning about the unused variable
   needs to be fixed.

 4) suggest having the for loop check for current char != '\0'
    so no need to call strlen(),
    so no need to check for length,
    so no need to continually adjust the length
Suchta answered 9/4, 2015 at 0:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.