How to check for repeated characters within a string in c
Asked Answered
V

13

5

I'm trying to create a program that checks for repeated characters within the command line argument's string. The string is suppose to contain only 26 characters, and all characters have to be alphabetical. However, there cannot be any repeated characters within the string, each alphabetical character must appear only once. I figured out the first two sections of the program but I cant figure out how to check for repeated characters. I could really use some help as well as explanations.

#include <stdio.h>
#include <cs50.h>
#include <string.h>
#include <ctype.h>

int main (int argc, string argv[])
{
    if (argc != 2)
    {
        printf("Usage: ./substitution key\n");
        return 1;
    }
    else
    {
        int len = strlen(argv[1]);
        if (len != 26)
        {
            printf("Key must contain 26 characters.\n");
            return 1;
        }
        else
        {
            for (int i = 0; i < len; i++)
            {
                if (!isalpha(argv[1][i]))
                {
                    printf("Usage: ./substitution key\n");
                    return 1;
                }
            }
        }
    }
}
Vaseline answered 3/7, 2020 at 23:13 Comment(6)
You need 26 counters, one for each letter of the alphabet, in order to count how often each letter occured. After performing this counting, you must verify that all 26 counters have the value 1 (which means that every letter occured exactly once). In order to allocate these 26 counters, you can either declare 26 individual local counter variables or one single array of 26 elements.Crystallo
Keep a Map and initialize it with zero. For every character check if it is previously found or not. If not found increase the value to 1, otherwise this char is repeated.Rabbitry
@AsifMujtaba: If you are referring to std::map, please note that this question is tagged C, not C++. Of course, it is possible to implement a map in C too, but that seems like an excessive amount of work for this simple problem.Crystallo
It can be done by an array actually. My logic is to use hashMap. Thanks for the correction.Rabbitry
@Patrick Dankyi Are upper case and lower case letters as for example 'a' and 'A' different letters?Voigt
@VladfromMoscow no they would be the sameVaseline
T
6

Here are the basics of a solution:

When starting, initialize a table of flags: char Seen[UCHAR_MAX+1] = {0};. In this array, Seen[c] will be true (non-zero) if and only if character c has been seen already. (To get UCHAR_MAX, #include <limits.h>.)

When processing each character, copy it to an unsigned char: unsigned char c = argv[1][i];. Then convert it to uppercase: c = toupper(c);. Then test whether it has been seen already:

if (Seen[c])
    Report error...

If it is new, remember that it has been seen: Seen[c] = 1;.

That is all that is necessary.

Notes:

  • If it is known that A to Z are contiguous in the character set, as they are in ASCII, then char Seen[UCHAR_MAX+1] can be reduced to char Seen['Z'-'A'+1] and indexed using Seen[c-'A']. (Actually, a weaker condition suffices: A is the lowest uppercase character in value and Z is the greatest.)
  • Do not be tempted to use unsigned char c = toupper(argv[1][i]), because toupper is defined for unsigned char values, and a char value may be out of bounds (negative).
  • In an esoteric C implementation where a char is as wide as an int, none of which are known to exist, UCHAR_MAX+1 would evaluate to zero. However, the compiler should warn if this happens.
Twowheeler answered 4/7, 2020 at 1:9 Comment(1)
I just coded this up to see. Very elegant. Reminds me of python dictionaries. Thumbsup! Thanks for sharing!Lymphatic
T
6

I'm assuming this was from cs50, I'm on this problem right now... What I did was I used a simple boolean function. I used two loops, the first loop was to represent the first character of reference and inside the nested loop I looped through all the other chars and returned false if i == k.

bool no_repeat(string arg_letters)
{
   for (int i = 0, j = strlen(arg_letters); i < j; i++)
   {
      for (int k = i+1; k < j; k++)
      {
         if (arg_letters[i] == arg_letters[k])
         {
            return false;
         }

      }
   }
   return true;
}
Tercel answered 24/1, 2021 at 20:18 Comment(0)
S
1

From the question, the expected input is 26 characters with each appears only 1 time.

According to Andreas Wenzel's idea, i add counters for each character. in case any character missing or duplicate will be spotted. Here's my solution:

int main(int argc, char *argv[])
{
    const int OFFSET = 'A';

    if (argc != 2)
    {
        return 1;
    }
    if (strlen(argv[1]) != 26)
    {
        printf("Key must contain 26 characters.\n");
        return 1;
    }

    unsigned char *key = argv[1];
    int count[26] = {0}; // array of all zero

    // Check for any invalid character 
    for (int i = 0, n = strlen(key); i < n; i++)
    {
        if (!isalpha(key[i]))
        {
            printf("Key contains invalid characters.\n");
            return 1;
        }
        count[toupper(key[i]) - OFFSET]++;
    }
    // check for duplicate character of 
    for (int i = 0; i < 26; i++)
    {
        if (count[i] != 1) 
        {
            printf("Key shouldn't contain duplicate characters\n");
            return 1;
        }
    }
)
Slipover answered 13/1, 2021 at 10:38 Comment(4)
Note that for non-default locales, the function isalpha may return nonzero for characters that are not in the range 'a' to 'z' or 'A' to Z'. For example, when the locale is set to some European languages, isalpha will return nonzero for the character code for 'ü', which has the value 252 in the character set I am using. The function toupper will probably change this character to a 'Ü', which has the value 220 in the character set I am using. With this value, you will be accessing count out of bounds.Crystallo
It is also worth noting that the ISO C standard does not guarantee that the letters 'A' to 'Z' are stored contiguously. It only makes such a guarantee for the numbers '0' to '9'. Therefore, your program will work with character sets such as ASCII, which stores the letters contiguously, but it won't work with for example EBCDIC, which does not do this.Crystallo
@AndreasWenzel Thanks a lot for your comments!! I have edited the answer according to your suggestions.Slipover
I have now deleted my comments that you have addressed, because they are no longer relevant (because you have fixed the issues). I did not delete two comments, because they are still relevant. However, as I already stated in my deleted comments, these remaining issues are probably not worth fixing. It is probably safe to assume that your program will only be used in ASCII-compatible character encodings, so you can assume that 'A' to 'Z' is stored contiguously. The issue with the locale is probably also not worth fixing, if you are sure that you are using the default locale.Crystallo
A
1

You can use strchr() to detect if a given character is duplicated in the rest of the string:

#include <cs50.h>
#include <ctype.h>
#include <stdio.h>
#include <string.h>

int main(int argc, char *argv[]) {
    if (argc != 2) {
        printf("Usage: ./substitution key\n");
        return 1;
    } else {
        char *key = argv[1];
        int len = strlen(key]);
        if (len != 26) {
            printf("Key must contain 26 characters.\n");
            return 1;
        } else {
            for (int i = 0; i < len; i++) {
                unsigned char c = key[i];
                if (!isalpha(c)) {
                    printf("The key must contain only letters.\n");
                    return 1;
                }
                if (strchr(key + i + 1, c) {
                    printf("The key must not contain duplicate letters.\n");
                    return 1;
                }
            }
        }
    }
    return 0;
}

Alternately, you can use an array of indicators to perform the check in a single pass. This method is preferred if you want to ignore the character case.

#include <ctype.h>
#include <stdio.h>

int main(int argc, char *argv[]) {
    if (argc != 2) {
        printf("Usage: ./substitution key\n");
        return 1;
    } else {
        char *key = argv[1];
        unsigned char seen['Z' + 1 - 'A'] = { 0 };
        int i;
        for (i = 0; key[i] != '\0'; i++) {
            unsigned char c = key[i];
            if (!isalpha(c)) {
                printf("The key must contain only letters.\n");
                return 1;
            }
            if (seen[toupper(c) - 'A']++ != 0) {
                printf("The key must not contain duplicate letters.\n");
                return 1;
            }
        }
        if (i != 26) {
            printf("Key must contain all 26 letters.\n");
            return 1;
        }
    }
    return 0;
}
Armadillo answered 20/1, 2021 at 12:27 Comment(5)
This solution has a time complexity of O(n^2) whereas it can be solved in O(n), as this solution demonstrates.Crystallo
@AndreasWenzel: The time complexity is not really quadratic: the total number of instructions is bounded as the input string has exactly 26 characters: more elaborate solutions that may use fewer instructions are possible, but not necessarily worth the effort... Note that the question is somewhat ambiguous regarding case and the solution you are pointing to is not portable to systems where letters are not consecutive in the execution character set. On a system that uses EBCDIC this code has undefined behavior.Armadillo
Yes, you are right that the solution I pointed to does not work with EBCDIC (actually I had already pointed this out myself in a comment to that answer). However, this other solution would work with EBCDIC and also has a time complexity of O(n), but it requires 256 bytes of data that must be initialized to 0. Therefore, you are probably right that using a O(n) algorithm is probably not worth it, when n is limited to 26.Crystallo
@AndreasWenzel: added an alternate portable method with a single pass. Definitely better :)Armadillo
@AndreasWenzel: I use a small array, still making assumptions about the target character set, but I'm afraid the C Standard does not mandate that A be the smallest uppercase letter and Z the largest... the code works for ASCII and EBCDIC, but a full 256 byte array seems required for exotic character sets, which would not fit the OP's requirements anyway.Armadillo
A
1

Here is my case-insensitive approach using a nested for loop and a counter (just replace key with argv[1]):

#include <ctype.h>

    int duplicates = 0;
    for (int i = 0; i < length; i++)
    {
        if (duplicates > 0)
            break;

        int count = 0;
        char letter = toupper(key[i]);

        for (int j = 0; j < length; j++)
        {
            if (letter == toupper(key[j]))
            {
                count++;
                if (count > 1)
                    duplicates++;
            }
        }
    }
Appropriate answered 16/5 at 13:7 Comment(0)
P
0

Your can first sort the characters in the string so that any repeated letters will be right next to each other. So after sorting, scan through your sorted string with a variable holding your currently scanned character. If the next character you scan is the same as the last one that you scanned, you have a repeat character

Panjandrum answered 3/7, 2020 at 23:18 Comment(5)
"Characters have to be in alphabetical order." - What are you referring to? The first command line argument? Or the ASCII table?Crystallo
Sorting takes O(n log n) time but the problem in the question can be solved in O(n) time.Twowheeler
eejakobowski - Where does the OP say they have to be sorted in alphabetical order? He simply states that it the input must be alphabetical and nonrepeating.Lymphatic
Eric - Yes I saw your answer above, tried it myself, and am pleasantly surprised.Lymphatic
@Lymphatic I misread "all characters have to be alphabetical" as "output must be in alphabetical order." Regardless, this is still a valid solution and would require the least amount of workPanjandrum
B
0

You can consider to use strchr(). I get an example from https://www.ibm.com/support/knowledgecenter/en/ssw_ibm_i_72/rtref/strchr.htm

#include <stdio.h>
#include <string.h>

#define SIZE 40

int main(void) {
  char buffer1[SIZE] = "computer program";
  char * ptr;
  int    ch = 'p';

  ptr = strchr(buffer1, ch);
  printf("The first occurrence of %c in '%s' is '%s'\n",
          ch, buffer1, ptr );

}

/*****************  Output should be similar to:  *****************
The first occurrence of p in 'computer program' is 'puter program'
*/
Burning answered 3/7, 2020 at 23:28 Comment(1)
Using strchr is wasteful; it takes O(n^2) time, but the problem in the question can be solved in O(n) time.Twowheeler
C
0

There is many ways to do it, but the one I like is to first sort the string and then go through it and delete or copy the string to another string/array and u can do that by

s = "bedezdbe"

after u sort it

s = "bbddeeez"

And to check it

if(s[i-1]!=s[i])
Crucial answered 3/7, 2020 at 23:30 Comment(2)
Sorting takes O(n log n) time but the problem in the question can be solved in O(n) time.Twowheeler
As I said "There is many ways to do it, but the one I like is"Crucial
L
0

I'm assuming you are not yet in Datastructures and Algorithms. If so I would do a binary insertion sort and when you find a repeat you just printf your error message and return a 1 like you have done for the rest of the errors.

That said, that might be over your head right now. So in that case, just go with a bruteforce method without regard for optimization.

  1. Start with the first character, loop through the rest of the string to see if it matches.

  2. If it matches, error message

  3. if there is no match, then repeat for the next character.

The code below checks if there are any repeating characters by this brute force method mentioned. You'll need to adapt it to your code of course.

#include <stdio.h>
#include <string.h>
#include <stdbool.h>

bool itrepeats(char c, char *restofstring, int sublen){
    for(int j=0; j<sublen; j++){
        if(c==*restofstring) return true;
        restofstring++;
    }
    return false;
}

int main (int argc, char *argv[])
{
    int len = strlen(argv[1]);
    char *substr = argv[1];
    
    //loop through all characters
    for(int i=0;i<len;i++){
        //get a character
        char c = argv[1][i];
        //start of substring after c
        substr++;
        //length of that substring
        int substrlen = len-(i+1);
        //check for c in rest of string
        if(itrepeats(c, substr, substrlen)){
            printf("%c repeats, not allowed!\n",c);
            return 1;
        }
        else printf("all good!\n");
    }
}
Lymphatic answered 4/7, 2020 at 3:33 Comment(0)
C
0

I've Done that before so here is my solution to this problem: Every time we found the current argv[1]'s character in the string we gonna add 1 to the counter

in normal cases where the argv[1] is normal, counter should be 26 at the end.

but if there is a repeated character, counter should be more than 26 in the end.

int counter = 0;
for (int i = 0; i < len; i++)
        {
            char argu = argv[1][i];
            // make sure that the key doesnt accept duplicated characters in key.
            for (int k = 0; k < len; k++)
            {
                if (argu == argv[1][k])
                {
                    counter++;
                }
                if (counter > 26)
                {
                    printf("Opps! no duplicated characters!\n");
                    return 1;
                }
            }
        }
Cockrell answered 4/5, 2021 at 5:0 Comment(2)
The question states that no duplicated characters is the desired outcome. Therefore, it does not seem appropriate that to print "Oops" in that case, as that implies than an error occurred. Is "Opps" supposed to mean "Oops"?Crystallo
This solution is rather inefficient, as it requires a total of 676 (26^2) comparisons (time complexity of O(n^2)), whereas this solution only requires 26 comparisons (time complexity of O(n)).Crystallo
G
0
#include<stdio.h>
int main(){
char  string[3] = {'c','j','j'};
for(int i = 0;i<3;i++){
    for(int j= i+1;j<3;j++){
        if(string[i] == string[j]){
            printf("position %d char %c\n",i,string[i]);
        }
    }
}

}
Guinea answered 22/5, 2022 at 2:3 Comment(1)
Your answer would be of higher quality if you added some text to your answer, in order to explain it.Crystallo
M
0

I saw the answer by JulianErnest and felt that while I'm sure it's more efficient than my own take on it, we haven't yet learnt how to work with bool type functions so I gave it a shot using just what we had learnt so far in CS50. Please do let me know if I'm missing any edge cases and such:

int letters[26];
for (int i = 0; i < length; i++)
{
    char c = toupper(argv[1][i]);
    if (isalpha(c))
    {
        letters[c - 'A']++;
    }
}
int checker = 0;
for (int j = 0; j < 26; j++)
{
    if (letters[j] == 1)
    {
        checker++;
    }
}
if (checker != 26)
{
    printf("Key must contain each letter exactly once\n");
    return 1;
}

}

As per my understanding, I'm creating an array of 26 spaces and inserting a one in the appropriate array slot if that character is there in the key (if 'A' appears, letters[0] = 1; if 'B' appears, letters1 = 1 and so on...), then in the second loop, checking if there are 26 ones. If there are not, there are repeated letters so the Key is not in the correct format, and you exit out of main() with error 1.

I haven't added the part of the code where I am checking that there are exactly 26 characters within the key (using strlen), but that comes before this and that code combined with this should make sure that each letter appears exactly once. It's probably not the most efficient, but it's just using only what we know up to and including lecture 2.

Moonshiner answered 26/2 at 7:25 Comment(0)
B
0

Related with my cs50x Substitution problem, I used below method for detecting letters duplication.

int hasDuplicateLetters(string str)
{
    int letters[26] = {0}; // Initialize an array to store counts of each letter
    int length = strlen(str);
    for (int i = 0; i < length; i++)
    {
        if (isalpha(str[i]))
        {
            // Convert character to lowercase
            char lowercase_char = tolower(str[i]);
            // Map the character to an index in the array (a=0, b=1, ..., z=25)
            int index = lowercase_char - 'a';
            // If the letter has already been encountered, return true
            if (letters[index])
            {
                return 1;
            }
            // Otherwise, mark the letter as encountered
            letters[index] = 1;
        }
    }

    // No duplicates found
    return 0;
}
Brabazon answered 19/4 at 6:4 Comment(1)
If code has already checked that length of str is 26, no-more-no-less, AND that it is made up of only characters that pass isalpha(), there's no need to measure its length a second time OR to check that each value is a letter a second time. (All that's needed is to look for duplications within str...)Sedition

© 2022 - 2024 — McMap. All rights reserved.