Finding the most frequent character in a string
Asked Answered
B

19

12

I found this programming problem while looking at a job posting on SO. I thought it was pretty interesting and as a beginner Python programmer I attempted to tackle it. However I feel my solution is quite...messy...can anyone make any suggestions to optimize it or make it cleaner? I know it's pretty trivial, but I had fun writing it. Note: Python 2.6

The problem:

Write pseudo-code (or actual code) for a function that takes in a string and returns the letter that appears the most in that string.

My attempt:

import string

def find_max_letter_count(word):

    alphabet = string.ascii_lowercase
    dictionary = {}

    for letters in alphabet:
        dictionary[letters] = 0

    for letters in word:
        dictionary[letters] += 1

    dictionary = sorted(dictionary.items(), 
                        reverse=True, 
                        key=lambda x: x[1])

    for position in range(0, 26):
        print dictionary[position]
        if position != len(dictionary) - 1:
            if dictionary[position + 1][1] < dictionary[position][1]:
                break

find_max_letter_count("helloworld")

Output:

>>> 
('l', 3)

Updated example:

find_max_letter_count("balloon") 
>>>
('l', 2)
('o', 2)
Blockus answered 9/11, 2010 at 6:43 Comment(3)
Incidental note: you should read PEP 8, which documents the recommended Python coding style. Methods should be in snake_case rather than mixedCase.Diedra
possible duplicate of How to find most common elements of a list?Bondmaid
possible duplicate of Python most common element in a listRocaille
R
37

There are many ways to do this shorter. For example, you can use the Counter class (in Python 2.7 or later):

import collections
s = "helloworld"
print(collections.Counter(s).most_common(1)[0])

If you don't have that, you can do the tally manually (2.5 or later has defaultdict):

d = collections.defaultdict(int)
for c in s:
    d[c] += 1
print(sorted(d.items(), key=lambda x: x[1], reverse=True)[0])

Having said that, there's nothing too terribly wrong with your implementation.

Rhodarhodamine answered 9/11, 2010 at 6:54 Comment(2)
.most_common()....Bondmaid
Thanks for your answer (you too Chris Morgan), but I guess I forgot to mention that if multiple characters are the most frequent, they should all be output. (ex. 'abcdefg' outputs a = 1, b = 1, etc.) I thought this was the trickiest part, hence the mess at the end. I've edited the question.Blockus
G
5

If you are using Python 2.7, you can quickly do this by using collections module. collections is a hight performance data structures module. Read more at http://docs.python.org/library/collections.html#counter-objects

>>> from collections import Counter
>>> x = Counter("balloon")
>>> x
Counter({'o': 2, 'a': 1, 'b': 1, 'l': 2, 'n': 1})
>>> x['o']
2
Ghiselin answered 9/11, 2010 at 8:5 Comment(0)
A
2

Here is way to find the most common character using a dictionary

message = "hello world"
d = {}
letters = set(message)
for l in letters:
    d[message.count(l)] = l

print d[d.keys()[-1]], d.keys()[-1]
Athirst answered 16/11, 2013 at 23:40 Comment(0)
R
2

Here's a way using FOR LOOP AND COUNT()

w = input()
r = 1
for i in w:
    p = w.count(i)
    if p > r:
        r = p
        s = i
print(s)
Reconnoiter answered 6/11, 2020 at 15:26 Comment(0)
C
2

The way I did uses no built-in functions from Python itself, only for-loops and if-statements.

def most_common_letter():
    string = str(input())
    letters = set(string)
    if " " in letters:         # If you want to count spaces too, ignore this if-statement
        letters.remove(" ")
    max_count = 0
    freq_letter = []
    for letter in letters:
        count = 0
        for char in string:
            if char == letter:
                count += 1
        if count == max_count:
            max_count = count
            freq_letter.append(letter)
        if count > max_count:
            max_count = count
            freq_letter.clear()
            freq_letter.append(letter)
    return freq_letter, max_count

This ensures you get every letter/character that gets used the most, and not just one. It also returns how often it occurs. Hope this helps :)

Consuetudinary answered 25/1, 2021 at 11:53 Comment(0)
C
1

If you want to have all the characters with the maximum number of counts, then you can do a variation on one of the two ideas proposed so far:

import heapq  # Helps finding the n largest counts
import collections

def find_max_counts(sequence):
    """
    Returns an iterator that produces the (element, count)s with the
    highest number of occurrences in the given sequence.

    In addition, the elements are sorted.
    """

    if len(sequence) == 0:
        raise StopIteration

    counter = collections.defaultdict(int)
    for elmt in sequence:
        counter[elmt] += 1

    counts_heap = [
        (-count, elmt)  # The largest elmt counts are the smallest elmts
        for (elmt, count) in counter.iteritems()]

    heapq.heapify(counts_heap)

    highest_count = counts_heap[0][0]

    while True:

        try:
            (opp_count, elmt) = heapq.heappop(counts_heap)
        except IndexError:
            raise StopIteration

        if opp_count != highest_count:
            raise StopIteration

        yield (elmt, -opp_count)

for (letter, count) in find_max_counts('balloon'):
    print (letter, count)

for (word, count) in find_max_counts(['he', 'lkj', 'he', 'll', 'll']):
    print (word, count)

This yields, for instance:

lebigot@weinberg /tmp % python count.py
('l', 2)
('o', 2)
('he', 2)
('ll', 2)

This works with any sequence: words, but also ['hello', 'hello', 'bonjour'], for instance.

The heapq structure is very efficient at finding the smallest elements of a sequence without sorting it completely. On the other hand, since there are not so many letter in the alphabet, you can probably also run through the sorted list of counts until the maximum count is not found anymore, without this incurring any serious speed loss.

Chippewa answered 9/11, 2010 at 8:8 Comment(0)
T
1
def most_frequent(text):
    frequencies = [(c, text.count(c)) for c in set(text)]
    return max(frequencies, key=lambda x: x[1])[0]

s = 'ABBCCCDDDD'
print(most_frequent(s))

frequencies is a list of tuples that count the characters as (character, count). We apply max to the tuples using count's and return that tuple's character. In the event of a tie, this solution will pick only one.

Tsosie answered 18/11, 2013 at 21:11 Comment(0)
T
1

Question : Most frequent character in a string The maximum occurring character in an input string

Method 1 :

a = "GiniGinaProtijayi"

d ={}
chh = ''
max = 0 
for ch in a : d[ch] = d.get(ch,0) +1 
for val in sorted(d.items(),reverse=True , key = lambda ch : ch[1]):
    chh = ch
    max  = d.get(ch)
    
    
print(chh)  
print(max)  

Method 2 :

a = "GiniGinaProtijayi"

max = 0 
chh = ''
count = [0] * 256 
for ch in a : count[ord(ch)] += 1
for ch in a :
    if(count[ord(ch)] > max):
        max = count[ord(ch)] 
        chh = ch
        
print(chh)        

Method 3 :

   import collections
    
    line ='North Calcutta Shyambazaar Soudipta Tabu  Roopa Roopi Gina Gini Protijayi  Sovabazaar Paikpara  Baghbazaar  Roopa'
    
bb = collections.Counter(line).most_common(1)[0][0]
print(bb)

Method 4 :

line =' North Calcutta Shyambazaar Soudipta Tabu  Roopa Roopi Gina Gini Protijayi  Sovabazaar Paikpara  Baghbazaar  Roopa'


def mostcommonletter(sentence):
    letters = list(sentence)
    return (max(set(letters),key = letters.count))


print(mostcommonletter(line))    

    
Timecard answered 16/6, 2018 at 8:13 Comment(0)
N
1

I noticed that most of the answers only come back with one item even if there is an equal amount of characters most commonly used. For example "iii 444 yyy 999". There are an equal amount of spaces, i's, 4's, y's, and 9's. The solution should come back with everything, not just the letter i:

sentence = "iii 444 yyy 999"

# Returns the first items value in the list of tuples (i.e) the largest number
# from Counter().most_common()
largest_count: int = Counter(sentence).most_common()[0][1]

# If the tuples value is equal to the largest value, append it to the list
most_common_list: list = [(x, y)
                         for x, y in Counter(sentence).items() if y == largest_count]

print(most_common_count)

# RETURNS
[('i', 3), (' ', 3), ('4', 3), ('y', 3), ('9', 3)]
Noelyn answered 23/12, 2018 at 3:45 Comment(0)
D
0

Here are a few things I'd do:

  • Use collections.defaultdict instead of the dict you initialise manually.
  • Use inbuilt sorting and max functions like max instead of working it out yourself - it's easier.

Here's my final result:

from collections import defaultdict

def find_max_letter_count(word):
    matches = defaultdict(int)  # makes the default value 0

    for char in word:
        matches[char] += 1

    return max(matches.iteritems(), key=lambda x: x[1])

find_max_letter_count('helloworld') == ('l', 3)
Diedra answered 9/11, 2010 at 6:54 Comment(2)
Nitpicking: letters would be more correct as letter, since it's a variable that contain exactly one letter.Chippewa
@EOL: true; I didn't rename that variable from what he had - I'd put it as char myself, I think, as it's not just a letter...Diedra
E
0

If you could not use collections for any reason, I would suggest the following implementation:

s = input()
d = {}

# We iterate through a string and if we find the element, that
# is already in the dict, than we are just incrementing its counter.
for ch in s:
    if ch in d:
        d[ch] += 1
    else:
        d[ch] = 1

# If there is a case, that we are given empty string, then we just
# print a message, which says about it.
print(max(d, key=d.get, default='Empty string was given.'))
Eury answered 17/6, 2021 at 15:42 Comment(0)
C
0
sentence = "This is a great question made me wanna watch matrix again!"

char_frequency = {}

for char in sentence:
    if char == " ": #to skip spaces
        continue
    elif char in char_frequency:
        char_frequency[char] += 1 
    else:
        char_frequency[char] = 1


char_frequency_sorted = sorted(
    char_frequency.items(), key=lambda ky: ky[1], reverse=True
)
print(char_frequency_sorted[0]) #output -->('a', 9)
Cooe answered 14/9, 2021 at 1:21 Comment(0)
R
0
# return the letter with the max frequency.

def maxletter(word:str) -> tuple:
    ''' return the letter with the max occurance '''
    v = 1
    dic = {}
    for letter in word:
        if letter in dic:
            dic[letter] += 1
        else:
            dic[letter] = v

    for k in dic:
        if dic[k] == max(dic.values()):
            return k, dic[k]

l, n = maxletter("Hello World")
print(l, n)

output: l 3

Romantic answered 11/7, 2022 at 16:2 Comment(0)
P
0

you may also try something below.

from pprint import pprint                               
    sentence = "this is a common interview question"        
                                                            
    char_frequency = {}                                     
    for char in sentence:                                   
        if char in char_frequency:                          
            char_frequency[char] += 1                       
        else:                                               
            char_frequency[char] = 1                        
    pprint(char_frequency, width = 1)                       
    out = sorted(char_frequency.items(),                    
                 key = lambda kv : kv[1], reverse = True)   
    print(out)                                              
    print(out[0])   
Panther answered 13/8, 2022 at 15:9 Comment(0)
H
0

statistics.mode(data) Return the single most common data point from discrete or nominal data. The mode (when it exists) is the most typical value and serves as a measure of central location.

If there are multiple modes with the same frequency, returns the first one encountered in the data. If the smallest or largest of those is desired instead, use min(multimode(data)) or max(multimode(data)). If the input data is empty, StatisticsError is raised.

import statistics as stat

test = 'This is a test of the fantastic mode super special function ssssssssssssss'
test2 = ['block', 'cheese', 'block']
val = stat.mode(test)
val2 = stat.mode(test2)
print(val, val2)

mode assumes discrete data and returns a single value. This is the standard treatment of the mode as commonly taught in schools:

mode([1, 1, 2, 3, 3, 3, 3, 4])
3

The mode is unique in that it is the only statistic in this package that also applies to nominal (non-numeric) data:

mode(["red", "blue", "blue", "red", "green", "red", "red"])
'red'
Halting answered 3/12, 2022 at 17:20 Comment(0)
K
0

Here is how I solved it, considering the possibility of multiple most frequent chars:

sentence = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, \
sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut 
enim."

joint_sentence = sentence.replace(" ", "")
frequencies = {}
for letter in joint_sentence:
    frequencies[letter] = frequencies.get(letter, 0) +1

biggest_frequency = frequencies[max(frequencies, key=frequencies.get)]
most_frequent_letters = {key: value for key, value in frequencies.items() if value == biggest_frequency}
print(most_frequent_letters)

Output:

{'e': 12, 'i': 12}
Kerek answered 18/1, 2023 at 10:9 Comment(0)
A
0

#from Coding with Mosh from pprint import pprint sentence = "Hello World"

char_frequency = {}
for char in sentence:
    if char in char_frequency:
        char_frequency[char] += 1
    else:
        char_frequency[char] = 1

char_frequency_sorted = sorted(
    char_frequency.items(),
    key=lambda kv: kv[1],
    reverse=True)
print(char_frequency_sorted[0])
Aristocratic answered 4/10, 2023 at 11:30 Comment(1)
As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.Gilliam
G
-1
#file:filename
#quant:no of frequent words you want

def frequent_letters(file,quant):
    file = open(file)
    file = file.read()
    cnt = Counter
    op = cnt(file).most_common(quant)
    return op   
Gombosi answered 12/10, 2017 at 10:12 Comment(3)
Thank you for this code snippet, which might provide some limited, immediate help. A proper explanation would greatly improve its long-term value by showing why this is a good solution to the problem, and would make it more useful to future readers with other, similar questions. Please edit your answer to add some explanation, including the assumptions you've made. Specifically, where did Counter come from?Dunkle
Counter has to be imported it is by using the command 'from collections import Counter'Gombosi
Please edit your answer to show the additional information, rather than writing it as a comment. Comments can disappear without trace, so it really needs to be part of your answer. Thank you.Dunkle
S
-1
# This code is to print all characters in a string which have highest frequency
 
def find(str):
      
    y = sorted([[a.count(i),i] for i in set(str)])
  # here,the count of unique character and the character are taken as a list  
  # inside y(which is a list). And they are sorted according to the 
  # count of each character in the list y. (ascending)
  # Eg : for "pradeep", y = [[1,'r'],[1,'a'],[1,'d'],[2,'p'],[2,'e']]

    most_freq= y[len(y)-1][0]   
  # the count of the most freq character is assigned to the variable 'r'
  # ie, most_freq= 2

    x= []

    for j in range(len(y)):
       
        if y[j][0] == most_freq:
            x.append(y[j])
      # if the 1st element in the list of list == most frequent 
      # character's count, then all the characters which have the 
      # highest frequency will be appended to list x.
      # eg :"pradeep"
      # x = [['p',2],['e',2]]   O/P  as expected
    return x

find("pradeep")
Sortition answered 17/8, 2021 at 17:18 Comment(1)
Can you please provide some explanation to this code, and explain how is it better/worse than the other solutions?Bastinado

© 2022 - 2024 — McMap. All rights reserved.