Deleting consonants from a string in Python

Asked 2/5, 2015 at 3:25 Answered 2/5, 2015 at 4:1

Solved python string list python-3.x python-idle

Here is my code. I'm not exactly sure if I need a counter for this to work. The answer should be 'iiii'.

def eliminate_consonants(x):
        vowels= ['a','e','i','o','u']
        vowels_found = 0
        for char in x:
            if char == vowels:
                print(char)

eliminate_consonants('mississippi')

Freezer answered 2/5, 2015 at 3:25 Comment(0)

Correcting your code

The line if char == vowels: is wrong. It has to be if char in vowels:. This is because you need to check if that particular character is present in the list of vowels. Apart from that you need to print(char,end = '') (in python3) to print the output as iiii all in one line.

The final program will be like

def eliminate_consonants(x):
        vowels= ['a','e','i','o','u']
        for char in x:
            if char in vowels:
                print(char,end = "")

eliminate_consonants('mississippi')

And the output will be

iiii

Other ways include

Using in a string
```
def eliminate_consonants(x):
    for char in x:
        if char in 'aeiou':
            print(char,end = "")
```
As simple as it looks, the statement if char in 'aeiou' checks if char is present in the string aeiou.
A list comprehension
```
 ''.join([c for c in x if c in 'aeiou'])
```
This list comprehension will return a list that will contain the characters only if the character is in aeiou
A generator expression
```
''.join(c for c in x if c in 'aeiou')
```
This gen exp will return a generator than will return the characters only if the character is in aeiou
Regular Expressions

You can use re.findall to discover only the vowels in your string. The code
```
re.findall(r'[aeiou]',"mississippi")
```
will return a list of vowels found in the string i.e. ['i', 'i', 'i', 'i']. So now we can use str.join and then use
```
''.join(re.findall(r'[aeiou]',"mississippi"))
```
str.translate and maketrans

For this technique you will need to store a map which matches each of the non vowels to a None type. For this you can use string.ascii_lowecase. The code to make the map is
```
str.maketrans({i:None for i in string.ascii_lowercase if i not in "aeiou"})
```
this will return the mapping. Do store it in a variable (here m for map)
```
"mississippi".translate(m)
```
This will remove all the non aeiou characters from the string.
Using dict.fromkeys

You can use dict.fromkeys along with sys.maxunicode. But remember to import sys first!
```
dict.fromkeys(i for i in range(sys.maxunicode+1) if chr(i) not in 'aeiou')
```
and now use str.translate.
```
'mississippi'.translate(m)
```
Using bytearray

As mentioned by J.F.Sebastian in the comments below, you can create a bytearray of lower case consonants by using
```
non_vowels = bytearray(set(range(0x100)) - set(b'aeiou'))
```
Using this we can translate the word ,
```
'mississippi'.encode('ascii', 'ignore').translate(None, non_vowels)
```
which will return b'iiii'. This can easily be converted to str by using decode i.e. b'iiii'.decode("ascii").
Using bytes

bytes returns an bytes object and is the immutable version of bytearray. (It is Python 3 specific)
```
non_vowels = bytes(set(range(0x100)) - set(b'aeiou'))
```
Using this we can translate the word ,
```
'mississippi'.encode('ascii', 'ignore').translate(None, non_vowels)
```
which will return b'iiii'. This can easily be converted to str by using decode i.e. b'iiii'.decode("ascii").

Timing comparison

Python 3

python3 -m timeit -s "text = 'mississippi'*100; non_vowels = bytes(set(range(0x100)) - set(b'aeiou'))" "text.encode('ascii', 'ignore').translate(None, non_vowels).decode('ascii')"
100000 loops, best of 3: 2.88 usec per loop
python3 -m timeit -s "text = 'mississippi'*100; non_vowels = bytearray(set(range(0x100)) - set(b'aeiou'))" "text.encode('ascii', 'ignore').translate(None, non_vowels).decode('ascii')"
100000 loops, best of 3: 3.06 usec per loop
python3 -m timeit -s "text = 'mississippi'*100;d=dict.fromkeys(i for i in range(127) if chr(i) not in 'aeiou')" "text.translate(d)"
10000 loops, best of 3: 71.3 usec per loop
python3 -m timeit -s "import string; import sys; text='mississippi'*100; m = dict.fromkeys(i for i in range(sys.maxunicode+1) if chr(i) not in 'aeiou')" "text.translate(m)"
10000 loops, best of 3: 71.6 usec per loop
python3 -m timeit -s "text = 'mississippi'*100" "''.join(c for c in text if c in 'aeiou')"
10000 loops, best of 3: 60.1 usec per loop
python3 -m timeit -s "text = 'mississippi'*100" "''.join([c for c in text if c in 'aeiou'])"
10000 loops, best of 3: 53.2 usec per loop
python3 -m timeit -s "import re;text = 'mississippi'*100; p=re.compile(r'[aeiou]')" "''.join(p.findall(text))"
10000 loops, best of 3: 57 usec per loop

The timings in sorted order

translate (bytes)    |  2.88
translate (bytearray)|  3.06
List Comprehension   | 53.2
Regular expressions  | 57.0
Generator exp        | 60.1
dict.fromkeys        | 71.3
translate (unicode)  | 71.6

As you can see the final method using bytes is the fastest.

Python 3.5

python3.5 -m timeit -s "text = 'mississippi'*100; non_vowels = bytes(set(range(0x100)) - set(b'aeiou'))" "text.encode('ascii', 'ignore').translate(None, non_vowels).decode('ascii')"
100000 loops, best of 3: 4.17 usec per loop
python3.5 -m timeit -s "text = 'mississippi'*100; non_vowels = bytearray(set(range(0x100)) - set(b'aeiou'))" "text.encode('ascii', 'ignore').translate(None, non_vowels).decode('ascii')"
100000 loops, best of 3: 4.21 usec per loop
python3.5 -m timeit -s "text = 'mississippi'*100;d=dict.fromkeys(i for i in range(127) if chr(i) not in 'aeiou')" "text.translate(d)"
100000 loops, best of 3: 2.39 usec per loop
python3.5 -m timeit -s "import string; import sys; text='mississippi'*100; m = dict.fromkeys(i for i in range(sys.maxunicode+1) if chr(i) not in 'aeiou')" "text.translate(m)"
100000 loops, best of 3: 2.33 usec per loop
python3.5 -m timeit -s "text = 'mississippi'*100" "''.join(c for c in text if c in 'aeiou')"
10000 loops, best of 3: 97.1 usec per loop
python3.5 -m timeit -s "text = 'mississippi'*100" "''.join([c for c in text if c in 'aeiou'])"
10000 loops, best of 3: 86.6 usec per loop
python3.5 -m timeit -s "import re;text = 'mississippi'*100; p=re.compile(r'[aeiou]')" "''.join(p.findall(text))"
10000 loops, best of 3: 74.3 usec per loop

The timings in sorted order

translate (unicode)  |  2.33
dict.fromkeys        |  2.39
translate (bytes)    |  4.17
translate (bytearray)|  4.21
List Comprehension   | 86.6
Regular expressions  | 74.3
Generator exp        | 97.1

Lebbie answered 2/5, 2015 at 3:26 Comment(14)

Thank you! :) Appreciate the fast response. :) :) – Freezer 2/5, 2015 at 3:28

Can you pls let me know how to print the output all in one line in Python 2.7 as this: print(char,end = "") only seems to works in Paython 3. Thanks. – Cami 2/5, 2015 at 3:53

@JoeR print char, in py2 (Note the trailing comma) – Lebbie 2/5, 2015 at 3:54

What does vowels_found = 0 mean ? – Esprit 2/5, 2015 at 4:25

If you need speed; work with bytes and call bytestring.translate(None, non_vowels) – Pussyfoot 15/7, 2015 at 16:10

.translate() for Unicode string is very slow (compared to bytes version). See Best way to strip punctuation from a string in Python – Pussyfoot 15/7, 2015 at 17:15

@BhargavRao: bytes-version is 10x faster than genexpr: 70.3us vs. 4.75us: (b'mississippi'*100).translate(None, non_vowels) where non_vowels = bytearray(set(range(0x100)) - set(b'aeiou')). Or if input is Unicode: text.encode('ascii', 'ignore').translate(None, non_vowels) (it is also 10x faster than ''.join([c for c in text if c in 'aeiou'])) where text = 'mississippi'*100. – Pussyfoot 15/7, 2015 at 18:34

1. str.maketrans and dict.fromkeys could be moved to the setup too. 2. To support arbitrary input, you should use range(sys.maxunicode+1) instead of range(127) there 3. On CPython: ''.join([...]) (listcomp) is faster than ''.join(...) (genexpr). – Pussyfoot 15/7, 2015 at 19:22

@BhargavRao: What is your OS, Python version? I don't remember seeing that ''.join(genexpr) being faster than ''.join(listcomp). ideone confirms it. – Pussyfoot 15/7, 2015 at 20:15

note: I've used bytearray() rather than bytes() to make the code being able to run on both Python 2 and 3 (from the same source) – Pussyfoot 15/7, 2015 at 21:52

(it is not the end ;)) Note: Python 3.5 greatly improves dict.fromkeys unicode .translate() case (same performance as bytes case for the input in the answer). – Pussyfoot 16/7, 2015 at 12:6

Yep, certainly. Let's improve the answer to contain 3.5 specs too. But for that I have to read WNIP3.5 before :/. Actually @J.F. I'm waiting for my current project to get over and completely remove 3.4. and move over to 3.5. – Lebbie 16/7, 2015 at 12:9

@J.F. Added the Python3.5 results. The table has just overturned. – Lebbie 27/11, 2015 at 23:38

As the time complexity of list search is O(n) while set search is O(1), among the "simple" solutions I would rather use the search within a set: wowels_s = set(['a', 'e', 'i', 'o', 'u']) followed by if char in wovels_s etc. – Deery 19/11, 2021 at 6:47

You can try pythonic way like this,

In [1]: s = 'mississippi'
In [3]: [char for char in s if char in 'aeiou']
Out[3]: ['i', 'i', 'i', 'i']

Function;

In [4]: def eliminate_consonants(x):
   ...:     return ''.join(char for char in x if char in 'aeiou')
   ...: 

In [5]: print(eliminate_consonants('mississippi'))
iiii

Hardi answered 2/5, 2015 at 4:1 Comment(4)

Errr! I rather prefer return ''.join([char for char in x if char in 'aeiou']). Direct and easy to understand :) – Lebbie 2/5, 2015 at 4:8

@BhargavRao Building your intermediate list is both unnecessary and costly. You should remove those []. Another alternative is ''.join(filter('aeiou'.__contains__, s)). I tested a 10 million lowercase letters string, Bhargav's way took 2.16 seconds, mine took 1.47 seconds, Bhargav's without creating that intermediate list took 1.25 seconds. – Gastrin 2/5, 2015 at 5:20

@StefanPochmann Thanks Buddy for that info. I will update my answer. – Lebbie 2/5, 2015 at 5:23

@BhargavRao The update there was good as well, but what I meant was your comment here. Just use ''.join(char ... 'aeiou') without those []. – Gastrin 2/5, 2015 at 5:31

== tests for equality. You are looking to see if any of the characters exist in the string that are in your list 'vowels'. To do that, you can simply use in such as below.

Additionally, I see you have a 'vowels_found' variable but are not utilizing it. Below one example how you can solve this:

def eliminate_consonants(x):
    vowels= ['a','e','i','o','u']
    vowels_found = 0
    for char in x:
        if char in vowels:
            print(char)
            vowels_found += 1

    print "There are", vowels_found, "vowels in", x

eliminate_consonants('mississippi')

Your output would then be:

i
i
i
i
There are 4 vowels in mississippi

Septarium answered 2/5, 2015 at 3:31 Comment(1)

At best, this is a comment, not an answer. – Fortin 2/5, 2015 at 3:31

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++