Generating all possible combinations of characters in a string
Asked Answered
S

6

11

Say I have a string list:

li = ['a', 'b', 'c']

I would like to construct a new list such that each entry of the new list is a concatenation of a selection of 3 entries in the original list. Note that each entry can be chosen repeatedly:

new_li=['abc', 'acb', 'bac', 'bca', 'cab', 'cba', 'aab', 'aac',....'aaa', 'bbb', 'ccc']

The brutal force way is to construct a 3-fold nested for loop and insert each 3-combination into the new list. I was wondering if there is any Pythonic way to deal with that? Thanks.

Update: Later I will convert the new list into a set, so the order does not matter anyway.

Salema answered 31/8, 2017 at 21:38 Comment(3)
Does it have to be random? How long should the list be?Threadgill
@Threadgill Please see the updates. The length of the list should be 3^n, where n is the number of entries in the original listSalema
the length should be n^n not 3^nVelar
D
18

This looks like a job for itertools.product.

import itertools

def foo(l):
     yield from itertools.product(*([l] * 3)) 

for x in foo('abc'):
     print(''.join(x))

aaa
aab
aac
aba
abb
abc
aca
acb
acc
baa
bab
bac
bba
bbb
bbc
bca
bcb
bcc
caa
cab
cac
cba
cbb
cbc
cca
ccb
ccc

yield from is available to you from python3.3 and beyond. For older version, yield within a loop:

def foo(l):
     for i in itertools.product(*([l] * 3)) :
         yield i
Dorladorlisa answered 31/8, 2017 at 21:41 Comment(2)
Thanks. What if each entry in the original list is not a single character, like li=['a1','b2','c3']?Salema
@James It works exactly the same way. You could just try it.Dorladorlisa
C
14

The best way to get all combinations (also called cartesian product) of a list is to use itertools.product using the len of your iterable as repeat argument (that's where it differs from the other answer):

from itertools import product
li = ['a', 'b', 'c']
for comb in product(li, repeat=len(li)):
    print(''.join(comb))

or if you want the result as list:

>>> combs = [''.join(comb) for comb in product(li, repeat=len(li))]
>>> combs
['aaa', 'aab', 'aac', 'aba', 'abb', 'abc', 'aca', 'acb', 'acc', 'baa', 
 'bab', 'bac', 'bba', 'bbb', 'bbc', 'bca', 'bcb', 'bcc', 'caa', 'cab', 
 'cac', 'cba', 'cbb', 'cbc', 'cca', 'ccb', 'ccc']

It's a bit cleaner to use the repeat argument than to multiply and unpack the list you have manually.

Coriss answered 1/9, 2017 at 12:11 Comment(0)
F
4

An alternate approach using list comprehension:

li = ['a', 'b', 'c']

new_li = [a+b+c for a in li for b in li for c in li]
Frobisher answered 3/2, 2022 at 17:47 Comment(2)
As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.Brade
Clever, but it works only for three elements. What if I have 10 or 100 elements? Can you improve it? :)Incinerator
F
0
import itertools
repeat=int(input("Enter length: ")
def password():
    def foo(l):
        yield from itertools.product(*([l] * repeat)))

    for x in foo('abcdefghijklmnopqrstuvwxyz'): 
        # you could also use string.ascii_lowercase or ["a","b","c"]
        print(''.join(x))

password()
Frontality answered 10/9, 2021 at 3:11 Comment(1)
Please provide additional details in your answer. As it's currently written, it's hard to understand your solution.Brade
U
0

I'll show you a way to do this without any libraries so that you can understand the logic behind how to achieve it.

First, we need to understand how to achieve all combinations mathematically.

Let's take a look at the pattern of every possible combination of characters ranging from a-b with a length of '1'.

a
b

Not much to see but from what we can see, there is one set of each character in the list. Let's increase our string length to '2' and see what pattern emerges.

aa
ab
ba
bb

So looking at this pattern, we see a new column has been added. The far right column is the same as the first example, with there being only 1 set of characters, but it's looped this time. The column on the far left has 2 set of characters. Could it be that for every new column added, one more set of characters is added? Let's take a look and find out by increasing the string length to '3'.

aaa
aab
aba
abb
baa
bab
bba
bbb

We can see the two columns on the right have stayed the same and the new column on the left has 4 of each characters! Not what we was expecting. So the number of characters doesn't increase by 1 for each column. Instead, if you notice the pattern, it is actually increasing by powers of 2.

The first column with only '1' set of characters : 2 ^ 0 = 1

The second column with '2' sets of characters : 2 ^ 1 = 2

The third column with '4' sets of characters : 2 ^ 2 = 4

So the answer here is, with each new column added, the number of each characters in the column is determined by it's position of powers, with the first column on the right being x ^ 0, then x ^ 1, then x ^ 2... and so on.

But what is x? In the example I gave x = 2. But is it always 2? Let's take a look.

I will now give an example of each possible combination of characters from range a-c

aa
ab
ac
ba
bb
bc
ca
cb
cc

If we count how many characters are in the first column on the right, there is still only one set of each characters for every time it loops, this is because the very first column on the right will always be equal to x ^ 0 and anything to the power of 0 is always 1. But if we look at the second column, we see 3 of each characters for every loop. So if x ^ 1 is for the second column, then x = 3. For the first example I gave with a range of a-b (range of 2), to the second example where I used a range a-c (range of 3), it seems as if x is always the length of characters used in your combinations.

With this first pattern recognised, we can start building a function that can identify what each column should represent. If we want to build every combination of characters from range a-b with a string length of 3, then we need a function that can understand that every set of characters in each column will as followed : [4, 2, 1].

Now create a function that can find how many set of characters should be in each column by returning a list of numbers that represent the total number of characters in a column based on it's position. We do this using powers.

Remember if we use a range of characters from a-b (2) then each column should have a total of x ^ y number of characters for each set, where x represents the length of characters being used, and y represents it's column position, where the very first column on the right is column number 0.

Example:

A combination of characters ranging from ['a', 'b'] with a string length of 3 will have a total of 4 a's and b's in the far left column for each set, a total of 2 a's and b's in the next for each set and a total of 1 a's and b's in the last for each set.

To return a list with this total number of characters respective to their columns as so [4, 2, 1] we can do this

def getCharPower(stringLength, charRange):
    charpowers = []
    for x in range(0, stringLength):
            charpowers.append(len(charRange)**(stringLength - x - 1))
    return charpowers

With the above function - if we want to create every possible combination of characters that range from a-b (2) and have a string length of 4, like so

aaaa
aaab
aaba
aabb
abaa
abab
abba
abbb
baaa
baab
baba
babb
bbaa
bbab
bbba
bbbb

which have a total set of (8) a's and b's, (4) a's and b's, (2) a's and b's, and (1) a's and b's, then we want to return a list of [8, 4, 2, 1]. The stringLength is 4 and our charRange is ['a', 'b'] and the result from our function is [8, 4, 2, 1].

So now all we have to do is print out each character x number of times depending on the value of it's column placement from our returned list.

In order to do this though, we need to find out how many times each set is printed in it's column. Take a look at the first column on the right of the previous combination example. All though a and b is only printed once per set, it loops and prints out the same thing 7 more times (8 total). If the string was only 3 characters in length then it loop a total of 4 times.

The reason for this is because the length of our strings determine how many combinations there will be in total. The formula for working this out is x ^ y = a, where x equals our range of characters, y equals the length of the string and a equals the total number of combinations that are possible within those specifications.

So to finalise this problem, our solution is to figure out

  1. How many many characters in each set go into each column
  2. How many times to repeat each set in each column

Our first option has already been solved with our previously created function. Our second option can be solved by finding out how many combinations there are in total by calculating charRange ^ stringLength. Then running through a loop, we add how many sets of characters there are until a (total number of possible combinations) has been reached in that column. Run that for each column and you have your result.

Here is the function that solves this

def Generator(stringLength, charRange):
    workbench = []
    results = []
    charpowers = getCharPower(stringLength, charRange)
    for x in range(0, stringLength):
            while len(workbench) < len(charRange)**stringLength:
                    for char in charRange:
                            for z in range(0, charpowers[x]):
                                    workbench.append(char)
            results.append(workbench)
            workbench = []
    results = ["".join(result) for result in list(zip(*results))]
    return results

That function will return every possible combination of characters and of string length that you provide.

A way more simpler way of approaching this problem would be to just run a for loop for your total length.

So to create every possible combination of characters ranging from a-b with a length of 2

characters = ['a', 'b']
for charone in characters:
    for chartwo in characters:
        print(charone+chartwo)

All though this is a lot simpler, this is limited. This code only works to print every combination with a length of 2. To create more than this, we would have to manually add another for loop each time we wanted to change it. The functions I provided to you before this code however will print any combination for how many string length you give it, making it 100% adaptable and the best way to solve this issue manually yourself without any libraries.

Uda answered 28/8, 2022 at 18:54 Comment(2)
powers() could be just replaced with the ** operator, for one...Irvinirvine
The more you know!Uda
B
0

Thanks to the amazing insightful reply from @cap1hunna, I came up with a faster alternative to your problem, in case speed is important for you.

Instead of manually building up every combination, I decided to go ahead and first build the columns, and then combine them:

from timeit import default_timer


def create_column(
    *,
    char_list: str,
    column_position: int,
    string_total_length: int,
    ) -> str:
    repeated_chars = ''
    for character in char_list:
        repeated_chars += character * (len(char_list)**column_position)
    column_length = len(char_list)**string_total_length
    return repeated_chars * int(column_length/len(repeated_chars))


def combine_columns(
    *,
    columns: list[list[str]],
    ) -> list[str]:
    result = []
    columns.reverse()
    columns_count = len(columns)
    strings_length = len(columns[0])
    for iterator in range(0, strings_length):
        substring = ''
        for subiterator in range(0, columns_count):
            substring += columns[subiterator][iterator]
        result.append(substring)
    return result


def main():
    lowcase_characters = 'abcdefghijklmnopqrstuvwxyz'
    string_total_length = 3
    columns = []
    for column_position in range(0, string_total_length):
        temp_column = create_column(
            char_list=lowcase_characters,
            column_position=column_position,
            string_total_length=string_total_length
            )
        columns.append(temp_column)
    combinations = combine_columns(columns=columns)
    print(combinations)    


if __name__ == '__main__':
    start_time = default_timer()
    main()
    end_time = default_timer()
    print(f'[INFO] Execution lasted: {end_time - start_time} seconds.')

Without the print statements, the accepted answer and the rest return the 3 combinations of these exact lowcase characters in 0.005 seconds. This code returns the same results in 0.004s.

Bremser answered 11/8, 2023 at 23:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.