Determine the common prefix of multiple strings
Asked Answered
A

14

97

I have a set of strings, e.g.

my_prefix_what_ever
my_prefix_what_so_ever
my_prefix_doesnt_matter

I simply want to find the longest common portion of these strings, here the prefix. In the above the result should be

my_prefix_

The strings

my_prefix_what_ever
my_prefix_what_so_ever
my_doesnt_matter

should result in the prefix

my_

Is there a relatively painless way in Python to determine the prefix (without having to iterate over each character manually)?

PS: I'm using Python 2.6.3.

Arachne answered 16/7, 2011 at 15:3 Comment(3)
So you are in effect asking for the longest common subsequence?Archbishopric
@KonradRudolph It's asking for the longest common prefix, not for a subsequence.Imperial
He! I'm not an it ☝️🙄Arachne
G
176

Never rewrite what is provided to you: os.path.commonprefix does exactly this:

Return the longest path prefix (taken character-by-character) that is a prefix of all paths in list. If list is empty, return the empty string (''). Note that this may return invalid paths because it works a character at a time.

For comparison to the other answers, here's the code:

# Return the longest prefix of all list elements.
def commonprefix(m):
    "Given a list of pathnames, returns the longest common leading component"
    if not m: return ''
    s1 = min(m)
    s2 = max(m)
    for i, c in enumerate(s1):
        if c != s2[i]:
            return s1[:i]
    return s1
Gastrocnemius answered 16/7, 2011 at 15:45 Comment(6)
I think this can only handle two strings in m, isn't it? The comment though says "all list elements, kinda indicating any number of elements"Leeke
@Leeke Not exactly! min() and max() on string is lexicographical minimum and mnaximum like in dictionary. So when minimum and maximum have same first letter, then all other words between their have to have same letter too, and so on.Microgroove
Do the arguments need to be valid path names? What happens if they are not? The documentation says nothing, so I'm not so sure this can be used for arbitrary strings.Laryngitis
@Laryngitis No. This code is just looking at string, not at paths. If they happen to be all paths watch out for this prefix commonprefix({"/aaA/b", "/aaB/b"}) == "/aa" which is probably not a path you want to use.Intwine
@hochi If you do need valid paths, check out the sister function os.path.commonpath. From the documentation: "Unlike commonprefix(), this returns a valid path."Viscacha
Tell that to the LC interviewers ("never rewrite what's provided to you"). As a serious and perhaps constructive comment, consider not actually providing the source code for that function as on first glance I missed the statement that it's already part of os.path.commonprefixFlashbulb
S
22

Ned Batchelder is probably right. But for the fun of it, here's a more efficient version of phimuemue's answer using itertools.

import itertools

strings = ['my_prefix_what_ever', 
           'my_prefix_what_so_ever', 
           'my_prefix_doesnt_matter']

def all_same(x):
    return all(x[0] == y for y in x)

char_tuples = itertools.izip(*strings)
prefix_tuples = itertools.takewhile(all_same, char_tuples)
''.join(x[0] for x in prefix_tuples)

As an affront to readability, here's a one-line version :)

>>> from itertools import takewhile, izip
>>> ''.join(c[0] for c in takewhile(lambda x: all(x[0] == y for y in x), izip(*strings)))
'my_prefix_'
Silvestro answered 16/7, 2011 at 18:12 Comment(1)
For Python3 please replace itertools.izip(*strings) with zip(*strings).Cardon
A
6

Here's my solution:

a = ["my_prefix_what_ever", "my_prefix_what_so_ever", "my_prefix_doesnt_matter"]

prefix_len = len(a[0])
for x in a[1 : ]:
    prefix_len = min(prefix_len, len(x))
    while not x.startswith(a[0][ : prefix_len]):
        prefix_len -= 1

prefix = a[0][ : prefix_len]
Argeliaargent answered 16/7, 2011 at 15:35 Comment(0)
T
4

The following is an working, but probably quite inefficient solution.

a = ["my_prefix_what_ever", "my_prefix_what_so_ever", "my_prefix_doesnt_matter"]
b = zip(*a)
c = [x[0] for x in b if x==(x[0],)*len(x)]
result = "".join(c)

For small sets of strings, the above is no problem at all. But for larger sets, I personally would code another, manual solution that checks each character one after another and stops when there are differences.

Algorithmically, this yields the same procedure, however, one might be able to avoid constructing the list c.

Terat answered 16/7, 2011 at 15:15 Comment(1)
This fails for e.g. a = ["abc", "axc"], which gives 'ac', instead of just 'a'Latitudinarian
V
2

Here's a simple clean solution. The idea is to use zip() function to line up all the characters by putting them in a list of 1st characters, list of 2nd characters,...list of nth characters. Then iterate each list to check if they contain only 1 value.

a = ["my_prefix_what_ever", "my_prefix_what_so_ever", "my_prefix_doesnt_matter"]

list = [all(x[i] == x[i+1] for i in range(len(x)-1)) for x in zip(*a)]

print a[0][:list.index(0) if list.count(0) > 0 else len(list)]

output: my_prefix_

Veljkov answered 24/11, 2016 at 15:51 Comment(2)
how is this clean?Decagon
how is it not clean? Other solutions have codes in blocks. Logic is simple enough to do it in a single assignment.Veljkov
H
1

Just out of curiosity I figured out yet another way to do this:

def common_prefix(strings):

    if len(strings) == 1:#rule out trivial case
        return strings[0]

    prefix = strings[0]

    for string in strings[1:]:
        while string[:len(prefix)] != prefix and prefix:
            prefix = prefix[:len(prefix)-1]
        if not prefix:
            break

    return prefix

strings = ["my_prefix_what_ever","my_prefix_what_so_ever","my_prefix_doesnt_matter"]

print common_prefix(strings)
#Prints "my_prefix_"

As Ned pointed out it's probably better to use os.path.commonprefix, which is a pretty elegant function.

Hydrosol answered 30/10, 2013 at 15:57 Comment(0)
O
1

The second line of this employs the reduce function on each character in the input strings. It returns a list of N+1 elements where N is length of the shortest input string.

Each element in lot is either (a) the input character, if all input strings match at that position, or (b) None. lot.index(None) is the position of the first None in lot: the length of the common prefix. out is that common prefix.

val = ["axc", "abc", "abc"]
lot = [reduce(lambda a, b: a if a == b else None, x) for x in zip(*val)] + [None]
out = val[0][:lot.index(None)]
Otisotitis answered 2/11, 2015 at 15:15 Comment(0)
F
0

Here is another way of doing this using OrderedDict with minimal code.

import collections
import itertools

def commonprefix(instrings):
    """ Common prefix of a list of input strings using OrderedDict """

    d = collections.OrderedDict()

    for instring in instrings:
        for idx,char in enumerate(instring):
            # Make sure index is added into key
            d[(char, idx)] = d.get((char,idx), 0) + 1

    # Return prefix of keys while value == length(instrings)
    return ''.join([k[0] for k in itertools.takewhile(lambda x: d[x] == len(instrings), d)])
Frodin answered 14/1, 2015 at 7:10 Comment(0)
J
0

I had a slight variation of the problem which I think will be useful to document:

I have a list like:

  • my_prefix_what_ever
  • my_prefix_what_so_ever
  • my_prefix_doesnt_matter
  • some_noise
  • some_other_noise

So I would expect my_prefix to be returned. That can be done with:

from collections import Counter

def get_longest_common_prefix(values, min_length):
    substrings = [value[0: i-1] for value in values for i in range(min_length, len(value))]
    counter = Counter(substrings)
    # remove count of 1
    counter -= Counter(set(substrings))
    return max(counter, key=len)
Jotunheim answered 22/4, 2020 at 7:16 Comment(0)
A
0

In one line without using itertools, for no particular reason, although it does iterate through each character:

''.join([z[0] for z in zip(*(list(s) for s in strings)) if all(x==z[0] for x in z)])
Aeschines answered 5/5, 2021 at 15:0 Comment(0)
R
0

Find the common prefix in all words from the given input string, if there is no common prefix print -1

stringList = ['my_prefix_what_ever', 'my_prefix_what_so_ever', 'my_prefix_doesnt_matter']
len2 = len( stringList )
if len2 != 0:
    # let shortest word is prefix
    prefix = min( stringList )
    for i in range( len2 ):
        word = stringList[ i ]
        len1 = len( prefix )
        # slicing each word as lenght of prefix
        word = word[ 0:len1 ]
        for j in range( len1 ):
            # comparing each letter of word and prefix
            if word[ j ] != prefix[ j ]:
                # if letter does not match slice the prefix
                prefix = prefix[ :j ]
                break # after getting comman prefix move to next word
    if len( prefix ) != 0:
        print("common prefix: ",prefix)
    else:
        print("-1")
else:
     print("string List is empty") 
Rupe answered 10/2, 2022 at 4:36 Comment(0)
I
0

numpy matrix approach:

import numpy as np


def common_prefix(strings: list[str]) -> str:
    # common prefix cannot be larger than the smallest str
    min_length = min(len(string) for string in strings)
    strings = [string[:min_length] for string in strings]

    array = np.array([list(x) for x in strings])  # covert to numpy matrix for column-wise operations
    for i in range(min_length):
        # for every column check if all char values are the same (same as first)
        if not all(array[:, i] == array[0][i]):
            # if not return the substring before the first char difference
            return strings[0][:i]

    # the common prefix is the full (shortest) str
    return strings[0]


assert common_prefix(["str1", "str2", "str3"]) == "str"
assert common_prefix(["s", "st", "str"]) == "s"
assert common_prefix(["1str", "str", "str"]) == ""

Insular answered 10/5, 2023 at 6:44 Comment(0)
H
0
import os
os.path.commonprefix(list_name) ## this will do the work

OS module link

Hedva answered 6/11, 2023 at 10:28 Comment(1)
What does this add to the accepted answer?Arachne
H
0

Posting yet another solution as its more concise than most of the solutions here

def common_prefix(strings: list[str]) -> str:
    shortest = min(strings, key=len)
    for i, c in enumerate(shortest):
        if any(x[i] != c for x in strings):
            return shortest[:i]
    return shortest

assert common_prefix(["Abc01", "Abc02", "Abc03"]) == "Abc0"
assert common_prefix(["Abc01", "Abc01"]) == "Abc01"
assert common_prefix(["", "Abc01"]) == ""
assert common_prefix(["Xyz01", "Abc01"]) == ""

It will throw on an empty list but trivial to add a check for this if needed

Howzell answered 19/4 at 10:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.