Finding whether a string starts with one of a list's variable-length prefixes
Asked Answered
M

11

24

I need to find out whether a name starts with any of a list's prefixes and then remove it, like:

if name[:2] in ["i_", "c_", "m_", "l_", "d_", "t_", "e_", "b_"]:
    name = name[2:]

The above only works for list prefixes with a length of two. I need the same functionality for variable-length prefixes.

How is it done efficiently (little code and good performance)?

A for loop iterating over each prefix and then checking name.startswith(prefix) to finally slice the name according to the length of the prefix works, but it's a lot of code, probably inefficient, and "non-Pythonic".

Does anybody have a nice solution?

Marchant answered 24/9, 2011 at 15:29 Comment(6)
The solution you describe is pretty decent.Melonie
It isn't a lot of code to do, just a lot of code to make clear.Hoey
@Melonie the issue was that the prefixes could be multiple characters, so it wouldnt be sufficient to check name[:2]Cetane
@FooBah No, the second solution of using startswith etc.Melonie
A for loop iterating over each prefix and then checking name.startswith(prefix) to finally slice the name according to the length of the prefix works That sounds pretty pythonic to me. That shouldn't me more than 5 or 10 lines of code. "Pythonic" doesn't mean it has to be done in 1 line.Approximation
I know this is a really old question but what would you want to have happen if the name starts with multiple prefixes in the list, where each of the prefixes were different lengths? ex. name = "amazing", list = ['am', 'ama', 'amaz']. Should it remove 2, 3, or 4 characters?Impercipient
S
15

A bit hard to read, but this works:

name=name[len(filter(name.startswith,prefixes+[''])[0]):]
Serpentiform answered 24/9, 2011 at 16:1 Comment(2)
Very nice, this even ignores unprefixed names. Perfect.Marchant
For those more used to list comprehensions, this is equivalent to: name=name[len([prefix for prefix in prefixes+[''] if name.startswith(prefix)][0]):]Bargeboard
T
50

str.startswith(prefix[, start[, end]])¶

Return True if string starts with the prefix, otherwise return False. prefix can also be a tuple of prefixes to look for. With optional start, test string beginning at that position. With optional end, stop comparing string at that position.

$ ipython
Python 3.5.2 (default, Nov 23 2017, 16:37:01)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.4.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: prefixes = ("i_", "c_", "m_", "l_", "d_", "t_", "e_", "b_")

In [2]: 'test'.startswith(prefixes)
Out[2]: False

In [3]: 'i_'.startswith(prefixes)
Out[3]: True

In [4]: 'd_a'.startswith(prefixes)
Out[4]: True
Twiddle answered 24/9, 2011 at 16:5 Comment(2)
I also need to remove the found prefix from the name in case it starts with one of the prefixes. Maybe the question was a little inaccurate, however I still like the fact that str.startswith also accepts a tuple. (unchecked)Marchant
yes, because it accepts tuples it might be the cleanest implementation.Twiddle
S
15

A bit hard to read, but this works:

name=name[len(filter(name.startswith,prefixes+[''])[0]):]
Serpentiform answered 24/9, 2011 at 16:1 Comment(2)
Very nice, this even ignores unprefixed names. Perfect.Marchant
For those more used to list comprehensions, this is equivalent to: name=name[len([prefix for prefix in prefixes+[''] if name.startswith(prefix)][0]):]Bargeboard
S
5
for prefix in prefixes:
    if name.startswith(prefix):
        name=name[len(prefix):]
        break
Selfhypnosis answered 24/9, 2011 at 15:41 Comment(3)
Except genexes don't leak the iterator name.Hoey
@unutbu: The list is about 10 prefixes long. ThanksMarchant
The first solution won't work, since only the last value of the iterator name is leaked.Hoey
B
3

Regexes will likely give you the best speed:

prefixes = ["i_", "c_", "m_", "l_", "d_", "t_", "e_", "b_", "also_longer_"]
re_prefixes = "|".join(re.escape(p) for p in prefixes)

m = re.match(re_prefixes, my_string)
if m:
    my_string = my_string[m.end()-m.start():]
Bekah answered 24/9, 2011 at 16:49 Comment(1)
@JohnMachin Couldn't he just have done `re_prefixes = '^' + "|^".join(re.escape(p) for p in prefixes)'? Thanks.Githens
C
2

If you define prefix to be the characters before an underscore, then you can check for

if name.partition("_")[0] in ["i", "c", "m", "l", "d", "t", "e", "b", "foo"] and name.partition("_")[1] == "_":
    name = name.partition("_")[2]
Cetane answered 24/9, 2011 at 15:34 Comment(1)
I'd use "_" in name as your second clause to avoid partitioning the string twice, and in fact I'd put that clause first to avoid partitioning the string at all if there's no underscore in it. But good thinking.Killie
B
2

What about using filter?

prefs = ["i_", "c_", "m_", "l_", "d_", "t_", "e_", "b_"]
name = list(filter(lambda item: not any(item.startswith(prefix) for prefix in prefs), name))

Note that the comparison of each list item against the prefixes efficiently halts on the first match. This behaviour is guaranteed by the any function that returns as soon as it finds a True value, eg:

def gen():
    print("yielding False")
    yield False
    print("yielding True")
    yield True
    print("yielding False again")
    yield False

>>> any(gen()) # last two lines of gen() are not performed
yielding False
yielding True
True

Or, using re.match instead of startswith:

import re
patt = '|'.join(["i_", "c_", "m_", "l_", "d_", "t_", "e_", "b_"])
name = list(filter(lambda item: not re.match(patt, item), name))
Backfill answered 24/9, 2011 at 16:46 Comment(0)
S
2

Regex, tested:

import re

def make_multi_prefix_matcher(prefixes):
    regex_text = "|".join(re.escape(p) for p in prefixes)
    print repr(regex_text)
    return re.compile(regex_text).match

pfxs = "x ya foobar foo a|b z.".split()
names = "xenon yadda yeti food foob foobarre foo a|b a b z.yx zebra".split()

matcher = make_multi_prefix_matcher(pfxs)
for name in names:
    m = matcher(name)
    if not m:
        print repr(name), "no match"
        continue
    n = m.end()
    print repr(name), n, repr(name[n:])

Output:

'x|ya|foobar|foo|a\\|b|z\\.'
'xenon' 1 'enon'
'yadda' 2 'dda'
'yeti' no match
'food' 3 'd'
'foob' 3 'b'
'foobarre' 6 're'
'foo' 3 ''
'a|b' 3 ''
'a' no match
'b' no match
'z.yx' 2 'yx'
'zebra' no match
Staging answered 24/9, 2011 at 22:40 Comment(1)
Nice complete solution and I appreciate the escaping and testing! I'm sure this regex based approach would run faster than list comprehensions etc for any sizeable amount of data, with a fairly long list of prefixes.Skylar
J
1

When it comes to search and efficiency always thinks of indexing techniques to improve your algorithms. If you have a long list of prefixes you can use an in-memory index by simple indexing the prefixes by the first character into a dict.

This solution is only worth if you had a long list of prefixes and performance becomes an issue.

pref = ["i_", "c_", "m_", "l_", "d_", "t_", "e_", "b_"]

#indexing prefixes in a dict. Do this only once.
d = dict()
for x in pref:
        if not x[0] in d:
                d[x[0]] = list()
        d[x[0]].append(x)


name = "c_abcdf"

#lookup in d to only check elements with the same first character.
result = filter(lambda x: name.startswith(x),\
                        [] if name[0] not in d else d[name[0]])
print result
Janessa answered 24/9, 2011 at 15:56 Comment(0)
O
0

This edits the list on the fly, removing prefixes. The break skips the rest of the prefixes once one is found for a particular item.

items = ['this', 'that', 'i_blah', 'joe_cool', 'what_this']
prefixes = ['i_', 'c_', 'a_', 'joe_', 'mark_']

for i,item in enumerate(items):
    for p in prefixes:
        if item.startswith(p):
            items[i] = item[len(p):]
            break

print items

Output

['this', 'that', 'blah', 'cool', 'what_this']
Olmstead answered 24/9, 2011 at 17:23 Comment(0)
L
0

Could use a simple regex.

import re
prefixes = ("i_", "c_", "longer_")
re.sub(r'^(%s)' % '|'.join(prefixes), '', name)

Or if anything preceding an underscore is a valid prefix:

name.split('_', 1)[-1]

This removes any number of characters before the first underscore.

Lythraceous answered 4/12, 2018 at 14:7 Comment(0)
R
-1
import re

def make_multi_prefix_replacer(prefixes):
    if isinstance(prefixes,str):
        prefixes = prefixes.split()
    prefixes.sort(key = len, reverse=True)
    pat = r'\b(%s)' % "|".join(map(re.escape, prefixes))
    print 'regex patern :',repr(pat),'\n'
    def suber(x, reg = re.compile(pat)):
        return reg.sub('',x)
    return suber



pfxs = "x ya foobar yaku foo a|b z."
replacer = make_multi_prefix_replacer(pfxs)               

names = "xenon yadda yeti yakute food foob foobarre foo a|b a b z.yx zebra".split()
for name in names:
    print repr(name),'\n',repr(replacer(name)),'\n'

ss = 'the yakute xenon is a|bcdf in the barfoobaratu foobarii'
print '\n',repr(ss),'\n',repr(replacer(ss)),'\n'
Rescind answered 25/9, 2011 at 1:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.