Python: Rename duplicates in list with progressive numbers without sorting list

Asked 4/6, 2015 at 17:33 Answered 25/8, 2021 at 3:4

Given a list like this:

mylist = ["name", "state", "name", "city", "name", "zip", "zip"]

I would like to rename the duplicates by appending a number to get the following result:

mylist = ["name1", "state", "name2", "city", "name3", "zip1", "zip2"]

I do not want to change the order of the original list. The solutions suggested for this related Stack Overflow question sorts the list, which I do not want to do.

Underdrawers answered 4/6, 2015 at 17:33 Comment(0)

This is how I would do it. EDIT: I wrote this into a more generalized utility function since people seem to like this answer.

mylist = ["name", "state", "name", "city", "name", "zip", "zip"]
check = ["name1", "state", "name2", "city", "name3", "zip1", "zip2"]
copy = mylist[:]  # so we will only mutate the copy in case of failure

from collections import Counter # Counter counts the number of occurrences of each item
from itertools import tee, count

def uniquify(seq, suffs = count(1)):
    """Make all the items unique by adding a suffix (1, 2, etc).

    `seq` is mutable sequence of strings.
    `suffs` is an optional alternative suffix iterable.
    """
    not_unique = [k for k,v in Counter(seq).items() if v>1] # so we have: ['name', 'zip']
    # suffix generator dict - e.g., {'name': <my_gen>, 'zip': <my_gen>}
    suff_gens = dict(zip(not_unique, tee(suffs, len(not_unique))))  
    for idx,s in enumerate(seq):
        try:
            suffix = str(next(suff_gens[s]))
        except KeyError:
            # s was unique
            continue
        else:
            seq[idx] += suffix

uniquify(copy)
assert copy==check  # raise an error if we failed
mylist = copy  # success

If you wanted to append an underscore before each count, you could do something like this:

>>> mylist = ["name", "state", "name", "city", "name", "zip", "zip"]
>>> uniquify(mylist, (f'_{x!s}' for x in range(1, 100)))
>>> mylist
['name_1', 'state', 'name_2', 'city', 'name_3', 'zip_1', 'zip_2']

...or if you wanted to use letters instead:

>>> mylist = ["name", "state", "name", "city", "name", "zip", "zip"]
>>> import string
>>> uniquify(mylist, (f'_{x!s}' for x in string.ascii_lowercase))
>>> mylist
['name_a', 'state', 'name_b', 'city', 'name_c', 'zip_a', 'zip_b']

NOTE: this is not the fastest possible algorithm; for that, refer to the answer by ronakg. The advantage of the function above is it is easy to understand and read, and you're not going to see much of a performance difference unless you have an extremely large list.

EDIT: Here is my original answer in a one-liner, however the order is not preserved and it uses the .index method, which is extremely suboptimal (as explained in the answer by DTing). See the answer by queezz for a nice 'two-liner' that preserves order.

[s + str(suffix) if num>1 else s for s,num in Counter(mylist).items() for suffix in range(1, num+1)]
# Produces: ['zip1', 'zip2', 'city', 'state', 'name1', 'name2', 'name3']

Corney answered 4/6, 2015 at 18:59 Comment(1)

This is my favorite suggestion, as it is compact and easy to read. THANKS! – Underdrawers 4/6, 2015 at 19:18

My solution with map and lambda:

print map(lambda x: x[1] + str(mylist[:x[0]].count(x[1]) + 1) if mylist.count(x[1]) > 1 else x[1], enumerate(mylist))

More traditional form

newlist = []
for i, v in enumerate(mylist):
    totalcount = mylist.count(v)
    count = mylist[:i].count(v)
    newlist.append(v + str(count + 1) if totalcount > 1 else v)

And last one

[v + str(mylist[:i].count(v) + 1) if mylist.count(v) > 1 else v for i, v in enumerate(mylist)]

Irinairis answered 4/6, 2015 at 17:54 Comment(4)

The lambda 1 liner is a bit complicated, but your second suggestion is nice. Thanks! – Underdrawers 4/6, 2015 at 19:49

Added one liner similar to Rick Teachey solution but with order preserving – Irinairis 5/6, 2015 at 5:50

this solution actually is pricey since it takes O(n^2) – Nakisha 22/6, 2015 at 15:31

thanks! I really admire how smart it is to make the list item unique based on the count of the list slice up to [:i] (in the last list comprehension solution), my sincere compliments!!! – Annulation 29/10, 2017 at 15:46

This is how I would do it. EDIT: I wrote this into a more generalized utility function since people seem to like this answer.

mylist = ["name", "state", "name", "city", "name", "zip", "zip"]
check = ["name1", "state", "name2", "city", "name3", "zip1", "zip2"]
copy = mylist[:]  # so we will only mutate the copy in case of failure

from collections import Counter # Counter counts the number of occurrences of each item
from itertools import tee, count

def uniquify(seq, suffs = count(1)):
    """Make all the items unique by adding a suffix (1, 2, etc).

    `seq` is mutable sequence of strings.
    `suffs` is an optional alternative suffix iterable.
    """
    not_unique = [k for k,v in Counter(seq).items() if v>1] # so we have: ['name', 'zip']
    # suffix generator dict - e.g., {'name': <my_gen>, 'zip': <my_gen>}
    suff_gens = dict(zip(not_unique, tee(suffs, len(not_unique))))  
    for idx,s in enumerate(seq):
        try:
            suffix = str(next(suff_gens[s]))
        except KeyError:
            # s was unique
            continue
        else:
            seq[idx] += suffix

uniquify(copy)
assert copy==check  # raise an error if we failed
mylist = copy  # success

If you wanted to append an underscore before each count, you could do something like this:

>>> mylist = ["name", "state", "name", "city", "name", "zip", "zip"]
>>> uniquify(mylist, (f'_{x!s}' for x in range(1, 100)))
>>> mylist
['name_1', 'state', 'name_2', 'city', 'name_3', 'zip_1', 'zip_2']

...or if you wanted to use letters instead:

>>> mylist = ["name", "state", "name", "city", "name", "zip", "zip"]
>>> import string
>>> uniquify(mylist, (f'_{x!s}' for x in string.ascii_lowercase))
>>> mylist
['name_a', 'state', 'name_b', 'city', 'name_c', 'zip_a', 'zip_b']

[s + str(suffix) if num>1 else s for s,num in Counter(mylist).items() for suffix in range(1, num+1)]
# Produces: ['zip1', 'zip2', 'city', 'state', 'name1', 'name2', 'name3']

Corney answered 4/6, 2015 at 18:59 Comment(1)

This is my favorite suggestion, as it is compact and easy to read. THANKS! – Underdrawers 4/6, 2015 at 19:18

Any method where count is called on each element is going to result in O(n^2) since count is O(n). You can do something like this:

# not modifying original list
from collections import Counter

mylist = ["name", "state", "name", "city", "name", "zip", "zip"]
counts = {k:v for k,v in Counter(mylist).items() if v > 1}
newlist = mylist[:]

for i in reversed(range(len(mylist))):
    item = mylist[i]
    if item in counts and counts[item]:
        newlist[i] += str(counts[item])
        counts[item]-=1
print(newlist)

# ['name1', 'state', 'name2', 'city', 'name3', 'zip1', 'zip2']

# modifying original list
from collections import Counter

mylist = ["name", "state", "name", "city", "name", "zip", "zip"]
counts = {k:v for k,v in Counter(mylist).items() if v > 1}      

for i in reversed(range(len(mylist))):
    item = mylist[i]
    if item in counts and counts[item]:
        mylist[i] += str(counts[item])
        counts[item]-=1
print(mylist)

# ['name1', 'state', 'name2', 'city', 'name3', 'zip1', 'zip2']

This should be O(n).

Other provided answers:

mylist.index(s) per element causes O(n^2)

mylist = ["name", "state", "name", "city", "name", "zip", "zip"]

from collections import Counter
counts = Counter(mylist)
for s,num in counts.items():
    if num > 1:
        for suffix in range(1, num + 1):
            mylist[mylist.index(s)] = s + str(suffix)

count(x[1]) per element causes O(n^2)
It is also used multiple times per element along with list slicing.

print map(lambda x: x[1] + str(mylist[:x[0]].count(x[1]) + 1) if mylist.count(x[1]) > 1 else x[1], enumerate(mylist))

Benchmarks:

http://nbviewer.ipython.org/gist/dting/c28fb161de7b6287491b

Kingship answered 4/6, 2015 at 18:3 Comment(1)

I changed my answer to do away with .index per your suggestions. Much better. – Corney 5/6, 2019 at 14:34

Here's a very simple O(n) solution. Simply walk the list storing the index of element in the list. If we've seen this element before, use the stored data earlier to append the occurrence value.

This approach solves the problem with just creating one more dictionary for look-back. Avoids doing look-ahead so that we don't create temporary list slices.

mylist = ["name", "state", "name", "city", "city", "name", "zip", "zip", "name"]

dups = {}

for i, val in enumerate(mylist):
    if val not in dups:
        # Store index of first occurrence and occurrence value
        dups[val] = [i, 1]
    else:
        # Special case for first occurrence
        if dups[val][1] == 1:
            mylist[dups[val][0]] += str(dups[val][1])

        # Increment occurrence value, index value doesn't matter anymore
        dups[val][1] += 1

        # Use stored occurrence value
        mylist[i] += str(dups[val][1])

print mylist

# ['name1', 'state', 'name2', 'city1', 'city2', 'name3', 'zip1', 'zip2', 'name4']

Pamilapammi answered 4/6, 2015 at 18:51 Comment(2)

This is nice but have you ever seen collections.Counter? Use that and you don't have to implement your own counting algorithm. – Corney 4/6, 2015 at 19:51

Yes I know about collections.Counter :). I just wanted to post a solution that's more efficient. This solution does just one pass of the list. – Pamilapammi 4/6, 2015 at 19:57

A list comprehension version of the Rick Teachey answer, "two-liner":

from collections import Counter

m = ["name", "state", "name", "city", "name", "zip", "zip"]

d = {a:list(range(1, b+1)) if b>1 else '' for a,b in Counter(m).items()}
[i+str(d[i].pop(0)) if len(d[i]) else i for i in m]
#['name1', 'state', 'name2', 'city', 'name3', 'zip1', 'zip2']

Prana answered 5/6, 2019 at 4:51 Comment(2)

very nice answer, but you needed to do range(1,b+1) to get the desired suffixes. – Corney 5/6, 2019 at 14:52

Sure, range(1,b+1). Must've missed the indexing while using some old variable in the notebook. Thanks. – Prana 6/6, 2019 at 15:9

You can use hashtable to solve this problem. Define a dictionary d. key is the string and value is (first_time_index_in_the_list, times_of_appearance). Everytime when you see a word, just check the dictionary, and if the value is 2, use first_time_index_in_the_list to append '1' to the first element, and append times_of_appearance to current element. If greater than 2, just append times_of_appearance to current element.

Hallett answered 4/6, 2015 at 17:43 Comment(1)

Would be nice with an example code, it's faster for the eye to realize the logic. – Waksman 19/3, 2019 at 11:2

Less fancy stuff.

from collections import defaultdict
mylist = ["name", "state", "name", "city", "name", "zip", "zip"]
finalList = []
dictCount = defaultdict(int)
anotherDict = defaultdict(int)
for t in mylist:
   anotherDict[t] += 1
for m in mylist:
   dictCount[m] += 1
   if anotherDict[m] > 1:
       finalList.append(str(m)+str(dictCount[m]))
   else:
       finalList.append(m)
print finalList

Combo answered 4/6, 2015 at 18:36 Comment(0)

Beware of updated values that already exist in the original list

If the starting list already includes an item "name2" ...

mylist = ["name", "state", "name", "city", "name", "zip", "zip", "name2"]

...then mylist[2] shouldn't be updated to "name2" when the function runs, otherwise a new duplicate will be created; instead, the function should jump to the next available item name "name3" .

mylist_updated = ['name1', 'state', 'name3', 'city', 'name4', 'zip1', 'zip2', 'name2']

Here's an alternate solution (can probably be shortened and optimized) which includes a recursive function that checks for these existing items in the original list.

mylist = ["name", "state", "name", "city", "name", "zip", "zip", "name2"]

def fix_dups(mylist, sep='', start=1, update_first=True):
    mylist_dups = {}
    #build dictionary containing val: [occurrences, suffix]
    for val in mylist:
        if val not in mylist_dups:
            mylist_dups[val] = [1, start - 1]
        else:
            mylist_dups[val][0] += 1
            
    #define function to update duplicate values with suffix, check if updated value already exists
    def update_val(val, num):
        temp_val = sep.join([str(x) for x in [val, num]])
        if temp_val not in mylist_dups:
            return temp_val, num
        else:
            num += 1
            return update_val(val, num)        
    
    #update list
    for i, val in enumerate(mylist):
        if mylist_dups[val][0] > 1:
            mylist_dups[val][1] += 1  
            if update_first or mylist_dups[val][1] > start:
                new_val, mylist_dups[val][1] = update_val(val, mylist_dups[val][1])
                mylist[i] = new_val

    return mylist
                
mylist_updated = fix_dups(mylist, sep='', start=1, update_first=True)
print(mylist_updated)
#['name1', 'state', 'name3', 'city', 'name4', 'zip1', 'zip2', 'name2']

In case you don't want to change the first occurrence.

mylist = ["name", "state", "name", "city", "name", "zip", "zip", "name_2"]
             
mylist_updated = fix_dups(mylist, sep='_', start=0, update_first=False)
print(mylist_updated)
#['name', 'state', 'name_1', 'city', 'name_3', 'zip', 'zip_1', 'name_2']

Glockenspiel answered 25/8, 2021 at 3:4 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Other provided answers:

Benchmarks:

Recommended topics

Hot tags