list match in python: get indices of a sub-list in a larger list

Asked 30/4, 2012 at 14:49 Answered 30/4, 2012 at 14:56

For two lists,

a = [1, 2, 9, 3, 8, ...]   (no duplicate values in a, but a is very big)
b = [1, 9, 1,...]          (set(b) is a subset of set(a), 1<<len(b)<<len(a)) 

indices = get_indices_of_a(a, b)

how to let get_indices_of_a return indices = [0, 2, 0,...] with array(a)[indices] = b? Is there a faster method than using a.index, which is taking too long?

Making b a set is a fast method of matching lists and returning indices (see compare two lists in python and return indices of matched values ), but it will lose the index of the second 1 as well as the sequence of the indices in this case.

Uredium answered 30/4, 2012 at 14:49 Comment(0)

A fast method (when a is a large list) would be using a dict to map values in a to indices:

>>> index_dict = dict((value, idx) for idx,value in enumerate(a))
>>> [index_dict[x] for x in b]
[0, 2, 0]

This will take linear time in the average case, compared to using a.index which would take quadratic time.

Spearhead answered 30/4, 2012 at 14:56 Comment(1)

+1. This is a good answer for large lists where it will drastically reduce the time needed - naturally on small lists the creation of the dict will take more time than it will save. Given the asker's comment on my answer, it seems big lists are involved, so this is the wanted answer. – Runkle 30/4, 2012 at 15:2

Presuming we are working with smaller lists, this is as easy as:

>>> a = [1, 2, 9, 3, 8] 
>>> b = [1, 9, 1] 
>>> [a.index(item) for item in b]
[0, 2, 0]

On larger lists, this will become quite expensive.

(If there are duplicates, the first occurrence will always be the one referenced in the resulting list, if not set(b) <= set(a), you will get a ValueError).

Runkle answered 30/4, 2012 at 14:50 Comment(4)

Thanks a lot! There are no duplicates, but a is very big, and b is not small either although len(b)<<len(a). Using a.index(item) is doing a search in a for each value in b... is there a faster method? – Uredium 30/4, 2012 at 15:0

@Uredium Yup, see interjay's answer. – Runkle 30/4, 2012 at 15:1

you can add this to your solution to remove the ValueError situation : [a.index(item) for item in b if item in a] – Kavita 30/4, 2012 at 15:4

@AshwiniChaudhary Given what the asker has said, I think he'd prefer an error than silent failure. Naturally if you wanted to skip the missing elements, then yes. – Runkle 30/4, 2012 at 15:5

Recommended topics

Hot tags