Update: The more_itertools
library has released more_itertool.replace
, a tool that solves this particular problem (see Option 3).
First, here are some other options that work on generic iterables (lists, strings, iterators, etc.):
Code
Option 1 - without libraries:
def remove(iterable, subsequence):
"""Yield non-subsequence items; sans libraries."""
seq = tuple(iterable)
subsequence = tuple(subsequence)
n = len(subsequence)
skip = 0
for i, x in enumerate(seq):
slice_ = seq[i:i+n]
if not skip and (slice_ == subsequence):
skip = n
if skip:
skip -= 1
continue
yield x
Option 2 - with more_itertools
import more_itertools as mit
def remove(iterable, subsequence):
"""Yield non-subsequence items."""
iterable = tuple(iterable)
subsequence = tuple(subsequence)
n = len(subsequence)
indices = set(mit.locate(mit.windowed(iterable, n), pred=lambda x: x == subsequence))
it_ = enumerate(iterable)
for i, x in it_:
if i in indices:
mit.consume(it_, n-1)
else:
yield x
Demo
list(remove(big_list, sub_list))
# [2, 3, 4]
list(remove([1, 2, 1, 2], sub_list))
# []
list(remove([1, "a", int, 3, float, "a", int, 5], ["a", int]))
# [1, 3, float, 5]
list(remove("11111", "111"))
# ['1', '1']
list(remove(iter("11111"), iter("111")))
# ['1', '1']
Option 3 - with more_itertools.replace
:
Demo
pred = lambda *args: args == tuple(sub_list)
list(mit.replace(big_list, pred=pred, substitutes=[], window_size=2))
# [2, 3, 4]
pred=lambda *args: args == tuple(sub_list)
list(mit.replace([1, 2, 1, 2], pred=pred, substitutes=[], window_size=2))
# []
pred=lambda *args: args == tuple(["a", int])
list(mit.replace([1, "a", int, 3, float, "a", int, 5], pred=pred, substitutes=[], window_size=2))
# [1, 3, float, 5]
pred=lambda *args: args == tuple("111")
list(mit.replace("11111", pred=pred, substitutes=[], window_size=3))
# ['1', '1']
pred=lambda *args: args == tuple(iter("111"))
list(mit.replace(iter("11111"), pred=pred, substitutes=[], window_size=3))
# ['1', '1']
Details
In all of these examples, we are scanning the main sequence with smaller window slices. We yield whatever is not found in the slice and skip whatever is in the slice.
Option 1 - without libraries
Iterate an enumerated sequence and evaluate slices of size n
(the length of the sub-sequence). If the upcoming slice equals the sub-sequence, reset skip
and yield the item. Otherwise, iterate past it. skip
tracks how many times to advance the loop, e.g. sublist
is of size n=2
, so it skips twice per match.
Note, you can convert this option to work with sequences alone by removing the first two tuple assignments and replacing the iterable
parameter with seq
, e.g. def remove(seq, subsequence):
.
Option 2 - with more_itertools
Indices are located for every matching sub-sequence in an iterable. While iterating an enumerated iterator, if an index is found in indices
, the remaining sub-sequence is skipped by consuming the next n-1
elements from the iterator. Otherwise, an item is yielded.
Install this library via > pip install more_itertools
.
Option 3 - with more_itertools.replace
:
This tool replaces a sub-sequence of items defined in a predicate with substitute values. To remove items, we substitute an empty container, e.g. substitutes=[]
. The length of replaced items is specified by the window_size
parameter (this value is equal to the length of the sub-sequence).
list(map(int, (''.join(map(str, big_list)).replace(''.join(map(str, sub_list)), ''))))
. Or do you want to apply this to arbitrary objects? – Sawfishbig_list = [1, 2, 1, 2, 1]
andsub_list = [1, 2, 1]
do you want the result to be[2, 1]
or[]
(i.e. remove per occurrence or remove all items that match thesub_list
pattern)? – Sawfish