Splitting a string by list of indices
Asked Answered
I

4

46

I want to split a string by a list of indices, where the split segments begin with one index and end before the next one.

Example:

s = 'long string that I want to split up'
indices = [0,5,12,17]
parts = [s[index:] for index in indices]
for part in parts:
    print part

This will return:

long string that I want to split up
string that I want to split up
that I want to split up
I want to split up

I'm trying to get:

long
string
that
I want to split up

Implacental answered 1/6, 2012 at 13:43 Comment(0)
T
64
s = 'long string that I want to split up'
indices = [0,5,12,17]
parts = [s[i:j] for i,j in zip(indices, indices[1:]+[None])]

returns

['long ', 'string ', 'that ', 'I want to split up']

which you can print using:

print '\n'.join(parts)

Another possibility (without copying indices) would be:

s = 'long string that I want to split up'
indices = [0,5,12,17]
indices.append(None)
parts = [s[indices[i]:indices[i+1]] for i in xrange(len(indices)-1)]
Tessin answered 1/6, 2012 at 13:45 Comment(10)
Another way is, [s[i:j] for i,j in izip_longest(indices,indices[1:])] but I like your way better!Anticathode
This copies the indices list with indices[1:] and creates a new list with double size by the zip function -> Bad performance and memory consumption.Federico
@ms4py This is fine, performance is not an issue in this case, this is a very readable solution. If performance is an issue my suggestion can be used.Anticathode
eumiro- thank you, this works great. Can you explain how the +[None] part works?Implacental
@ms4py - ok, there's an updated version withou copying of the list and without zip. Although your itertools version is probably more performant.Tessin
@Implacental - indices[1:] + [None] copies the array without the first element and adds a None at the end. So for your indices it looks like [5,12,17,None]. I am using it to be able to access the last part of the string with s[17:None] (the same like s[17:], just using two variables I have anyway).Tessin
@Implacental [1:None] for example is the same as [1:]Anticathode
@ms4py What do you mean by that?Anticathode
Not sure it's your fortee but how would on do this in NodeJs?Albertinealbertite
This had been a hectic for me since an hour and half. Thanks @TessinJennings
F
5

Here is a short solution with heavy usage of the itertools module. The tee function is used to iterate pairwise over the indices. See the Recipe section in the module for more help.

>>> from itertools import tee, izip_longest
>>> s = 'long string that I want to split up'
>>> indices = [0,5,12,17]
>>> start, end = tee(indices)
>>> next(end)
0
>>> [s[i:j] for i,j in izip_longest(start, end)]
['long ', 'string ', 'that ', 'I want to split up']

Edit: This is a version that does not copy the indices list, so it should be faster.

Federico answered 1/6, 2012 at 13:52 Comment(3)
Thanks for the alt approach- ill have to check out itertools sometimeImplacental
Neat approach, learned something new. Is there an easy way to get rid of the extra blank at the end of the first 3 strings inside the expression? I tried s[i:j].strip() but that didn't work at all (not sure why not)Katalin
If you are gonna use this you may as well use the pairwise function straight from the itertools docs. Also using next(end) is preferred to end.next() for python 3 compatibility.Anticathode
P
4

You can write a generator if you don't want to make any modifications to the list of indices:

>>> def split_by_idx(S, list_of_indices):
...     left, right = 0, list_of_indices[0]
...     yield S[left:right]
...     left = right
...     for right in list_of_indices[1:]:
...         yield S[left:right]
...         left = right
...     yield S[left:]
... 
>>> 
>>> 
>>> s = 'long string that I want to split up'
>>> indices = [5,12,17]
>>> [i for i in split_by_idx(s, indices)]
['long ', 'string ', 'that ', 'I want to split up']
Pimply answered 3/8, 2019 at 22:18 Comment(0)
C
1

Another solution (a bit more readable):

parts=[]; i2=len(s)  #--> i1 and i2 are 'startIndex' and 'endIndex'

for i1 in reversed(indices): parts.append( s[i1:i2] );  i2=i1

parts.reverse()

This reverses the indices and therefore starts splitting from the last index position to the 'endIndex' i2 (which is updated in every loop).

Of course the elements are in the wrong order than. That's why I reversed the result array at the end.

I think for beginners this is a bit more readable than the accepted answer.

Curling answered 21/7, 2022 at 21:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.