Split a string to even sized chunks

R

10

32

How would I be able to take a string like 'aaaaaaaaaaaaaaaaaaaaaaa' and split it into 4 length tuples like (aaaa,aaaa,aaaa)

Roast answered 25/1, 2014 at 13:39 Comment(3)

for x in s:x = s[0:4];s = s[4:];print(x) – Melson 25/1, 2014 at 13:44

related: What is the most “pythonic” way to iterate over a list in chunks? – Careworn 4/4, 2016 at 11:37

Does this answer your question? Split string every nth character? – Carabiniere 16/2, 2020 at 0:18

C

45

Use textwrap.wrap:

>>> import textwrap
>>> s = 'aaaaaaaaaaaaaaaaaaaaaaa'
>>> textwrap.wrap(s, 4)
['aaaa', 'aaaa', 'aaaa', 'aaaa', 'aaaa', 'aaa']

Cade answered 25/1, 2014 at 13:40 Comment(2)

Won't this fail if the string contains spaces? – Carabiniere 16/2, 2020 at 0:19

textwrap is very powerful and IMO, for a precise task like this, offers far too many options for things like replacing tabs with spaces, fixing sentence punctuation, etc. I would be more comfortable using something much simpler. – Troostite 31/8, 2023 at 3:53

P

25

Using list comprehension, generator expression:

>>> s = 'aaaaaaaaaaaaaaaaaaaaaaa'
>>> [s[i:i+4] for i in range(0, len(s), 4)]
['aaaa', 'aaaa', 'aaaa', 'aaaa', 'aaaa', 'aaa']

>>> tuple(s[i:i+4] for i in range(0, len(s), 4))
('aaaa', 'aaaa', 'aaaa', 'aaaa', 'aaaa', 'aaa')

>>> s = 'a bcdefghi j'
>>> tuple(s[i:i+4] for i in range(0, len(s), 4))
('a bc', 'defg', 'hi j')

Paulo answered 25/1, 2014 at 13:39 Comment(0)

M

5

Another solution using regex:

>>> s = 'aaaaaaaaaaaaaaaaaaaaaaa'
>>> import re
>>> re.findall('[a-z]{4}', s)
['aaaa', 'aaaa', 'aaaa', 'aaaa', 'aaaa']
>>>

Mincing answered 25/1, 2014 at 13:47 Comment(3)

A regular expression is a bit overkill for this. – Adenaadenauer 25/1, 2014 at 13:49

The owner was asking for a solution, not asking for the most optimize one, so i just put as we can solve in that way too. never mind, i know its not the best of its kind. – Mincing 25/1, 2014 at 14:8

Actually, that's a really nice solution (apart from regular expressions being slower when used in bulk) and easier to understand on first sight than the zip() solution. And it can easily be changed to work with arbitrary characters, including newlines: re.findall('.{4}', s, re.DOTALL) - Or even accept incomplete tails: re.findall('.{1,4}', s, re.DOTALL) – Reprobation 5/4, 2017 at 9:44

R

5

You could use the grouper recipe, zip(*[iter(s)]*4):

In [113]: s = 'aaaaaaaaaaaaaaaaaaaaaaa'

In [114]: [''.join(item) for item in zip(*[iter(s)]*4)]
Out[114]: ['aaaa', 'aaaa', 'aaaa', 'aaaa', 'aaaa']

Note that textwrap.wrap may not split s into strings of length 4 if the string contains spaces:

In [43]: textwrap.wrap('I am a hat', 4)
Out[43]: ['I am', 'a', 'hat']

The grouper recipe is faster than using textwrap:

In [115]: import textwrap

In [116]: %timeit [''.join(item) for item in zip(*[iter(s)]*4)]
100000 loops, best of 3: 2.41 µs per loop

In [117]: %timeit textwrap.wrap(s, 4)
10000 loops, best of 3: 32.5 µs per loop

And the grouper recipe can work with any iterator, while textwrap only works with strings.

Rhinarium answered 26/2, 2014 at 22:7 Comment(0)

A

1

s = 'abcdefghi'

k - no of parts of string

k = 3

parts - list to store parts of string

parts = [s[i:i+k] for i in range(0, len(s), k)]

parts --> ['abc', 'def', 'ghi']

Acherman answered 29/1, 2019 at 9:42 Comment(0)

N

0

s = 'abcdef'

We need to split in parts of 2

[s[pos:pos+2] for pos,i in enumerate(list(s)) if pos%2 == 0]

Answer:

['ab', 'cd', 'ef']

Nathalie answered 6/9, 2016 at 8:18 Comment(0)

S

0

I think this method is simpler. But the message length must be split with split_size. Or letters must be added to the message. Example: message = "lorem ipsum_" then the added letter can be deleted.

message = "lorem ipsum"

array = []

temp = ""

split_size = 3

for i in range(1, len(message) + 1):
    temp += message[i - 1]

    if i % split_size == 0:
        array.append(temp)
        temp = ""

print(array)

Output: ['lor', 'em ', 'ips']

Speedball answered 1/12, 2019 at 17:17 Comment(0)

A

0

Here's another possible solution to the given problem:

def split_by_length(text, width):
    width = max(1, width)
    chunk = ""
    for v in text:
        chunk += v
        if len(chunk) == width:
            yield chunk
            chunk = ""

    if chunk:
        yield chunk

if __name__ == '__main__':
    x = "123456789"
    for i in range(20):
        print(i, list(split_by_length(x, i)))

Output:

0 ['1', '2', '3', '4', '5', '6', '7', '8', '9']
1 ['1', '2', '3', '4', '5', '6', '7', '8', '9']
2 ['12', '34', '56', '78', '9']
3 ['123', '456', '789']
4 ['1234', '5678', '9']
5 ['12345', '6789']
6 ['123456', '789']
7 ['1234567', '89']
8 ['12345678', '9']
9 ['123456789']
10 ['123456789']
11 ['123456789']
12 ['123456789']
13 ['123456789']
14 ['123456789']
15 ['123456789']
16 ['123456789']
17 ['123456789']
18 ['123456789']
19 ['123456789']

Allan answered 24/3, 2020 at 17:8 Comment(0)

G

0

The kiddy way

def wrap(string, max_width):
    i=0
    strings = []
    s = ""
    for x in string:
        i+=1
        if i == max_width:
            s = s + x
            strings.append(s)
            s = ""
            i = 0
        else:
            s = s + x
    strings.append(s)
    return strings

wrap('ABCDEFGHIJKLIMNOQRSTUVWXYZ',4)
# output: ['ABCD', 'EFGH', 'IJKL', 'IMNO', 'QRST', 'UVWX', 'YZ']

Graminivorous answered 19/4, 2021 at 5:10 Comment(0)

S

0

This function uses recursion.

s = 'dasffvvcsadcadscsdsdcsadssdfsdfsdfdfs'

delimiter = 5

def reccursive_split(data, delimiter, current_list = []):
    if len(data) > delimiter:
        current_list.append(data[:delimiter])
        return reccursive_split(data[delimiter:], delimiter, current_list)
    else:
        current_list.append(data)
        return current_list

print(reccursive_split(s, delimiter))

Stenography answered 7/10, 2023 at 13:23 Comment(0)

Recommended topics

Hot tags