Python - extracting split list correctly
Asked Answered
C

4

2

As a follow up to this question, I have an expression such as this:['(', '44', '+(', '3', '+', 'll', '))'] which was created by using re.findall('\w+|\W+',item) method, however within this list of strings, there are two errors. One is the '+(' and the other is the '))'.

Is there a pythonic way that I could split just the operators such that the list would be something like ['(', '44', '+','(', '3', '+', 'll', ')',')'].

(keep the digits/letters together, separate the symbols)

Thanks

Catto answered 19/2, 2017 at 22:50 Comment(2)
it's not a duplicate like at all. It may appear like that, but I have edited the question to really show what OP wants (after answering the prequel question)Ximenez
check my answer, it answers both questions now.Ximenez
D
1

You want to split characters of grouped non-alphanumerical characters.

I would create a 1-list item if the item is ok (alphanumerical) or a list of characters if the item is a sequence of symbols.

Then, I'd flatten the list to get what you asked for

import itertools

l = ['(', '44', '+(', '3', '+', 'll', '))']
new_l = list(itertools.chain.from_iterable([x] if x.isalnum() else list(x) for x in l))
print(new_l)

result:

['(', '44', '+', '(', '3', '+', 'll', ')', ')']

EDIT: actually you could link your 2 questions into one answer (adapting the regex answer of the original question) by not grouping symbols in the regex:

import re
lst = ['z+2-44', '4+55+((z+88))']
print([re.findall('\w+|\W', s) for s in lst])

(note the lack of + after \W) and you get directly:

[['z', '+', '2', '-', '44'], ['4', '+', '55', '+', '(', '(', 'z', '+', '88', ')', ')']]
Dionne answered 19/2, 2017 at 22:59 Comment(0)
T
1

Short solution using str.join() and re.split() functions:

import re
l = ['(', '44', '+(', '3', '+', 'll', '))']
new_list = [i for i in re.split(r'(\d+|[a-z]+|[^\w])', ''.join(l)) if i.strip()]

print(new_list)

The output:

['(', '44', '+', '(', '3', '+', 'll', ')', ')']
Tombolo answered 19/2, 2017 at 23:10 Comment(0)
E
1

An alternative would be to change the regex in order to keep the non-alphanumeric characters separate :

import re
lst = ['z+2-44', '4+(55+z)+88']
[re.findall('\w+|\W', s) for s in lst]

#[['z', '+', '2', '-', '44'], ['4', '+', '(', '55', '+', 'z', ')', '+', '88']]
Exportation answered 19/2, 2017 at 23:10 Comment(0)
S
1

Try this:

import re
lst = ['z+2-44', '4+(55+z)+88']
[re.findall('\w+|\W', s) for s in lst]

May be it helps to others.

Spectroradiometer answered 20/2, 2017 at 8:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.