That will work, using itertools.groupby
z = ['z+2-44', '4+55+z+88']
print([["".join(x) for k,x in itertools.groupby(i,str.isalnum)] for i in z])
output:
[['z', '+', '2', '-', '44'], ['4', '+', '55', '+', 'z', '+', '88']]
It just groups the chars if they're alphanumerical (or not), just join them back in a list comprehension.
EDIT: the general case of a calculator with parenthesis has been asked as a follow-up question here. If z
is as follows:
z = ['z+2-44', '4+55+((z+88))']
then with the previous grouping we get:
[['z', '+', '2', '-', '44'], ['4', '+', '55', '+((', 'z', '+', '88', '))']]
Which is not easy to parse in terms of tokens. So a change would be to join
only if alphanum, and let as list if not, flattening in the end using chain.from_iterable
:
print([list(itertools.chain.from_iterable(["".join(x)] if k else x for k,x in itertools.groupby(i,str.isalnum))) for i in z])
which yields:
[['z', '+', '2', '-', '44'], ['4', '+', '55', '+', '(', '(', 'z', '+', '88', ')', ')']]
(note that the alternate regex answer can also be adapted like this: [re.findall('\w+|\W', s) for s in lst]
(note the lack of +
after W
)
also "".join(list(x))
is slightly faster than "".join(x)
, but I'll let you add it up to avoid altering visibility of that already complex expression.
tokenizer
module. – Changsha