How to join strings between parentheses in a list of strings
Asked Answered
S

4

7
poke_list = [... 'Charizard', '(Mega', 'Charizard', 'X)', '78', '130', ...] #1000+ values

Is it possible to merge strings that start with '(' and end with ')' and then reinsert it into the same list or a new list?

My desired output poke_list = [... 'Charizard (Mega Charizard X)', '78', '130', ...]

Sedgemoor answered 31/5, 2020 at 22:14 Comment(0)
K
2

Another way to do it, slightly shorter than other solution

poke_list = ['Bulbasaur', 'Charizard', '(Mega', 'Charizard', 'X)', '78', 'Pikachu', '(Raichu)', '130']
fixed = []
acc = fixed
for x in poke_list:
    if x[0] == '(':
        acc = [fixed.pop()]
    acc.append(x)
    if x[-1] == ')':
        fixed.append(' '.join(acc))
        acc = fixed
if not acc is fixed:
    fixed.append(' '.join(acc))
print(fixed)

Also notice that this solution assumes that the broken list doesn't start with a parenthesis to fix, and also manage the case where an item has both opening and closing parenthesis (case excluded in other solution)

The idea is to either append values to main list (fixed) or to some inner list which will be joined later if we have detected opening parenthesis. If the inner list was never closed when exiting the loop (likely illegal) we append it anyway to the fixed list when exiting the loop.

This way of doing things if very similar to the transformation of a flat expression containing parenthesis to a hierarchy of lists. The code would of course be slightly different and should manage more than one level of inner list.

Krick answered 31/5, 2020 at 23:33 Comment(3)
What is fixed variable when initializing acc? Can you elaborate on the acc variable?Sedgemoor
@Aarvak: Actually you spotted a bug. Which proves again the rule of thumb, an untested code is always broken. Thanks. The new version should be correct.Krick
@Andrea Blengino: thanks, I over complicated things, after removing nearly 1/4 of the code it should work better :-). I certainly should write the tests I spoke of in my previous comment (still not done, so feel free to tell me if there are other bugs)Krick
V
8

You can iterate over list element checking if each element starts with ( or ends with ). Once you have find the elements between brackets, you can join them by the string .join method, like this:

poke_list = ['Charizard', '(Mega', 'Charizard', 'X)', '78', '130']

new_poke_list = []
to_concatenate = []
flag = 0

for item in poke_list:
    if item.startswith('(') and not item.endswith(')'):
        to_concatenate.append(item)
        flag = 1
    elif item.endswith(')') and not item.startswith('('):
        to_concatenate.append(item)
        concatenated = ' '.join(to_concatenate)
        new_poke_list.append(concatenated)
        to_concatenate = []
        flag = 0
    elif item.startswith('(') and item.endswith(')'):
        new_poke_list.append(item)
    else:
        if flag == 0:
            new_poke_list.append(item)
        else:
            to_concatenate.append(item)

print(new_poke_list)

The flag is set to 1 when the element is within brackets, 0 otherwise, so you can manage all cases.

Vulpecula answered 31/5, 2020 at 22:26 Comment(3)
Thank you so much, i was having trouble with the values in between and attempted using str.startswith/str.endswith however i didnt set a flag so i would always skip the middle value. I was able to fix my code. Thank you!Sedgemoor
This code has a small isue if the followup value to append contains both an opening and a closing parenthesis. In that case the closing parenthesis will be missed.Krick
Corrected the code in order to considered this case tooVulpecula
K
2

Another way to do it, slightly shorter than other solution

poke_list = ['Bulbasaur', 'Charizard', '(Mega', 'Charizard', 'X)', '78', 'Pikachu', '(Raichu)', '130']
fixed = []
acc = fixed
for x in poke_list:
    if x[0] == '(':
        acc = [fixed.pop()]
    acc.append(x)
    if x[-1] == ')':
        fixed.append(' '.join(acc))
        acc = fixed
if not acc is fixed:
    fixed.append(' '.join(acc))
print(fixed)

Also notice that this solution assumes that the broken list doesn't start with a parenthesis to fix, and also manage the case where an item has both opening and closing parenthesis (case excluded in other solution)

The idea is to either append values to main list (fixed) or to some inner list which will be joined later if we have detected opening parenthesis. If the inner list was never closed when exiting the loop (likely illegal) we append it anyway to the fixed list when exiting the loop.

This way of doing things if very similar to the transformation of a flat expression containing parenthesis to a hierarchy of lists. The code would of course be slightly different and should manage more than one level of inner list.

Krick answered 31/5, 2020 at 23:33 Comment(3)
What is fixed variable when initializing acc? Can you elaborate on the acc variable?Sedgemoor
@Aarvak: Actually you spotted a bug. Which proves again the rule of thumb, an untested code is always broken. Thanks. The new version should be correct.Krick
@Andrea Blengino: thanks, I over complicated things, after removing nearly 1/4 of the code it should work better :-). I certainly should write the tests I spoke of in my previous comment (still not done, so feel free to tell me if there are other bugs)Krick
S
1

Simple plus-signs can concatenate the strings:

new_charizard = 'Charizard '+'(Mega'+' Charizard '+'X)'
Splat answered 31/5, 2020 at 22:17 Comment(1)
sorry i wasn't 100% clear i understand normal concatenation, however is it possible to do this method with a list of thousands of values and check if values are incorrect?Sedgemoor
P
1

This is a function that merges strings between parenthesis. It assumes that for an open parenthesis there will be a closed one and that parenthesis aren't nested.

def merge_parenthesis(l):
    start, length = None, len(l)
    i = 0
    while i < length:
        if start is not None:
            l[start] += l[i]
            if l[i].endswith(')'):
                for _ in range(i-start):
                    l.pop(start+1)
                length = len(l)
                start, i = None, start # reset
        elif l[i].startswith('('):
            start = i
        i+=1

It works in place. When it finds an element beginning with an open parenthesis it start appending to this the next elements in the list until it finds the one that ends with a closed parenthesis. Then it removes all the element in between the one with the open parenthesis and the one ending with the closed one.

When the element ending with the closed parenthesis is found the state is restored by setting start to None, adjusting the length variable and setting the i variable in order to continue from the right element.

Parley answered 16/6, 2020 at 16:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.