Python Pandas concatenate a Series of strings into one string
Asked Answered
E

3

21

In python pandas, there is a Series/dataframe column of str values to combine into one long string:

df = pd.DataFrame({'text' : pd.Series(['Hello', 'world', '!'], index=['a', 'b', 'c'])})

Goal: 'Hello world !'

Thus far methods such as df['text'].apply(lambda x: ' '.join(x)) are only returning the Series.

What is the best way to get to the goal concatenated string?

Esparza answered 30/12, 2016 at 17:43 Comment(0)
F
35

You can join a string on the series directly:

In [3]:
' '.join(df['text'])

Out[3]:
'Hello world !'
Froissart answered 30/12, 2016 at 17:47 Comment(2)
I am getting an error while doing this: "TypeError: sequence item 0: expected str instance, list found". This is in python3, could you please guide?Gyniatrics
@user1930402 asking questions in comments is poor form on SO, the error message is clear you have lists in your dataframe not strings hence the error. As I don't have access to your computer I can only speculate that for some reason you're storing lists in your df which is not advisable. I can't help you, you need to post a new question, you should also ask yourself if you really need to store lists at all, it defeats the point of using pandas when you store non scalar valuesFroissart
M
15

Apart from join, you could also use pandas string method .str.cat

In [171]: df.text.str.cat(sep=' ')
Out[171]: 'Hello world !'

However, join() is much faster.

Mccraw answered 31/12, 2016 at 11:36 Comment(1)
Faster to write or faster to run? Can you provide results?Yellowthroat
R
3

Your code is "returning the series" because you didn't specify the right axis. Try this:

df.apply(' '.join, axis=0)
text    Hello world !
dtype: object

Specifying the axis=0 combines all the values from each column and puts them in a single string. The return type is a series where the index labels are the column names, and the values are the corresponding joined string. This is particularly useful if you want to combine more than one column into a single string at a time.

Generally I find that it is confusing to understand which axis you need when using apply, so if it doesn't work the way you think it should, always try applying along the other axis too.

Romanticism answered 27/6, 2021 at 3:44 Comment(2)
helpful description +10, but note that you're using df.apply whereas OP used df['text'].apply (Series.apply has no axis)Westbrooks
@Westbrooks that's true. This is because Series.apply generally works on single values at a time, more like DataFrame.applymap. From the Series.apply docs: "Invoke function on values of Series. Can be ufunc (a NumPy function that applies to the entire Series) or a Python function that only works on single values."Romanticism

© 2022 - 2024 — McMap. All rights reserved.