Python: Sum string lengths
Asked Answered
A

8

19

Is there a more idiomatic way to sum string lengths in Python than by using a loop?

length = 0
for string in strings:
    length += len(string)

I tried sum(), but it only works for integers:

>>> sum('abc', 'de')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: sum() can't sum strings [use ''.join(seq) instead]
Audie answered 23/9, 2010 at 16:17 Comment(3)
What do you mean by "quicker"? Less typing or faster execution?Bellamy
@Richard: Sorry, I was thinking "quicker" as in less typing, but what I actually mean is idiomatic.Audie
No worries. I think that's what everybody else figured. I'm just a pedant!Bellamy
B
5

I know this is an old question, but I can't help noting that the Python error message tells you how to do this:

TypeError: sum() can't sum strings [use ''.join(seq) instead]

So:

>>> strings = ['abc', 'de']
>>> print len(''.join(strings))
5
Bloch answered 20/9, 2014 at 1:15 Comment(8)
It seems wasteful to concatenate the strings when you don't have to, but +1 for adding another way of solving the problem!Audie
I don't know - I long since stopped wondering whether code was CPU wasteful for non-realtime systems. But since you mentioned "less typing" this looks pretty tight.Bloch
@Audie Wasteful? This is by far the fastest of the three solutions, if the timeit module is to be believed. The answer you accepted, sum(len(s) for s in strings), is over three times as slow, and is also almost twice as slow as sum(map(len, strings)). (Speed of course doesn't matter much in Python -- if you wanted speed you'd be using Pypy, as the saying goes -- but the full generator expression is also IMO a bit of an eyesore compared to the others.)Excellency
The other answers are more generic and useful, as they also answer the question when the element type of the list is not a string.Sherlock
@Sherlock The other answers answer a question that was not asked! The question was about lists of strings, with the answer provided by the error message. Why overcomplicate things?Bloch
The other answers answer the asked question as well as a more general question. They are not any more complicated than this answer. The votes bear this out. I'm at a loss as to why this became the accepted answer 4 years after the initial question and initial great answers. It is an interesting and correct answer, but imho not any simpler than the generic answers.Sherlock
@Sherlock Seriously? You think using map is simpler? Honestly, I don't know how it became the accepted answer either—I assume the OP liked this better, because another answer was originally accepted—but it's accepted because it's both simple, canonical, and apparently fast (which wasn't any consideration of mine when I wrote it). And, btw, the top voted question will not necessarily work for non-strings (e.g., strings=[1,2,3]). It, too, was only answering the actual question.Bloch
I do think that both the comprehension and the map answers are simpler both from a code and conceptual perspective. My initial comment was just to point out that other answers below also solved the problem when not dealing with strings. Why did I post it? Because I arrived at this question and answer through google, when searching for how to sum the lengths of something I had in a list, and my initial reaction upon arrival was - "nuts, this question is only about strings." But then I scrolled down and found other answers that solved a more general problem and helped me.Sherlock
D
49
length = sum(len(s) for s in strings)
Destructionist answered 23/9, 2010 at 16:19 Comment(3)
This is definitely a more idiomatic way of expressing it but I don't think it's any more efficient computationally. Still, +1 for elegance and Pythonicness!Bellamy
If you're really worried about computational efficiency, you probably shouldn't use Python, or should write the computation-intense part in C or C++ (or SciPy's weave library if you're brave). I like this style because it's more legible to other Python developers.Alla
Thanks, this is much shorter and easier to understand than my code.Audie
T
18

My first way to do it would be sum(map(len, strings)). Another way is to use a list comprehension or generator expression as the other answers have posted.

Towne answered 23/9, 2010 at 16:22 Comment(2)
Good answer, but I've accepted liori's answer because I found it more idiomatic.Audie
@Josh: Most people will indeed find the genexp more pythonic. I just wanted to add this for completeness.Towne
P
7

The shortest and fastest way is apply a functional programming style with map() and sum():

>>> data = ['a', 'bc', 'def', 'ghij']
>>> sum(map(len, data))
10

In Python 2, use itertools.imap instead of map for better memory performance:

>>> from itertools import imap
>>> data = ['a', 'bc', 'def', 'ghij']
>>> sum(imap(len, data))
10
Peculiarity answered 17/3, 2017 at 7:35 Comment(0)
B
5

I know this is an old question, but I can't help noting that the Python error message tells you how to do this:

TypeError: sum() can't sum strings [use ''.join(seq) instead]

So:

>>> strings = ['abc', 'de']
>>> print len(''.join(strings))
5
Bloch answered 20/9, 2014 at 1:15 Comment(8)
It seems wasteful to concatenate the strings when you don't have to, but +1 for adding another way of solving the problem!Audie
I don't know - I long since stopped wondering whether code was CPU wasteful for non-realtime systems. But since you mentioned "less typing" this looks pretty tight.Bloch
@Audie Wasteful? This is by far the fastest of the three solutions, if the timeit module is to be believed. The answer you accepted, sum(len(s) for s in strings), is over three times as slow, and is also almost twice as slow as sum(map(len, strings)). (Speed of course doesn't matter much in Python -- if you wanted speed you'd be using Pypy, as the saying goes -- but the full generator expression is also IMO a bit of an eyesore compared to the others.)Excellency
The other answers are more generic and useful, as they also answer the question when the element type of the list is not a string.Sherlock
@Sherlock The other answers answer a question that was not asked! The question was about lists of strings, with the answer provided by the error message. Why overcomplicate things?Bloch
The other answers answer the asked question as well as a more general question. They are not any more complicated than this answer. The votes bear this out. I'm at a loss as to why this became the accepted answer 4 years after the initial question and initial great answers. It is an interesting and correct answer, but imho not any simpler than the generic answers.Sherlock
@Sherlock Seriously? You think using map is simpler? Honestly, I don't know how it became the accepted answer either—I assume the OP liked this better, because another answer was originally accepted—but it's accepted because it's both simple, canonical, and apparently fast (which wasn't any consideration of mine when I wrote it). And, btw, the top voted question will not necessarily work for non-strings (e.g., strings=[1,2,3]). It, too, was only answering the actual question.Bloch
I do think that both the comprehension and the map answers are simpler both from a code and conceptual perspective. My initial comment was just to point out that other answers below also solved the problem when not dealing with strings. Why did I post it? Because I arrived at this question and answer through google, when searching for how to sum the lengths of something I had in a list, and my initial reaction upon arrival was - "nuts, this question is only about strings." But then I scrolled down and found other answers that solved a more general problem and helped me.Sherlock
E
2
print(sum(len(mystr) for mystr in strings))
Eosin answered 23/9, 2010 at 16:20 Comment(0)
E
2

TLDR

If you care about performance use

len(''.join(strings))

else using map will suffice without sacrificing code readability or a lot of performance

sum(map(len, strings))

Performance metrics

Although I agree with the general consensus that when using Python your first priority should not be writing efficient and scalable code, I think it would be beneficial for this post to have some timings for the proposed answers.

Using the words from the first paragraph of lorem ipsum (list of strings excluded for the sake of brevity)

In [3]: timeit("""
    ...: length = 0
    ...: for s in strings:
    ...:     length += len(s)
    ...: """, globals=globals())
Out[3]: 5.197531974001322

In [4]: timeit("sum(len(s) for s in strings)", globals=globals())
Out[4]: 4.925184353021905

In [5]: timeit("sum(map(len, strings))", globals=globals())
Out[5]: 1.9876644779578783

In [6]: timeit("len(''.join(strings))", globals=globals())
Out[6]: 0.6793132669990882

So for large list of strings @Auspex is clearly to be prefered.

Ellie answered 26/2, 2022 at 19:47 Comment(0)
S
1

Here's another way using operator. Not sure if this is easier to read than the accepted answer.

import operator

length = reduce(operator.add, map(len, strings))

print length
Souther answered 10/2, 2014 at 13:46 Comment(0)
C
-1

Just to add upon ...

Adding numbers from a list stored as a string

nos = ['1','14','34']

length = sum(int(s) for s in nos)

Corespondent answered 11/3, 2014 at 14:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.