Counting positive integer elements in a list with Python list comprehensions
Asked Answered
P

8

63

I have a list of integers and I need to count how many of them are > 0.
I'm currently doing it with a list comprehension that looks like this:

sum([1 for x in frequencies if x > 0])

It seems like a decent comprehension but I don't really like the "1"; it seems like a bit of a magic number. Is there a more Pythonish way to do this?

Pinafore answered 24/5, 2010 at 20:26 Comment(2)
counting nonzero elements is not the same as counting elements > 0. The title should be modified accordinglyAnthracene
I updated the title of your question so that it reflects its contents. I hope this is fine with you.Pasahow
B
100

If you want to reduce the amount of memory, you can avoid generating a temporary list by using a generator:

sum(x > 0 for x in frequencies)

This works because bool is a subclass of int:

>>> isinstance(True,int)
True

and True's value is 1:

>>> True==1
True

However, as Joe Golton points out in the comments, this solution is not very fast. If you have enough memory to use a intermediate temporary list, then sth's solution may be faster. Here are some timings comparing various solutions:

>>> frequencies = [random.randint(0,2) for i in range(10**5)]

>>> %timeit len([x for x in frequencies if x > 0])   # sth
100 loops, best of 3: 3.93 ms per loop

>>> %timeit sum([1 for x in frequencies if x > 0])
100 loops, best of 3: 4.45 ms per loop

>>> %timeit sum(1 for x in frequencies if x > 0)
100 loops, best of 3: 6.17 ms per loop

>>> %timeit sum(x > 0 for x in frequencies)
100 loops, best of 3: 8.57 ms per loop

Beware that timeit results may vary depending on version of Python, OS, or hardware.

Of course, if you are doing math on a large list of numbers, you should probably be using NumPy:

>>> frequencies = np.random.randint(3, size=10**5)
>>> %timeit (frequencies > 0).sum()
1000 loops, best of 3: 669 us per loop

The NumPy array requires less memory than the equivalent Python list, and the calculation can be performed much faster than any pure Python solution.

Boyd answered 24/5, 2010 at 20:30 Comment(9)
A variation: [x > 0 for x in frequencies].count(True)Quitt
@Peter: note that your suggestion loops twice over the data; once to build the output list, and twice to count True values.Mare
Relying on the boolean evaluation to be interpreted as 1 is a) arguably poor practice, and B) much slower.Selfservice
+1 for slightly more readable. However, I found it takes about 52% longer (the function I tested counted the number of factors in large numbers). So only use for comprehensions with few iterations ( < 10,000? ).Hobnail
@JoeGolton: Thanks for the comment. Indeed there are faster solutions, such as sth's, or by using NumPy.Boyd
I'm surprised that list comprehension is faster than generator expression - it didn't even occur to me to try a list comprehension. Why is it so much faster?Hobnail
@JoeGolton: There are so many factors here that have an impact on speed that it is hard to make any general statement about why one is faster than another. len being faster than sum is one such factor. My experience has been that with Python2 list comprehensions are often faster than generator expressions when you have enough memory.Boyd
@Joe Golton: But every version of Python may be different -- In Python3 Guido van Rossum writes that "there is no longer a speed difference between the two". Though for me using Python3.1, the timeit results above remain roughly unchanged. The only surefire way I know to decide what is faster is to benchmark on a case-by-case basis.Boyd
Thanks - it turns out that in my application the difference was minor, as the counts were low (as opposed to your example where the counts were high). So you're right - benchmarking case by case is the way to go.Hobnail
N
35

A slightly more Pythonic way would be to use a generator instead:

sum(1 for x in frequencies if x > 0)

This avoids generating the whole list before calling sum().

Nanceenancey answered 24/5, 2010 at 20:30 Comment(4)
+1 because this is a commonly overlooked way of doing a comprehension. If you're evaluating a list comprehension from within a function call, you can omit the [].Diastyle
Breaks if none of the elements match the criteria.Agosto
@FogleBird: the sum() of an empty generator returns 0.Nanceenancey
You're right. I got confused and was thinking of min() and max()Agosto
P
10

You could use len() on the filtered list:

len([x for x in frequencies if x > 0])
Pallmall answered 24/5, 2010 at 20:29 Comment(6)
even better, to use a generator (strip [ and ])Adest
You could use filter with this to make it look more clear. len(filter(lambda x: x > 0, frequencies))Prokofiev
@Jonathan: I'd say it's a matter of taste if you prefer filter() or a list comprehension, but usually list comprehensions are preferred to functional programming style. (And the OP asked for a list comprehension.)Pallmall
the OP actually only said (s)he is using a decent list comprehension right now, but didn't specifically ask for one. But your main point still holds, of course.Quitt
@JonathanSternberg: in Python 3, that syntax won't work (you can't do a len() on a filter object).Selfservice
@AdamParkin Not nearly as good, but you can just add "list(filter(...))" and len works again. Not nearly as good looking as a list comprehension though that would work in both languages (and wouldn't copy the list). But you're right, it won't work in Python 3.Prokofiev
P
4

This works, but adding bools as ints may be dangerous. Please take this code with a grain of salt (maintainability goes first):

sum(k>0 for k in x)
Pinelli answered 24/5, 2010 at 20:34 Comment(1)
Adding booleans as integers is guaranteed to work in Python 2 and 3: #2764517Pasahow
P
4

If the array only contains elements >= 0 (i.e. all elements are either 0 or a positive integer) then you could just count the zeros and subtract this number form the length of the array:

len(arr) - arr.count(0)
Piegari answered 21/8, 2011 at 9:27 Comment(0)
Q
2

How about this?

reduce(lambda x, y: x+1 if y > 0 else x, frequencies)

EDIT: With inspiration from the accepted answer from @~unutbu:

reduce(lambda x, y: x + (y > 0), frequencies)

Quitt answered 24/5, 2010 at 20:32 Comment(3)
I wish I had got a comment to go with that down vote to learn by my mistakes. Please?Quitt
There seems to be a trend away from lambda functions toward list comprehensions.Pinafore
I wasn't one to downvote you; however I would gather that people tend to frown upon reduce, it being phased out etc (by Guido proclamation). I like reduce, but I too frown upon its use in this case, since the sum(x > 0…) variant seems more straightforward to me.Mare
S
1

I would like to point out that all said applies to lists. If we have a numpy array, there are solutions that will be at least fourty times faster...

Summing up all solutions given and testing for efficiency, plus adding some more (had to modify the reduce code to be able to run it in Python 3), note that the last answer is in micros, not millis: enter image description here

code in copy-pastable format:

import random
import functools
frequencies = [random.randint(0,2) for i in range(10**5)]
from collections import Counter
import numpy as np

%timeit len([x for x in frequencies if x > 0])   # sth
%timeit sum([1 for x in frequencies if x > 0])
%timeit sum(1 for x in frequencies if x > 0)
%timeit sum(x > 0 for x in frequencies)
%timeit functools.reduce(lambda x, y: x + (y > 0), frequencies)
%timeit Counter(frequencies)

#'-------Numpy-----------------------')
%timeit ((np.array(frequencies))>0).sum()
npf=np.array(frequencies)
#'-------Numpy without conversion ---')
%timeit (npf>0).sum()
Stenographer answered 1/2, 2021 at 12:47 Comment(0)
G
0

You can also use numpy.count_nonzero like this:

import numpy as np
xs = [1,0,4,0,7]
print(np.count_nonzero(xs)) #3
Gatling answered 10/12, 2022 at 13:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.