Counting positive integer elements in a list with Python list comprehensions

Asked 24/5, 2010 at 20:26 Answered 10/12, 2022 at 13:37

Solved python list sum integer list-comprehension

I have a list of integers and I need to count how many of them are > 0.
I'm currently doing it with a list comprehension that looks like this:

sum([1 for x in frequencies if x > 0])

It seems like a decent comprehension but I don't really like the "1"; it seems like a bit of a magic number. Is there a more Pythonish way to do this?

Pinafore answered 24/5, 2010 at 20:26 Comment(2)

counting nonzero elements is not the same as counting elements > 0. The title should be modified accordingly – Anthracene 24/5, 2010 at 20:51

I updated the title of your question so that it reflects its contents. I hope this is fine with you. – Pasahow 26/5, 2010 at 7:28

100

If you want to reduce the amount of memory, you can avoid generating a temporary list by using a generator:

sum(x > 0 for x in frequencies)

This works because bool is a subclass of int:

>>> isinstance(True,int)
True

and True's value is 1:

>>> True==1
True

However, as Joe Golton points out in the comments, this solution is not very fast. If you have enough memory to use a intermediate temporary list, then sth's solution may be faster. Here are some timings comparing various solutions:

>>> frequencies = [random.randint(0,2) for i in range(10**5)]

>>> %timeit len([x for x in frequencies if x > 0])   # sth
100 loops, best of 3: 3.93 ms per loop

>>> %timeit sum([1 for x in frequencies if x > 0])
100 loops, best of 3: 4.45 ms per loop

>>> %timeit sum(1 for x in frequencies if x > 0)
100 loops, best of 3: 6.17 ms per loop

>>> %timeit sum(x > 0 for x in frequencies)
100 loops, best of 3: 8.57 ms per loop

Beware that timeit results may vary depending on version of Python, OS, or hardware.

Of course, if you are doing math on a large list of numbers, you should probably be using NumPy:

>>> frequencies = np.random.randint(3, size=10**5)
>>> %timeit (frequencies > 0).sum()
1000 loops, best of 3: 669 us per loop

The NumPy array requires less memory than the equivalent Python list, and the calculation can be performed much faster than any pure Python solution.

Boyd answered 24/5, 2010 at 20:30 Comment(9)

A variation: [x > 0 for x in frequencies].count(True) – Quitt 24/5, 2010 at 20:38

@Peter: note that your suggestion loops twice over the data; once to build the output list, and twice to count True values. – Mare 24/6, 2010 at 19:16

Relying on the boolean evaluation to be interpreted as 1 is a) arguably poor practice, and B) much slower. – Selfservice 18/7, 2012 at 23:7

+1 for slightly more readable. However, I found it takes about 52% longer (the function I tested counted the number of factors in large numbers). So only use for comprehensions with few iterations ( < 10,000? ). – Hobnail 9/7, 2013 at 15:8

@JoeGolton: Thanks for the comment. Indeed there are faster solutions, such as sth's, or by using NumPy. – Boyd 9/7, 2013 at 15:45

I'm surprised that list comprehension is faster than generator expression - it didn't even occur to me to try a list comprehension. Why is it so much faster? – Hobnail 9/7, 2013 at 18:35

@JoeGolton: There are so many factors here that have an impact on speed that it is hard to make any general statement about why one is faster than another. len being faster than sum is one such factor. My experience has been that with Python2 list comprehensions are often faster than generator expressions when you have enough memory. – Boyd 9/7, 2013 at 19:54

@Joe Golton: But every version of Python may be different -- In Python3 Guido van Rossum writes that "there is no longer a speed difference between the two". Though for me using Python3.1, the timeit results above remain roughly unchanged. The only surefire way I know to decide what is faster is to benchmark on a case-by-case basis. – Boyd 9/7, 2013 at 19:55

Thanks - it turns out that in my application the difference was minor, as the counts were low (as opposed to your example where the counts were high). So you're right - benchmarking case by case is the way to go. – Hobnail 9/7, 2013 at 20:57

A slightly more Pythonic way would be to use a generator instead:

sum(1 for x in frequencies if x > 0)

This avoids generating the whole list before calling sum().

Nanceenancey answered 24/5, 2010 at 20:30 Comment(4)

+1 because this is a commonly overlooked way of doing a comprehension. If you're evaluating a list comprehension from within a function call, you can omit the []. – Diastyle 24/5, 2010 at 20:43

Breaks if none of the elements match the criteria. – Agosto 8/8, 2011 at 20:4

@FogleBird: the sum() of an empty generator returns 0. – Nanceenancey 8/8, 2011 at 20:20

You're right. I got confused and was thinking of min() and max() – Agosto 8/8, 2011 at 20:59

You could use len() on the filtered list:

len([x for x in frequencies if x > 0])

Pallmall answered 24/5, 2010 at 20:29 Comment(6)

even better, to use a generator (strip [ and ]) – Adest 24/5, 2010 at 20:34

You could use filter with this to make it look more clear. len(filter(lambda x: x > 0, frequencies)) – Prokofiev 24/5, 2010 at 20:35

@Jonathan: I'd say it's a matter of taste if you prefer filter() or a list comprehension, but usually list comprehensions are preferred to functional programming style. (And the OP asked for a list comprehension.) – Pallmall 24/5, 2010 at 20:53

the OP actually only said (s)he is using a decent list comprehension right now, but didn't specifically ask for one. But your main point still holds, of course. – Quitt 24/5, 2010 at 21:2

@JonathanSternberg: in Python 3, that syntax won't work (you can't do a len() on a filter object). – Selfservice 18/7, 2012 at 23:1

@AdamParkin Not nearly as good, but you can just add "list(filter(...))" and len works again. Not nearly as good looking as a list comprehension though that would work in both languages (and wouldn't copy the list). But you're right, it won't work in Python 3. – Prokofiev 19/7, 2012 at 3:41

This works, but adding bools as ints may be dangerous. Please take this code with a grain of salt (maintainability goes first):

sum(k>0 for k in x)

Pinelli answered 24/5, 2010 at 20:34 Comment(1)

Adding booleans as integers is guaranteed to work in Python 2 and 3: #2764517 – Pasahow 26/5, 2010 at 7:30

If the array only contains elements >= 0 (i.e. all elements are either 0 or a positive integer) then you could just count the zeros and subtract this number form the length of the array:

len(arr) - arr.count(0)

Piegari answered 21/8, 2011 at 9:27 Comment(0)

How about this?

reduce(lambda x, y: x+1 if y > 0 else x, frequencies)

EDIT: With inspiration from the accepted answer from @~unutbu:

reduce(lambda x, y: x + (y > 0), frequencies)

Quitt answered 24/5, 2010 at 20:32 Comment(3)

I wish I had got a comment to go with that down vote to learn by my mistakes. Please? – Quitt 24/5, 2010 at 20:39

There seems to be a trend away from lambda functions toward list comprehensions. – Pinafore 28/5, 2010 at 23:31

I wasn't one to downvote you; however I would gather that people tend to frown upon reduce, it being phased out etc (by Guido proclamation). I like reduce, but I too frown upon its use in this case, since the sum(x > 0…) variant seems more straightforward to me. – Mare 24/6, 2010 at 19:20

I would like to point out that all said applies to lists. If we have a numpy array, there are solutions that will be at least fourty times faster...

Summing up all solutions given and testing for efficiency, plus adding some more (had to modify the reduce code to be able to run it in Python 3), note that the last answer is in micros, not millis:

code in copy-pastable format:

import random
import functools
frequencies = [random.randint(0,2) for i in range(10**5)]
from collections import Counter
import numpy as np

%timeit len([x for x in frequencies if x > 0])   # sth
%timeit sum([1 for x in frequencies if x > 0])
%timeit sum(1 for x in frequencies if x > 0)
%timeit sum(x > 0 for x in frequencies)
%timeit functools.reduce(lambda x, y: x + (y > 0), frequencies)
%timeit Counter(frequencies)

#'-------Numpy-----------------------')
%timeit ((np.array(frequencies))>0).sum()
npf=np.array(frequencies)
#'-------Numpy without conversion ---')
%timeit (npf>0).sum()

Stenographer answered 1/2, 2021 at 12:47 Comment(0)

You can also use numpy.count_nonzero like this:

import numpy as np
xs = [1,0,4,0,7]
print(np.count_nonzero(xs)) #3

Gatling answered 10/12, 2022 at 13:37 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags