Number of elements in Python Set
Asked Answered
N

5

7

I have a list of phone numbers that have been dialed (nums_dialed). I also have a set of phone numbers which are the number in a client's office (client_nums) How do I efficiently figure out how many times I've called a particular client (total)

For example:

>>>nums_dialed=[1,2,2,3,3]
>>>client_nums=set([2,3])
>>>???
total=4

Problem is that I have a large-ish dataset: len(client_nums) ~ 10^5; and len(nums_dialed) ~10^3.

Neoterism answered 27/3, 2010 at 7:23 Comment(0)
I
13

which client has 10^5 numbers in his office? Do you do work for an entire telephone company?

Anyway:

print sum(1 for num in nums_dialed if num in client_nums)

That will give you as fast as possible the number.


If you want to do it for multiple clients, using the same nums_dialed list, then you could cache the data on each number first:

nums_dialed_dict = collections.defaultdict(int)
for num in nums_dialed:
    nums_dialed_dict[num] += 1

Then just sum the ones on each client:

sum(nums_dialed_dict[num] for num in this_client_nums)

That would be a lot quicker than iterating over the entire list of numbers again for each client.

Invalidity answered 27/3, 2010 at 7:29 Comment(6)
or simply: len(filter(lambda x: x in client_nums, nums_dialed))Red
@ony: that's not simpler. And it is slower and uses more memory, because it has to build an entire new list before calculating the len. list comprehensions and generator expressions are preferable to using filter and map for most cases, specially when you need to create and call a new function to use them.Invalidity
@nosklo: sorry, my mistake. yes "filter" and "map" in python2 uses lists, but that will be very strange if "len" in python3 will force list building.Red
even shorter: print sum(num in client_nums for num in nums_dialed)Rightwards
@ony: yes. It would be very strange if len forced list building. Instead, it fails with TypeError on any generators, on both python 2 and 3. Including itertools.imap and itertools.ifilter on python 2, and map and filter on python 3. Those just aren't the right tool for this job.Invalidity
@newacct: Yes, this works, but in this case I don't feel shorter is better - using 1 makes it clear about what's on the sum, while relying on the fact that True is an integer seems hackish.Invalidity
J
2
>>> client_nums = set([2, 3])
>>> nums_dialed = [1, 2, 2, 3, 3]
>>> count = 0
>>> for num in nums_dialed:
...   if num in client_nums:
...     count += 1
... 
>>> count
4
>>> 

Should be quite efficient even for the large numbers you quote.

Jennings answered 27/3, 2010 at 7:24 Comment(0)
R
2

Using collections.Counter from Python 2.7:

dialed_count = collections.Counter(nums_dialed)
count = sum(dialed_count[t] for t in client_nums)
Rigsdaler answered 27/3, 2010 at 8:31 Comment(0)
R
0

Thats very popular way to do some combination of sorted lists in single pass:

nums_dialed = [1, 2, 2, 3, 3]
client_nums = [2,3]

nums_dialed.sort()
client_nums.sort()

c = 0
i = iter(nums_dialed)
j = iter(client_nums)
try:
    a = i.next()
    b = j.next()
    while True:
        if a < b:
            a = i.next()
            continue
        if a > b:
            b = j.next()
            continue
        # a == b
        c += 1
        a = i.next() # next dialed
except StopIteration:
    pass

print c

Because "set" is unordered collection (don't know why it uses hashes, but not binary tree or sorted list) and it's not fair to use it there. You can implement own "set" through "bisect" if you like lists or through something more complicated that will produce ordered iterator.

Red answered 27/3, 2010 at 8:4 Comment(0)
V
0

The method I use is to simply convert the set into a list and then use the len() function to count its values.

set_var = {"abc", "cba"}
print(len(list(set_var)))

Output:

2
Vitus answered 12/8, 2022 at 11:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.