What's the algorithm of 'set.intersection()' in python?
Asked Answered
A

1

10

First of all, my purpose is to randomly get only one element in both known sets. So my original method is firstly intersect two sets. And then randomly pick up a element from the intersected set. But this is foolish, because that I only need a elements but a intersected set.

So I need to find the algorithm of set.intersection().

I compare the cost time between the methods of 'set.intersection()' and 'for{for{}}'. Set.intersection() is more faster than other one(100 times). So using 'for{for{}}' to pick up a randomly elements is not a wise idea.

What's the algorithm behind set.intersection() in python?

Alaska answered 20/11, 2013 at 15:28 Comment(2)
The CPython one, the Jython, the IronPython one or the pypy one? :p... As long as a correct result is returned when set.intersection is called, then any implementation is free to do it how it feels. You're free to download/look at the source code for any of the implementations to see how they do it...Comeuppance
what's your real use model? the real question is 'what's the fastest way to get a random element from an intersection of two sets?' and that probably depends on whether your data is originally a set or not.Dole
C
16

The algorithm is as follows: the smaller set is looped over and every element is copied depending whether it's found in the bigger set. So, it's the C equivalent of

def intersect(a, b):
    if len(a) > len(b):
        a, b = b, a

    c = set()
    for x in a:
        if x in b:
            c.add(x)
    return c

(Or: return set(x for x in a if x in b).)

Commonage answered 20/11, 2013 at 15:37 Comment(4)
It's somewhat different where set.intersection is provided with non-set iterables though (and where there's more than one iterable)Comeuppance
@JonClements: in that case it only skips the swapping. The first argument is required to be a set.Commonage
Interesting. Is there any way to ensure that x is coming from a particular set, or is it always the larger one?Doukhobor
@M.JacksonWilkinson: it's always the smaller one, because when you iterate over the larger set, you're guaranteed to be processing some elements that are not in the smaller set.Commonage

© 2022 - 2024 — McMap. All rights reserved.