Python data structure for efficient add, remove, and random.choice
Asked Answered
F

2

34

I'm looking for a built-in Python data structure that can add a new element, remove an existing element, and choose a random element, all in better than O(n) time.

I was hoping that set could do this, but AFAIK, the only way to choose a random element from a Python set is random.choice(list(my_set)), which takes O(n) time.

I would greatly prefer a solution that's built into Python, since I require efficiency and easy deployment. Unfortunately, Python does not seem to have built-in tree data types.

Fanestil answered 13/4, 2013 at 22:6 Comment(1)
This is maybe a the problem of interface design. random selecting in tree/hashmap is not hard, but even C++ STL's map/unordered_map does not support random selecting.Mamie
G
33

Python does not have a built-in data structure which meets all 3 of your requirements.

That said, it's fairly trivial to implement a tree yourself.


Another option would be to combine a dictionary with a list to create what is effectively a set that also maintains a list of its items:

import random

class ListDict(object):
    def __init__(self):
        self.item_to_position = {}
        self.items = []

    def add_item(self, item):
        if item in self.item_to_position:
            return
        self.items.append(item)
        self.item_to_position[item] = len(self.items)-1

    def remove_item(self, item):
        position = self.item_to_position.pop(item)
        last_item = self.items.pop()
        if position != len(self.items):
            self.items[position] = last_item
            self.item_to_position[last_item] = position

    def choose_random_item(self):
        return random.choice(self.items)

Since the only operations done on the list are .pop(), .append(), and index retrieval and assignment, they shouldn't take more than constant time (in most Python implementations, at least).

You can extend the above definition with extra methods to support other useful operations, like len, in, and iteration:

class ListDict(object):
    ... # methods from above

    def __contains__(self, item):
        return item in self.item_to_position

    def __iter__(self):
        return iter(self.items)

    def __len__(self):
        return len(self.items)
Golem answered 13/4, 2013 at 22:13 Comment(6)
Implementing an efficient, self-balancing tree is not trivial.Fanestil
@Fanestil shrug I'd say that's subjective. Anyhow, you don't even need a tree for this - see edited answer.Golem
Trivial note: might as well use dict.pop to replace the first four lines of remove_item.Shishko
Is it possible to also get an O(1) set containment operation? Other then just maintaining a parallel set over all items.Shaitan
Oh, of course. You can just do item in self.item_to_position. Easy.Shaitan
@user2357112 I added an answer that is a modification/expansion of this answer.Roxi
R
0

This is a variation on @Amber and @user2357112 's answer. It's implemented as a subclass of list, and handles random.choice() directly because of that. It still quacks a lot like a set. It does a couple more common operations of both those types, but I'm sure it doesn't emulate either of them completely or perfectly.


class ListSet(list):
    """
    Derived from: https://stackoverflow.com/a/15993515
    
    Initialization: 
        Same as with list (basically same as with set):
        (including sets, lists, tuples, 
        generator expressions and range() expressions.)
        
    Setlike behaviors:
        for x in S():
        if x in S:  This is constant time.
        S.add(x)     "    "    "      "
        S.remove(x)  "    "    "      "
        len(s)       "    "    "      "
        print(S), str(S), repr(S)
        copy(S), list(S), set(S)  copy or cast.
        
    Listlike behaviors use internal list positions:
        S[3], S[-1]  # Returns an item.
        S[10:20]     # Returns a list, not ListSet.
        random.choice(S)  This is constant time.
        S += [4, 5, 6]
        S.append(7), same as S.add(7).
        len(s) in constant time.
        x = S.pop()  # Pop last element from internal list.
        y = S.pop(3) # Pop S[3] from internal list.
    """
    def __init__(self, *args, idx_of=None):
        super(ListSet, self).__init__(*args)
        if idx_of:
            self.idx_of = idx_of.copy()
        else:
            self.idx_of = {item: i
                          for (i, item) in enumerate(self)}

    def add(self, item):
        if item not in self.idx_of:
            super(ListSet, self).append(item)
            self.idx_of[item] = len(self) - 1
        
    def append(self, item):
        """
        append and += always append to the internal list,
        but remove and pop from other than the end 
        can change the internal list's order.
        """
        self.add(item)
        
    def copy(self):
        """ Return a shallow copy of the ListSet. """
        return ListSet(self, idx_of=self.idx_of)
    
    def list(self):
        return super(ListSet, self).copy()
        
    def __iadd__(self, items):
        """ self += items """
        for item in items:
            self.add(item)
        return self

    def remove(self, element):
        """
        Remove an element from a set; it must be a member.

        If the element is not a member, raise a KeyError.
        """
        try:
            position = self.idx_of.pop(element)
        except:
            raise(KeyError(element))
            
        last_item = super(ListSet, self).pop()
        if position != len(self):
            self[position] = last_item
            self.idx_of[last_item] = position
            
    def pop(self, i=-1):
        """ Remove by internal list position. """
        item = self[i]
        self.remove(item)
        return item
    
    def __contains__(self, item):
        return item in self.idx_of
    
    def _str_body(self):
        return ", ".join(repr(item) for item in self)
    
    def __repr__(self):
        return "ListSet([" + self._str_body() + "])"
    
    def __str__(self):
        if self:
            return "{" + self._str_body() + "}"
        else:
            return "ListSet()"

Roxi answered 3/11, 2023 at 6:26 Comment(1)
@user2357112 this is a variation on something you edited.Roxi

© 2022 - 2024 — McMap. All rights reserved.