why Data.Set has no powerset function?
Asked Answered
G

1

8

I was looking at Data.Set, and I found out that it has no powerset function. Why?

I can implement it like this:

import Data.Set (Set, empty, fromList, toList, insert)

powerset :: (Ord a) => Set a -> Set (Set a)
powerset s = fromList $ map (fromList) (powerList $ toList s)

powerList :: [a] -> [[a]]
powerList [] = [[]]
powerList (x:xs) = powerList xs ++ map (x:) (powerList xs)

But this doesn't seem the most efficient way of doing it. OK, I can also write

powerList :: [a] -> [[a]]
powerList = filterM (const [True, False])

but still, I wonder why Data.Set has no powerset function.

Also, what is the best way to write powerset :: (Ord a) => Set a -> Set (Set a)?

Glib answered 21/6, 2011 at 15:53 Comment(3)
It isn't a very commonly used function.Intimidate
I think the interesting thing here is if we can write something taking advantage of the special properties of Set to be more efficient than a naive solution. But I don't think such a thing is very easy. And since Set is spine-strict, and since powerSet generates 2^n elements regardless, even if you make things as efficient as possible, you're not getting a whole bunch of mileage out of those optimizations...Buckjump
@sclv: The Ord instance on Set is lexicographical order on the sorted list of elements. When constructing the power set, first remove the least element x and take the power set of the remainder, to get y. Every element of y will be greater than every element of map (insert x) y.Venettavenezia
C
12

Funny, I actually implemented powerset in Haskell the other day just for fun in a comment at /r/python.

import Data.Set
import Prelude hiding (map)

powerset s
    | s == empty = singleton empty
    | otherwise = map (insert x) pxs `union` pxs
        where (x, xs) = deleteFindMin s
              pxs = powerset xs

It is much as camccann described in his comment above. Set's fast union should give it a speed boost over the list version.

Cornelison answered 21/6, 2011 at 17:16 Comment(10)
Note that this is probably still less efficient than an algorithm with access to the internals of Set. The representation is a balanced binary tree, I believe, and the union here will always be two sets of equal size with all elements larger/smaller than all elements of the other, so it should be sufficient to merely sum the sizes and make a new root with the two sets as branches.Venettavenezia
@camccann this is true. I would be very interested to see such an implementation, and an analysis of its efficiency.Cornelison
Also, this algorithm may do unnecessary work deleting and inserting elements from the inner sets. Since only minimal elements are used, there are probably shortcuts available to avoid some rebalancing at intermediate steps. Anyway, I would be surprised if some analysis of this doesn't already exist, since powersets and balanced binary trees are not exactly obscure concepts.Venettavenezia
Out of curiousity, is this any more efficient than map Data.Set.fromAscList . filterM (const [False,True]) . Data.Set.toList? If you're going to enumerate 2^n subsets anyway, why not let the lists do the heavy work? You lose sharing (in a big way!) but that's the only negative I can think of.Vallo
I bet you can't use powerset if your original set has more than, say, 20 elements.Intimidate
I didn't mean your particular implementation, I meant in general.Intimidate
@augustss: Actually, the cute filterM version seems to work fine for me in GHCi with 20 elements. It started going downhill around 18 though, and I doubt it'd make it to 30.Venettavenezia
@camccann: I was in fact wondering why Data.Set does not provide an efficient implementation of powerset exploiting its internals.Glib
@MarcoS: That sort of question is hard to answer without asking the people who implemented it, but probably because of the point that augustss is indirectly making: The size of the power set grows exponentially, which leaves a very narrow range of inputs expensive enough to care about efficiency, but cheap enough to compute at all.Venettavenezia
The corollary to this is that, if someone really needs to work with the power set of a larger input, they'll probably want to roll their own algorithms anyway instead of using Data.Set, in order to exploit any additional regularity of the problem domain or such.Venettavenezia

© 2022 - 2024 — McMap. All rights reserved.