Breaking Data.Set integrity without GeneralizedNewtypeDeriving
Asked Answered
R

1

18

The code below uses an unsafe GeneralizedNewtypeDeriving extension to break Data.Set by inserting different elements with different Ord instances:

{-# LANGUAGE GeneralizedNewtypeDeriving #-}
import Data.Set
import System.Random

class AlaInt i where
  fromIntSet :: Set Integer -> Set i
  toIntSet :: Set i -> Set Integer
instance AlaInt Integer where
  fromIntSet = id
  toIntSet = id
newtype I = I Integer deriving (Eq, Show, AlaInt)
instance Ord I where compare (I n1) (I n2) = compare n2 n1 -- sic!  

insert' :: Integer -> Set Integer -> Set Integer
insert' n s = toIntSet $ insert (I n) $ fromIntSet s

randomInput = take 5000 $ zip (randomRs (0,9) gen) (randoms gen) where
    gen = mkStdGen 911

createSet = Prelude.foldr f empty where
    f (e,True) = insert e
    f (e,False) = insert' e

main = print $ toAscList $ createSet randomInput

The code prints [1,3,5,7,8,6,9,6,4,2,0,9]. Note that the list is unordered and has 9 twice.

Is it possible to perform this dictionary swapping attack using other extensions, e.g. ConstraintKinds? If yes, can Data.Set be redesigned to be resilient to such attacks?

Rill answered 4/10, 2012 at 20:23 Comment(1)
possible duplicate of Is there a list of GHC extensions that are considered 'safe'?Impolite
B
21

I think that's an important question, so I'll repeat my answer from elsewhere: you can have multiple instances of the same class for the same type in Haskell98 without any extensions at all:

$ cat A.hs
module A where
data U = X | Y deriving (Eq, Show)

$ cat B.hs
module B where
import Data.Set
import A
instance Ord U where
    compare X X = EQ
    compare X Y = LT
    compare Y X = GT
    compare Y Y = EQ
ins :: U -> Set U -> Set U
ins = insert

$ cat C.hs
module C where
import Data.Set
import A
instance Ord U where
    compare X X = EQ
    compare X Y = GT
    compare Y X = LT
    compare Y Y = EQ
ins' :: U -> Set U -> Set U
ins' = insert

$ cat D.hs
module D where
import Data.Set
import A
import B
import C
test = ins' X $ ins X $ ins Y $ empty

$ ghci D.hs
Prelude D> test
fromList [X,Y,X]

And yes, you can prevent this kind of attacks by storing the dictionary internally:

data MSet a where MSet :: Ord a => Set a -> MSet a
Bushwhack answered 5/10, 2012 at 10:44 Comment(5)
test = (ins X . ins' X . ins' Y) empty reproduces the problemRill
I would think that storing the dictionary internally prevents you from implementing efficient union and intersection operations, though.Uriel
Could be. Using just one of the dictionaries supplied is as bad as using an external one.Bushwhack
A slight correction. These modules do not form a valid Haskell 98 program since they define conflicting instances for Ord U. However, GHC accepts them anyways since GHC does not do a global instance uniqueness check: it only does a check when it needs to solve a constraint, which happens in modules B and C, in each of which only one instance is visible. This is a known (and long-standing) deficiency of GHC. (What is not known is how many Haskell programs out there rely on this lack of a global uniqueness check...)Clarabelle
That's true. However, I don't think the problem of verifying global uniqueness can be solved in general — I mean, when you have some extensions enabled, such as OverlappingInstances for example. And GHC supports tons of extensions.Bushwhack

© 2022 - 2024 — McMap. All rights reserved.