Summarize a list of Haskell records
Asked Answered
G

1

8

Let's say I have a list of records, and I want to summarize it by taking the median. More concretely, say I have

data Location = Location { x :: Double, y :: Double }

I have a list of measurements, and I want to summarize it into a median Location, so something like:

Location (median (map x measurements)) (median (map y measurements))

That is fine, but what if I have something more nested, such as:

data CampusLocation = CampusLocation { firstBuilding :: Location
                                      ,secondBuilding :: Location }

I have a list of CampusLocations and I want a summary CampusLocation, where the median is applied recursively to all fields.

What is the cleanest way to do this in Haskell? Lenses? Uniplate?

Edit: Bonus:

What if instead of a record containing fields we want to summarize, we had an implicit list instead? For example:

data ComplexCampus = ComplexCampus { buildings :: [Location] }

How can we summarize a [ComplexCampus] into a ComplexCampus, assuming that each of the buildings is the same length?

Gigantean answered 28/8, 2014 at 3:49 Comment(4)
I am suddenly imagining this kind of thing would fit a "dual" of lens traversals: "applicals" with type forall f. Traversable f => (f a -> b) -> (f s -> t). No idea if anyone thought of those yet.Balcke
@ØrjanJohansen I'm not sure whether this is relevant here, but there is a "cotraverse" in distributive.Pomatum
@AndrásKovács That does look relevant and I should have remembered that. By Kmett's usual naming scheme what I suggested (except with Functor, which he says is enough) would be a "cotraversal" then. To make the types in the question actually distributive, they would need to change Double into a type parameter.Balcke
From a theoretical point of view Lenses operate fairly well on general Profunctor transformers Profunctor p => p a b -> p s t where we recover the Van Laarhoven formulation by demanding that p is a "representable" Profunctor. So, you're asking for a "corepresentable" Profunctor in your lens.Bicapsular
N
5

Here is an implementation of summarize :: [ComplexCampus] -> ComplexCampus that uses Lenses w/ Uniplate (as you mentioned) to summarize a list of ComplexCampus a single ComplexCampus.

{-# Language TemplateHaskell,DeriveDataTypeable #-}
import Control.Lens
import Data.Data.Lens
import Data.Typeable
import Data.Data
import Data.List(transpose,genericLength)
data Location = Location { _x :: Double, _y :: Double } deriving(Show,Typeable,Data)


data CampusLocation =  CampusLocation { _firstBuilding :: Location, _firsecondBuilding :: Location }deriving(Show,Typeable,Data)
data ComplexCampus = ComplexCampus { _buildings :: [Location] } deriving(Show,Typeable,Data)


makeLenses ''Location
makeLenses ''CampusLocation
makeLenses ''ComplexCampus

l1 = Location 1 10
l2 = Location 2 20
l3 = Location 3 30


c1 = CampusLocation l1 l2
c2 = CampusLocation l2 l3
c3 = CampusLocation l1 l3
campusLocs = [c1,c2,c3]


c1' = ComplexCampus [l1, l2]
c2' = ComplexCampus [l2, l3]
c3' = ComplexCampus [l1, l3]
campusLocs' = [c1',c2',c3']


average l = (sum l) / (genericLength l)

-- returns average location for a list of locations
averageLoc locs = Location {
             _x = average $ locs ^.. biplate . x,
             _y = average $ locs ^.. biplate . y
             }


summarize :: [ComplexCampus] -> ComplexCampus
summarize ccs = ComplexCampus $ ccs ^.. biplate . buildings ^.. folding transpose . to averageLoc

Using biplate here is likely overkill, but regardless in averageLoc we use biplate on the list of locations to get all x fields and all y fields. If you wanted to summarize a ComplexCampus into a single Location we could use biplate to extract all x values and all y values from the top level ComplexBuilding.

For example:

campusLocs' ^.. biplate . x gives us all x values andcampusLocs' ^.. biplate . y gives us all y values

Likewise, to get all locations, we could just do:

(campusLocs' ^.. biplate) ::[Location]

Or, if we wanted every Double: (campusLocs' ^.. biplate) ::[Double]

Nur answered 28/8, 2014 at 6:18 Comment(9)
Why not just summarize = ComplexCampus . map averageLoc . transpose . map _buildings? I don't see essential use for lens/biplate here.Pomatum
sum xs / genericLength xs traverses xs twice and takes O(n) space. You want to use a strict fold that accumulates the sum and counts as it goes so you can traverse once in constant space. For more info, check out haskellforall.com/2013/08/composable-streaming-folds.htmlBurnet
@ReinHenrichs Anyway, the original question asks for median, not average, which I think basically rules out making it constant space.Balcke
@ØrjanJohansen Good point, but someone is going to teach someone else an average function, I'd prefer they didn't use sum xs / genericLength xs.Burnet
foldl now supports lenses/traversals, so you can now write: let average = liftA2 (/) sum genericLength in Location <$> pretraverse x average <*> pretraverse y average :: Fold Location Location to traverse the list just once, strictly, and in constant spacePeptidase
@GabrielGonzalez I was confused there until I googled, so I should point out you are talking about your foldl package based on the ideas in the blog post ReinHenrichs linked.Balcke
@GabrielGonzalez +1, foldl should always come with a link that it's a hackage library :)Dismay
@GabrielGonzalez BTW I think your pretraverse is too strictly typed. Without a type signature, it should be able to work with (unfortunate name clash there) Control.Lens.Fold.Fold.Balcke
@ØrjanJohansen Yeah, I think you're right. I will try to just go with the more general type, then.Peptidase

© 2022 - 2024 — McMap. All rights reserved.