Scala: generic weighted average function
Asked Answered
U

2

8

I want to implement a generic weighted average function which relaxes the requirement on the values and the weights being of the same type. ie, I want to support sequences of say: (value:Float,weight:Int) and (value:Int,weight:Float) arguments and not just: (value:Int,weight:Int). [See my earlier question in the run up to this.]

This is what I currently have:

def weightedSum[A: Numeric](weightedValues: GenSeq[(A, A)]): (A, A)

def weightedAverage[A: Numeric](weightedValues: GenSeq[(A, A)]): A = {
    val (weightSum, weightedValueSum) = weightedSum(weightedValues)
    implicitly[Numeric[A]] match {
        case num: Fractional[A] => ...
        case num: Integral[A] => ...
        case _ => sys.error("Undivisable numeric!")
    }
}

This works perfectly if I feed it for example:

val values:Seq[(Float,Float)] = List((1,2f),(1,3f))
val avg= weightedAverage(values)

However if I don't "upcast" the weights from Int to Float:

val values= List((1,2f),(1,3f)) //scalac sees it as Seq[(Int,Float)] 
val avg= weightedAverage(values)

Scala compiler will tell me:

error: could not find implicit value for evidence parameter of type Numeric[AnyVal]
val avg= weightedAverage(values)

Is there a way of getting round this?

I had an attempt at writing a NumericCombine class that I parameterized with A and B which "combines" the types into a "common" type AB (for example, combining Float and Int gives you Float) :

abstract class NumericCombine[A: Numeric, B: Numeric] {
    type AB <: AnyVal

    def fromA(x: A): AB
    def fromB(y: B): AB
    val num: Numeric[AB]

    def plus(x: A, y: B): AB = num.plus(fromA(x), fromB(y))
    def minus(x: A, y: B): AB = num.minus(fromA(x), fromB(y))
    def times(x: A, y: B): AB = num.times(fromA(x), fromB(y))
}

and I managed to write simple times and plus functions based on this with the typeclass pattern, but since NumericCombine introduces a path-dependent type AB, "composing" the types is proving to be more difficult than I expected. look at this question for more information and see here for the full implementation of NumericCombine.

Update

A somewhat satisfactory solution has been obtained as an answer to another question (full working demo here) however there is still room for some design improvement taking into account the points raised in the discussion with @ziggystar.

Underweight answered 13/4, 2017 at 1:52 Comment(5)
Yeah, what you're bumping up against is similar to something I asked about a while back. I couldn't find a satisfactory answer.Tier
#41093583Question
@Tier Thanks. yeah I had come across that question. I don't understand why the added implicits don't do anything though.Underweight
@Question Thanks for the link. yeah I understand how the implicit evidence parameter works. I am trying to get scalac to not opt for the lowest denimonator of Int and Float = AnyVal as clearly there is no Numeric[AnyVal].Underweight
@Tier any way this would work: #43393097Underweight
H
4

Linear combinations

I think the more general task that involves weighing/scaling some elements of type T by a scalar of type S is that of a linear combination. Here are the constraints on the weights for some tasks:

So the most general case according to this classification is the linear combination. According to Wikipedia, it requires the weights, S, to be a field, and T to form a vector space over S.

Edit: The truly most general requirement you can have on the types is that T forms a module (wiki) over the ring S, or T being an S-module.

Spire

You could set up these requirements with typeclasses. There's also spire, which already has typeclasses for Field and VectorSpace. I have never used it myself, so you have to check it out yourself.

Float/Int won't work

What is also apparent from this discussion, and what you have already observed, is the fact that having Float as a weight, and Int as the element type will not work out, as the whole numbers do not form a vector space over the reals. You'd have to promote Int to Float first.

Promotion via typeclass

There are only two major candidates for the scalar type, i.e., Float and Double. And mainly only Int is a candidate for promoting, so you could do the following as a simple and not so general solution:

case class Promotable[R,T](promote: R => T)

object Promotable {
  implicit val intToFloat = Promotable[Int,Float](_.toFloat)
  implicit val floatToDouble = Promotable[Float,Double](_.toDouble)
  implicit val intToDouble = Promotable[Int,Double](_.toDouble)

  implicit def identityInst[A] = Promotable[A,A](identity)
}As a "small" solution you could write a typeclass 

def weightedAverage[S,VS](values: Seq[(S,VS)])(implicit p: Promotable[VS,S]) = ???
Hackler answered 19/4, 2017 at 9:42 Comment(8)
Thanks! The idea of converting both weights and values on the spot was initially considered but thought might be inefficient and not exactly what we are doing mathematically either. This: gist.github.com/ShahOdin/9b3a5326c7ee355c1cbe76e8bfb2a97c is what we arrived at after a discussion on a related question: #43393097 This is working , but you make a valid point about putting restriction on types. How would you best incorporate that into^Underweight
@ShS Your typeclass is more general than mine, and I think this generality is not required for the task. You can express that two types A and B result in a third type AB, while my proposal only requires that the type of the vector space is convertible to the scalar type.Hackler
well you bring in the concept of vector space, but you don't explore it to its full potential. So mathematically, a vector space consists of values set V equipped with a plus/minus and the weight field W equipped with the field's times/divide as well as scaling: scale(w:W,v:V):V so the whole concept of promotion/combination only works in 1D. ie, what's the best way of supporting values of type: (Int,Int) and weights of type say Float? This makes me wonder if doing the parameterization based on Numeric is a good idea as Numeric is not designed to support say tuples.Underweight
This is a manifestation of a bigger dilemma that I have. currently I have 1D,2D and 3D plus functions taking A, (A,B) and (A,B,C). This means that I will have to have 1D, 2D and 3D versions of the any statistical functions I might want to implement, which is not great. My understanding is that I could use higher kindered functions to avoid this. I am not sure what would be the best way of going on about it though. I haven't visited Spire yet and this might be what they do, but I would be interested in the minimal way of generalising operations (Numeric etc) over data-structuresUnderweight
Yes, my proposal works only for 1d. You didn't ask anything about tuples. If you want to abstract over arity, have a look at shapeless.Hackler
Thanks for the Shapeless suggestion. Forgetting about arity for now, I just realised your generalisation of average above with vector spaces, is a bit restrictive. A user might expect to have Boolean weights over say int values. Now Booleans form only a ring, not a field (there is no inverse for the boolean & ). a module is what you'd be looking for. I am however annoyed by this discovery as this means that you can't plus the weights without combining them with the other type.Underweight
also, your canonical combination requirement highlights a bug in my implementation: currently I have: average( List( (1,x),(-1,y) ) ) return 0 which is not what you might expect. In fact, it's not clear what the user would expect for this expression.Underweight
@ShS Booleans can form a binary field. And you can sum over weights alone, as they form a field, and the law (a + b)v = av + bv governs the behavior of this addition with respect to the vector space. If you really want this ultimately general solution, I think you should (1) look at the implementation in spire, and if this doesn't suit you (2) roll your own based on the axioms of fields and vector spaces. But this won't solve your problem of not being able to combine types that don't fit together well. I think the proper solution to the latter would be to let the caller promote types.Hackler
B
-1

First of all your template is wrong. (And sorry if the "template" expression is wrong - I am a novice in scala). Your functions expect tuples where both elements are of the same type ( [A: Numeric] ) , instead of tuples where elements are of different types ( [A: Numeric, B: Numeric] ) ( (Int, Float) vs (Float, Float) )

Anyway the below compiles and hopefully will work well after you fill it with your desired calculus.

import scala.collection._

def weightedSum[A: Numeric, B: Numeric](weightedValues: GenSeq[(A,B)]): (A,B) = {
  weightedValues.foldLeft((implicitly[Numeric[A]].zero, implicitly[Numeric[B]].zero)) { (z, t) =>
    (   implicitly[Numeric[A]].plus(z._1, t._1), 
        implicitly[Numeric[B]].plus(z._2, t._2)
    ) 
  } 
}

def weightedAverage[A: Numeric, B: Numeric](weightedValues: GenSeq[(A,B)]): A = {
  val (weightSum, weightedValueSum) = weightedSum(weightedValues)
  implicitly[Numeric[A]] match {
    case num: Fractional[A] => implicitly[Numeric[A]].zero
    case num: Integral[A] => implicitly[Numeric[A]].zero
    case _ => sys.error("Undivisable numeric!")
  }
}

val values1: Seq[(Float, Float)] = List((1, 2f), (1, 3f))
val values2: Seq[(Int, Float)] = List((1, 2f), (1, 3f))

val wa1 = weightedAverage(values1)
val wa2 = weightedAverage(values2)
Bibliopole answered 21/4, 2017 at 23:52 Comment(2)
The initial attempt isn't "wrong". If you had visited the link, you would have noticed that the reason it was working was because of the value conversion that happened in place. (the same reason that val x=2 val y:Double=x works) Also your weighted sum implementation is not correct ( google weighted sum to see what is) the difficulty is multiplying/summing arguments of "different" types which your implementation conveniently ignores.Underweight
1) The "initial attempt" was completely wrong, didn't either compile - as stated by ShS. 2) The question was about types. 3) The purpose of my answer was to point out erroneous handling of types, and not the implementation of 'weighted sum' (I said "... after you fill it with your desired calculus"). 4) seen that the code compiles with (Int, Double), summing is ok, just add a 'mult' or whatever else you need there (we are not that "at hand").Bibliopole

© 2022 - 2024 — McMap. All rights reserved.