How can I "pimp my library" with Scala in a future-proof way?

Asked 14/5, 2018 at 13:15 Answered 22/5, 2018 at 17:11

scala implicit implicits scala-implicits

I use Scala implicit classes to extend objects I work with frequently. As an example, I have a method similar to this defined on Spark DataFrame:

implicit class DataFrameExtensions(df: DataFrame) {
  def deduplicate: Boolean = 
    df.groupBy(df.columns.map(col): _*).count
}

But implicit defs are not invoked if the class already defines the same method. What happens if I later upgrade to a new version of Spark that defines a DataFrame#deduplicate method? Client code will silently switch to the new implementation, which might cause subtle errors (or obvious ones, which are less problematic).

Using reflection, I can throw a runtime error if DataFrame already defines deduplicate before my implicit defines it. Theoretically, then, if my implicit method conflicts with an existing one, I can detect it and rename my implicit version. However, once I upgrade Spark, run the app, and detect the issue, it's too late to use the IDE to rename the old method, since any references to df.deduplicate now refer to the native Spark version. I would have to revert my Spark version, rename the method through the IDE, and then upgrade again. Not the end of the world, but not a great workflow.

Is there a better way to deal with this scenario? How can I use the "pimp my library" pattern safely?

Dormitory answered 14/5, 2018 at 13:15 Comment(7)

I believe the safest (and simplest) thing to do would be to use a simple functions instead of defining an implicit class. def deduplicate(df: DataFrame): Boolean = df.groupBy(df.columns.map(col): _*).count – Mcgowan 14/5, 2018 at 14:1

Not ideal, but you could use names that are unlikely to be introduced, e.g. deduplicate_ or any other convention. – Tetanize 14/5, 2018 at 14:20

This seems like a variation on the fragile base class problem, but implicit-based enrichment doesn't impose any explicit relationship on the classes involved. So the compiler doesn't see a problem. – Packard 14/5, 2018 at 18:52

You should take a look at typeclasses – Gaskill 23/5, 2018 at 5:59

Somewhat meta and off-topic, but does anyone else find it strange that we have three different unrelated tags for Scala implicits? – Semblable 25/5, 2018 at 2:8

@AndreyTyukin with Scala there are always too many ways to express yourself. – Tumultuous 26/5, 2018 at 18:5

@Gaskill Could you illustrate how typeclasses would help here? Seems like they would suffer the same problem. – Dormitory 30/5, 2018 at 0:5

You could add a test that ensures that certain code snippets do not compile into the test suite of DataFrameExtension. Maybe something like this:

"(???: DataFrame).deduplicate" shouldNot compile

If it compiles without your implicit conversion, then it means that the method deduplicate has been introduced by the Spark library. In this case, the test fails, and you know that you have to update your implicits.

Semblable answered 21/5, 2018 at 18:4 Comment(5)

Clever. Maybe too much. ;-) – Mcgowan 24/5, 2018 at 7:11

@Mcgowan It's a single line <50 characters long. Don't see in which dimension it is "too much". You gonna test your embedded DSL's somehow, so... – Semblable 24/5, 2018 at 10:24

My point is that for simple things like in the example, probably an embedded DSL built on an external library is an abuse of the power of the language. You've found a clever and clean way to easily spot issues, but for example questions like how to solve these errors remain open. Is renaming a viable option? Can we safely break the API? My point is not against your solution, but rather in favor of not dealing with this problem. I was proposing a perspective regarding the question itself, as seen through your brilliant way to approach the proposed problem. By the way, +1. – Mcgowan 24/5, 2018 at 12:35

@Mcgowan As Joe Pallas has remarked above, defining eDSL's using the pimp-my-library pattern introduces a version of a fragile base class problem. If you decide to pimp someone else's library, you are relying on a fragile base class that is under someone else's control, so you should at least monitor whether changes to this class break your code. The alternative would be to not use the pimp-my-library pattern, but that's not the question. I'm not sure what you mean by "embedded DSL ... external library ... abuse of power". It's ScalaTest, de-facto standard testing framework for Scala. – Semblable 24/5, 2018 at 12:46

As I mentioned, I was proposing a different perspective on the question. – Mcgowan 24/5, 2018 at 12:52

If the extension method is enabled by an import, use -Xlint to show that the import is no longer used:

//class C
class C { def x = 17 }

trait T {
  import Extras._
  def f = new C().x
}

object Extras {
  implicit class X(val c: C) {
    def x = 42
  }
}

Another view, where the evidence must be used under -Xlint -Xfatal-warnings:

//class C[A]
class C[A] { def x = 17 }

trait T {
  import Mine.ev
  val c = new C[Mine]
  def f = c.x
}

trait Mine
object Mine {
  implicit class X[A](val c: C[A]) {
    def x(implicit @deprecated("unused","") ev: Mine) = 42
  }
  implicit val ev: Mine = null
}

object Test {
  def main(args: Array[String]): Unit = println {
    val t = new T {}
    t.f
  }
}

Tumultuous answered 14/5, 2018 at 19:17 Comment(1)

This is cool. I'm a little leery of relying on particular compiler flags to alert me to the problem, since those might easily be changed by unwitting third parties. But seems like a good layer of defense-in-depth. – Dormitory 29/5, 2018 at 17:38

The solution for doing it safely is to ask explicitly for an extended data frame, to minimize the impact, you can use the implicit to have a nice syntax for conversions (like toJava/toScala, etc):

implicit class DataFrameExtSyntax(df: DataFrame) { 
 def toExtended: DataFrameExtensions = DataFrameExtensions(df)
}

And then your invocation will look:

myDf.asExtended
  .deduplicate
  .someOtherExtensionMethod
  .andMore

That way you're future-proofing your extension methods without runtime checks/linting/unit-test tricks ( You can even use myDf.ext it myDf.toExtended is too long :) )

Chet answered 22/5, 2018 at 17:11 Comment(7)

But doesn't toExtended suffer the same problem? With toJava this problem does not exist (because both the extension method and the conflicting one have to come from the same library), while toScala could suffer from this problem but it's very unlikely because the name is "namespaced" — which seems a problematic pattern. – Chancellorship 23/5, 2018 at 11:6

potentially, yes, you can pick even a weirder name... but yeah, for extra safety, I would add the "shouldNot compile" test – Chet 23/5, 2018 at 19:56

I would take exception to linting as a trick, but you goaded me to an alternative, where an implicit, if used, indicates that your API was definitively used. Your solution is the classic advice, of course, but @Chancellorship is sharp, pun intended. – Tumultuous 27/5, 2018 at 6:7

I'm not wild about this solution, since it slightly reduces readability and introduces the mental overhead of remembering which of my dataframe methods are "extended" and which are native, but I think this is the most future-proof plan. Other answers have great answers for detecting a conflict, but no ways to fix it once detected besides reverting the Spark version and changing the name. Too bad there's no Scala method_missing ... if there were I could just get in the habit of always using the extended class, and delegate native methods back to the original dataframe. – Dormitory 29/5, 2018 at 17:36

the Dynamic trait (scala-lang.org/api/2.12.4/scala/Dynamic.html) behaves like a "method missing" – Chet 30/5, 2018 at 18:17

@Dormitory By the way, a reminder: awarding the bounty is unrelating to accepting an answer. If you found this answer useful, then please take a second to mark the question as solved. – Semblable 30/5, 2018 at 20:31

Another hiccup here. I don’t think the answer is quite right as written. I can call df.ext.xm.nm, but not df.ext.xm.xm (xm = extension method, nm = native method), because the return value of df.ext.xm is a df, not a df.ext. If I wrap all return values in DataFrameExtensions and provide an implicit conversion DataFrameExtensions -> DataFrame, I get further, but not all of the way there (e.g., I can do df.ext.xm.xm.nm, but not df.ext.xm.nm.xm). I don’t see a way to freely and transparently chain both native and extension method calls with this patten. – Dormitory 23/7, 2018 at 16:54

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags