How can I "pimp my library" with Scala in a future-proof way?
Asked Answered
D

3

11

I use Scala implicit classes to extend objects I work with frequently. As an example, I have a method similar to this defined on Spark DataFrame:

implicit class DataFrameExtensions(df: DataFrame) {
  def deduplicate: Boolean = 
    df.groupBy(df.columns.map(col): _*).count
}

But implicit defs are not invoked if the class already defines the same method. What happens if I later upgrade to a new version of Spark that defines a DataFrame#deduplicate method? Client code will silently switch to the new implementation, which might cause subtle errors (or obvious ones, which are less problematic).

Using reflection, I can throw a runtime error if DataFrame already defines deduplicate before my implicit defines it. Theoretically, then, if my implicit method conflicts with an existing one, I can detect it and rename my implicit version. However, once I upgrade Spark, run the app, and detect the issue, it's too late to use the IDE to rename the old method, since any references to df.deduplicate now refer to the native Spark version. I would have to revert my Spark version, rename the method through the IDE, and then upgrade again. Not the end of the world, but not a great workflow.

Is there a better way to deal with this scenario? How can I use the "pimp my library" pattern safely?

Dormitory answered 14/5, 2018 at 13:15 Comment(7)
I believe the safest (and simplest) thing to do would be to use a simple functions instead of defining an implicit class. def deduplicate(df: DataFrame): Boolean = df.groupBy(df.columns.map(col): _*).countMcgowan
Not ideal, but you could use names that are unlikely to be introduced, e.g. deduplicate_ or any other convention.Tetanize
This seems like a variation on the fragile base class problem, but implicit-based enrichment doesn't impose any explicit relationship on the classes involved. So the compiler doesn't see a problem.Packard
You should take a look at typeclassesGaskill
Somewhat meta and off-topic, but does anyone else find it strange that we have three different unrelated tags for Scala implicits?Semblable
@AndreyTyukin with Scala there are always too many ways to express yourself.Tumultuous
@Gaskill Could you illustrate how typeclasses would help here? Seems like they would suffer the same problem.Dormitory
S
6

You could add a test that ensures that certain code snippets do not compile into the test suite of DataFrameExtension. Maybe something like this:

"(???: DataFrame).deduplicate" shouldNot compile

If it compiles without your implicit conversion, then it means that the method deduplicate has been introduced by the Spark library. In this case, the test fails, and you know that you have to update your implicits.

Semblable answered 21/5, 2018 at 18:4 Comment(5)
Clever. Maybe too much. ;-)Mcgowan
@Mcgowan It's a single line <50 characters long. Don't see in which dimension it is "too much". You gonna test your embedded DSL's somehow, so...Semblable
My point is that for simple things like in the example, probably an embedded DSL built on an external library is an abuse of the power of the language. You've found a clever and clean way to easily spot issues, but for example questions like how to solve these errors remain open. Is renaming a viable option? Can we safely break the API? My point is not against your solution, but rather in favor of not dealing with this problem. I was proposing a perspective regarding the question itself, as seen through your brilliant way to approach the proposed problem. By the way, +1.Mcgowan
@Mcgowan As Joe Pallas has remarked above, defining eDSL's using the pimp-my-library pattern introduces a version of a fragile base class problem. If you decide to pimp someone else's library, you are relying on a fragile base class that is under someone else's control, so you should at least monitor whether changes to this class break your code. The alternative would be to not use the pimp-my-library pattern, but that's not the question. I'm not sure what you mean by "embedded DSL ... external library ... abuse of power". It's ScalaTest, de-facto standard testing framework for Scala.Semblable
As I mentioned, I was proposing a different perspective on the question.Mcgowan
T
2

If the extension method is enabled by an import, use -Xlint to show that the import is no longer used:

//class C
class C { def x = 17 }

trait T {
  import Extras._
  def f = new C().x
}

object Extras {
  implicit class X(val c: C) {
    def x = 42
  }
}

Another view, where the evidence must be used under -Xlint -Xfatal-warnings:

//class C[A]
class C[A] { def x = 17 }

trait T {
  import Mine.ev
  val c = new C[Mine]
  def f = c.x
}

trait Mine
object Mine {
  implicit class X[A](val c: C[A]) {
    def x(implicit @deprecated("unused","") ev: Mine) = 42
  }
  implicit val ev: Mine = null
}

object Test {
  def main(args: Array[String]): Unit = println {
    val t = new T {}
    t.f
  }
}
Tumultuous answered 14/5, 2018 at 19:17 Comment(1)
This is cool. I'm a little leery of relying on particular compiler flags to alert me to the problem, since those might easily be changed by unwitting third parties. But seems like a good layer of defense-in-depth.Dormitory
C
0

The solution for doing it safely is to ask explicitly for an extended data frame, to minimize the impact, you can use the implicit to have a nice syntax for conversions (like toJava/toScala, etc):

implicit class DataFrameExtSyntax(df: DataFrame) { 
 def toExtended: DataFrameExtensions = DataFrameExtensions(df)
}

And then your invocation will look:

myDf.asExtended
  .deduplicate
  .someOtherExtensionMethod
  .andMore

That way you're future-proofing your extension methods without runtime checks/linting/unit-test tricks ( You can even use myDf.ext it myDf.toExtended is too long :) )

Chet answered 22/5, 2018 at 17:11 Comment(7)
But doesn't toExtended suffer the same problem? With toJava this problem does not exist (because both the extension method and the conflicting one have to come from the same library), while toScala could suffer from this problem but it's very unlikely because the name is "namespaced" — which seems a problematic pattern.Chancellorship
potentially, yes, you can pick even a weirder name... but yeah, for extra safety, I would add the "shouldNot compile" testChet
I would take exception to linting as a trick, but you goaded me to an alternative, where an implicit, if used, indicates that your API was definitively used. Your solution is the classic advice, of course, but @Chancellorship is sharp, pun intended.Tumultuous
I'm not wild about this solution, since it slightly reduces readability and introduces the mental overhead of remembering which of my dataframe methods are "extended" and which are native, but I think this is the most future-proof plan. Other answers have great answers for detecting a conflict, but no ways to fix it once detected besides reverting the Spark version and changing the name. Too bad there's no Scala method_missing ... if there were I could just get in the habit of always using the extended class, and delegate native methods back to the original dataframe.Dormitory
the Dynamic trait (scala-lang.org/api/2.12.4/scala/Dynamic.html) behaves like a "method missing"Chet
@Dormitory By the way, a reminder: awarding the bounty is unrelating to accepting an answer. If you found this answer useful, then please take a second to mark the question as solved.Semblable
Another hiccup here. I don’t think the answer is quite right as written. I can call df.ext.xm.nm, but not df.ext.xm.xm (xm = extension method, nm = native method), because the return value of df.ext.xm is a df, not a df.ext. If I wrap all return values in DataFrameExtensions and provide an implicit conversion DataFrameExtensions -> DataFrame, I get further, but not all of the way there (e.g., I can do df.ext.xm.xm.nm, but not df.ext.xm.nm.xm). I don’t see a way to freely and transparently chain both native and extension method calls with this patten.Dormitory

© 2022 - 2024 — McMap. All rights reserved.