I use Scala implicit classes to extend objects I work with frequently. As an example, I have a method similar to this defined on Spark DataFrame
:
implicit class DataFrameExtensions(df: DataFrame) {
def deduplicate: Boolean =
df.groupBy(df.columns.map(col): _*).count
}
But implicit defs are not invoked if the class already defines the same method. What happens if I later upgrade to a new version of Spark that defines a DataFrame#deduplicate
method? Client code will silently switch to the new implementation, which might cause subtle errors (or obvious ones, which are less problematic).
Using reflection, I can throw a runtime error if DataFrame
already defines deduplicate
before my implicit defines it. Theoretically, then, if my implicit method conflicts with an existing one, I can detect it and rename my implicit version. However, once I upgrade Spark, run the app, and detect the issue, it's too late to use the IDE to rename the old method, since any references to df.deduplicate
now refer to the native Spark version. I would have to revert my Spark version, rename the method through the IDE, and then upgrade again. Not the end of the world, but not a great workflow.
Is there a better way to deal with this scenario? How can I use the "pimp my library" pattern safely?
def deduplicate(df: DataFrame): Boolean = df.groupBy(df.columns.map(col): _*).count
– Mcgowandeduplicate_
or any other convention. – Tetanize