Scala module requiring specific version of data bind for Spark
Asked Answered
L

1

7

I am having issues trying to get Spark to load, read and query a parquet file. The infrastructure seems to be set up (Spark standalone 3.0) and can be seen and will pick up jobs.

The issue I am having is when this line is called

    Dataset<Row> parquetFileDF = sparkSession.read().parquet(parquePath);

the following error is thrown

Caused by: com.fasterxml.jackson.databind.JsonMappingException: Scala module 2.10.0 requires Jackson Databind version >= 2.10.0 and < 2.11.0
    at com.fasterxml.jackson.module.scala.JacksonModule.setupModule(JacksonModule.scala:61)

I looked into JacksonModule.setupModule and when it gets to context.getMapperVersion the version that is being passed is 2.9.10. It appears to me that the DefaultScalaModule is pulling some older version.

I'm using Gradle to build and have the dependencies set up as such

    implementation 'com.fasterxml.jackson.core:jackson-core:2.10.0'
    implementation 'com.fasterxml.jackson.core:jackson-databind:2.10.0'
    implementation 'org.apache.spark:spark-core_2.12:3.0.0'
    implementation 'org.apache.spark:spark-sql_2.12:3.0.0'
    implementation 'org.apache.spark:spark-launcher_2.12:3.0.0'
    implementation 'org.apache.spark:spark-catalyst_2.12:3.0.0'
    implementation 'org.apache.spark:spark-streaming_2.12:3.0.0'

That didn't work, so I tried forcing databind

    implementation ('com.fasterxml.jackson.core:jackson-databind') {
        version {
            strictly '2.10.0'
        }
    }

I've tried a few different versions and still keep hitting this issue. Maybe I'm missing something super simple, but right now, I can't seem to get past this error.

Any help would be appreciated.

Lurdan answered 27/10, 2020 at 22:0 Comment(1)
You can probably get more help if you add the output of gradle dependencies (or ./gradlew dependencies if you're using the wrapper.)Sandblind
L
4

I was able to figure out the issue. I was pulling in jar file from another project. The functionality in the jar file wasn't being used at all, so it wasn't suspect. Unfortunately, that project hadn't been updated and there were some older Spark libraries that were some how being picked up by my current running app. Once I removed that, the error went away. What's interesting is the dependency graph didn't show anything about the libraries the other jar file was using.

I suppose if you run into a similar issue, double check any jar files being imported.

Lurdan answered 29/10, 2020 at 16:41 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.