I am having issues trying to get Spark to load, read and query a parquet file. The infrastructure seems to be set up (Spark standalone 3.0) and can be seen and will pick up jobs.
The issue I am having is when this line is called
Dataset<Row> parquetFileDF = sparkSession.read().parquet(parquePath);
the following error is thrown
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Scala module 2.10.0 requires Jackson Databind version >= 2.10.0 and < 2.11.0
at com.fasterxml.jackson.module.scala.JacksonModule.setupModule(JacksonModule.scala:61)
I looked into JacksonModule.setupModule
and when it gets to context.getMapperVersion
the version that is being passed is 2.9.10. It appears to me that the DefaultScalaModule is pulling some older version.
I'm using Gradle to build and have the dependencies set up as such
implementation 'com.fasterxml.jackson.core:jackson-core:2.10.0'
implementation 'com.fasterxml.jackson.core:jackson-databind:2.10.0'
implementation 'org.apache.spark:spark-core_2.12:3.0.0'
implementation 'org.apache.spark:spark-sql_2.12:3.0.0'
implementation 'org.apache.spark:spark-launcher_2.12:3.0.0'
implementation 'org.apache.spark:spark-catalyst_2.12:3.0.0'
implementation 'org.apache.spark:spark-streaming_2.12:3.0.0'
That didn't work, so I tried forcing databind
implementation ('com.fasterxml.jackson.core:jackson-databind') {
version {
strictly '2.10.0'
}
}
I've tried a few different versions and still keep hitting this issue. Maybe I'm missing something super simple, but right now, I can't seem to get past this error.
Any help would be appreciated.
gradle dependencies
(or./gradlew dependencies
if you're using the wrapper.) – Sandblind