TypeError: 'JavaPackage' object is not callable (spark._jvm)

Asked 29/10, 2019 at 13:17 Answered 26/8, 2020 at 0:8

Solved java python apache-spark java-package geospark

I'm setting up GeoSpark Python and after installing all the pre-requisites, I'm running the very basic code examples to test it.

from pyspark.sql import SparkSession
from geo_pyspark.register import GeoSparkRegistrator


spark = SparkSession.builder.\
        getOrCreate()

GeoSparkRegistrator.registerAll(spark)

df = spark.sql("""SELECT st_GeomFromWKT('POINT(6.0 52.0)') as geom""")

df.show()

I tried running it with python3 basic.py and spark-submit basic.py, both give me this error:

Traceback (most recent call last):
  File "/home/jessica/Downloads/geo_pyspark/basic.py", line 8, in <module>
    GeoSparkRegistrator.registerAll(spark)
  File "/home/jessica/Downloads/geo_pyspark/geo_pyspark/register/geo_registrator.py", line 22, in registerAll
    cls.register(spark)
  File "/home/jessica/Downloads/geo_pyspark/geo_pyspark/register/geo_registrator.py", line 27, in register
    spark._jvm. \
TypeError: 'JavaPackage' object is not callable

I'm using Java 8, Python 3, Apache Spark 2.4, my JAVA_HOME is set correctly, I'm running Linux Mint 19. My SPARK_HOME is also set:

$ printenv SPARK_HOME
/home/jessica/spark/

How can I fix this?

Adiathermancy answered 29/10, 2019 at 13:17 Comment(0)

The Jars for geoSpark are not correctly registered with your Spark Session. There's a few ways around this ranging from a tad inconvenient to pretty seamless. For example, if when you call spark-submit you specify:

--jars jar1.jar,jar2.jar,jar3.jar

then the problem will go away, you can also provide a similar command to pyspark if that's your poison.

If, like me, you don't really want to be doing this every time you boot (and setting this as a .conf() in Jupyter will get tiresome) then instead you can go into $SPARK_HOME/conf/spark-defaults.conf and set:

spark-jars jar1.jar,jar2.jar,jar3.jar

Which will then be loaded when you create a spark instance. If you've not used the conf file before it'll be there as spark-defaults.conf.template.

Of course, when I say jar1.jar.... What I really mean is something along the lines of:

/jars/geo_wrapper_2.11-0.3.0.jar,/jars/geospark-1.2.0.jar,/jars/geospark-sql_2.3-1.2.0.jar,/jars/geospark-viz_2.3-1.2.0.jar

but that's up to you to get the right ones from the geo_pyspark package.

If you are using an EMR: You need to set your cluster config json to

[
  {
    "classification":"spark-defaults", 
    "properties":{
      "spark.jars": "/jars/geo_wrapper_2.11-0.3.0.jar,/jars/geospark-1.2.0.jar,/jars/geospark-sql_2.3-1.2.0.jar,/jars/geospark-viz_2.3-1.2.0.jar"
      }, 
    "configurations":[]
  }
]

and also get your jars to upload as part of your bootstrap. You can do this from Maven but I just threw them on an S3 bucket:

#!/bin/bash
sudo mkdir /jars
sudo aws s3 cp s3://geospark-test-ds/bootstrap/geo_wrapper_2.11-0.3.0.jar /jars/
sudo aws s3 cp s3://geospark-test-ds/bootstrap/geospark-1.2.0.jar /jars/
sudo aws s3 cp s3://geospark-test-ds/bootstrap/geospark-sql_2.3-1.2.0.jar /jars/
sudo aws s3 cp s3://geospark-test-ds/bootstrap/geospark-viz_2.3-1.2.0.jar /jars/

If you are using an EMR Notebook You need a magic cell at the top of your notebook:

%%configure -f
{
"jars": [
        "s3://geospark-test-ds/bootstrap/geo_wrapper_2.11-0.3.0.jar",
        "s3://geospark-test-ds/bootstrap/geospark-1.2.0.jar",
        "s3://geospark-test-ds/bootstrap/geospark-sql_2.3-1.2.0.jar",
        "s3://geospark-test-ds/bootstrap/geospark-viz_2.3-1.2.0.jar"
    ]
}

Chapen answered 3/2, 2020 at 13:22 Comment(5)

Thank you so much! One addition here, if anyone is installing geospark as a package on the cluster, then they can also use the location /usr/local/lib/python3.6/site-packages/geospark/jars/2_4/<JAR_FILE> when specifying spark.jars, because that is the location used on EMR for both Master and Core nodes. – Perpetua 6/5, 2020 at 14:47

where can I download geo_wrapper.jar? – Jade 7/10, 2020 at 20:52

It's been a while but I thinkn we grabbed it from the geo_pyspark repo, just be sure to get the right version: github.com/Imbruced/geo_pyspark/tree/master/geo_pyspark/jars – Chapen 9/10, 2020 at 0:13

And just in case you see the same problem in a Databricks notebook, you could install the missing JARs via the UI for the cluster configuration. – Felishafelita 16/11, 2021 at 14:6

By the way, the error indicates that Python code was installed (thus the Python imports work) but not the JARs that are used by that Python code. – Felishafelita 16/11, 2021 at 14:7

I was seeing a similar kind of issue with SparkMeasure jars on Windows 10 machine

self.stagemetrics =
self.sc._jvm.ch.cern.sparkmeasure.StageMetrics(self.sparksession._jsparkSession)
TypeError: 'JavaPackage' object is not callable

So what I did was

Went to 'SPARK_HOME' via Pyspark shell, and installed the required jar

bin/pyspark --packages ch.cern.sparkmeasure:spark-measure_2.12:0.16
Grabbed that jar ( ch.cern.sparkmeasure_spark-measure_2.12-0.16.jar ) and copied into the the Jars folder of 'SPARK_HOME'
Reran the script and now it worked without that above error.

Schultz answered 26/8, 2020 at 0:8 Comment(0)

Recommended topics

Hot tags