Method showString([class java.lang.Integer, class java.lang.Integer, class java.lang.Boolean]) does not exist in PySpark

Asked 24/11, 2018 at 5:38 Answered 24/11, 2018 at 10:40

Solved java apache-spark pyspark apache-spark-sql py4j

This is the snippet:

from pyspark import SparkContext
from pyspark.sql.session import SparkSession

sc = SparkContext()
spark = SparkSession(sc)
d = spark.read.format("csv").option("header", True).option("inferSchema", True).load('file.csv')
d.show()

After this runs into the error:

An error occurred while calling o163.showString. Trace:
py4j.Py4JException: Method showString([class java.lang.Integer, class java.lang.Integer, class java.lang.Boolean]) does not exist

All the other methods work well. Tried researching alot but in vain. Any lead will be highly appreciated

Immesh answered 24/11, 2018 at 5:38 Comment(0)

This is an indicator of a Spark version mismatch. Before Spark 2.3 show method took only two arguments:

def show(self, n=20, truncate=True):

since 2.3 it takes three arguments:

def show(self, n=20, truncate=True, vertical=False):

In your case Python client seems to invoke the latter one, while the JVM backend uses the older version.

Since SparkContext initialization undergone significant changes in 2.4, which would cause failure on SparkContext.__init__, you're likely using:

2.3.x Python library.
2.2.x JARs.

You can confirm that by checking versions directly from your session, Python:

sc.version

vs. JVM:

sc._jsc.version()

Problems like this, are usually a result of misconfigured PYTHONPATH (either directly, or by using pip installed PySpark on top per-existing Spark binaries) or SPARK_HOME.

Particularism answered 24/11, 2018 at 10:40 Comment(6)

I rechecked theconfig settings, its all fine. SparkContext version is 2.1.1 and spark version is 2.3.0. Method showString() cannot take 3 arguments. What can i do further? The issue seems to be of incompatibility. – Immesh 24/11, 2018 at 20:19

Check PYTHONPATH and SPARK_HOME environment variables - do these point to the same installation? Did you install PySpark separately of the Spark binaries? – Particularism 24/11, 2018 at 20:22

I am using Anaconda jupyter notebook for python and its path is : C:\Users\user_name\Anaconda3 And SPARK_HOME is set as : C:\Spark\spark-2.3.0-bin-hadoop2.7 – Immesh 24/11, 2018 at 23:7

What does conda list pyspark return? – Particularism 25/11, 2018 at 16:17

It returns : C:\Users\user_name\Anaconda3 .So are you suggesting that my SPARK_HOME is also set to the same ? – Immesh 27/11, 2018 at 1:14

It shouldn't return path (alone), but a list of matching pacakges. – Particularism 29/11, 2018 at 10:24

On spark-shell console, enter the variable name and see the data type. As an alternative, you can tab twice after variable named. and it will show necessary function which could be applied. Example of a DataFrame object.

res23: org.apache.spark.sql.DataFrame = [order_id: string, book_name: string ... 1 more field]

Cyathus answered 24/11, 2018 at 6:23 Comment(1)

Thanks. It shows me 'show' function which could be applied. But the issue is with the arguments passed – Immesh 24/11, 2018 at 20:18

Recommended topics

Hot tags