Py4JException: Constructor org.apache.spark.sql.SparkSession([class org.apache.spark.SparkContext, class java.util.HashMap]) does not exist

L

3

20

I am trying to run a spark session in the Jupyter Notebook on a EC2 Linux machine via Visual Studio Code. My code looks as following:

from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("spark_app").getOrCreate()

the error is:

{
    "name": "Py4JError",
    "message": "An error occurred while calling None.org.apache.spark.sql.SparkSession. Trace:\npy4j.Py4JException: Constructor org.apache.spark.sql.SparkSession([class org.apache.spark.SparkContext, class java.util.HashMap]) does not exist\n\tat py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:179)\n\tat py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:196)\n\tat py4j.Gateway.invoke(Gateway.java:237)\n\tat py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)\n\tat py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)\n\tat py4j.GatewayConnection.run(GatewayConnection.java:238)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\n\n",
    "stack": "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31mPy4JError\u001b[0m                                 Traceback (most recent call last)\n\u001b[1;32mc:\\Users\\IrinaKaerkkaenen\\Projekte\\ZugPortal\\test.ipynb Cell 3'\u001b[0m in \u001b[0;36m<cell line: 2>\u001b[0;34m()\u001b[0m\n\u001b[1;32m      <a href='vscode-notebook-cell:/c%3A/Users/IrinaKaerkkaenen/Projekte/ZugPortal/test.ipynb#ch0000002?line=0'>1</a>\u001b[0m \u001b[39mfrom\u001b[39;00m \u001b[39mpyspark\u001b[39;00m\u001b[39m.\u001b[39;00m\u001b[39msql\u001b[39;00m \u001b[39mimport\u001b[39;00m SparkSession\n\u001b[0;32m----> <a href='vscode-notebook-cell:/c%3A/Users/IrinaKaerkkaenen/Projekte/ZugPortal/test.ipynb#ch0000002?line=1'>2</a>\u001b[0m spark \u001b[39m=\u001b[39m SparkSession\u001b[39m.\u001b[39;49mbuilder\u001b[39m.\u001b[39;49mappName(\u001b[39m\"\u001b[39;49m\u001b[39mspark_app\u001b[39;49m\u001b[39m\"\u001b[39;49m)\u001b[39m.\u001b[39;49mgetOrCreate()\n\nFile \u001b[0;32m~/anaconda3/lib/python3.9/site-packages/pyspark/sql/session.py:272\u001b[0m, in \u001b[0;36mSparkSession.Builder.getOrCreate\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m    269\u001b[0m     sc \u001b[39m=\u001b[39m SparkContext\u001b[39m.\u001b[39mgetOrCreate(sparkConf)\n\u001b[1;32m    270\u001b[0m     \u001b[39m# Do not update `SparkConf` for existing `SparkContext`, as it's shared\u001b[39;00m\n\u001b[1;32m    271\u001b[0m     \u001b[39m# by all sessions.\u001b[39;00m\n\u001b[0;32m--> 272\u001b[0m     session \u001b[39m=\u001b[39m SparkSession(sc, options\u001b[39m=\u001b[39;49m\u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_options)\n\u001b[1;32m    273\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[1;32m    274\u001b[0m     \u001b[39mgetattr\u001b[39m(\n\u001b[1;32m    275\u001b[0m         \u001b[39mgetattr\u001b[39m(session\u001b[39m.\u001b[39m_jvm, \u001b[39m\"\u001b[39m\u001b[39mSparkSession$\u001b[39m\u001b[39m\"\u001b[39m), \u001b[39m\"\u001b[39m\u001b[39mMODULE$\u001b[39m\u001b[39m\"\u001b[39m\n\u001b[1;32m    276\u001b[0m     )\u001b[39m.\u001b[39mapplyModifiableSettings(session\u001b[39m.\u001b[39m_jsparkSession, \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_options)\n\nFile \u001b[0;32m~/anaconda3/lib/python3.9/site-packages/pyspark/sql/session.py:307\u001b[0m, in \u001b[0;36mSparkSession.__init__\u001b[0;34m(self, sparkContext, jsparkSession, options)\u001b[0m\n\u001b[1;32m    303\u001b[0m         \u001b[39mgetattr\u001b[39m(\u001b[39mgetattr\u001b[39m(\u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_jvm, \u001b[39m\"\u001b[39m\u001b[39mSparkSession$\u001b[39m\u001b[39m\"\u001b[39m), \u001b[39m\"\u001b[39m\u001b[39mMODULE$\u001b[39m\u001b[39m\"\u001b[39m)\u001b[39m.\u001b[39mapplyModifiableSettings(\n\u001b[1;32m    304\u001b[0m             jsparkSession, options\n\u001b[1;32m    305\u001b[0m         )\n\u001b[1;32m    306\u001b[0m     \u001b[39melse\u001b[39;00m:\n\u001b[0;32m--> 307\u001b[0m         jsparkSession \u001b[39m=\u001b[39m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_jvm\u001b[39m.\u001b[39;49mSparkSession(\u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_jsc\u001b[39m.\u001b[39;49msc(), options)\n\u001b[1;32m    308\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[1;32m    309\u001b[0m     \u001b[39mgetattr\u001b[39m(\u001b[39mgetattr\u001b[39m(\u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_jvm, \u001b[39m\"\u001b[39m\u001b[39mSparkSession$\u001b[39m\u001b[39m\"\u001b[39m), \u001b[39m\"\u001b[39m\u001b[39mMODULE$\u001b[39m\u001b[39m\"\u001b[39m)\u001b[39m.\u001b[39mapplyModifiableSettings(\n\u001b[1;32m    310\u001b[0m         jsparkSession, options\n\u001b[1;32m    311\u001b[0m     )\n\nFile \u001b[0;32m~/anaconda3/lib/python3.9/site-packages/py4j/java_gateway.py:1585\u001b[0m, in \u001b[0;36mJavaClass.__call__\u001b[0;34m(self, *args)\u001b[0m\n\u001b[1;32m   1579\u001b[0m command \u001b[39m=\u001b[39m proto\u001b[39m.\u001b[39mCONSTRUCTOR_COMMAND_NAME \u001b[39m+\u001b[39m\\\n\u001b[1;32m   1580\u001b[0m     \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_command_header \u001b[39m+\u001b[39m\\\n\u001b[1;32m   1581\u001b[0m     args_command \u001b[39m+\u001b[39m\\\n\u001b[1;32m   1582\u001b[0m     proto\u001b[39m.\u001b[39mEND_COMMAND_PART\n\u001b[1;32m   1584\u001b[0m answer \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_gateway_client\u001b[39m.\u001b[39msend_command(command)\n\u001b[0;32m-> 1585\u001b[0m return_value \u001b[39m=\u001b[39m get_return_value(\n\u001b[1;32m   1586\u001b[0m     answer, \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_gateway_client, \u001b[39mNone\u001b[39;49;00m, \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_fqn)\n\u001b[1;32m   1588\u001b[0m \u001b[39mfor\u001b[39;00m temp_arg \u001b[39min\u001b[39;00m temp_args:\n\u001b[1;32m   1589\u001b[0m     temp_arg\u001b[39m.\u001b[39m_detach()\n\nFile \u001b[0;32m~/anaconda3/lib/python3.9/site-packages/py4j/protocol.py:330\u001b[0m, in \u001b[0;36mget_return_value\u001b[0;34m(answer, gateway_client, target_id, name)\u001b[0m\n\u001b[1;32m    326\u001b[0m         \u001b[39mraise\u001b[39;00m Py4JJavaError(\n\u001b[1;32m    327\u001b[0m             \u001b[39m\"\u001b[39m\u001b[39mAn error occurred while calling \u001b[39m\u001b[39m{0}\u001b[39;00m\u001b[39m{1}\u001b[39;00m\u001b[39m{2}\u001b[39;00m\u001b[39m.\u001b[39m\u001b[39m\\n\u001b[39;00m\u001b[39m\"\u001b[39m\u001b[39m.\u001b[39m\n\u001b[1;32m    328\u001b[0m             \u001b[39mformat\u001b[39m(target_id, \u001b[39m\"\u001b[39m\u001b[39m.\u001b[39m\u001b[39m\"\u001b[39m, name), value)\n\u001b[1;32m    329\u001b[0m     \u001b[39melse\u001b[39;00m:\n\u001b[0;32m--> 330\u001b[0m         \u001b[39mraise\u001b[39;00m Py4JError(\n\u001b[1;32m    331\u001b[0m             \u001b[39m\"\u001b[39m\u001b[39mAn error occurred while calling \u001b[39m\u001b[39m{0}\u001b[39;00m\u001b[39m{1}\u001b[39;00m\u001b[39m{2}\u001b[39;00m\u001b[39m. Trace:\u001b[39m\u001b[39m\\n\u001b[39;00m\u001b[39m{3}\u001b[39;00m\u001b[39m\\n\u001b[39;00m\u001b[39m\"\u001b[39m\u001b[39m.\u001b[39m\n\u001b[1;32m    332\u001b[0m             \u001b[39mformat\u001b[39m(target_id, \u001b[39m\"\u001b[39m\u001b[39m.\u001b[39m\u001b[39m\"\u001b[39m, name, value))\n\u001b[1;32m    333\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[1;32m    334\u001b[0m     \u001b[39mraise\u001b[39;00m Py4JError(\n\u001b[1;32m    335\u001b[0m         \u001b[39m\"\u001b[39m\u001b[39mAn error occurred while calling \u001b[39m\u001b[39m{0}\u001b[39;00m\u001b[39m{1}\u001b[39;00m\u001b[39m{2}\u001b[39;00m\u001b[39m\"\u001b[39m\u001b[39m.\u001b[39m\n\u001b[1;32m    336\u001b[0m         \u001b[39mformat\u001b[39m(target_id, \u001b[39m\"\u001b[39m\u001b[39m.\u001b[39m\u001b[39m\"\u001b[39m, name))\n\n\u001b[0;31mPy4JError\u001b[0m: An error occurred while calling None.org.apache.spark.sql.SparkSession. Trace:\npy4j.Py4JException: Constructor org.apache.spark.sql.SparkSession([class org.apache.spark.SparkContext, class java.util.HashMap]) does not exist\n\tat py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:179)\n\tat py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:196)\n\tat py4j.Gateway.invoke(Gateway.java:237)\n\tat py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)\n\tat py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)\n\tat py4j.GatewayConnection.run(GatewayConnection.java:238)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\n\n"
}

The output of running the cell before I read the full error in text editor is the following

Output exceeds the size limit. Open the full output data in a text editor
---------------------------------------------------------------------------
Py4JError                                 Traceback (most recent call last)
/tmp/ipykernel_5260/8684085.py in <module>
      1 from pyspark.sql import SparkSession
----> 2 spark = SparkSession.builder.appName("spark_app").getOrCreate()

~/anaconda3/envs/zupo_env_test1/lib64/python3.7/site-packages/pyspark/sql/session.py in getOrCreate(self)
    270                     # Do not update `SparkConf` for existing `SparkContext`, as it's shared
    271                     # by all sessions.
--> 272                     session = SparkSession(sc, options=self._options)
    273                 else:
    274                     getattr(

~/anaconda3/envs/zupo_env_test1/lib64/python3.7/site-packages/pyspark/sql/session.py in __init__(self, sparkContext, jsparkSession, options)
    305                 )
    306             else:
--> 307                 jsparkSession = self._jvm.SparkSession(self._jsc.sc(), options)
    308         else:
    309             getattr(getattr(self._jvm, "SparkSession$"), "MODULE$").applyModifiableSettings(

~/anaconda3/envs/zupo_env_test1/lib64/python3.7/site-packages/py4j/java_gateway.py in __call__(self, *args)
   1584         answer = self._gateway_client.send_command(command)
   1585         return_value = get_return_value(
-> 1586             answer, self._gateway_client, None, self._fqn)
   1587 
...
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.base/java.lang.Thread.run(Thread.java:829)

I have googled a lot without a success. Does anybody has an idea what is wrong?

I use IPython Kernel with 3.9 Python installed.

The warnings before the error comes:

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/home/ec2-user/spark/spark-3.1.2-bin-hadoop2.7/jars/spark-unsafe_2.12-3.1.2.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
22/07/05 21:06:22 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

Libnah answered 5/7, 2022 at 18:25 Comment(6)

The package you're importing seems to be using Java. So, is it possible that you're version of Java is not compatible with this package? – Mcclain 5/7, 2022 at 18:38

I have checked the java version, it is like this:openjdk version "11.0.15" 2022-04-19 LTS OpenJDK Runtime Environment Corretto-11.0.15.9.1 (build 11.0.15+9-LTS) OpenJDK 64-Bit Server VM Corretto-11.0.15.9.1 (build 11.0.15+9-LTS, mixed mode). Looks OK, isn't? I actually not sure which one I would need overwise – Redolent 5/7, 2022 at 18:42

Can you include the steps you took to install pyspark? Can you remember if you encountered any warnings or errors during the installation? – Mcclain 5/7, 2022 at 19:0

Also, how are you initalizing the spark session on Jupyter Notebook? What command are you running? Is there any output? – Mcclain 5/7, 2022 at 19:2

I am pretty sure I had installed it by calling pip3 install pyspark. But I do not have a log of warnings anymore :(. These two lines I am using for initialization of the spark are in the question. This actually worked for me in a local mode before I have switched to EC2. – Redolent 5/7, 2022 at 19:5

Before the error log, it gives me these warnings I have pasted now to the question below. – Redolent 5/7, 2022 at 19:6

C

28

I had the same problem, I've fixed it installing the same version of pyspark from pip and spark. you should check if your installed versions are the same.

Cuff answered 31/7, 2022 at 3:29 Comment(5)

I had similar issue. I was using spark 3.2.1 and pip install pyspark==3.2.1 solved it. – Emeraldemerge 2/8, 2022 at 6:21

pyspark version 3.2.2 also works fine – Hemostat 17/8, 2022 at 2:24

@Luis Hassiel Figueroa, it worked for me. Thank you so much for this, I have been spending days for a solution. – Kroon 19/8, 2022 at 22:4

Same issue, matching the pySpark version to the spark distribution solved it – Randy 19/4, 2023 at 8:11

I had a similar issue

Py4JError: An error occurred while calling None.org.apache.spark.api.python.PythonFunction. Trace: py4j.Py4JException: Constructor org.apache.spark.api.python.PythonFunction([class [B, class java.util.HashMap, class java.util.ArrayList, class java.lang.String, class java.lang.String, class java.util.ArrayList, class org.apache.spark.api.python.PythonAccumulatorV2]) does not exist

installing the correct pyspark dependecy solved it.. THANKS A LOTTT!! – Alexaalexander 11/7, 2023 at 13:7

G

12

It appears that the version of spark installed in your machine does not match the version of pyspark.

Check the version of your spark using the following command:

<path-to-spark-bin>/spark-submit --version
# example output
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.1.3
      /_/

Now as it appears from the sample output, the version of installed Spark version is 3.1.3, so you need to install the python library of spark (pyspark) with the same version by executing the following command:

pip install pyspark==<the-version-of-your-spark>
# Example
pip install pyspark==3.1.3

Gussie answered 12/10, 2022 at 13:45 Comment(0)

B

0

pip install pyspark==3.2.1

#It worked for me also

Berget answered 2/2 at 12:51 Comment(0)

Recommended topics

Hot tags