ModuleNotFoundError: No module named 'py4j'
Asked Answered
D

2

6

I installed Spark and I am running into problems loading the pyspark module into ipython. I'm getting the following error:

ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-2-49d7c4e178f8> in <module>
----> 1 import pyspark

/opt/spark/python/pyspark/__init__.py in <module>
     44 
     45 from pyspark.conf import SparkConf
---> 46 from pyspark.context import SparkContext
     47 from pyspark.rdd import RDD
     48 from pyspark.files import SparkFiles

/opt/spark/python/pyspark/context.py in <module>
     27 from tempfile import NamedTemporaryFile
     28 
---> 29 from py4j.protocol import Py4JError
     30 
     31 from pyspark import accumulators

ModuleNotFoundError: No module named 'py4j'
Discolor answered 28/5, 2019 at 12:47 Comment(1)
The error says the module py4j is missing. Do you have it installed in your environment?Coda
K
14

If you can run spark directly, maybe you have to fix the environment variable PYTHONPATH. Check the filename in the directory $SPARK_HOME/python/lib/. If the Spark version 2.4.3, the file is py4j-0.10.7-src.zip:

export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.7-src.zip:$PYTHONPATH
Kakalina answered 30/7, 2019 at 6:48 Comment(3)
What's $SPARK_HOME? Is it the same as %SPARK_HOME%?Mellisamellisent
If you used brew to install, then setup SPARK_HOME as "/usr/local/Cellar/apache-spark/<whichever-spark-version>/libexec". %SPARK_HOME% is the path, i presume for a Windows based machine.Politics
@Kakalina do you mind sharing how were you able to determine the py4j version based on Spark? I am not able to find one compatible with Spark 3.5.0 since i see errors with py4j being used like this ENV PYTHONPATH="$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.9-src.zip:$PYTHONPATH"Sanitary
V
0

On Windows, despite it should work and python looks for file correctly when wildcard is used in the PYTHONPATH, it cannot open the file.

Instead of

%SPARK_HOME%\python\lib\*.zip

I had to explictly specify the zip file name to make it working.

I got inspried by how Spark itself opens the pyspark.cmd shell in

spark\bin\pyspark2.cmd

Where they specify the full path:

%SPARK_HOME%\python\lib\py4j-0.10.9.7-src.zip
Vinificator answered 14/9, 2023 at 17:17 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.