After installing sparknlp, cannot import sparknlp
Asked Answered
C

3

5

The following ran successfully on a Cloudera CDSW cluster gateway.

import pyspark
from pyspark.sql import SparkSession
spark = (SparkSession
            .builder
            .config("spark.jars.packages","JohnSnowLabs:spark-nlp:1.2.3")
            .getOrCreate()
         )

Which produces this output.

Ivy Default Cache set to: /home/cdsw/.ivy2/cache
The jars for the packages stored in: /home/cdsw/.ivy2/jars
:: loading settings :: url = jar:file:/opt/cloudera/parcels/SPARK2-2.2.0.cloudera1-1.cdh5.12.0.p0.142354/lib/spark2/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
JohnSnowLabs#spark-nlp added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
    confs: [default]
    found JohnSnowLabs#spark-nlp;1.2.3 in spark-packages
    found com.typesafe#config;1.3.0 in central
    found org.fusesource.leveldbjni#leveldbjni-all;1.8 in central
downloading http://dl.bintray.com/spark-packages/maven/JohnSnowLabs/spark-nlp/1.2.3/spark-nlp-1.2.3.jar ...
    [SUCCESSFUL ] JohnSnowLabs#spark-nlp;1.2.3!spark-nlp.jar (3357ms)
downloading https://repo1.maven.org/maven2/com/typesafe/config/1.3.0/config-1.3.0.jar ...
    [SUCCESSFUL ] com.typesafe#config;1.3.0!config.jar(bundle) (348ms)
downloading https://repo1.maven.org/maven2/org/fusesource/leveldbjni/leveldbjni-all/1.8/leveldbjni-all-1.8.jar ...
    [SUCCESSFUL ] org.fusesource.leveldbjni#leveldbjni-all;1.8!leveldbjni-all.jar(bundle) (382ms)
:: resolution report :: resolve 3836ms :: artifacts dl 4095ms
    :: modules in use:
    JohnSnowLabs#spark-nlp;1.2.3 from spark-packages in [default]
    com.typesafe#config;1.3.0 from central in [default]
    org.fusesource.leveldbjni#leveldbjni-all;1.8 from central in [default]
    ---------------------------------------------------------------------
    |                  |            modules            ||   artifacts   |
    |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
    ---------------------------------------------------------------------
    |      default     |   3   |   3   |   3   |   0   ||   3   |   3   |
    ---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent
    confs: [default]
    3 artifacts copied, 0 already retrieved (5740kB/37ms)
Setting default log level to "ERROR".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

But when I try to import sparknlp as described on John Snow Labs for pyspark...

import sparknlp
# or 
from sparknlp.annotator import *

I get this:

ImportError: No module named sparknlp
ImportError: No module named sparknlp.annotator 

What do I need to do to use sparknlp? Of course this could be generalized for any Spark package.

Cadenza answered 7/12, 2017 at 22:52 Comment(1)
running pip install sparknlp gave Collecting sparknlp , Could not find a version that satisfies the requirement sparknlp (from versions: ) , No matching distribution found for sparknlpCadenza
C
3

I figured it out. The jar files that were correctly loaded were only the compiled Scala files. I still had to put the Python files that contained the wrapper code in a location that I could import from. Once I did that, everything worked great.

Cadenza answered 30/12, 2017 at 19:0 Comment(3)
How did you do that? I'm having a similar problem and I am a complete beginner, so if you could explain how you did this..Mariehamn
Things have changed since I wrote this. Did you try following these instructions?Cadenza
In any case, the folder that contains the python wrapper code is here. I just copied that folder into a place where I could import them as a package, and everything worked.Cadenza
L
5

You can use the SparkNLP package in PySpark using the command:

pyspark --packages JohnSnowLabs:spark-nlp:1.3.0

But this doesn't tell Python where to find the bindings. Following the instructions for a similar report here, this can be fixed either by adding the jar directory to your PYTHONPATH:

export PYTHONPATH="~/.ivy2/jars/JohnSnowLabs_spark-nlp-1.3.0.jar:$PYTHONPATH"

or by

import sys, glob, os
sys.path.extend(glob.glob(os.path.join(os.path.expanduser("~"), ".ivy2/jars/*.jar")))
Luger answered 6/2, 2018 at 19:37 Comment(1)
hi where can I run this export PYTHONPATH command?Lubricity
C
3

I figured it out. The jar files that were correctly loaded were only the compiled Scala files. I still had to put the Python files that contained the wrapper code in a location that I could import from. Once I did that, everything worked great.

Cadenza answered 30/12, 2017 at 19:0 Comment(3)
How did you do that? I'm having a similar problem and I am a complete beginner, so if you could explain how you did this..Mariehamn
Things have changed since I wrote this. Did you try following these instructions?Cadenza
In any case, the folder that contains the python wrapper code is here. I just copied that folder into a place where I could import them as a package, and everything worked.Cadenza
S
0

Thanks to Clay. The following is how I set the PYTHONPATH :

git clone --branch 3.0.3 https://github.com/JohnSnowLabs/spark-nlp
export PYTHONPATH="./spark-nlp/python:$PYTHONPATH"

and then it worked for me, because my ./spark-nlp/python folder now contains the elusive sparknlp module.

pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:3.0.3

>>> import sparknlp
>>> 
Silverpoint answered 20/5, 2021 at 19:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.