Why spark-shell fails with NullPointerException?
Asked Answered
W

12

40

I try to execute spark-shell on Windows 10, but I keep getting this error every time I run it.

I used both latest and spark-1.5.0-bin-hadoop2.4 versions.

15/09/22 18:46:24 WARN Connection: BoneCP specified but not present in     
CLASSPATH (or one of dependencies)
15/09/22 18:46:24 WARN Connection: BoneCP specified but not present in                 CLASSPATH (or one of dependencies)
15/09/22 18:46:27 WARN ObjectStore: Version information not found in    
metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
15/09/22 18:46:27 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
15/09/22 18:46:27 WARN : Your hostname, DESKTOP-8JS2RD5 resolves to a loopback/non-reachable address: fe80:0:0:0:0:5efe:c0a8:103%net1, but we couldn't find any external IP address!
java.lang.RuntimeException: java.lang.NullPointerException
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
    at org.apache.spark.sql.hive.client.ClientWrapper.<init>    (ClientWrapper.scala:171)
    at org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala    :163)
    at org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:161)
    at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:168)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
    at java.lang.reflect.Constructor.newInstance(Unknown Source)
    at org.apache.spark.repl.SparkILoop.createSQLContext(SparkILoop.scala:1028)
    at $iwC$$iwC.<init>(<console>:9)
    at $iwC.<init>(<console>:18)
    at <init>(<console>:20)
    at .<init>(<console>:24)
    at .<clinit>(<console>)
    at .<init>(<console>:7)
    at .<clinit>(<console>)
    at $print(<console>)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
    at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340)
    at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
    at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
    at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
    at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
    at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
    at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
    at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:132)
    at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:124)
    at org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:324)
    at org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:124)
    at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:64)
    at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1$$anonfun$apply$mcZ$sp$5.apply$mcV$sp(SparkILoop.scala:974)
    at org.apache.spark.repl.SparkILoopInit$class.runThunks(SparkILoopInit.scala:159)
    at org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:64)
    at org.apache.spark.repl.SparkILoopInit$class.postInitialization(SparkILoopInit.sca      la:108)
    at org.apache.spark.repl.SparkILoop.postInitialization(SparkILoop.scala:64)
    at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$proc      ess$1.apply$mcZ$sp(SparkILoop.scala:991)
    at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$proc      ess$1.apply(SparkILoop.scala:945)
    at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$proc      ess$1.apply(SparkILoop.scala:945)
    at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scal      a:135)
    at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
    at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
    at org.apache.spark.repl.Main$.main(Main.scala:31)
    at org.apache.spark.repl.Main.main(Main.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
  Caused by: java.lang.NullPointerException
    at java.lang.ProcessBuilder.start(Unknown Source)
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:445)
    at org.apache.hadoop.util.Shell.run(Shell.java:418)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
    at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
    at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
    at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1097)
    at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:559)
    at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:534)
   org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:599)
    at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)

org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508) ... 56 more

  <console>:10: error: not found: value sqlContext
               import sqlContext.implicits._
                ^
  <console>:10: error: not found: value sqlContext
               import sqlContext.sql
                ^
Watertight answered 22/9, 2015 at 16:5 Comment(3)
I downloaded spark-1.5.0-bin-hadoop2.4 and started spark-shell, but sqlContext loads without a problem for me. I got the same warning messages, except for the hostname warning, so my guess is that you'll need to fix your network configutation.Tootsy
I can't figure out the cause. I was playing with pre-built package. After I tried downloading the source code git clone git://github.com/apache/spark.git -b branch-1.6 it worked.Hypothetical
@Hypothetical Thx. I'm playing with the pre-built package too. Which OS are you using by the way ?Multiversity
S
49

I used Spark 1.5.2 with Hadoop 2.6 and had similar problems. Solved by doing the following steps:

  1. Download winutils.exe from the repository to some local folder, e.g. C:\hadoop\bin.

  2. Set HADOOP_HOME to C:\hadoop.

  3. Create c:\tmp\hive directory (using Windows Explorer or any other tool).

  4. Open command prompt with admin rights.

  5. Run C:\hadoop\bin\winutils.exe chmod 777 /tmp/hive

With that, I am still getting some warnings, but no ERRORs and can run Spark applications just fine.

Stonyhearted answered 9/12, 2015 at 17:11 Comment(8)
HADOOP_HOME is user if system variable? Do you run it on Windows 10? I ask because it isn't working for me...Cabby
Did you set SPARK_LOCAL_HOSTNAME = localhost?Cabby
I am still getting some warnings, but no ERRORs.Stonyhearted
Thx, but I have errors still... Where do you have spark dir located? Do you run spark-shell command inside bin directory if bin\spark-shell?Cabby
It worked fine for me. I guess the key here is to download the right version of the windutils for your platform. I my case I am running a windows 10 in 32 bits and the winutils provided above does not work.Here a link for the 32 version: code.google.com/p/rrd-hadoop-win32/source/checkoutBouilli
I have Win 8.1 64bit and distribution from Titus Barik's blog works on my machine.Mylander
Worked for me performing only step 1.Lucialucian
Worked. Since I didn't install Hadoop, I used my spark directory as the HADOOP_HOME. That is, instead of C:\Hadoop\bin, I used D:\spark-1.6.1-bin-hadoop2.6\bin. A dummy "hadoop" directory might have worked too.Inflationism
V
5

I was facing a similar issue, got it resolved by putting the winutil inside bin folder. The Hadoop_home should be set as C:\Winutils and winutil to be placed in C:\Winutils\bin.

Windows 10 64 bit Winutils are available in https://github.com/steveloughran/winutils/tree/master/hadoop-2.6.0/bin

Also ensure that command line has administrative access.

Refer https://wiki.apache.org/hadoop/WindowsProblems

Vincenzovincible answered 8/7, 2016 at 21:41 Comment(0)
A
3

My guess is that you're running into https://issues.apache.org/jira/browse/SPARK-10528. I was seeing the same issue running on Windows 7. Initially I was getting the NullPointerException as you did. When I put winutils into the bin directory and set HADOOP_HOME to point to the Spark directory, I got the error described in the JIRA issue.

Augustusaugy answered 9/10, 2015 at 13:14 Comment(1)
so are you saying that putting HADOOP_HOME to the spark direcotyr, this caused the error? My HADOOP_HOME is currently set to c:\winutitls, which is not my spark home, and I am currently getting this errorGiantess
J
2

Or perhaps this link here below be easier to follow,

https://wiki.apache.org/hadoop/WindowsProblems

Basically download and copy winutils.exe to your spark\bin folder. Re-run spark-shell

If you have not set your /tmp/hive to a writable state, please do so.

Joyjoya answered 29/11, 2015 at 7:59 Comment(2)
Also, It is best that you fire up your spark-shell in /spark/bin folder.Joyjoya
what does this mean? As opposed to firing it up where? Sorry if this question seems crazy, just not sure why people would be firing it up from anywhere else, and I am trying to troubleshoot something myself and I just trying to rule out everything as a cause of failure.Giantess
F
1

You need to give permission to /tmp/hive directory to resolve this exception.

Hope you already have winutils.exe and set HADOOP_HOME environment variable. Then open the command prompt and run following command as administrator:

If winutils.exe is present in D:\winutils\bin location and \tmp\hive is also in D drive:

D:\winutils\bin\winutils.exe chmod 777 D:\tmp\hive

For more details,you can refer the following links :

Frequent Issues occurred during Spark Development
How to run Apache Spark on Windows7 in standalone mode

Fortier answered 11/10, 2016 at 14:46 Comment(0)
C
0

You can resolve this issue by placing mysqlconnector jar in spark-1.6.0/libs folder and restart it again.It works.

The important thing is here instead of running spark-shell you should do

spark-shell --driver-class-path /home/username/spark-1.6.0-libs-mysqlconnector.jar

Hope it should work.

Cameron answered 13/6, 2016 at 2:39 Comment(0)
A
0

For Python - Create a SparkSession in your python (This config section is only for Windows)

spark = SparkSession.builder.config("spark.sql.warehouse.dir", "C:/temp").appName("SparkSQL").getOrCreate()

Copy winutils.exe and keep in C:\winutils\bin and execute the bellow commands

C:\Windows\system32>C:\winutils\bin\winutils.exe chmod 777 C:/temp

Run command prompt in ADMIN mode ( Run as Administrator)

Accept answered 20/9, 2016 at 11:47 Comment(0)
D
0

My issue was having other .exe's/Jars inside the winutils/bin folder. So I cleared all the others and was left with winutils.exe alone. Was using spark 2.1.1

Dhole answered 25/5, 2017 at 6:25 Comment(0)
S
0

Issue was resolved after installing correct Java version in my case its java 8 and setting the environmental variables. Make sure you run the winutils.exe to create a temporary directory as below.

c:\winutils\bin\winutils.exe chmod 777 \tmp\hive

Above should not return any error. Use java -version to verify the version of java you are using before invoking spark-shell.

Sidneysidoma answered 1/12, 2017 at 3:35 Comment(0)
T
0

In Windows, you need to clone "winutils"

git clone https://github.com/steveloughran/winutils.git

And

set var HADOOP_HOME to DIR_CLONED\hadoop-{version}

Remember to choose the version of your hadoop.

Tryout answered 9/8, 2018 at 9:27 Comment(0)
T
0

Setting SPARK_LOCAL_HOSTNAME as localhost (on Windows 10) resolved the problem for me

Thermic answered 20/2, 2022 at 18:16 Comment(0)
G
0

type SET SPARK_LOCAL_HOSTNAME=localhost on your command prompt. Worked for me on windows 11

Girl answered 5/3, 2023 at 8:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.