Unable to launch SparkR in RStudio
Asked Answered
G

7

14

After long and difficult installation process of SparkR i getting into new problems of launching SparkR.

My Settings

R 3.2.0    
RStudio 0.98.1103    
Rtools 3.3    
Spark 1.4.0
Java Version 8
SparkR 1.4.0
Windows 7 SP 1  64 Bit

Now i try to use following code in R:

library(devtools)
library(SparkR)
Sys.setenv(SPARK_MEM="1g")
Sys.setenv(SPARK_HOME="C:/spark-1.4.0")
sc <- sparkR.init(master="local")

I recieve following:

JVM is not ready after 10 seconds

I was also trying to add some system variables like spark path or java path.

Do you have any advices for me to fix that problems.

The next step for me after testing local host would be to start tests on my running hadoop cluster.

Gooseberry answered 29/6, 2015 at 15:5 Comment(2)
Looks like a Windows specific issue or at least not reproducible on Debian GNU/Linux, R 3.2.1, Spark 1.4.0, RStudio 0.98.1103, OpenJDK 7u79. Additional info about your OS configuration could be useful.Memphis
When i use sc <- sparkR.init(master="local") then i get Launching java with spark-submit command C:/spark-1.4.0/bin/spark-submit.cmd sparkr-shell. Is there maybe a mistake in my enviroment variables or maybe in my java version? Or in running the shell?Gooseberry
H
5

I think it was a bug that has now been resolved. Try the following,

Sys.setenv(SPARK_HOME="C:\\spark-1.4.0")

.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))

library("SparkR", lib.loc="C:\\spark-1.4.0\\lib") # The use of \\ is for windows environment.

library(SparkR)

sc=sparkR.init(master="local")

Launching java with spark-submit command C:\spark-1.4.0/bin/spark-submit.cmd sparkr-shell

C:\Users\Ashish\AppData\Local\Temp\RtmpWqFsOB\backend_portbdc329477c6

Hope this helps.

Housekeeping answered 14/7, 2015 at 1:48 Comment(3)
Thanks. In the end i maybe go on with python but the command .libPaths... works for me. I think another problem is to get the right spark version(prebuild with hadoop).Gooseberry
This didn't do the trick for me. I'm running Spark 1.4.1 with R 3.1.3 on RStudio 0.98.1103 running on Windows 7 64bits. Do you have other ideas ?Generation
@Julien, what is the error message? The solution that I posted earlier worked for me for both 32 and 64 bit Win 7 OSHousekeeping
F
2

I had the same issue and my spark-submit.cmd file was also not executing from the command line. Following steps worked for me

Go to your environment variables and in the system variables select variable name PATH. Along with other values add c:/Windows/System32/ separated by a semicolon. This made my spark-submit.cmd run from command line and eventually from the Rstudio.

I have realized that we get the above issue only if all the required path values are not specified. Ensure all your path values(R, Rtools) are specified in the environment variables. For instance my Rtools path was c:\Rtools\bin;c:\Rtools\gcc-4.6.3\bin

I hope this helps.

Flowerage answered 29/7, 2015 at 16:16 Comment(1)
This solution worked well for me. You need to have downloaded Rtools (frozen version) suitable for your R version. Mark the "Edith Path" in the Wizard installer of Rtools so that it can add 2 registries in your PATH. Then sc = sparkR.init(master="local") will work fine.Supervision
M
1

That didn't work for me. If anyone has the same problem, try to give execute permissions to c:/sparkpath/bin/spark-submit.cmd.

Maleate answered 12/8, 2015 at 19:44 Comment(0)
A
0

I had exact same issue. I can start SparkR in command line, but not in RStudio in Windows. And here is the solution works for me.

  1. clean up all the paths you set when you tried to fix this issue. This including the paths you set in the windows environment from window control panel and uses Sys.unsetenv() to unset the SPARK_HOME.

  2. find out your RStudio default working directory by using getwd() in RStudio. And then create a .Rprofile file in this directory. Put the following line in this file: .libPaths("C:/Apache/Spark-1.5.1/R/lib")

  3. In window control panel->System->Advanced system settings->Environment Variables, add this ";C:\Apache\Spark-1.5.1\bin" at the end of your exsiting PATH variable.

  4. Start RStudio, if you type .libPaths(), you can see the SparkR library path is already in the library path

  5. use library(SparkR) to load SparkR library

  6. sc=sparkR.init(master="local")

I tried this on both Spark 1.4.1 and 1.5.1, they both work fine. I hope this can help whoever still having issue after all the suggestion above.

Astyanax answered 3/11, 2015 at 16:22 Comment(0)
A
0

I had a similar issue. In my case the problem was with the hyphen ('-').
by changing the code :

sc <- sparkR.init(master = "local[*]",sparkPackages = c("com.databricks:spark-csv_2.11-1.4.0"))

to:

sc <- sparkR.init(master = "local[*]",sparkPackages = c("com.databricks:spark-csv_2.11:1.4.0"))

worked for me. Do you notice the change?

P.S.: Do copy the jar in your SPARK_HOME\lib folder

Edit 1: Also, check that you have configured your "HADOOP_HOME"


Hope this helps.

Agreed answered 25/4, 2016 at 7:59 Comment(0)
B
0

The following solution will work for Mac OS.

After installing Hadoop followed by Spark.

spark_path <- strsplit(system("brew info apache-spark",intern=T)[4],' ')[[1]][1] # Get your spark path .libPaths(c(file.path(spark_path,"libexec", "R", "lib"), .libPaths())) library(SparkR

Blythebm answered 7/6, 2016 at 17:28 Comment(0)
C
0

I also had this error, from a different cause. Under the hood, Spark calls

system2(sparkSubmitBin, combinedArgs, wait = F)

There are many ways this can go wrong. In my case the underlying error (invisible until calling system2 directly as an experiment) was ""UNC path are not supported." I had to change my working directory in R studio to a directory which was not part of a network share, and then it started working.

Credenza answered 13/6, 2016 at 14:45 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.