sparkr Questions

3

Solved

As a new version of spark (1.4) was released there appeared to be a nice frontend interfeace to spark from R package named sparkR. On the documentation page of R for spark there is a command that e...
Repentance asked 3/7, 2015 at 10:50

2

I have a 500K row spark DataFrame that lives in a parquet file. I'm using spark 2.0.0 and the SparkR package inside Spark (RStudio and R 3.3.1), all running on a local machine with 4 cores and 8gb ...
Weakly asked 19/9, 2016 at 15:23

1

I have below simple SparkR program, which is to create a SparkR DataFrame and retrieve/collect data from it. Sys.setenv(HADOOP_CONF_DIR = "/etc/hadoop/conf.cloudera.yarn") Sys.setenv(SPARK_HOME = ...
Hollah asked 25/7, 2016 at 21:47

2

Solved

Using SparkR how can nested arrays be "exploded along"? I've tried using explode like so: dat <- nested_spark_df %>% mutate(a=explode(metadata)) %>% head() but though the above does...
Anstus asked 27/7, 2016 at 0:58

7

Solved

After long and difficult installation process of SparkR i getting into new problems of launching SparkR. My Settings R 3.2.0 RStudio 0.98.1103 Rtools 3.3 Spark 1.4.0 Java Version 8 SparkR 1.4....
Gooseberry asked 29/6, 2015 at 15:5

4

I have installed the SparkR package from Spark distribution into the R library. I can call the following command and it seems to work properly: library(SparkR) However, when I try to get the Spark...
Intaglio asked 9/7, 2015 at 15:37

1

Solved

I found from JIRA that 1.6 release of SparkR has implemented window functions including lag and rank, but over function is not implemented yet. How can I use window function like lag function witho...
Abulia asked 19/1, 2016 at 20:6

1

Solved

How do I do map and reduce operations using SparkR? All I can find is stuff about SQL queries. Is there a way to do map and reduce using SQL?
Infielder asked 23/6, 2015 at 20:22

1

Solved

I am using SparkR:::map and my function returns a large-ish R dataframe for each input row, each of the same shape. I would like to write these dataframes as parquet files without 'collect'ing them...
Petrel asked 27/11, 2015 at 16:4

0

My collected data size is 1.3g and all the configurations about driver memory are set to 3g. Why the out of memory is still happening?? This is my detail configuration of sparkR and OOM excepti...
Eavesdrop asked 2/11, 2015 at 10:15

1

I wanted to be able to package DataFrames in a Scala jar file and access them in R. The end goal is to create a way to access specific and often-used database tables in Python, R, and Scala without...
Aftmost asked 23/10, 2015 at 20:55

1

I checked for sparkR package in CRAN package list through the following link. https://cran.r-project.org/web/packages/available_packages_by_date.html This list does not include sparkR, and ...
Priestly asked 16/9, 2015 at 6:32

2

Solved

I have a SparkSQL DataFrame. Some entries in this data are empty but they don't behave like NULL or NA. How could I remove them? Any ideas? In R I can easily remove them but in sparkR it say tha...
Fetterlock asked 23/7, 2015 at 21:46

1

Solved

With SparkR, I'm trying for a PoC to collect an RDD that I created from text files which contains around 4M lines. My Spark cluster is running in Google Cloud, is bdutil deployed and is composed w...
Mcnew asked 4/6, 2015 at 13:45

© 2022 - 2024 — McMap. All rights reserved.