sparkr Questions

3

Solved

I have a SparkR DataFrame and I want to get the mode (most often) value for each unique name. How can I do this? There doesn't seem to be a built-in mode function. Either a SparkR or PySpark soluti...
Evocative asked 28/6, 2017 at 15:25

6

Solved

How can I sum multiple columns in Spark? For example, in SparkR the following code works to get the sum of one column, but if I try to get the sum of both columns in df, I get an error. # Create ...
Abrogate asked 12/6, 2017 at 14:35

10

Solved

I'd like to process Apache Parquet files (in my case, generated in Spark) in the R programming language. Is an R reader available? Or is work being done on one? If not, what would be the most ex...
Frausto asked 22/5, 2015 at 17:5

2

Solved

I’m working with SparkR 1.6 and I have a dataFrame of millions rows. One of the df's column, named « categories », contains strings that have the following pattern : categories 1 cat1,cat2,cat...
Naughty asked 10/3, 2016 at 14:26

4

Solved

I have a 10GB csv file in hadoop cluster with duplicate columns. I try to analyse it in SparkR so I use spark-csv package to parse it as DataFrame: df <- read.df( sqlContext, FILE_PATH, sou...
Bleeder asked 19/11, 2015 at 23:45

3

Solved

I have a Spark DataFrame as shown below: #Create DataFrame df <- data.frame(name = c("Thomas", "William", "Bill", "John"), dates = c('2017-01-05', '2017-02-23', '2017-03-16', '2017-04-08')) ...
Glisten asked 21/6, 2017 at 21:38

2

I am currently trying to implement some functions using sparkR version 1.5.1. I have seen older (version 1.3) examples, where people used the apply function on DataFrames, but it looks like this is...
Fleisig asked 22/10, 2015 at 16:32

2

I am trying to read and write data into files at each time step. To do this, I am using the package h5 to store large datasets but I find that my code using the functions of this package is runni...
Mocambique asked 9/8, 2019 at 23:48

1

As far as I understood, those two packages provide similar but mostly different wrapper functions for Apache Spark. Sparklyr is newer and still needs to grow in the scope of functionality. I theref...
Quietude asked 13/11, 2016 at 19:2

1

Is there a function similar to melt in SparkR library? Transform data with 1 row and 50 columns to 50 rows and 3 columns?
Superabundant asked 12/10, 2018 at 15:19

4

Solved

I have the last version of R - 3.2.1. Now I want to install SparkR on R. After I execute: > install.packages("SparkR") I got back: Installing package into ‘/home/user/R/x86_64-pc-linux-gnu-l...
Sedan asked 2/7, 2015 at 12:38

1

The zeppelin R interpreter documentation states: If you return a data.frame, Zeppelin will attempt to display it using Zeppelin's built-in visualizations. This can be seen in the documentation e...
Suppositive asked 5/8, 2016 at 0:14

2

Solved

I would like to add a column filled with a character N in a DataFrame in SparkR. I would do it like that with non-SparkR code : df$new_column <- "N" But with SparkR, I get the following error...
Ellenaellender asked 19/5, 2016 at 15:22

1

I have an RStudio driver instance which is connected to a Spark Cluster. I wanted to know if there is any way to actually connect to Spark cluster from RStudio using an external configuration file ...
Simonsen asked 12/4, 2018 at 20:14

0

Context I'm working on an azure HDI R server cluster with rstudio and sparkR package. I'm reading file, modifying it and then i want to write it with write.df, but the problem is that when i write...
Upshot asked 8/3, 2018 at 13:42

2

I am using a Dockerized image and Jupyter notebook along with SparkR kernel. When I create a SparkR notebook, it uses an install of Microsoft R (3.3.2) instead of vanilla CRAN R install (3.2.3). Th...
Expulsion asked 18/9, 2017 at 18:33

2

Solved

I have implemented machine learning algorithms through sagemaker. I have installed SDK for .net, and tried by executing below code. Uri sagemakerEndPointURI = new Uri("https://runtime.sagemaker.u...

1

Solved

In my R script, I have a SparkDataFrame of two columns (time, value) containing data for four different months. Because of the fact that I need to apply my function to an each month separately, I f...
Virgenvirgie asked 26/1, 2018 at 15:43

1

Solved

Situation I used to work on Rstudio with data.table instead of plyr or sqldf because it's really fast. Now, i'm working on sparkR on an azure cluster and i'd like to now if i can use data.table on...
Externalization asked 9/11, 2017 at 12:35

1

Solved

With SparkR 1.6.0 I can read from a JDBC source with the following code, jdbc_url <- "jdbc:mysql://localhost:3306/dashboard?user=<username>&password=<password>" df <- sqlCon...
Sofiasofie asked 16/8, 2017 at 14:21

3

I am new to spark and was trying out a few commands in sparkSql using python when I came across these two commands: createOrReplaceTempView() and registerTempTable(). What is the difference betw...
Selfcentered asked 17/7, 2017 at 13:41

1

Solved

Using either pyspark or sparkr (preferably both), how can I get the intersection of two DataFrame columns? For example, in sparkr I have the following DataFrames: newHires <- data.frame(name = ...
Fiat asked 24/5, 2017 at 21:0

1

Solved

I have a Spark table: simx x0: num 1.00 2.00 3.00 ... x1: num 2.00 3.00 4.00 ... ... x788: num 2.00 3.00 4.00 ... and a handle named simX_tbl in the R environment that is connected to this simx ...
Joris asked 25/4, 2017 at 14:56

1

Solved

Is it possible to list what spark packages have been added to the spark session? The class org.apache.spark.deploySparkSubmitArguments has a variable for the packages: var packages: String = null...
Renowned asked 16/2, 2017 at 16:33

3

Solved

I am using RStudio. After creating session if i try to create dataframe using R data it gives error. Sys.setenv(SPARK_HOME = "E:/spark-2.0.0-bin-hadoop2.7/spark-2.0.0-bin-hadoop2.7") Sys.setenv(H...
Dagny asked 10/8, 2016 at 1:47

© 2022 - 2024 — McMap. All rights reserved.