sparkr - McMap

3

Solved

Get mode (most often) value in Spark column with groupBy

I have a SparkR DataFrame and I want to get the mode (most often) value for each unique name. How can I do this? There doesn't seem to be a built-in mode function. Either a SparkR or PySpark soluti...

apache-spark pyspark apache-spark-sql mode sparkr

Evocative asked 28/6, 2017 at 15:25

6

Solved

Summing multiple columns in Spark

How can I sum multiple columns in Spark? For example, in SparkR the following code works to get the sum of one column, but if I try to get the sum of both columns in df, I get an error. # Create ...

apache-spark pyspark sparkr

Abrogate asked 12/6, 2017 at 14:35

10

Solved

How do I read a Parquet in R and convert it to an R DataFrame?

I'd like to process Apache Parquet files (in my case, generated in Spark) in the R programming language. Is an R reader available? Or is work being done on one? If not, what would be the most ex...

r apache-spark parquet sparkr

Frausto asked 22/5, 2015 at 17:5

2

Solved

Using SparkR, how to split a string column into 'n' multiple columns?

I’m working with SparkR 1.6 and I have a dataFrame of millions rows. One of the df's column, named « categories », contains strings that have the following pattern : categories 1 cat1,cat2,cat...

r apache-spark dataframe apache-spark-sql sparkr

Naughty asked 10/3, 2016 at 14:26

4

Solved

Duplicate columns in Spark Dataframe

I have a 10GB csv file in hadoop cluster with duplicate columns. I try to analyse it in SparkR so I use spark-csv package to parse it as DataFrame: df <- read.df( sqlContext, FILE_PATH, sou...

r csv hadoop apache-spark sparkr

Bleeder asked 19/11, 2015 at 23:45

3

Solved

Convert date to end of month in Spark

I have a Spark DataFrame as shown below: #Create DataFrame df <- data.frame(name = c("Thomas", "William", "Bill", "John"), dates = c('2017-01-05', '2017-02-23', '2017-03-16', '2017-04-08')) ...

pyspark apache-spark-sql sparkr

Glisten asked 21/6, 2017 at 21:38

2

Using apply functions in SparkR

I am currently trying to implement some functions using sparkR version 1.5.1. I have seen older (version 1.3) examples, where people used the apply function on DataFrames, but it looks like this is...

sparkr

Fleisig asked 22/10, 2015 at 16:32

2

Efficient way to read and write data into files over a loop using R

I am trying to read and write data into files at each time step. To do this, I am using the package h5 to store large datasets but I find that my code using the functions of this package is runni...

r performance sparkr

Mocambique asked 9/8, 2019 at 23:48

1

Using SparkR and Sparklyr simultaneously

As far as I understood, those two packages provide similar but mostly different wrapper functions for Apache Spark. Sparklyr is newer and still needs to grow in the scope of functionality. I theref...

r apache-spark sparkr sparklyr

Quietude asked 13/11, 2016 at 19:2

1

R SparkR - equivalent to melt function

Is there a function similar to melt in SparkR library? Transform data with 1 row and 50 columns to 50 rows and 3 columns?

r apache-spark reshape2 sparkr

Superabundant asked 12/10, 2018 at 15:19

4

Solved

Installing of SparkR

I have the last version of R - 3.2.1. Now I want to install SparkR on R. After I execute: > install.packages("SparkR") I got back: Installing package into ‘/home/user/R/x86_64-pc-linux-gnu-l...

r apache-spark sparkr

Sedan asked 2/7, 2015 at 12:38

1

zeppelin with sparkr is not displaying dataframe as table

The zeppelin R interpreter documentation states: If you return a data.frame, Zeppelin will attempt to display it using Zeppelin's built-in visualizations. This can be seen in the documentation e...

sparkr apache-zeppelin

Suppositive asked 5/8, 2016 at 0:14

2

Solved

Add column to DataFrame in sparkR

I would like to add a column filled with a character N in a DataFrame in SparkR. I would do it like that with non-SparkR code : df$new_column <- "N" But with SparkR, I get the following error...

r sparkr

Ellenaellender asked 19/5, 2016 at 15:22

1

Starting SparkR session using external config file

I have an RStudio driver instance which is connected to a Spark Cluster. I wanted to know if there is any way to actually connect to Spark cluster from RStudio using an external configuration file ...

r apache-spark rstudio sparkr

Simonsen asked 12/4, 2018 at 20:14

0

Losing columns names when writing sparkdataframe with sparkR write.df

Context I'm working on an azure HDI R server cluster with rstudio and sparkR package. I'm reading file, modifying it and then i want to write it with write.df, but the problem is that when i write...

r azure hadoop apache-spark-sql sparkr

Upshot asked 8/3, 2018 at 13:42

2

How to use Jupyter + SparkR and custom R install

I am using a Dockerized image and Jupyter notebook along with SparkR kernel. When I create a SparkR notebook, it uses an install of Microsoft R (3.3.2) instead of vanilla CRAN R install (3.2.3). Th...

r jupyter sparkr

Expulsion asked 18/9, 2017 at 18:33

2

Solved

How to call Sagemaker training model endpoint API in C#

I have implemented machine learning algorithms through sagemaker. I have installed SDK for .net, and tried by executing below code. Uri sagemakerEndPointURI = new Uri("https://runtime.sagemaker.u...

c#amazon-web-services amazon-s3 sparkr amazon-sagemaker

Jerri asked 21/1, 2018 at 10:37

1

Solved

SparkR DataFrame partitioning issue

In my R script, I have a SparkDataFrame of two columns (time, value) containing data for four different months. Because of the fact that I need to apply my function to an each month separately, I f...

r apache-spark sparkr

Virgenvirgie asked 26/1, 2018 at 15:43

1

Solved

Is it possible to use data.table on SparkR with Sparkdataframes?

Situation I used to work on Rstudio with data.table instead of plyr or sqldf because it's really fast. Now, i'm working on sparkR on an azure cluster and i'd like to now if i can use data.table on...

r apache-spark data.table cluster-computing sparkr

Externalization asked 9/11, 2017 at 12:35

1

Solved

How to write to JDBC source with SparkR 1.6.0?

With SparkR 1.6.0 I can read from a JDBC source with the following code, jdbc_url <- "jdbc:mysql://localhost:3306/dashboard?user=<username>&password=<password>" df <- sqlCon...

r apache-spark jdbc sparkr

Sofiasofie asked 16/8, 2017 at 14:21

3

Difference between createOrReplaceTempView and registerTempTable

I am new to spark and was trying out a few commands in sparkSql using python when I came across these two commands: createOrReplaceTempView() and registerTempTable(). What is the difference betw...

apache-spark pyspark apache-spark-sql sparkr

Selfcentered asked 17/7, 2017 at 13:41

1

Solved

How to check for intersection of two DataFrame columns in Spark

Using either pyspark or sparkr (preferably both), how can I get the intersection of two DataFrame columns? For example, in sparkr I have the following DataFrames: newHires <- data.frame(name = ...

apache-spark pyspark sparkr

Fiat asked 24/5, 2017 at 21:0

1

Solved

Sparklyr: how to center a Spark table based on column?

I have a Spark table: simx x0: num 1.00 2.00 3.00 ... x1: num 2.00 3.00 4.00 ... ... x788: num 2.00 3.00 4.00 ... and a handle named simX_tbl in the R environment that is connected to this simx ...

r apache-spark dplyr sparkr sparklyr

Joris asked 25/4, 2017 at 14:56

1

Solved

how to list spark-packages added to the spark context?

Is it possible to list what spark packages have been added to the spark session? The class org.apache.spark.deploySparkSubmitArguments has a variable for the packages: var packages: String = null...

apache-spark sparkr

Renowned asked 16/2, 2017 at 16:33

3

Solved

SparkR from Rstudio - gives Error in invokeJava(isStatic = TRUE, className, methodName, ...) :

I am using RStudio. After creating session if i try to create dataframe using R data it gives error. Sys.setenv(SPARK_HOME = "E:/spark-2.0.0-bin-hadoop2.7/spark-2.0.0-bin-hadoop2.7") Sys.setenv(H...

r apache-spark hiveql apache-spark-mllib sparkr

Dagny asked 10/8, 2016 at 1:47

sparkr Questions

Recommended topics

Hot tags