sparklyr - 2

1

Solved

I have a dataframe in Spark, and would like to calculate the 0.1 quantile after grouping by a specific column. For example: > library(sparklyr) > library(tidyverse) > con = spark_connect...

r apache-spark group-by quantile sparklyr

Fishbein asked 12/2, 2018 at 12:32

3

Solved

R - How to replicate rows in a spark dataframe using sparklyr

Is there a way to replicate the rows of a Spark's dataframe using the functions of sparklyr/dplyr? sc <- spark_connect(master = "spark://####:7077") df_tbl <- copy_to(sc, data.frame(row1 = ...

r apache-spark sparklyr

Turku asked 13/6, 2017 at 20:5

1

Convert spark dataframe to sparklyR table "tbl_spark"

I'm trying to convert spark dataframe org.apache.spark.sql.DataFrame to a sparklyr table tbl_spark. I tried with sdf_register, but it failed with following error. In here, df is spark dataframe. ...

r apache-spark sparklyr

Italy asked 16/1, 2018 at 19:48

1

Solved

Sparklyr using case_when with variables

Sparklyr fails when using a case_when with external variables. Working Example: test <- copy_to(sc, tibble(column = c(1,2,3,4))) test %>% mutate(group = case_when( column %in% c(1,2) ~ 'g...

r dplyr sparklyr dbplyr

Cudlip asked 10/10, 2017 at 23:19

1

Solved

Sparklyr ignoring line delimiter

I'm trying to read a .csv of 2GB~ (5mi lines) in sparklyr with: bigcsvspark <- spark_read_csv(sc, "bigtxt", "path", delimiter = "!", infer_schema = FALSE, memory = TRUE, overwrite = TRUE, ...

r csv sparklyr

Geriatric asked 13/10, 2017 at 19:1

2

Specifying col type in Sparklyr (spark_read_csv)

I am reading in a csv into spark using SpraklyR schema <- structType(structField("TransTime", "array<timestamp>", TRUE), structField("TransDay", "Date", TRUE)) spark_read_csv(sc, filen...

r sparklyr

Flak asked 24/3, 2017 at 15:17

2

Solved

SparklyR removing a Table from Spark Context

Would like to remove a single data table from the Spark Context ('sc'). I know a single cached table can be un-cached, but this isn't the same as removing an object from the sc -- as far as I can g...

r apache-spark rstudio sparklyr

Kong asked 7/12, 2016 at 18:49

1

Converting string/chr to date using sparklyr

I've brought a table into Hue which has a column of dates and i'm trying to play with it using sparklyr in Rstudio. I'd like to convert a character column into a date column like so: Weather_data =...

r apache-spark hive dplyr sparklyr

Redfaced asked 27/9, 2017 at 17:13

5

spark: java.io.IOException: No space left on device [again!]

I am getting the java.io.IOException: No space left on device that occurs after running a simple query in sparklyr. I use both last versions of Spark (2.1.1) and Sparklyr df_new <-spark_read_pa...

r apache-spark pyspark sparklyr

Gnosticize asked 3/7, 2017 at 14:32

1

Solved

How to filter on partial match using sparklyr

I'm new to sparklyr (but familiar with spark and pyspark), and I've got a really basic question. I'm trying to filter a column based on a partial match. In dplyr, i'd write my operation as so: bus...

r apache-spark dplyr sparklyr

Eryneryngo asked 18/9, 2017 at 23:19

2

rollapply for large data using sparklyr

I want to estimate rolling value-at-risk for a dataset of about 22.5 million observations, thus I want to use sparklyr for fast computation. Here is what I did (using a sample database): library(P...

r dplyr sparklyr rollapply performanceanalytics

Ziwot asked 3/9, 2017 at 14:10

2

Out of memory error when collecting data out of Spark cluster

I know there are plenty of questions on SO about out of memory errors on Spark but I haven't found a solution to mine. I have a simple workflow: read in ORC files from Amazon S3 filter down to ...

apache-spark memory sparklyr

Wales asked 25/8, 2017 at 1:35

1

Solved

How to use a predicate while reading from JDBC connection?

By default, spark_read_jdbc() reads an entire database table into Spark. I've used the following syntax to create these connections. library(sparklyr) library(dplyr) config <- spark_config() c...

r apache-spark jdbc sparklyr

Mccool asked 31/7, 2017 at 16:26

1

Solved

Matrix Math With Sparklyr

Looking to convert some R code to Sparklyr, functions such as lmtest::coeftest() and sandwich::sandwich(). Trying to get started with Sparklyr extensions but pretty new to the Spark API and having ...

r apache-spark apache-spark-mllib sparklyr

Sabelle asked 17/6, 2017 at 6:52

1

Solved

Sparklyr: Use group_by and then concatenate strings from rows in a group

I am trying to use the group_by() and mutate() functions in sparklyr to concatenate rows in a group. Here is a simple example that I think should work but doesn't: library(sparkylr) d <- data....

r data-science sparklyr

Rivalry asked 6/6, 2017 at 21:15

1

Solved

How to use sdf_pivot() in sparklyr and concatenate strings?

I am trying to use the sdf_pivot() function in sparklyr to "gather" a long format data frame into a wide format. The values of the variables are strings that I would like to concatenate. Here is a...

r sparklyr

Mattins asked 19/5, 2017 at 19:40

1

Solved

How to store data in a Spark cluster using sparklyr?

If I connect to a Spark cluster, copy some data to it, and disconnect, ... library(dplyr) library(sparklyr) sc <- spark_connect("local") copy_to(sc, iris) src_tbls(sc) ## [1] "iris" spark_disco...

r sparklyr

Verniavernice asked 23/2, 2017 at 13:40

1

Solved

Sparklyr: how to center a Spark table based on column?

I have a Spark table: simx x0: num 1.00 2.00 3.00 ... x1: num 2.00 3.00 4.00 ... ... x788: num 2.00 3.00 4.00 ... and a handle named simX_tbl in the R environment that is connected to this simx ...

r apache-spark dplyr sparkr sparklyr

Joris asked 25/4, 2017 at 14:56

1

Solved

"GC overhead limit exceeded" on cache of large dataset into spark memory (via sparklyr & RStudio)

I am very new to the Big Data technologies I am attempting to work with, but have so far managed to set up sparklyr in RStudio to connect to a standalone Spark cluster. Data is stored in Cassandra,...

r apache-spark cassandra sparklyr

Protozoal asked 6/3, 2017 at 12:12

4

Solved

Can sparklyr be used with spark deployed on yarn-managed hadoop cluster?

Is the sparklyr R package able to connect to YARN-managed hadoop clusters? This doesn't seem to be documented in the cluster deployment documentation. Using the SparkR package that ships with Spark...

r apache-spark hadoop-yarn sparkapi sparklyr

Merous asked 29/6, 2016 at 14:42

1

Solved

Why doesn't ml_create_dummy_variables show new dummy variable columns in sparklyr

I'm trying to create a model matrix in sparklyr. There is a function ml_create_dummy_variables() for creating dummy variables for one categorical variable at a time. As far as I can tell there is n...

r apache-spark machine-learning data-science sparklyr

Flatfoot asked 9/3, 2017 at 17:23

1

Solved

Importing cassandra table into spark via sparklyr - possible to select only some columns?

I've been working with sparklyr to bring large cassandra tables into spark, register these with R and conduct dplyr operations on them. I have been successfully importing cassandra tables with the...

r apache-spark cassandra cql sparklyr

Alcestis asked 2/3, 2017 at 15:7

0

Sparklyr "embedded nul in string" when collecting

In R I have a spark connection and a DataFrame as ddf. library(sparklyr) library(tidyverse) sc <- spark_connect(master = "foo", version = "2.0.2") ddf <- spark_read_parquet(sc, name='test', ...

r apache-spark dplyr sparklyr

Iives asked 20/2, 2017 at 9:38

1

Solved

Changing column data type to factor with sparklyr

I am pretty new to Spark and am currently using it using the R API through sparkly package. I created a Spark data frame from hive query. The data types are not specified correctly in the source ta...

r apache-spark dplyr apache-spark-sql sparklyr

Hooge asked 21/12, 2016 at 2:22

1

Solved

Simple command for extracting column names in sparklyr (R+spark)

In base r, it is easy to extract the names of columns (variables) from a data frame > testdf <- data.frame(a1 = rnorm(1e5), a2 = rnorm(1e5), a3 = rnorm(1e5), a4 = rnorm(1e5), a5 = rnorm(1e5)...

r apache-spark dplyr sparklyr

Fugacious asked 11/10, 2016 at 13:56

sparklyr Questions

Recommended topics

Hot tags