sparklyr Questions

1

Solved

I have a dataframe in Spark, and would like to calculate the 0.1 quantile after grouping by a specific column. For example: > library(sparklyr) > library(tidyverse) > con = spark_connect...
Fishbein asked 12/2, 2018 at 12:32

3

Solved

Is there a way to replicate the rows of a Spark's dataframe using the functions of sparklyr/dplyr? sc <- spark_connect(master = "spark://####:7077") df_tbl <- copy_to(sc, data.frame(row1 = ...
Turku asked 13/6, 2017 at 20:5

1

I'm trying to convert spark dataframe org.apache.spark.sql.DataFrame to a sparklyr table tbl_spark. I tried with sdf_register, but it failed with following error. In here, df is spark dataframe. ...
Italy asked 16/1, 2018 at 19:48

1

Solved

Sparklyr fails when using a case_when with external variables. Working Example: test <- copy_to(sc, tibble(column = c(1,2,3,4))) test %>% mutate(group = case_when( column %in% c(1,2) ~ 'g...
Cudlip asked 10/10, 2017 at 23:19

1

Solved

I'm trying to read a .csv of 2GB~ (5mi lines) in sparklyr with: bigcsvspark <- spark_read_csv(sc, "bigtxt", "path", delimiter = "!", infer_schema = FALSE, memory = TRUE, overwrite = TRUE, ...
Geriatric asked 13/10, 2017 at 19:1

2

I am reading in a csv into spark using SpraklyR schema <- structType(structField("TransTime", "array<timestamp>", TRUE), structField("TransDay", "Date", TRUE)) spark_read_csv(sc, filen...
Flak asked 24/3, 2017 at 15:17

2

Solved

Would like to remove a single data table from the Spark Context ('sc'). I know a single cached table can be un-cached, but this isn't the same as removing an object from the sc -- as far as I can g...
Kong asked 7/12, 2016 at 18:49

1

I've brought a table into Hue which has a column of dates and i'm trying to play with it using sparklyr in Rstudio. I'd like to convert a character column into a date column like so: Weather_data =...
Redfaced asked 27/9, 2017 at 17:13

5

I am getting the java.io.IOException: No space left on device that occurs after running a simple query in sparklyr. I use both last versions of Spark (2.1.1) and Sparklyr df_new <-spark_read_pa...
Gnosticize asked 3/7, 2017 at 14:32

1

Solved

I'm new to sparklyr (but familiar with spark and pyspark), and I've got a really basic question. I'm trying to filter a column based on a partial match. In dplyr, i'd write my operation as so: bus...
Eryneryngo asked 18/9, 2017 at 23:19

2

I want to estimate rolling value-at-risk for a dataset of about 22.5 million observations, thus I want to use sparklyr for fast computation. Here is what I did (using a sample database): library(P...
Ziwot asked 3/9, 2017 at 14:10

2

I know there are plenty of questions on SO about out of memory errors on Spark but I haven't found a solution to mine. I have a simple workflow: read in ORC files from Amazon S3 filter down to ...
Wales asked 25/8, 2017 at 1:35

1

Solved

By default, spark_read_jdbc() reads an entire database table into Spark. I've used the following syntax to create these connections. library(sparklyr) library(dplyr) config <- spark_config() c...
Mccool asked 31/7, 2017 at 16:26

1

Solved

Looking to convert some R code to Sparklyr, functions such as lmtest::coeftest() and sandwich::sandwich(). Trying to get started with Sparklyr extensions but pretty new to the Spark API and having ...
Sabelle asked 17/6, 2017 at 6:52

1

Solved

I am trying to use the group_by() and mutate() functions in sparklyr to concatenate rows in a group. Here is a simple example that I think should work but doesn't: library(sparkylr) d <- data....
Rivalry asked 6/6, 2017 at 21:15

1

Solved

I am trying to use the sdf_pivot() function in sparklyr to "gather" a long format data frame into a wide format. The values of the variables are strings that I would like to concatenate. Here is a...
Mattins asked 19/5, 2017 at 19:40

1

Solved

If I connect to a Spark cluster, copy some data to it, and disconnect, ... library(dplyr) library(sparklyr) sc <- spark_connect("local") copy_to(sc, iris) src_tbls(sc) ## [1] "iris" spark_disco...
Verniavernice asked 23/2, 2017 at 13:40

1

Solved

I have a Spark table: simx x0: num 1.00 2.00 3.00 ... x1: num 2.00 3.00 4.00 ... ... x788: num 2.00 3.00 4.00 ... and a handle named simX_tbl in the R environment that is connected to this simx ...
Joris asked 25/4, 2017 at 14:56

1

Solved

I am very new to the Big Data technologies I am attempting to work with, but have so far managed to set up sparklyr in RStudio to connect to a standalone Spark cluster. Data is stored in Cassandra,...
Protozoal asked 6/3, 2017 at 12:12

4

Solved

Is the sparklyr R package able to connect to YARN-managed hadoop clusters? This doesn't seem to be documented in the cluster deployment documentation. Using the SparkR package that ships with Spark...
Merous asked 29/6, 2016 at 14:42

1

Solved

I'm trying to create a model matrix in sparklyr. There is a function ml_create_dummy_variables() for creating dummy variables for one categorical variable at a time. As far as I can tell there is n...
Flatfoot asked 9/3, 2017 at 17:23

1

Solved

I've been working with sparklyr to bring large cassandra tables into spark, register these with R and conduct dplyr operations on them. I have been successfully importing cassandra tables with the...
Alcestis asked 2/3, 2017 at 15:7

0

In R I have a spark connection and a DataFrame as ddf. library(sparklyr) library(tidyverse) sc <- spark_connect(master = "foo", version = "2.0.2") ddf <- spark_read_parquet(sc, name='test', ...
Iives asked 20/2, 2017 at 9:38

1

Solved

I am pretty new to Spark and am currently using it using the R API through sparkly package. I created a Spark data frame from hive query. The data types are not specified correctly in the source ta...
Hooge asked 21/12, 2016 at 2:22

1

Solved

In base r, it is easy to extract the names of columns (variables) from a data frame > testdf <- data.frame(a1 = rnorm(1e5), a2 = rnorm(1e5), a3 = rnorm(1e5), a4 = rnorm(1e5), a5 = rnorm(1e5)...
Fugacious asked 11/10, 2016 at 13:56

© 2022 - 2024 — McMap. All rights reserved.