sparklyr Questions
1
Solved
I have a dataframe in Spark, and would like to calculate the 0.1 quantile after grouping by a specific column.
For example:
> library(sparklyr)
> library(tidyverse)
> con = spark_connect...
Fishbein asked 12/2, 2018 at 12:32
3
Solved
Is there a way to replicate the rows of a Spark's dataframe using the functions of sparklyr/dplyr?
sc <- spark_connect(master = "spark://####:7077")
df_tbl <- copy_to(sc, data.frame(row1 = ...
Turku asked 13/6, 2017 at 20:5
1
I'm trying to convert spark dataframe org.apache.spark.sql.DataFrame to a sparklyr table tbl_spark. I tried with sdf_register, but it failed with following error.
In here, df is spark dataframe.
...
Italy asked 16/1, 2018 at 19:48
1
Solved
Sparklyr fails when using a case_when with external variables.
Working Example:
test <- copy_to(sc, tibble(column = c(1,2,3,4)))
test %>%
mutate(group = case_when(
column %in% c(1,2) ~ 'g...
1
Solved
I'm trying to read a .csv of 2GB~ (5mi lines) in sparklyr with:
bigcsvspark <- spark_read_csv(sc, "bigtxt", "path",
delimiter = "!",
infer_schema = FALSE,
memory = TRUE,
overwrite = TRUE,
...
2
I am reading in a csv into spark using SpraklyR
schema <- structType(structField("TransTime", "array<timestamp>", TRUE),
structField("TransDay", "Date", TRUE))
spark_read_csv(sc, filen...
2
Solved
Would like to remove a single data table from the Spark Context ('sc'). I know a single cached table can be un-cached, but this isn't the same as removing an object from the sc -- as far as I can g...
Kong asked 7/12, 2016 at 18:49
1
I've brought a table into Hue which has a column of dates and i'm trying to play with it using sparklyr in Rstudio.
I'd like to convert a character column into a date column like so:
Weather_data =...
Redfaced asked 27/9, 2017 at 17:13
5
I am getting the java.io.IOException: No space left on device that occurs after running a simple query in sparklyr. I use both last versions of Spark (2.1.1) and Sparklyr
df_new <-spark_read_pa...
Gnosticize asked 3/7, 2017 at 14:32
1
Solved
I'm new to sparklyr (but familiar with spark and pyspark), and I've got a really basic question. I'm trying to filter a column based on a partial match. In dplyr, i'd write my operation as so:
bus...
Eryneryngo asked 18/9, 2017 at 23:19
2
I want to estimate rolling value-at-risk for a dataset of about 22.5 million observations, thus I want to use sparklyr for fast computation. Here is what I did (using a sample database):
library(P...
Ziwot asked 3/9, 2017 at 14:10
2
I know there are plenty of questions on SO about out of memory errors on Spark but I haven't found a solution to mine.
I have a simple workflow:
read in ORC files from Amazon S3
filter down to ...
Wales asked 25/8, 2017 at 1:35
1
Solved
By default, spark_read_jdbc() reads an entire database table into Spark. I've used the following syntax to create these connections.
library(sparklyr)
library(dplyr)
config <- spark_config()
c...
Mccool asked 31/7, 2017 at 16:26
1
Solved
Looking to convert some R code to Sparklyr, functions such as lmtest::coeftest() and sandwich::sandwich(). Trying to get started with Sparklyr extensions but pretty new to the Spark API and having ...
Sabelle asked 17/6, 2017 at 6:52
1
Solved
I am trying to use the group_by() and mutate() functions in sparklyr to concatenate rows in a group.
Here is a simple example that I think should work but doesn't:
library(sparkylr)
d <- data....
Rivalry asked 6/6, 2017 at 21:15
1
Solved
I am trying to use the sdf_pivot() function in sparklyr to "gather" a long format data frame into a wide format. The values of the variables are strings that I would like to concatenate.
Here is a...
1
Solved
If I connect to a Spark cluster, copy some data to it, and disconnect, ...
library(dplyr)
library(sparklyr)
sc <- spark_connect("local")
copy_to(sc, iris)
src_tbls(sc)
## [1] "iris"
spark_disco...
1
Solved
I have a Spark table:
simx
x0: num 1.00 2.00 3.00 ...
x1: num 2.00 3.00 4.00 ...
...
x788: num 2.00 3.00 4.00 ...
and a handle named simX_tbl in the R environment that is connected to this simx ...
Joris asked 25/4, 2017 at 14:56
1
Solved
I am very new to the Big Data technologies I am attempting to work with, but have so far managed to set up sparklyr in RStudio to connect to a standalone Spark cluster. Data is stored in Cassandra,...
Protozoal asked 6/3, 2017 at 12:12
4
Solved
Is the sparklyr R package able to connect to YARN-managed hadoop clusters? This doesn't seem to be documented in the cluster deployment documentation. Using the SparkR package that ships with Spark...
Merous asked 29/6, 2016 at 14:42
1
Solved
I'm trying to create a model matrix in sparklyr. There is a function ml_create_dummy_variables() for creating dummy variables for one categorical variable at a time. As far as I can tell there is n...
Flatfoot asked 9/3, 2017 at 17:23
1
Solved
I've been working with sparklyr to bring large cassandra tables into spark, register these with R and conduct dplyr operations on them.
I have been successfully importing cassandra tables with the...
Alcestis asked 2/3, 2017 at 15:7
0
In R I have a spark connection and a DataFrame as ddf.
library(sparklyr)
library(tidyverse)
sc <- spark_connect(master = "foo", version = "2.0.2")
ddf <- spark_read_parquet(sc, name='test', ...
Iives asked 20/2, 2017 at 9:38
1
Solved
I am pretty new to Spark and am currently using it using the R API through sparkly package. I created a Spark data frame from hive query. The data types are not specified correctly in the source ta...
Hooge asked 21/12, 2016 at 2:22
1
Solved
In base r, it is easy to extract the names of columns (variables) from a data frame
> testdf <- data.frame(a1 = rnorm(1e5), a2 = rnorm(1e5), a3 = rnorm(1e5), a4 = rnorm(1e5), a5 = rnorm(1e5)...
Fugacious asked 11/10, 2016 at 13:56
© 2022 - 2024 — McMap. All rights reserved.