sparklyr Questions

2

Solved

I am getting heap space errors on even fairly small datasets. I can be sure that I'm not running out of system memory. For example, consider a dataset containing about 20M rows and 9 columns, and t...
Imaginary asked 29/12, 2016 at 17:18

3

Solved

This is my code. I run it in databricks. library(sparklyr) library(dplyr) library(arrow) sc <- spark_connect(method = "databricks") tbl_change_db(sc, "prod") trip_ids <- ...
Schreiner asked 30/3, 2023 at 13:19

1

Solved

test <- data.frame('prod_id'= c("shoe", "shoe", "shoe", "shoe", "shoe", "shoe", "boat", "boat","boat","boat","boat","boat"), 'seller_id'= c("a", "b", "c", "d", "e", "f", "a","g", "h", "r", "q"...
Pshaw asked 4/12, 2018 at 6:9

4

Solved

I have two tables that I want to do a full join using dplyr, but I don't want it to drop any of the columns. Per the documentation and my own experience it is only keeping the join column for the l...
Never asked 5/5, 2017 at 15:52

2

Solved

Is there any way to disable the hive support in sparklyr? Just like in SparkR: sparkR.session(master="local[*]", enableHiveSupport=FALSE)
Springy asked 9/1, 2017 at 16:44

1

I am using the library sparklyr to interact with 'spark'. There are two functions for put a data frame in a spark context. Such functions are 'dplyr::copy_to' and 'sparklyr::sdf_copy_to'. What is t...
Acetophenetidin asked 15/5, 2019 at 11:57

2

Solved

Spark 2.0 with Hive Let's say I am trying to write a spark dataframe, irisDf to orc and save it to the hive metastore In Spark I would do that like this, irisDf.write.format("orc") .mode("overw...
Phlegethon asked 16/8, 2018 at 22:42

0

Typically when one wants to use sparklyr on a custom function (i.e. **non-translated functions) they place them within spark_apply(). However, I've only encountered examples where a single local da...
Fearnought asked 18/3, 2020 at 16:33

3

Solved

After I managed it to connect to our (new) cluster using sparklyr with yarn-client method, now I can show just the tables from the default scheme. How can I connect to scheme.table? Using DBI it's ...
Hessney asked 5/5, 2017 at 13:35

1

I am trying to create a R package so I can use the Stanford CoreNLP wrapper for Apache Spark (by databricks) from R. I am using the sparklyr package to connect to my local Spark instance. I created...
Mccorkle asked 15/10, 2016 at 22:18

1

As far as I understood, those two packages provide similar but mostly different wrapper functions for Apache Spark. Sparklyr is newer and still needs to grow in the scope of functionality. I theref...
Quietude asked 13/11, 2016 at 19:2

3

Solved

I would like to connect my local desktop RStudio session to a remote spark session via sparklyr. When you go to add a new connection in the sparklyr ui tab in RStudio and choose cluster is says tha...
Cyclopropane asked 30/9, 2016 at 19:28

1

Solved

I have tried below code & its combinations in order to read all files given in a S3 folder , but nothing seems to be working .. Sensitive information/code is removed from the below script. Ther...
Hildie asked 3/12, 2018 at 6:42

1

Solved

Does anyone have any advice about how to convert the tree information from sparklyr's ml_decision_tree_classifier, ml_gbt_classifier, or ml_random_forest_classifier models into a.) a format that ca...
Antisthenes asked 2/11, 2018 at 18:14

3

Consider there are 2 tables or table references in spark which you want to compare, e.g. to ensure that your backup worked correctly. Is there a possibility to do that remote in spark? Because it's...
Turtleback asked 26/7, 2018 at 8:51

4

Solved

In the following example I've loaded a parquet file that contains a nested record of map objects in the meta field. sparklyr seems to do a nice job of dealing with these. However tidyr::unnest does...
Dumbfound asked 1/9, 2016 at 16:52

1

Solved

Introduction R code is written by using Sparklyr package to create database schema. [Reproducible code and database is given] Existing Result root |-- contributors : string |-- created_at : str...
Felipa asked 6/9, 2018 at 0:32

2

I am trying to change the location spark writes temporary files to. Everything I've found online says to set this by setting the SPARK_LOCAL_DIRS parameter in the spark-env.sh file, but I am not ha...
Gutenberg asked 29/8, 2018 at 2:41

1

Solved

Say I have 40 continuous (DoubleType) variables that I've bucketed into quartiles using ft_quantile_discretizer. Identifying the quartiles on all of the variables is super fast, as the function sup...
Otisotitis asked 21/8, 2018 at 23:23

3

I tried using sparklyr to write data to hdfs or hive , but was unable to find a way . Is it even possible to write a R dataframe to hdfs or hive using sparklyr ? Please note , my R and hadoop are r...
Chitkara asked 27/6, 2017 at 21:58

1

Solved

I have 500 million rows in a spark dataframe. I'm interested in using sample_n from dplyr because it will allow me to explicitly specify the sample size I want. If I were to use sparklyr::sdf_sampl...
Earley asked 24/7, 2018 at 15:28

3

Solved

I am using sparklyr to manipulate some data. Given a, a<-tibble(id = rep(c(1,10), each = 10), attribute1 = rep(c("This", "That", 'These', 'Those', "The", "Other", "Test", "End", "Start", 'Begi...
Classmate asked 22/5, 2018 at 10:26

1

Solved

Consider the following example dtrain <- data_frame(text = c("Chinese Beijing Chinese", "Chinese Chinese Shanghai", "Chinese Macao", "Tokyo Japan Chinese"), doc_id = 1:4, class = c(1, 1, 1...
Avionics asked 25/5, 2018 at 17:26

1

Solved

I'm very new to sparklyr and spark, so please let me know if this is not the "spark" way to do this. My problem I have 50+ .txt files at around 300 mb each, all in the same folder, call it x, tha...
Shayna asked 31/3, 2018 at 10:23

3

Solved

I have some unix times that I convert to timestamps in sparklyr and for some reasons I also need to convert them into strings. Unfortunately, it seems that during the conversion to string hive co...
Donnelldonnelly asked 19/3, 2018 at 2:41

© 2022 - 2024 — McMap. All rights reserved.