sparklyr - McMap

2

Solved

Running out of heap space in sparklyr, but have plenty of memory

I am getting heap space errors on even fairly small datasets. I can be sure that I'm not running out of system memory. For example, consider a dataset containing about 20M rows and 9 columns, and t...

r apache-spark dplyr sparklyr

Imaginary asked 29/12, 2016 at 17:18

3

Solved

R and sparklyr: Why is a simple query so slow?

This is my code. I run it in databricks. library(sparklyr) library(dplyr) library(arrow) sc <- spark_connect(method = "databricks") tbl_change_db(sc, "prod") trip_ids <- ...

r apache-spark sparklyr

Schreiner asked 30/3, 2023 at 13:19

1

Solved

Writing a function to use with spark_apply() from sparklyr

test <- data.frame('prod_id'= c("shoe", "shoe", "shoe", "shoe", "shoe", "shoe", "boat", "boat","boat","boat","boat","boat"), 'seller_id'= c("a", "b", "c", "d", "e", "f", "a","g", "h", "r", "q"...

r dplyr sparklyr

Pshaw asked 4/12, 2018 at 6:9

4

Solved

Is it possible to do a full join in dplyr and keep all the columns used in the join?

I have two tables that I want to do a full join using dplyr, but I don't want it to drop any of the columns. Per the documentation and my own experience it is only keeping the join column for the l...

r dplyr sparklyr

Never asked 5/5, 2017 at 15:52

2

Solved

Disable hive support in sparklyr

Is there any way to disable the hive support in sparklyr? Just like in SparkR: sparkR.session(master="local[*]", enableHiveSupport=FALSE)

r sparklyr

Springy asked 9/1, 2017 at 16:44

1

what is the difference between dplyr::copy_to and sparklyr::sdf_copy_to?

I am using the library sparklyr to interact with 'spark'. There are two functions for put a data frame in a spark context. Such functions are 'dplyr::copy_to' and 'sparklyr::sdf_copy_to'. What is t...

r dplyr sparklyr

Acetophenetidin asked 15/5, 2019 at 11:57

2

Solved

sparklyr can I pass format and path options into spark_write_table? or use saveAsTable with spark_write_orc?

Spark 2.0 with Hive Let's say I am trying to write a spark dataframe, irisDf to orc and save it to the hive metastore In Spark I would do that like this, irisDf.write.format("orc") .mode("overw...

r apache-spark hive apache-spark-sql sparklyr

Phlegethon asked 16/8, 2018 at 22:42

0

Creating Spark Objects from JPEG and using spark_apply() on a non-translated function

Typically when one wants to use sparklyr on a custom function (i.e. **non-translated functions) they place them within spark_apply(). However, I've only encountered examples where a single local da...

r binary jpeg sparklyr

Fearnought asked 18/3, 2020 at 16:33

3

Solved

Access table in other than default scheme (database) from sparklyr

After I managed it to connect to our (new) cluster using sparklyr with yarn-client method, now I can show just the tables from the default scheme. How can I connect to scheme.table? Using DBI it's ...

r apache-spark dplyr sparklyr

Hessney asked 5/5, 2017 at 13:35

1

How to implement Stanford CoreNLP wrapper for Apache Spark using sparklyr?

I am trying to create a R package so I can use the Stanford CoreNLP wrapper for Apache Spark (by databricks) from R. I am using the sparklyr package to connect to my local Spark instance. I created...

r apache-spark stanford-nlp sparklyr

Mccorkle asked 15/10, 2016 at 22:18

1

Using SparkR and Sparklyr simultaneously

As far as I understood, those two packages provide similar but mostly different wrapper functions for Apache Spark. Sparklyr is newer and still needs to grow in the scope of functionality. I theref...

r apache-spark sparkr sparklyr

Quietude asked 13/11, 2016 at 19:2

3

Solved

Connect sparklyr to remote spark connection

I would like to connect my local desktop RStudio session to a remote spark session via sparklyr. When you go to add a new connection in the sparklyr ui tab in RStudio and choose cluster is says tha...

r apache-spark sparklyr

Cyclopropane asked 30/9, 2016 at 19:28

1

Solved

How to read all files in S3 folder/bucket using sparklyr in R?

I have tried below code & its combinations in order to read all files given in a S3 folder , but nothing seems to be working .. Sensitive information/code is removed from the below script. Ther...

r apache-spark amazon-s3 rstudio sparklyr

Hildie asked 3/12, 2018 at 6:42

1

Solved

Extract and Visualize Model Trees from Sparklyr

Does anyone have any advice about how to convert the tree information from sparklyr's ml_decision_tree_classifier, ml_gbt_classifier, or ml_random_forest_classifier models into a.) a format that ca...

r apache-spark random-forest decision-tree sparklyr

Antisthenes asked 2/11, 2018 at 18:14

3

Find out if 2 tables (`tbl_spark`) are equal without collecting them using sparklyr

Consider there are 2 tables or table references in spark which you want to compare, e.g. to ensure that your backup worked correctly. Is there a possibility to do that remote in spark? Because it's...

r apache-spark dataframe dplyr sparklyr

Turtleback asked 26/7, 2018 at 8:51

4

Solved

Is there a way to deal with nested data with sparklyr?

In the following example I've loaded a parquet file that contains a nested record of map objects in the meta field. sparklyr seems to do a nice job of dealing with these. However tidyr::unnest does...

r tidyr sparklyr

Dumbfound asked 1/9, 2016 at 16:52

1

Solved

How to flatten the data of different data types by using Sparklyr package?

Introduction R code is written by using Sparklyr package to create database schema. [Reproducible code and database is given] Existing Result root |-- contributors : string |-- created_at : str...

r apache-spark nested flatten sparklyr

Felipa asked 6/9, 2018 at 0:32

2

How to set SPARK_LOCAL_DIRS parameter using spark-env.sh file

I am trying to change the location spark writes temporary files to. Everything I've found online says to set this by setting the SPARK_LOCAL_DIRS parameter in the spark-env.sh file, but I am not ha...

apache-spark sparklyr

Gutenberg asked 29/8, 2018 at 2:41

1

Solved

Slowdown with repeated calls to spark dataframe in memory

Say I have 40 continuous (DoubleType) variables that I've bucketed into quartiles using ft_quantile_discretizer. Identifying the quartiles on all of the variables is super fast, as the function sup...

r apache-spark apache-spark-ml sparklyr

Otisotitis asked 21/8, 2018 at 23:23

3

sparklyr write data to hdfs or hive

I tried using sparklyr to write data to hdfs or hive , but was unable to find a way . Is it even possible to write a R dataframe to hdfs or hive using sparklyr ? Please note , my R and hadoop are r...

sparklyr

Chitkara asked 27/6, 2017 at 21:58

1

Solved

Is sample_n really a random sample when used with sparklyr?

I have 500 million rows in a spark dataframe. I'm interested in using sample_n from dplyr because it will allow me to explicitly specify the sample size I want. If I were to use sparklyr::sdf_sampl...

r apache-spark random dplyr sparklyr

Earley asked 24/7, 2018 at 15:28

3

Solved

Gather in sparklyr

I am using sparklyr to manipulate some data. Given a, a<-tibble(id = rep(c(1,10), each = 10), attribute1 = rep(c("This", "That", 'These', 'Those', "The", "Other", "Test", "End", "Start", 'Begi...

r apache-spark dplyr sparklyr

Classmate asked 22/5, 2018 at 10:26

1

Solved

How to train a ML model in sparklyr and predict new values on another dataframe?

Consider the following example dtrain <- data_frame(text = c("Chinese Beijing Chinese", "Chinese Chinese Shanghai", "Chinese Macao", "Tokyo Japan Chinese"), doc_id = 1:4, class = c(1, 1, 1...

r apache-spark apache-spark-ml sparklyr

Avionics asked 25/5, 2018 at 17:26

1

Solved

Importing multiple files in sparklyr

I'm very new to sparklyr and spark, so please let me know if this is not the "spark" way to do this. My problem I have 50+ .txt files at around 300 mb each, all in the same folder, call it x, tha...

r apache-spark sparklyr

Shayna asked 31/3, 2018 at 10:23

3

Solved

how to convert a timestamp into string (without changing timezone)?

I have some unix times that I convert to timestamps in sparklyr and for some reasons I also need to convert them into strings. Unfortunately, it seems that during the conversion to string hive co...

r apache-spark hive timestamp sparklyr

Donnelldonnelly asked 19/3, 2018 at 2:41

sparklyr Questions

Recommended topics

Hot tags