sparklyr Questions
2
Solved
I am getting heap space errors on even fairly small datasets. I can be sure that I'm not running out of system memory. For example, consider a dataset containing about 20M rows and 9 columns, and t...
Imaginary asked 29/12, 2016 at 17:18
3
Solved
This is my code. I run it in databricks.
library(sparklyr)
library(dplyr)
library(arrow)
sc <- spark_connect(method = "databricks")
tbl_change_db(sc, "prod")
trip_ids <- ...
Schreiner asked 30/3, 2023 at 13:19
1
Solved
test <- data.frame('prod_id'= c("shoe", "shoe", "shoe", "shoe", "shoe", "shoe", "boat", "boat","boat","boat","boat","boat"),
'seller_id'= c("a", "b", "c", "d", "e", "f", "a","g", "h", "r", "q"...
4
Solved
I have two tables that I want to do a full join using dplyr, but I don't want it to drop any of the columns. Per the documentation and my own experience it is only keeping the join column for the l...
2
Solved
Is there any way to disable the hive support in sparklyr?
Just like in SparkR:
sparkR.session(master="local[*]", enableHiveSupport=FALSE)
1
I am using the library sparklyr to interact with 'spark'. There are two functions for put a data frame in a spark context. Such functions are 'dplyr::copy_to' and 'sparklyr::sdf_copy_to'. What is t...
2
Solved
Spark 2.0 with Hive
Let's say I am trying to write a spark dataframe, irisDf to orc and save it to the hive metastore
In Spark I would do that like this,
irisDf.write.format("orc")
.mode("overw...
Phlegethon asked 16/8, 2018 at 22:42
0
Typically when one wants to use sparklyr on a custom function (i.e. **non-translated functions) they place them within spark_apply(). However, I've only encountered examples where a single local da...
3
Solved
After I managed it to connect to our (new) cluster using sparklyr with yarn-client method, now I can show just the tables from the default scheme. How can I connect to scheme.table?
Using DBI it's ...
Hessney asked 5/5, 2017 at 13:35
1
I am trying to create a R package so I can use the Stanford CoreNLP wrapper for Apache Spark (by databricks) from R. I am using the sparklyr package to connect to my local Spark instance. I created...
Mccorkle asked 15/10, 2016 at 22:18
1
As far as I understood, those two packages provide similar but mostly different wrapper functions for Apache Spark. Sparklyr is newer and still needs to grow in the scope of functionality. I theref...
Quietude asked 13/11, 2016 at 19:2
3
Solved
I would like to connect my local desktop RStudio session to a remote spark session via sparklyr. When you go to add a new connection in the sparklyr ui tab in RStudio and choose cluster is says tha...
Cyclopropane asked 30/9, 2016 at 19:28
1
Solved
I have tried below code & its combinations in order to read all files given in a S3 folder , but nothing seems to be working .. Sensitive information/code is removed from the below script. Ther...
Hildie asked 3/12, 2018 at 6:42
1
Solved
Does anyone have any advice about how to convert the tree information from sparklyr's ml_decision_tree_classifier, ml_gbt_classifier, or ml_random_forest_classifier models into a.) a format that ca...
Antisthenes asked 2/11, 2018 at 18:14
3
Consider there are 2 tables or table references in spark which you want to compare, e.g. to ensure that your backup worked correctly. Is there a possibility to do that remote in spark? Because it's...
Turtleback asked 26/7, 2018 at 8:51
4
Solved
In the following example I've loaded a parquet file that contains a nested record of map objects in the meta field. sparklyr seems to do a nice job of dealing with these. However tidyr::unnest does...
1
Solved
Introduction
R code is written by using Sparklyr package to create database schema. [Reproducible code and database is given]
Existing Result
root
|-- contributors : string
|-- created_at : str...
Felipa asked 6/9, 2018 at 0:32
2
I am trying to change the location spark writes temporary files to. Everything I've found online says to set this by setting the SPARK_LOCAL_DIRS parameter in the spark-env.sh file, but I am not ha...
Gutenberg asked 29/8, 2018 at 2:41
1
Solved
Say I have 40 continuous (DoubleType) variables that I've bucketed into quartiles using ft_quantile_discretizer. Identifying the quartiles on all of the variables is super fast, as the function sup...
Otisotitis asked 21/8, 2018 at 23:23
3
I tried using sparklyr to write data to hdfs or hive , but was unable to find a way . Is it even possible to write a R dataframe to hdfs or hive using sparklyr ? Please note , my R and hadoop are r...
Chitkara asked 27/6, 2017 at 21:58
1
Solved
I have 500 million rows in a spark dataframe. I'm interested in using sample_n from dplyr because it will allow me to explicitly specify the sample size I want. If I were to use sparklyr::sdf_sampl...
Earley asked 24/7, 2018 at 15:28
3
Solved
I am using sparklyr to manipulate some data.
Given a,
a<-tibble(id = rep(c(1,10), each = 10),
attribute1 = rep(c("This", "That", 'These', 'Those', "The", "Other", "Test", "End", "Start", 'Begi...
Classmate asked 22/5, 2018 at 10:26
1
Solved
Consider the following example
dtrain <- data_frame(text = c("Chinese Beijing Chinese",
"Chinese Chinese Shanghai",
"Chinese Macao",
"Tokyo Japan Chinese"),
doc_id = 1:4,
class = c(1, 1, 1...
Avionics asked 25/5, 2018 at 17:26
1
Solved
I'm very new to sparklyr and spark, so please let me know if this is not the "spark" way to do this.
My problem
I have 50+ .txt files at around 300 mb each, all in the same folder, call it x, tha...
Shayna asked 31/3, 2018 at 10:23
3
Solved
I have some unix times that I convert to timestamps in sparklyr and for some reasons I also need to convert them into strings.
Unfortunately, it seems that during the conversion to string hive co...
Donnelldonnelly asked 19/3, 2018 at 2:41
1 Next >
© 2022 - 2024 — McMap. All rights reserved.