Situation
I used to work on Rstudio with data.table instead of plyr or sqldf because it's really fast. Now, i'm working on sparkR on an azure cluster and i'd like to now if i can use data.table on my spark Data frames and if it's faster than sql ?
Situation
I used to work on Rstudio with data.table instead of plyr or sqldf because it's really fast. Now, i'm working on sparkR on an azure cluster and i'd like to now if i can use data.table on my spark Data frames and if it's faster than sql ?
It is not possible. SparkDataFrames
are Java objects with a thin R interface. While it is possible to use worker side R in some limited cases (dapply
, gapply
) there is no use for data.table
there.
© 2022 - 2024 — McMap. All rights reserved.
sparklyr
package by Rstudio which allows you to use a spark dataframe withdplyr
. – Rydder