Is there a function similar to melt
in SparkR library?
Transform data with 1 row and 50 columns to 50 rows and 3 columns?
Is there a function similar to melt
in SparkR library?
Transform data with 1 row and 50 columns to 50 rows and 3 columns?
There is no built-in function that provides a similar functionality in SparkR. You can built your own with explode
library(magrittr)
df <- createDataFrame(data.frame(
A = c('a', 'b', 'c'),
B = c(1, 3, 5),
C = c(2, 4, 6)
))
melt <- function(df, id.vars, measure.vars,
variable.name = "key", value.name = "value") {
measure.vars.exploded <- purrr::map(
measure.vars, function(c) list(lit(c), column(c))) %>%
purrr::flatten() %>%
(function(x) do.call(create_map, x)) %>%
explode()
id.vars <- id.vars %>% purrr::map(column)
do.call(select, c(df, id.vars, measure.vars.exploded)) %>%
withColumnRenamed("key", variable.name) %>%
withColumnRenamed("value", value.name)
}
melt(df, c("A"), c("B", "C")) %>% head()
A key value
1 a B 1
2 a C 2
3 b B 3
4 b C 4
5 c B 5
6 c C 6
or use SQL with Hive's stack
UDF:
stack <- function(df, id.vars, measure.vars,
variable.name = "key", value.name = "value") {
measure.vars.exploded <- glue::glue('"{measure.vars}", `{measure.vars}`') %>%
glue::glue_collapse(" , ") %>%
(function(x) glue::glue(
"stack({length(measure.vars)}, {x}) as ({variable.name}, {value.name})"
)) %>%
as.character()
do.call(selectExpr, c(df, id.vars, measure.vars.exploded))
}
stack(df, c("A"), c("B", "C")) %>% head()
A key value
1 a B 1
2 a C 2
3 b B 3
4 b C 4
5 c B 5
6 c C 6
Related questions:
id.vars <- id.vars %>% purrr::map(column) do.call(select, c(df, id.vars, measure.vars.exploded))
with df %>% select(c(id.vars, measure.vars.exploded))
at least it worked on my version of RSpark (2.4) –
Platinize © 2022 - 2024 — McMap. All rights reserved.
data.frame
are not compatible withSparkDatafarme
. – Corruption