Sparklyr: Use group_by and then concatenate strings from rows in a group

library(sparkylr) d <- data.frame(id=c("1", "1", "2", "2", "1", "2"), x=c("200", "200", "200", "201", "201", "201"), y=c("This", "That", "The", "Other", "End", "End")) d_sdf <- copy_to(sc, d, "d") d_sdf %>% group_by(id, x) %>% mutate( y = paste(y, collapse = " "))

Source: local data frame [6 x 3] Groups: id, x [4] # A tibble: 6 x 3 id x y <fctr> <fctr> <chr> 1 1 200 This That 2 1 200 This That 3 2 200 The 4 2 201 Other End 5 1 201 End 6 2 201 Other End

Spark sql doesn't like it if you use aggregate functions without aggregating, hence the reason why this works in dplyr with an ordinary dataframe but not in a SparkDataFrame- sparklyr translates your commands to an sql statement. You can observe this going wrong if you look at the second bit in the error message:

== SQL ==
SELECT `id`, `x`, CONCAT_WS(' ', `y`, ' ' AS "collapse") AS `y`

paste gets translated to CONCAT_WS. concat however would paste columns together.

A better equivalent would be collect_list and collect_set, but they produce list outputs.

But you can build on that:

If you do not want to have the same row replicated in your result you can use summarise, collect_list, and paste:

res <- d_sdf %>% 
      group_by(id, x) %>% 
      summarise( yconcat =paste(collect_list(y)))

result:

Source:     lazy query [?? x 3]
Database:   spark connection master=local[8] app=sparklyr local=TRUE
Grouped by: id

     id     x         y
  <chr> <chr>     <chr>
1     1   201       End
2     2   201 Other End
3     1   200 This That
4     2   200       The

you can join this back onto your original data if you do want to have your rows replicated:

d_sdf %>% left_join(res)

result:

Source:     lazy query [?? x 4]
Database:   spark connection master=local[8] app=sparklyr local=TRUE

     id     x     y   yconcat
  <chr> <chr> <chr>     <chr>
1     1   200  This This That
2     1   200  That This That
3     2   200   The       The
4     2   201 Other Other End
5     1   201   End       End
6     2   201   End Other End

Recommended topics

Hot tags