Add column to DataFrame in sparkR
Asked Answered
E

2

13

I would like to add a column filled with a character N in a DataFrame in SparkR. I would do it like that with non-SparkR code :

df$new_column <- "N"

But with SparkR, I get the following error :

Error: class(value) == "Column" || is.null(value) is not TRUE

I've tried insane things to manage it, I was able to create a column using another (existing) one with df <- withColumn(df, "new_column", df$existing_column), but this simple thing, nope...

Any help ?

Thanks.

Ellenaellender answered 19/5, 2016 at 15:22 Comment(2)
The only hack I know for this is to use ifelse with the same return value for both conditions. So df$new <- ifelse(condition, 'N', 'N').Faliscan
Worked, thank you very much (put it as an answer if you want me to validate it)Gitlow
H
15

The straight solution will be to use SparkR::lit() function:

df_new = withColumn(df, "new_column_name", lit("N"))

Edit 7/17/2019

In newer Spark versions, the following also works:

df1$new_column <- "N"
df1[["new_column"]] <- "N"
Hoick answered 19/5, 2016 at 16:19 Comment(4)
Nice! Didn't know about lit(), will delete my answer when the OP accepts yours.Faliscan
How would I add a column full of NAs ?Gitlow
When I attempt to do the same task, df <- withColumn(df, "col", lit(NA)), and then returnstr(df), I receive the following error: Error in FUN(X[[i]], ...) : Unsupported data type: null. I can open a new question, but thought that @DmitriySelivanov or @fmalaussena may know the answer after working with the problem.Propinquity
df <- withColumn(df, "col", lit("NA")) should work (with " " around the NA)Gitlow
F
0

There's an easier way to use SparkR::lit() that more closely mimics the syntax you tried first:

df$new_column <- lit("N")
Firefly answered 29/6, 2018 at 17:58 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.