Spark dataframe add a row for every existing row
Asked Answered
E

1

6

I have a dataframe with following columns:

groupid,unit,height
----------------------
1,in,55
2,in,54

I want to create another dataframe with additional rows where unit=cm and height=height*2.54.

Resulting dataframe:

groupid,unit,height
----------------------
1,in,55
2,in,54
1,cm,139.7
2,cm,137.16

Not sure how I can use spark udf and explode here. Any help is appreciated. Thanks in advance.

Enabling answered 10/7, 2017 at 3:19 Comment(0)
A
11

you can create another dataframe with changes you require using withColumn and then union both dataframes as

import sqlContext.implicits._
import org.apache.spark.sql.functions._

val df = Seq(
  (1, "in", 55),
  (2, "in", 54)
).toDF("groupid", "unit", "height")

val df2 = df.withColumn("unit", lit("cm")).withColumn("height", col("height")*2.54)

df.union(df2).show(false)

you should have

+-------+----+------+
|groupid|unit|height|
+-------+----+------+
|1      |in  |55.0  |
|2      |in  |54.0  |
|1      |cm  |139.7 |
|2      |cm  |137.16|
+-------+----+------+
Alodee answered 10/7, 2017 at 3:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.