how to add a Incremental column ID for a table in spark SQL

About

Asked 14/7, 2016 at 14:36 Answered 8/8, 2017 at 12:59

Solved apache-spark apache-spark-sql apache-spark-mllib

I'm working on a spark mllib algorithm. The dataset I have is in this form

Company":"XXXX","CurrentTitle":"XYZ","Edu_Title":"ABC","Exp_mnth":.(there are more values similar to these)

Im trying to raw code String values to Numeric values. So, I tried using zipwithuniqueID for unique value for each of the string values.For some reason I'm not able to save the modified dataset to the disk. Can I do this in any way using spark SQL? or what would be the better approach for this?

Hanni answered 14/7, 2016 at 14:36 Comment(3)

Sorry..I figured out with this thread #33103227 – Hanni 14/7, 2016 at 19:11

Can you please delete your question (since it's a duplicate)? Thanks. – Basalt 29/1, 2017 at 23:23

Possible duplicate of Primary keys with Apache Spark – Peeper 27/2, 2017 at 18:35

Scala

import org.apache.spark.sql.functions.monotonically_increasing_id
val dataFrame1 = dataFrame0.withColumn("index",monotonically_increasing_id())

Java

 Import org.apache.spark.sql.functions;
Dataset<Row> dataFrame1 = dataFrame0.withColumn("index",functions.monotonically_increasing_id());

Bothy answered 8/8, 2017 at 12:59 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags