How could I add a column to a DataFrame in Pyspark with incremental values?

About

Asked 14/9, 2017 at 8:20 Answered 14/9, 2017 at 8:27

Solved python dataframe attributes pyspark increment

I have a DataFrame called 'df' like the following:

+-------+-------+-------+
|  Atr1 |  Atr2 |  Atr3 |
+-------+-------+-------+
|   A   |   A   |   A   |
+-------+-------+-------+
|   B   |   A   |   A   |
+-------+-------+-------+
|   C   |   A   |   A   |
+-------+-------+-------+

I want to add a new column to it with incremental values and get the following updated DataFrame:

+-------+-------+-------+-------+
|  Atr1 |  Atr2 |  Atr3 |  Atr4 |
+-------+-------+-------+-------+
|   A   |   A   |   A   |   1   |
+-------+-------+-------+-------+
|   B   |   A   |   A   |   2   |
+-------+-------+-------+-------+
|   C   |   A   |   A   |   3   |
+-------+-------+-------+-------+

How could I get it?

Before answered 14/9, 2017 at 8:20 Comment(0)

If you only need incremental values (like an ID) and if there is no constraint that the numbers need to be consecutive, you could use monotonically_increasing_id(). The only guarantee when using this function is that the values will be increasing for each row, however, the values themself can differ each execution.

from pyspark.sql.functions import monotonically_increasing_id

df.withColumn("Atr4", monotonically_increasing_id())

Microclimatology answered 14/9, 2017 at 8:27 Comment(3)

Thanks! Nice solution! – Before 14/9, 2017 at 9:18

Note that this answer does in fact address the question, however it should be noted given the example specifies a dataframe "like the following", one might assume the example would extend to an infinite amount of consecutive numbers, however monotonically_increasing_id() does not produce consecutive numbers, only monotonically increasing numbers and thus the assumption would break down with a larger dataframe. – Stripe 18/7, 2019 at 20:58

@Jomonsugi: That is correct. I highlighted that part of the answer to make this constraint more obvious. – Microclimatology 19/7, 2019 at 1:55

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags