PySpark - Add Row Number After Using Explode Function [duplicate]

I'm doing an nlp project and have reviews that contain multiple sentences. I am using the spark-nlp package that outputs one column containing a list of the sentences in each review. I am using explode to create a row for each sentence but I want to add numbering so I know which sentence was 1st, 2nd, etc. I don't know how to use row_number() because I don't really have anything to orderBy.

Here's what my data looks like:

REVIEW_ID REVIEW_COMMENTS     SENTENCES_LIST           
    1     Hi. Sent1. Sent2.   [Hi., Sent1., Sent2.]   
    2     Yeah. Ok.           [Yeah., Ok.]

Here's what I want it to look like:

REVIEW_ID REVIEW_COMMENTS     SENTENCES_LIST           SENTENCE  SENT_NUMBER
    1     Hi. Sent1. Sent2.   [Hi., Sent1., Sent2.]    Hi.       1
    1     Hi. Sent1. Sent2.   [Hi., Sent1., Sent2.]    Sent1.    2
    1     Hi. Sent1. Sent2.   [Hi., Sent1., Sent2.]    Sent2.    3
    2     Yeah. Ok.           [Yeah., Ok.]             Yeah.     1
    2     Yeah. Ok.           [Yeah., Ok.]             Ok.       2

I'm using the code below and not sure how to use row_number() because I don't have a column to use as the "orderBy" except for it's placement in the SENTENCES_LIST.

df2 = df.withColumn('SENTENCE', F.explode('SENTENCES_LIST'))
df3 = df2.withColumn('SENT_NUMBER',row_number().over(Window.partitionBy('REVIEW_ID').orderBy('????')))

from pyspark.sql import functions as F df.withColumn("list", F.explode(F.expr("""transform(SENTENCES_LIST,(x,i)-> struct(x as SENTENCE,(i+1) as SENT_NUMBER))""")))\ .select("*", "list.*").show() #+---------+-----------------+--------------------+-----------+--------+-----------+ #|REVIEW_ID| REVIEW_COMMENTS| SENTENCES_LIST| list|SENTENCE|SENT_NUMBER| #+---------+-----------------+--------------------+-----------+--------+-----------+ #| 1|Hi. Sent1. Sent2.|[Hi., Sent1., Sen...| [Hi., 1]| Hi.| 1| #| 1|Hi. Sent1. Sent2.|[Hi., Sent1., Sen...|[Sent1., 2]| Sent1.| 2| #| 1|Hi. Sent1. Sent2.|[Hi., Sent1., Sen...|[Sent2., 3]| Sent2.| 3| #| 2| Yeah. Ok.| [Yeah., Ok.]| [Yeah., 1]| Yeah.| 1| #| 2| Yeah. Ok.| [Yeah., Ok.]| [Ok., 2]| Ok.| 2| #+---------+-----------------+--------------------+-----------+--------+-----------+

Recommended topics

Hot tags