pyspark access column of dataframe with a dot '.'
Asked Answered
M

1

4

A pyspark dataframe containing dot (e.g. "id.orig_h") will not allow to groupby upon unless first renamed by withColumnRenamed. Is there a workaround? "`a.b`" doesn't seem to solve it.

Mouser answered 16/5, 2016 at 10:23 Comment(1)
Can you share the code you are using to group?Mourn
M
14

In my pyspark shell, the following snippets are working:

from pyspark.sql.functions import *
myCol = col("`id.orig_h`")    
result = df.groupBy(myCol).agg(...)

and

myCol = df["`id.orig_h`"]   
result = df.groupBy(myCol).agg(...)

I hope it helps.

Mourn answered 16/5, 2016 at 23:31 Comment(4)
Thanks @Daniel de Paula for your answer. Can you confirm that using groupby("`id.orig_h`") doesn't work?Mouser
@HananShteingart, for me the following code works: df.groupBy("`id.orig_h`").agg(...)Mourn
For me it doesn't. Can you please add more columns starting with id. ? I use pyspark 1.6Mouser
@HananShteingart how as your DataFrame created? How are you doing your groupBy operation? Can you show the result of df.printSchema()?Mourn

© 2022 - 2024 — McMap. All rights reserved.