how to use Regexp_replace in spark
Asked Answered
G

2

18

I am pretty new to spark and would like to perform an operation on a column of a dataframe so as to replace all the , in the column with .

Assume there is a dataframe x and column x4

x4
1,3435
1,6566
-0,34435

I want the output to be as

x4
1.3435
1.6566
-0.34435

The code I am using is

import org.apache.spark.sql.Column
def replace = regexp_replace((x.x4,1,6566:String,1.6566:String)x.x4)

But I get the following error

import org.apache.spark.sql.Column
<console>:1: error: ')' expected but '.' found.
       def replace = regexp_replace((train_df.x37,0,160430299:String,0.160430299:String)train_df.x37)

Any help on the syntax, logic or any other suitable way would be much appreciated

Gilliangilliard answered 17/10, 2016 at 7:24 Comment(0)
R
34

Here's a reproducible example, assuming x4 is a string column.

import org.apache.spark.sql.functions.regexp_replace

val df = spark.createDataFrame(Seq(
  (1, "1,3435"),
  (2, "1,6566"),
  (3, "-0,34435"))).toDF("Id", "x4")

The syntax is regexp_replace(str, pattern, replacement), which translates to:

df.withColumn("x4New", regexp_replace(df("x4"), "\\,", ".")).show
+---+--------+--------+
| Id|      x4|   x4New|
+---+--------+--------+
|  1|  1,3435|  1.3435|
|  2|  1,6566|  1.6566|
|  3|-0,34435|-0.34435|
+---+--------+--------+
Rhoea answered 17/10, 2016 at 7:46 Comment(4)
can i use multiple characters in place of comma? for exmaple, i want to replace comma dot exclamation by any other char?Leflore
you want to replace multiple special characters by one character? yes it is possible.Rhoea
I tried but didn't work. could you please tell me how to do that.Leflore
you could try something like regexp_replace(df.col, "[\\?,\\.,\\$]", "."))Rhoea
C
-2

We could use the map method to do this transformation:

scala> df.map(each => { 
(each.getInt(0),each.getString(1).replaceAll(",", "."))
})
.toDF("Id","x4")
.show

Output:

+---+--------+
| Id|      x4|
+---+--------+
|  1|  1.3435|
|  2|  1.6566|
|  3|-0.34435|
+---+--------+
Centerboard answered 6/6, 2020 at 15:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.