Apply several string transformations in scala
Asked Answered
S

5

8

I want to perform several ordered and successive replaceAll(...,...) on a string in a functional way in scala.

What's the most elegant solution ? Scalaz welcome ! ;)

Sandpit answered 5/7, 2012 at 17:30 Comment(0)
O
13

First, let's get a function out of the replaceAll method:

scala> val replace = (from: String, to: String) => (_:String).replaceAll(from, to)
replace: (String, String) => String => java.lang.String = <function2>

Now you can use Functor instance for function, defined in scalaz. That way you can compose functions, using map (or to make it look better, using unicode aliases).

It will look like this:

scala> replace("from", "to") ∘ replace("to", "from") ∘ replace("some", "none")
res0: String => java.lang.String = <function1>

If you prefer haskell-way compose (right to left), use contramap:

scala> replace("some", "none") ∙ replace("to", "from") ∙ replace ("from", "to")
res2: String => java.lang.String = <function1>

You can also have some fun with Category instance:

scala> replace("from", "to") ⋙ replace("to", "from") ⋙ replace("some", "none")
res5: String => java.lang.String = <function1>

scala> replace("some", "none") ⋘ replace("to", "from") ⋘ replace ("from", "to")
res7: String => java.lang.String = <function1>

And applying it:

scala> "somestringfromto" |> res0
res3: java.lang.String = nonestringfromfrom

scala> res2("somestringfromto")
res4: java.lang.String = nonestringfromfrom

scala> "somestringfromto" |> res5
res6: java.lang.String = nonestringfromfrom

scala> res7("somestringfromto")
res8: java.lang.String = nonestringfromfrom
Obreption answered 6/7, 2012 at 7:40 Comment(6)
Bull's eye ! Great and complete answer using Scalaz.Photosynthesis
@M'λ', if you don't want to pull in Scalaz dependency, you can use andThen instead of and compose instead of , which are both part of standard library.Ballroom
It blows my mind that people find ∘ and ∙ readable. Much prefer andThen and compose, I can understand the code without having to look up those symbols (not to mention how difficult it is to look symbols up through google)Takeover
@MikeMcFarland those are "official" function composition symbols en.wikipedia.org/wiki/Function_composition . They are readable because they are recognizable: I don't need to look them up, I already know what they are doing.Obreption
@folone It really does help to know where they come from, thanks. I prefer to get used to symbolic representations, and these concepts seem sensible to have symbols for. However, its more then a little awkward to use unicode.Takeover
@MikeMcFarland not if you have a proper dev environment for that. There are languages with syntax based on math notation, with the all the unicode madness :) en.wikipedia.org/wiki/Agda_(programming_language)Obreption
D
16

If its just a few invocations then just chain them. Otherwise I guess I'd try this:

Seq("a" -> "b", "b" -> "a").foldLeft("abab"){case (z, (s,r)) => z.replaceAll(s, r)}

Or if you like shorter code with confusing wildcards and extra closures:

Seq("a" -> "b", "b" -> "a").foldLeft("abab"){_.replaceAll _ tupled(_)}
Day answered 5/7, 2012 at 18:23 Comment(2)
With Scalaz, one could also do case (str, tup) => tup fold str.replaceAll.Moskowitz
Ok, that's a 'standard' and concise scala solution that suits me well without scalaz at hand.But I'm sorry, i will accept the scalaz solution by folone as the best because I was hoping for something like what he proposed !Photosynthesis
O
13

First, let's get a function out of the replaceAll method:

scala> val replace = (from: String, to: String) => (_:String).replaceAll(from, to)
replace: (String, String) => String => java.lang.String = <function2>

Now you can use Functor instance for function, defined in scalaz. That way you can compose functions, using map (or to make it look better, using unicode aliases).

It will look like this:

scala> replace("from", "to") ∘ replace("to", "from") ∘ replace("some", "none")
res0: String => java.lang.String = <function1>

If you prefer haskell-way compose (right to left), use contramap:

scala> replace("some", "none") ∙ replace("to", "from") ∙ replace ("from", "to")
res2: String => java.lang.String = <function1>

You can also have some fun with Category instance:

scala> replace("from", "to") ⋙ replace("to", "from") ⋙ replace("some", "none")
res5: String => java.lang.String = <function1>

scala> replace("some", "none") ⋘ replace("to", "from") ⋘ replace ("from", "to")
res7: String => java.lang.String = <function1>

And applying it:

scala> "somestringfromto" |> res0
res3: java.lang.String = nonestringfromfrom

scala> res2("somestringfromto")
res4: java.lang.String = nonestringfromfrom

scala> "somestringfromto" |> res5
res6: java.lang.String = nonestringfromfrom

scala> res7("somestringfromto")
res8: java.lang.String = nonestringfromfrom
Obreption answered 6/7, 2012 at 7:40 Comment(6)
Bull's eye ! Great and complete answer using Scalaz.Photosynthesis
@M'λ', if you don't want to pull in Scalaz dependency, you can use andThen instead of and compose instead of , which are both part of standard library.Ballroom
It blows my mind that people find ∘ and ∙ readable. Much prefer andThen and compose, I can understand the code without having to look up those symbols (not to mention how difficult it is to look symbols up through google)Takeover
@MikeMcFarland those are "official" function composition symbols en.wikipedia.org/wiki/Function_composition . They are readable because they are recognizable: I don't need to look them up, I already know what they are doing.Obreption
@folone It really does help to know where they come from, thanks. I prefer to get used to symbolic representations, and these concepts seem sensible to have symbols for. However, its more then a little awkward to use unicode.Takeover
@MikeMcFarland not if you have a proper dev environment for that. There are languages with syntax based on math notation, with the all the unicode madness :) en.wikipedia.org/wiki/Agda_(programming_language)Obreption
N
4

Another Scalaz-based solution to this problem would be to use the Endo monoid. This monoid captures the identity function (as the monoid's identity element) and function composition (as the monoid's append operation). This solution would be particularly useful if you have an arbitrarily-sized (even possibly empty) list of functions to apply.

val replace = (from: String, to: String) => (_:String).replaceAll(from, to)

val f: Endo[String] = List(
  replace("some", "none"),
  replace("to", "from"),
  replace("from", "to")    
).foldMap(_.endo)

e.g. (using one of folone's examples)

scala> f.run("somestringfromto")
res0: String = nonestringfromfrom
Nevlin answered 11/3, 2013 at 12:14 Comment(2)
Seeing this today I was thinking of debasishg.blogspot.com/2013/03/… too :)Hexarchy
sourcedelica: That is similar to another answer I wrote a few weeks ago: #14900805Nevlin
S
3

Define a replace function with anonymous parameters and then you can chain successive replace functions together.

scala> val s = "hello world"
res0: java.lang.String = hello world

scala> def replace = s.replaceAll(_, _)
replace: (java.lang.String, java.lang.String) => java.lang.String

scala> replace("h", "H")  replace("w", "W")
res1: java.lang.String = Hello World
Sheikh answered 5/7, 2012 at 17:47 Comment(1)
A confused and very misleading REPL session. The 1st replace("h","H") does, indeed, invoke the just defined replaceAll() wrapper, but the 2nd replace("w","W") invokes the java.lang.String.replace() method, which isn't the same as replaceAll().Detritus
N
-1
#to replace or remove multiple substrings in scala in dataframe's string column

import play.api.libs.json._
#to find
def isContainingContent(str:String,regexStr:String):Boolean={
  val regex=new scala.util.matching.Regex(regexStr)
  val containingRemovables= regex.findFirstIn(str)
  containingRemovables match{
    case Some(s) => true
    case None => false
  }
}
val colContentPresent= udf((str: String,regex:String) => {
  isContainingContent(str,regex)
})
#to remove
val cleanPayloadOfRemovableContent= udf((str: String,regexStr:String) => {
  val regex=new scala.util.matching.Regex(regexStr)
  val cleanedStr= regex.replaceAllIn(str,"")
  cleanedStr
})
#to define
val removableContentRegex=
"<log:Logs>[\\s\\S]*?</log:Logs>|\\\\n<![\\s\\S]*?-->|<\\?xml[\\s\\S]*?\\?>"

#to call
val dfPayloadLogPresent = dfXMLCheck.withColumn("logsPresentInit", colContentPresent($"payload",lit(removableContentRegex)))
val dfCleanedXML = dfPayloadLogPresent.withColumn("payload", cleanPayloadOfRemovableContent($"payload",lit(removableContentRegex)))
Nebulize answered 6/3, 2019 at 17:47 Comment(2)
Please consider adding some context around your code so we know what it does and why.Lilialiliaceous
As far as I can tell,this doesn't answer the OP's question. The OP wants several replacements on one string, yours does one replacement on a dataframe of strings. Not at all the same thingMuhammad

© 2022 - 2024 — McMap. All rights reserved.