Remove special characters from string in Red language
Asked Answered
P

3

6

I want to remove all characters in a string except:

  • - or _ or .
  • A thru Z
  • a thru z
  • 0 to 9
  • space

On linux command line, using sed I would do this:

$ echo "testing-#$% yes.no" | sed 's/[^-_.a-zA-Z0-9 ]//g'

Output:

testing- yes.no

How can I achieve the same effect in Red language with PARSE? I looked at:

However, I could not codify it. I tried:

>> parse "mystring%^&" [#a - #z #A - #Z #0 - #9]
== false
>> parse "mystring%^&" [#a-#z#A-#Z#0-#9]        
== false
Propeller answered 20/9, 2017 at 18:3 Comment(1)
Remember to use trim if you want to remove some chars. trim/with "testing-#$% yes.no" "-#$%." == "testing yesno"Bournemouth
I
4

First, characters must be in quotes, #a is issue!, char! is #"a". You've got the specification right, but you must pass it to charset function, to make a bitset! form it.

Then you can parse your string, keeping valid characters and skiping invalid:

>> chars: charset [#"a" - #"z" #"A" - #"Z" #"0" - #"9"]
== make bitset! #{000000000000FFC07FFFFFE07FFFFFE0}
>> rejoin parse "mystring%^&asdf" [collect some [keep chars | skip]]
== "mystringasdf"
Irradiation answered 20/9, 2017 at 18:24 Comment(0)
T
6

First note the difference between ISSUE! and CHAR!

#a #b #c  ; issues
#"a" #"b" #"c"   ; chars

You can then establish a character set (BITSET! type) either for the characters you want to keep or those you wish to discard. We'll do the former here:

good-chars: charset [#"a" - #"z" #"A" - #"Z" #"0" - #"9"]

Now that we have that, we can approach this in some different ways:

Parse

A fairly basic parse loop—skips any good-chars and removes anything else.

parse "mystring%^&" [any [some good-chars | remove skip]]

Remove-Each

Hopefully self-explanatory:

remove-each char "mystring%^&" [not find good-chars char]
Taishataisho answered 20/9, 2017 at 18:24 Comment(5)
On Red command line, your first answer (parse...) produces ==true as output. The second answer (remove-each...) produces no output. Why is this happening?Propeller
@Propeller Parse returns true if the rule matches. In Red, you can wrap the rule thus: take parse "mystring%^&" [collect [keep any [some good-chars | remove skip]]] but that copies the string. Apparently remove-each returns the product of the last iteration—in Rebol 2 it would return the modified series. Note that both my answers as-is modify the existing string, they do not create new strings.Taishataisho
@Propeller If your string is set to a word, you can use ALSO: also mystring parse mystring [any [...]] or: also mystring remove-each char mystring [...]Taishataisho
Good information. You may add these comments to your answer above since many readers just read answers and not comments.Propeller
@Propeller It's implied in that I answered your question as asked. Your chosen answer actually creates several new strings in the process.Taishataisho
I
4

First, characters must be in quotes, #a is issue!, char! is #"a". You've got the specification right, but you must pass it to charset function, to make a bitset! form it.

Then you can parse your string, keeping valid characters and skiping invalid:

>> chars: charset [#"a" - #"z" #"A" - #"Z" #"0" - #"9"]
== make bitset! #{000000000000FFC07FFFFFE07FFFFFE0}
>> rejoin parse "mystring%^&asdf" [collect some [keep chars | skip]]
== "mystringasdf"
Irradiation answered 20/9, 2017 at 18:24 Comment(0)
T
2

An alternative solution to PARSE would be to use REPLACE here with a COMPLEMENT CHARSET:

replace/all "mystring%^&" complement charset [{-_. } #"a" - #"z" #"0" - #"9"] {}

NB. Above works in Rebol (2 & 3). Unfortunately it currently hangs in Red (tested on 0.63 on MacOS).

Tierney answered 21/9, 2017 at 16:3 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.