How to ensure replaceAll will replace a whole word and not a subString
Asked Answered
L

2

6

I have an input of dictionary. The dictionary is iterated over to replace the key from dictionary in the text. But replaceAll function replaces the subString as well.

How to ensure that it will match the whole word (as a whole and not as a subString)

String text= "Synthesis of 1-(2,6-dimethylbenzyl)-1H-indole-6-carboxylic acid [69-3] The titled compound (883 mg) sdvfshd[69-3]3456 as a white solid was prepared"

dictionary= {[69-3]=1-(2,6-dimethylbenzyl)-1H-indole-6-carboxylic acid }

for(Map.Entry<String, String> entry : dictionary.entrySet()){

        text=text.replaceAll("\\b"+Pattern.quote(entry.getKey())+"\\b", entry.getValue());

} 
Loreanloredana answered 9/9, 2014 at 6:54 Comment(2)
Have you tried checking the elements for equal before replacing?Tav
I didnt get the question..what do u mean by equal??? I am using entire text to replace and havent tokenised it..Loreanloredana
T
10

replaceAll takes as parameter a regular expression.

In regular expressions, you have word boundaries : \b (use \\b in a string literal). They're the best way to ensure you're matching a word and not a part of a word : "\\bword\\b"

But in your case, you can't use word boundaries as you're not looking for a word ([69-3] isn't a word).

I suggest this :

text=text.replaceAll("(?=\\W+|^)"+Pattern.quote("[69-3]")+"(?=\\W+|$)", ...

The idea is to match a string end or something that's not a word. I can't ensure this will be the right solution for you though : such a pattern must be tuned knowing the exact complete use case.

Note that if all your keys follow a similar pattern there might be a better solution than to iterate through a dictionary, you might for example use a pattern like "(?=\\W+|^)\\[\\d+\\-\\d+\\](?=\\W+|$)".

Thormora answered 9/9, 2014 at 6:55 Comment(2)
I am using Pattern.quote to keep the special characters intact. but even that doesnt work. also I have tried \\b .. it doesnt workLoreanloredana
@Loreanloredana Please write down in your question the code which doesn't work.Enrol
W
1

"\bword\b" is working for me.

Sample Code :

for (row <- df.rdd.collect){   
var config_key = row.mkString(",").split(",")(0)
var config_value = row.mkString(",").split(",")(1)
val rc_applied_hiveQuery="select * from emp_details_Spark2 where empid_details= 'empid' limit 10"
var str_row = rc_applied_hiveQuery.replaceAll("\\b"+config_key+"\\b", "xyz")
println(str_row)}

Output : select * from emp_details_Spark2 where empid_details= '5' limit 10

Wellintentioned answered 13/11, 2018 at 16:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.