I needed something to be programmable and handle punctuation, brackets, etc.
http://jsfiddle.net/AQvyd/
var wordToReplace = '買い手',
replacementWord = '[[BUYER]]',
text = 'Mange 買い手 information. The selected Store and Classification will be the default on the สั่งซื้อ.'
function replaceWord(text, wordToReplace, replacementWord) {
var re = new RegExp('(^|\\s|\\(|\'|"|,|;)' + wordToReplace + '($|\\s|\\)|\\.|\'|"|!|,|;|\\?)', 'gi');
return text.replace(re, replacementWord);
}
I've written a javascript resource editor so this is why I've found this page and also answered it out of necessity since I couldn't find a word boundary parametarized regexp that worked well for Unicode.
UTF-8
for Unicode. According to the standard an implementation may use eitherUCS-2
orUTF-16
I believe. This means either you are operating on text that has been converted to one of these formats, or you could be operating on text where each "octet" (byte) of each Unicode codepoint has been converted to one of these formats, depending on how your code gets the text. – Microampere