Regular Expression - replace word except within a URL/URI
Asked Answered
L

3

2

Writing a globalization module for a web application and I need a regexp to replace all instances of a word with another word (the translation) - except - words found within a URL/URI.

EDIT: I forgot to mention that I'm using Ruby, so I can't use 'Lookbehind'

License answered 29/1, 2010 at 15:20 Comment(3)
Doing translation by word replacement is doomed to failure.Predikant
Ive tried using this: '/((?<=>|^)[^<]*)(\bfoo\b)([^<]*(?=<|$))/i' But it requires Lookbehind, which Ruby doesn't supportLicense
Ruby 1.9 supports lookbehind. Are you using 1.8?Ritz
R
4
  • Split on URI regular expression; include the URI's in the result.
  • For each piece:
    • if it is a URI, leave it alone
    • otherwise, do word replacement
  • Join the pieces

Code:

# From RFC 3986 Appendix B, with these modifications:
#   o Spaces disallowed
#   o All groups non-matching, except for added outermost group
#   o Not anchored
#   o Scheme required
#   o Authority required
URI_REGEX = %r"((?:(?:[^ :/?#]+):)(?://(?:[^ /?#]*))(?:[^ ?#]*)(?:\?(?:[^ #]*))?(?:#(?:[^ ]*))?)"

def replace_except_uris(text, old, new)
  text.split(URI_REGEX).collect do |s|
    if s =~ URI_REGEX
      s
    else
      s.gsub(old, new)
    end
  end.join
end

text = <<END
stack http://www.stackoverflow.com stack
stack http://www.somewhere.come/stack?stack=stack#stack stack
END

puts replace_except_uris(text, /stack/, 'LINKED-LIST')

# => LINKED-LIST http://www.stackoverflow.com LINKED-LIST
# => LINKED-LIST http://www.somewhere.come/stack?stack=stack#stack LINKED-LIST
Rerun answered 31/1, 2010 at 12:38 Comment(0)
F
0

You can probaby use something like

(?<!://[^ ]*)\bfoo\b

But this probably isn't perfect, it just looks that the word doesn't appear in a single non-whitespace string of characters that don't have :// somewhere before the word.

PS Home:\> "foo foobar http://foo_bar/baz?gak=foobar baz foo" -replace '(?<!://[^ ]*)\bfoo\b', 'FOO'
FOO foobar http://foo_bar/baz?gak=foobar baz FOO
Fulmar answered 29/1, 2010 at 15:26 Comment(0)
R
0

Have you tried splitting your text into words and iterating over the words? Then you can examine each word, determine if it's a URI, translate it if it isn't.

Ritz answered 29/1, 2010 at 17:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.