I have a regex ([-@.\/,':\w]*[\w])*
and it matches all words within a text (including punctuated words like I.B.M), but I want to make it exclude underscores and I can't seem to figure out how to do it... I tried adding ^[_]
(e.g. (^[_][-@.\/,':\w]*[\w])*
) but it just breaks up all the words into letters. I want to preserve the word matching, but I don't want to have words with underscores in them, nor words that are entirely made up of underscores.
Whats the proper way to do this?
P.S.
- My app is written in C# (if that makes any difference).
- I can't use A-Za-z0-9 because I have to match words regardless of the language (could be Chinese, Russian, Japanese, German, English).
Update
Here is an example:
"I.B.M should be parsed as one word w_o_r_d! Russian should work too: мплекс исторических событий."
The matches should be:
I.B.M.
should
be
parsed
as
one
word
Russian
should
work
too
мплекс
исторических
событий
Note that w_o_r_d
should not get matched.
^[_]
should be[^_]
. The former will match a_
at the beginning of the string (or line if multiline). – Spline