Regex to exclude non-word Characters but leave spaces
Asked Answered
D

4

9

I am trying to write a Regex to stop a use entering invalid characters into a postcode field.

from this link I manged to exclude all "Non-word" characters like so.

Regex regex = new Regex(@"[\W_]+");
string cleanText = regex.Replace(messyText, "").ToUpper();

But this also excludes the "Space" characters.

I am sure this is possible but I find regex very confusing!

Can someone help out with an explanation of the regex pattern used?

Danielldaniella answered 13/6, 2017 at 12:3 Comment(5)
What are invalid postcode characters? It sounds like you're in the UK, but many states have some sort of equivalent (with different formats). Do they also need validation?Pacian
Due to the principle of least astonishment, it would also be advisable to validate rather than correct.Pacian
@Pacian I already validate the postcode using a UK postcode regex. But I would also like to stop the user imputting invalid characters as I think it gives a better UX. I assumed that postcodes can only be Alpha-Numeric with an optional space. I may be incorrect thoughDanielldaniella
I'd argue that preventing certain inputs makes for a worse UX. I'm not alone. ux.stackexchange.com/questions/1242/…Pacian
@Pacian Thanks for the link. I will give the UX question a read and consider itDanielldaniella
S
11

You may use character class subtraction:

[\W_-[\s]]+

It matches one or more non-word and underscore symbols with the exception of any whitespace characters.

To exclude just horizontal whitespace characters use [\p{Zs}\t] in the subtraction part:

[\W_-[\p{Zs}\t]]+

To exclude just vertical whitespace characters (line break chars) use [\n\v\f\r\u0085\u2028\u2029] in the subtraction part:

[\W_-[\n\v\f\r\u0085\u2028\u2029]]+

Non-character class substraction solution (that is more portable) is

[^\w\s]+

It matches one or more characters other than word and whitespace characters. Note that this still won't match _ that are considered word characters (this is important in string tokenization scenarios where (?:[^\w\s]|_)+ or [_\W-[\s]] is preferable).

Sister answered 13/6, 2017 at 12:8 Comment(0)
O
5

You can inverse your character class to make it a negated character class like this:

[^\sa-zA-Z0-9]+

This will match any character except a whitespace or alphanumerical character.

RegEx Demo (as this is not a .NET regex)

Orchestra answered 13/6, 2017 at 12:7 Comment(0)
C
3

Assuming valid postcodes comprise only alphanumeric character, you may replace with an empty string anything but alphanumerics and spaces:

Regex regex = new Regex(@"[^a-zA-Z0-9\s]");
string cleanText = regex.Replace(messyText, "").ToUpper();

Please note that \s includes tabs, newlines and some other non-printable character. You may not want to consider them valid. In this is the case, just list the whitespace character literally:

[^a-zA-Z0-9 ]
Cote answered 13/6, 2017 at 12:7 Comment(0)
U
3

This regex will capture everything except letters, digits, and spaces.

[^\w\s\d]|_

The ^ inside the [ ] will cause the regex to look for everything except letters, digits, and spaces.

Unroot answered 16/2, 2021 at 21:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.