Remove all special characters with RegExp
Asked Answered
P

11

312

I would like a RegExp that will remove all special characters from a string. I am trying something like this but it doesn’t work in IE7, though it works in Firefox.

var specialChars = "!@#$^&%*()+=-[]\/{}|:<>?,.";

for (var i = 0; i < specialChars.length; i++) {
  stringToReplace = stringToReplace.replace(new RegExp("\\" + specialChars[i], "gi"), "");
}

A detailed description of the RegExp would be helpful as well.

Poignant answered 7/12, 2010 at 8:47 Comment(10)
Something like this would be better off as a white-list, not a black-list. then you could just do [a-z]|[0-9]|\sDescendent
Any script error? Did you debug? Or else put a try...catch block in the javascript code.Jeri
@ Ape-inago can you please explain RegExp a bit more to me pleasePoignant
Please define "special character"! Is "風" special for you? (Thinking about this you'll see @Ape-iango's point.)Rhineland
look at my variable specialChars. Anything like that.Poignant
What about "!@#$^&%*()+=ー"? (No, these are not the same as above.) :-PRhineland
@Rhineland i do realise that there are like 300 ascii characters, these characters were for the example. I didn't know about RegExp and that i could do a white list.Poignant
@Timothy Better try 109,000+ characters supported by Unicode, which is what Javascript uses internally. Just a general, well-intentioned advise: Whenever you think "special characters", be a little more precise. :-)Rhineland
Well i am sorry for not knowing everything decezePoignant
I don't think anyone here meant any offence. I've got burned before by doing it as a blacklist since there always are those little "gotcha's" that end up getting through (like deceze's examples). Ultimately the correct approach is more about why you are trying to do this.Descendent
M
757
var desired = stringToReplace.replace(/[^\w\s]/gi, '')

As was mentioned in the comments it's easier to do this as a whitelist - replace the characters which aren't in your safelist.

The caret (^) character is the negation of the set [...], gi say global and case-insensitive (the latter is a bit redundant but I wanted to mention it) and the safelist in this example is digits, word characters, underscores (\w) and whitespace (\s).

Moonlit answered 7/12, 2010 at 8:55 Comment(9)
This solution do not work for non English symbols. "Їжак" for example.Disarticulate
You can also use uppercase \W instead of ^\w. \W : Matches any non-word character. Equivalent to [^A-Za-z0-9_]. developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/…Portsmouth
@Disarticulate I have added an answer which handles Unicodes.Essentiality
to accept accents words, like in portuguese language, do this: stringToReplace.replace(/[^A-zÀ-ú\s]/gi, '')Automatic
this replaces also chinese characters, how to exclude them from this replace?Wit
To add most European languages (Norwegian, Sweedish, German, Portoguise, Spanish) stringToReplace.replace(/[^\w\s\xc0-xff]/gi, ''). To include other languages unicode ranges can be used. See: #150533Ghislainegholston
best for me considering I don't want any accents / specials. I don't even want space, I removed \sClypeate
var sessionName = '\ / ? * [ ]' sessionName.replace(/[^\w\s]/gi, '-'); While I'm trying to use your script it should return 6 - But it return only 5. Actually Its skips *. Why?Puseyism
This looks like skipping _Heteromorphic
D
170

Note that if you still want to exclude a set, including things like slashes and special characters you can do the following:

var outString = sourceString.replace(/[`~!@#$%^&*()_|+\-=?;:'",.<>\{\}\[\]\\\/]/gi, '');

take special note that in order to also include the "minus" character, you need to escape it with a backslash like the latter group. if you don't it will also select 0-9 which is probably undesired.

Dorm answered 18/6, 2012 at 20:10 Comment(5)
excellent solution! the accepted answer only works in English, this works on any languages (as far as I checked). thanks :)Seaver
@knutole remove the ? from the character set portion towards the front. this lists the characters you want to remove, so excluding it from being stripped will inherently include it in the final result.Dorm
This works great, fits perfectly for any language, just need to add the char that you want replace and that's all. Thanks.Carnation
How would I implement this on a search input? How do I test the input against this RegEx?Designed
By the way, there is no need to escape { and }. Like: var outString = sourceString.replace(/[`~!@#$%^&*()_|+\-=?;:'",.<>{}\[\]\\\/]/gi, '');Mach
E
31

Plain Javascript regex does not handle Unicode letters.

Do not use [^\w\s], this will remove letters with accents (like àèéìòù), not to mention to Cyrillic or Chinese, letters coming from such languages will be completed removed.

You really don't want remove these letters together with all the special characters. You have two chances:

  • Add in your regex all the special characters you don't want remove,
    for example: [^èéòàùì\w\s].
  • Have a look at xregexp.com. XRegExp adds base support for Unicode matching via the \p{...} syntax.

var str = "Їжак::: résd,$%& adùf"
var search = XRegExp('([^?<first>\\pL ]+)');
var res = XRegExp.replace(str, search, '',"all");

console.log(res); // returns "Їжак::: resd,adf"
console.log(str.replace(/[^\w\s]/gi, '') ); // returns " rsd adf"
console.log(str.replace(/[^\wèéòàùì\s]/gi, '') ); // returns " résd adùf"
<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/3.1.1/xregexp-all.js"></script>
Essentiality answered 27/11, 2016 at 17:25 Comment(3)
Good to know for internationalization, i had no idea JS regex wasn't UTF-8 minded.Thrilling
You can't put all valid UTF-8 letters into var strDisarticulate
@Disarticulate yes, but in case you're not write world wide compatible application, you can pragmatically put only the list of valid UTF-8 letters for your current localizations. In my case, for Italian language there are only few letters.Essentiality
T
14

using \W or [a-z0-9] regex won't work for non english languages like chinese etc.,

It's better to use all special characters in regex and exclude them from given string

str.replace(/[~`!@#$%^&*()+={}\[\];:\'\"<>.,\/\\\?-_]/g, '');
Toulouse answered 18/5, 2021 at 11:53 Comment(0)
D
11

The first solution does not work for any UTF-8 alphabet. (It will cut text such as Їжак). I have managed to create a function which does not use RegExp and use good UTF-8 support in the JavaScript engine. The idea is simple if a symbol is equal in uppercase and lowercase it is a special character. The only exception is made for whitespace.

function removeSpecials(str) {
    var lower = str.toLowerCase();
    var upper = str.toUpperCase();

    var res = "";
    for(var i=0; i<lower.length; ++i) {
        if(lower[i] != upper[i] || lower[i].trim() === '')
            res += str[i];
    }
    return res;
}

Update: Please note, that this solution works only for languages where there are small and capital letters. In languages like Chinese, this won't work.

Update 2: I came to the original solution when I was working on a fuzzy search. If you also trying to remove special characters to implement search functionality, there is a better approach. Use any transliteration library which will produce you string only from Latin characters and then the simple Regexp will do all magic of removing special characters. (This will work for Chinese also and you also will receive side benefits by making Tromsø == Tromso).

Disarticulate answered 21/10, 2014 at 8:50 Comment(4)
Excellent, like this answer! I use it for creating a valid filename and have it extended your solution to remove spaces (Linux/Unix compatible) and allow numbers as well. So I extended the if statement (jQuery involved): if(str[i] !== ' ' && (lower[i] != upper[i] || lower[i].trim() === '' || $.isNumeric(str[i])))Calmas
in many languages there are no uppercase letters... therefore the function will consider valid input as special charactersChrisom
Chinese characters are one example that get stripped out by thisLilllie
When I created this solution, unfortunately, I was not thinking about languages like Chinese. The solution has to be proposed, as the previous answers won't work either.Disarticulate
N
2

I use RegexBuddy for debbuging my regexes it has almost all languages very usefull. Than copy/paste for the targeted language. Terrific tool and not very expensive.

So I copy/pasted your regex and your issue is that [,] are special characters in regex, so you need to escape them. So the regex should be : /!@#$^&%*()+=-[\x5B\x5D]\/{}|:<>?,./im

Novelia answered 7/12, 2010 at 8:54 Comment(0)
T
2

str.replace(/\s|[0-9_]|\W|[#$%^&*()]/g, "") I did sth like this. But there is some people who did it much easier like str.replace(/\W_/g,"");

Thorianite answered 22/6, 2017 at 21:16 Comment(1)
Most of the things in your approach are redundant, since \W contains some of the characters. But why would you filter out numbers? Those aren’t special characters.Emboss
N
1

Removing all characters except letters and numbers:

str.replace(/[^\p{L}\d]+/gu, '')

If you need to leave spaces:

str.replace(/[^\p{L}\d\s]+/gu, '')
Near answered 26/10, 2023 at 14:29 Comment(0)
H
0

@Seagull anwser (https://mcmap.net/q/24958/-remove-all-special-characters-with-regexp) looks good but you get undefined string in result when there are some special (turkish) characters. See example below.

let str="bənövşəyi 😟пурпурный İdÖĞ";

i slightly improve it and patch with undefined check.

function removeSpecials(str) {
    let lower = str.toLowerCase();
    let upper = str.toUpperCase();

    let res = "",i=0,n=lower.length,t;
    for(i; i<n; ++i) {
        if(lower[i] !== upper[i] || lower[i].trim() === ''){
            t=str[i];
            if(t!==undefined){
                res +=t;
            }
        }
    }
    return res;
}
Heaps answered 14/4, 2022 at 10:17 Comment(1)
wow, that's a pretty amazing idea. I'd implement it differently. However it doesn't support languages that has no uppercase letters, such as Hebrew Arabic Chinese etcAudreyaudri
P
0
text.replace(/[`~!@#$%^*()_|+\-=?;:'",.<>\{\}\[\]\\\/]/gi, '');
Pall answered 14/9, 2022 at 5:58 Comment(0)
C
-1

why dont you do something like:

re = /^[a-z0-9 ]$/i;
var isValid = re.test(yourInput);

to check if your input contain any special char

Czarevitch answered 7/12, 2010 at 8:57 Comment(2)
The OP says he's trying to remove special characters not see if they exist.Moonlit
This is one of good solution but this will only allow English alphabet letter numbers and the space but it will remove characters like èéòàùì and some cases this will not be the solutionLambent

© 2022 - 2024 — McMap. All rights reserved.