No \p{L} for JavaScript Regex ? Use Unicode in JS regex [duplicate]
Asked Answered
C

2

16

I nedd to add a-zA-ZáàâäãåçéèêëíìîïñóòôöõúùûüýÿæœÁÀÂÄÃÅÇÉÈÊËÍÌÎÏÑÓÒÔÖÕÚÙÛÜÝŸÆŒ x time but I find this very ugly. So I try \p{L} but it does not working in JavaScript.

Any Idea ?

my actual regex : [a-zA-ZáàâäãåçéèêëíìîïñóòôöõúùûüýÿæœÁÀÂÄÃÅÇÉÈÊËÍÌÎÏÑÓÒÔÖÕÚÙÛÜÝŸÆŒ][a-zA-ZáàâäãåçéèêëíìîïñóòôöõúùûüýÿæœÁÀÂÄÃÅÇÉÈÊËÍÌÎÏÑÓÒÔÖÕÚÙÛÜÝŸÆŒ' ,"-]*[a-zA-ZáàâäãåçéèêëíìîïñóòôöõúùûüýÿæœÁÀÂÄÃÅÇÉÈÊËÍÌÎÏÑÓÒÔÖÕÚÙÛÜÝŸÆŒ'",]+

I want to have a thing like that : [\p{L}][\p{L}' ,"-]*[\p{L}'",]+ (or smaller than the actual expression)

Carlow answered 4/5, 2018 at 15:35 Comment(7)
I find hard to understand the question.. do you want to match multiple occurrences of that character set? can you provide an example of text that should be matched by your regex?Choreograph
You can use a regex library that handles non-latin letters better like XRegExpHettie
actualy I'd like to make a thing like that ` [\p{L}][\p{L}' ,"-]*[\p{L}'",]+ ` instead of : ` [a-zA-ZáàâäãåçéèêëíìîïñóòôöõúùûüýÿæœÁÀÂÄÃÅÇÉÈÊËÍÌÎÏÑÓÒÔÖÕÚÙÛÜÝŸÆŒ][a-zA-ZáàâäãåçéèêëíìîïñóòôöõúùûüýÿæœÁÀÂÄÃÅÇÉÈÊËÍÌÎÏÑÓÒÔÖÕÚÙÛÜÝŸÆŒ' ,"-]*[a-zA-ZáàâäãåçéèêëíìîïñóòôöõúùûüýÿæœÁÀÂÄÃÅÇÉÈÊËÍÌÎÏÑÓÒÔÖÕÚÙÛÜÝŸÆŒ'",]+ `Carlow
That's ugly but rather the best solution. Don't think about performance. It's all the same.Maraschino
If I forget one caractere that I need (like î or ò) and I don't test all caracteres how can I be sure that I don't forget one (moreover I need to use that a lot of time so my expressions are unreadable and if I need to come back of it, I will maybe don't understand why it is so long).Carlow
finaly I used this : ^(?!.*\/\/)[A-zÀ-ž][A-zÀ-ž\/]*[A-zÀ-ž-'" ]*[A-zÀ-ž'"]$Carlow
Add the "u" flag to your regex for \p{L} to work. The official JS Guide says it clearly: "For Unicode property escapes to work, a regular expression must use the u flag".Origami
M
9

What you need to add is a subset of what you asked for. First you should define what set of characters you need. \pL means every letter from every language.

It's kind of ugly but doesn't affect performance and rather the best solution to get around such kind of problems in JS. ECMA2018 has a support for \pL but way far to be implemented by all major browsers.

If it's a personal taste, you could reduce this ugliness a bit:

var characterSet = 'a-zA-ZáàâäãåçéèêëíìîïñóòôöõúùûüýÿæœÁÀÂÄÃÅÇÉÈÊËÍÌÎÏÑÓÒÔÖÕÚÙÛÜÝŸÆŒ';
var re = new RegExp('[' + characterSet + ']' + '[' + characterSet + '\' ,"-]*' + '[' + characterSet + '\'",]+');

This update credits go to @Francesco:

var pCL = 'a-zA-ZáàâäãåçéèêëíìîïñóòôöõúùûüýÿæœÁÀÂÄÃÅÇÉÈÊËÍÌÎÏÑÓÒÔÖÕÚÙÛÜÝŸÆŒ';
var re = new RegExp(`[${pCL}][${pCL}' ,"-]*[${pCL}'",]+`);
console.log(re.source);
Maraschino answered 4/5, 2018 at 16:7 Comment(4)
I was thinking about something like that. probably with a template string looks better: [a-z${pL}][a-z${pL}\\ ,"-]* ecc.. or maybe notChoreograph
Thank you, updated accordingly.Maraschino
@Maraschino they're not allowing support for \pL but they are for \p{L}Gutshall
@Gutshall I could use \p{Letter} instead. I'm mainly talking about a known unicode property not which syntax of it will be supported in fact.Maraschino
D
3

You have XRegExp addon to support unicode letter matcher:

var unicodeWord = XRegExp("^\\pL+$"); // L: Letter

Here you can see more example matching unicode in javascript

http://xregexp.com/plugins/

Drab answered 4/5, 2018 at 16:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.