Matching case sensitive unicode strings with regular expressions in Python - McMap

About

Matching case sensitive unicode strings with regular expressions in Python

Asked 13/9, 2011 at 6:50 Answered 13/9, 2011 at 6:59

Solved python regex unicode case-insensitive character-properties

Y

1

6

Suppose I want to match a lowercase letter followed by an uppercase letter, I could do something like

re.compile(r"[a-z][A-Z]")

Now I want to do the same thing for unicode strings, i.e. match something like 'aÅ' or 'yÜ'.

Tried

re.compile(r"[a-z][A-Z]", re.UNICODE)

but that does not work.

Any clues?

Yesterday answered 13/9, 2011 at 6:50 Comment(0)

R

7

This is hard to do with Python regex because the current implementation doesn't support Unicode property shortcuts like \p{Lu} and \p{Ll}.

[A-Za-z] will of course only match ASCII letters, regardless of whether the Unicode option is set or not.

So until the re module is updated (or you install the regex package currently in development), you either need to do it programmatically (iterate through the string and do char.islower()/char.isupper() on the characters), or specify all the unicode code points manually which probably isn't worth the effort...

Rifle answered 13/9, 2011 at 6:59 Comment(1)

This was useful. I only have to deal with Danish letters. So adding 'æøå' and 'ÆØÅ' is probably OK. – Yesterday 13/9, 2011 at 7:7

Recommended topics

#Godot #Unity #Godot 4.X #Mongodb

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

© 2022 - 2024 — McMap. All rights reserved.