Python: POSIX character class in regex?
Asked Answered
D

1

14

How can I search for, say, a sequence of 10 isprint characters in a given string in Python?

With GNU grep, I would simply do grep [[:print:]]{10}

Dues answered 10/8, 2015 at 8:53 Comment(0)
O
12

Since POSIX is not supported by Python re module, you have to emulate it with the help of character class.

You can use the one from the regular-expressions.info and add a limiting quantifier {10}:

[\x20-\x7E]{10}

See demo

Alternatively, you can use Matthew Barnett regex module that claims to support POSIX character classes (POSIX character classes are supported.).

Oxford answered 10/8, 2015 at 9:1 Comment(5)
This character class worked for me in Python 3 [`~!@#$%^&*()_=+\[\]{}\\\|;:\"\'<>.,/?] when using inside the re.sub() methodMccubbin
@Iota, that [`~!@#$%^&*()_=+\[\]{}\\\|;:\"\'<>.,/?] only matches ASCII punctuation, it has nothing to do with the concept of "printable chars". So, if you were to use a POSIX character class, it would be [[:punct:]]. To match punctuation in Python, you can use [^\w\s], although there are better and more precise patterns.Bleeding
My mistake! I misread the [[:print]] class as [[:punct]]. Appreciate your correction.Mccubbin
The regex in the answer will not match Unicode non-ASCII characters like grep (GNU grep) does.Pandorapandour
@pabouk-Ukrainestaystrong Then see the bottom of the answer. Just install the PyPi regex module (pip install regex in the console/terminal) and then use import regex and pattern = regex.compile(r'[[:print:]]{10}').Bleeding

© 2022 - 2024 — McMap. All rights reserved.