Alternatives for regex in Python
Asked Answered
O

6

19

Regular expressions are highly unreadable and difficult to debug. Does there exist any replacement for text processing which could be handled by mere mortals?

Criteria include

  • It's a library or a tool (please point the answer to the library itself)

  • Human readable syntax (no cheatsheets needed)

  • Documentation with examples

  • Able to debug expressions

If possible can you mention language specific and language independent solutions. I am mainly developing on Python, but I'd hope to see a library which could be ported to other languages/platforms.

I once read that Haskell would have nice text processing capabilities, but again, this is a built-in language solution, not a generic solution.

Edit: Please do not give answers "regular expressions are not bad, do like this!" Stackoverflow.com is not a place for subjective opinions, but I think a regular expressions are bad and I want to see my alternative options for using them.

Obsolete answered 16/7, 2011 at 11:42 Comment(11)
See programmers.stackexchange.com/questions/75905/… for why this may be a bad idea if the "human readable syntax" bit is taken too far. As for documentation, there's plenty of documentation on the "usual" regular expressions and even if you think those such, you don't need to reinvent the technology to write "better" documentation on it. And as for debugging, there are tools for this, bring some examples of what you mean.Carcassonne
I seek a replacement, not enhancing my existing reg exp experienceObsolete
What the OP of that question proposes is a replacement for current regular expressions. And even if there was a difference, many if not all of those points still apply. Perhaps you have an example of what you're looking for?Carcassonne
I don't think there exists what you call a human readable syntax without cheatsheets. Every single language (programming or otherwise) requires time to learn and get used to the language constructs.Hamper
deinan: thanks for the clarification. I only read the question part of the link which was pointing to some ad hoc invention. I clarified a question that I am looking for a library.Obsolete
Rosh: If regrexps, invented four decades ago (?) is still the best solution I am little disappointed to the development of the development tools. Programming languages have progressed, regexps are still ugly?Obsolete
Best solution for what? Regular expressions are rarely the best solution to a problem, but when they are, they are. If nobody has invented anything better for this set of problems, perhaps there isn't really a room for improvement?Hamper
I'm a human and I can read regular expressions. Regular expressions aren't unreadable; only poorly-written regular expressions are. (Tip: r"break" + r"the" + r"expression" apart and comment it line by line, like you would any other block of complex code.)Hump
Another tip: using something nonstandard will make your code less understandable; every competent developer in the world already understands regular expressions, but nobody will understand whatever obscure, niche pattern matching scheme you end up with.Hump
re "Stackoverflow.com is not a place for subjective opinions" - fair point. I've removed my answer.Jonell
To some of the guys who commented here 'bragging' about how they enjoy regular expressions. Your earned proficiency in composing and reading regular expressions or joy of exercising it does not mean that millions of developers, not to mention new ones, should use their arcane syntax, which is not suitable for algorithmic construction and manipulation, and notorious for its quirky and closed-garden approach to what constitutes a word and what does not. It is really time to move on to better things that algorithm, engineering design and usability design can offer. And yes, I read regexes too :)Nevis
O
12

I know this post was old, but people might be benefit from this question/answers. VerbalExpressions is still using regex behind the scene, but in a friendly way.

Intro: http://thechangelog.com/stop-writing-regular-expressions-express-them-with-verbal-expressions/ Python fork: https://github.com/VerbalExpressions

Orelie answered 6/8, 2013 at 17:21 Comment(1)
VerbalExpressions looks great for readability and composability, within the flexibility limitations of regular expressions! I couldn't notice however whether it has anything on it's api for indicating a white-space character (?). Does it simply interpret " " as any white-space character?Nevis
B
7

you could use the re.VERBOSE flag:

charref = re.compile(r"""
 &[#]                # Start of a numeric entity reference
 (
     0[0-7]+         # Octal form
   | [0-9]+          # Decimal form
   | x[0-9a-fA-F]+   # Hexadecimal form
 )
 ;                   # Trailing semicolon
""", re.VERBOSE)
Binal answered 16/7, 2011 at 11:46 Comment(0)
L
7

pyparsing offers another method to create and execute (simple) grammars. I've been using it in a project for parsing different kind of log files and the use was rather simple and somewhat more intuitive than with regexps.

Linskey answered 16/7, 2011 at 12:20 Comment(1)
Link was broken, looks like this is the project: github.com/pyparsing/pyparsingParthenon
I
4

Take a look at Ned Batchelder's list of python parsing tools

Isar answered 18/7, 2011 at 11:28 Comment(1)
Excellent source - this kind of information I was precisely looking for!Obsolete
A
1

LPeg is a Lua library and not a Python one I am afraid, but it might have been ported by someone. Either way, it is open-source so you could port it if you wanted to yourself. It has a somewhat different approach to text-matching than regular expressions do, and as such I find it has a considerable learning curve. However, where efficiency is concerned it has the potential to out-perform regular expressions - but obviously, such a statement depends strongly on the testcase and ones ability in both languages.

Abecedarian answered 16/7, 2011 at 12:34 Comment(0)
C
1

If you're concerned about understanding and debugging others' regex, there are translational tools that make them more easily understandable. My favorite is RegExBuddy on Windows. On Mac, RegExRx in the AppStore is helpful.

Chloris answered 13/9, 2018 at 17:43 Comment(2)
While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - From ReviewTanah
Thank you for the improvements. Fixed. Someone can certainly search for those tools in an engine. Link not necessary.Chloris

© 2022 - 2024 — McMap. All rights reserved.