What are "steps" in RegexBuddy?
Asked Answered
R

2

8

RegexBuddy on the tab "Debug" shows how regular expressions are executed step by step. But what exactly that steps mean? What operations are behind every step?

Riyal answered 3/1, 2017 at 16:19 Comment(7)
This question is off-topic because it is about general computing hardware and software.Quin
That's a tool to discover how corresponding engine works in comparison to your thoughts.Urba
Are you aware that within RegexBuddy there is a private forum, and that if you ask this question there it is likely that Jan, the author, will reply? Also, this is an awesome regexbuddy tutorial Scroll down a bit for the direct link to the Debug section.Hockey
@WiktorStribiżew actually I ask about this tool here because (1) it is not "general" software but very special software for developers and (2) I suppose the answer can be interesting for others.Riyal
@Hockey ok ... I can say that 90% of all questions here can be answered by reading official documentation. So what? Let's close this site because all questions can be answered somewhere else, yes?Riyal
The private forum within RegexBuddy, being private, is only available to people who have already purchased RegexBuddy. If you want to ask questions about RegexBuddy on a public forum, then stackoverflow is a good fit. RegexBuddy is a technical tool and programmers are a key group of users.Irreverent
You can think of steps as individual checks similar to basic programming if-checks each time the position changes. The more checks, the less efficient. That doesn't always correlate to good or bad, as the context of your regex and its goals are unknown, but typically you want to use as few steps as needed -- note: I didn't say possible; that level of detail and optimization is rarely needed. Regex optimization is more of a problem when dealing with regexes that lead to massive performance hits from badly written regexes. Sometimes those are used intentionally for DDoS attacks.Chevrotain
W
11

The steps count is basically how many times the current position in the input was changed, which is a very good indicator of performance.

The "current position" may be at any character or between characters (including before and after the entire input).

Simplifying it, regex engines process the input by moving the current position along the input and evaluating whether the regex matches at that position. They also keep track of the position in the regex the match is up to.

I don't want to turn this answer into a regex tutorial, but... regex engines always consume as much of the input as possible while still matching. To give a simple example, given the input "12345" and the regex .*1.*, the regex engine will first apply .* consuming all input leaving the position at the end of the input, fail to match a 1, then back track by "uncomsuming" one character at a time until it finds a 1, then continue. You can see that this would take 9 steps just to process the initial .*.

By contrast, if the regex was [^1]*1.*, the regex will match the "1" in just one step.

Wellmannered answered 3/1, 2017 at 18:43 Comment(4)
Thank you! But could you provide some more details for this "how many times the current position in the input was changed"?Riyal
@KonstantinSmolyanin I updated the answer, but a visit to regular-expressions.info is probably in order.Wellmannered
@SeinopSys If you have regexbuddy, just try "Debug Everywhere" and it should show you when it has to backtrack etc so you can see how many steps it takes to match or fail to match, depending on your goal.Chevrotain
@Chevrotain I was pointing out a minor grammar issue but with the removal of the backtick by the author the last sentence's last half got even more confusing to me.Chauvinism
I
2

In RegexBuddy's debugger, a step is when the regex engine matches something, or fails to match something. Steps that match a character are indicated by all the characters matched by the regex so far which will usually be one character more than the previous step. Steps that match a position, like a word boundary, are indicated by the characters matched so far plus "ok". Steps that failed to match something are indicated by the characters matched so far plus "backtrack".

If you click on any of the matched characters in the debugger, RegexBuddy selects the token in the regular expression that matched those characters and highlights all the characters in the debugger matched by that token. If you click on an "ok" or "backtrack" indicator, RegexBuddy selects the token in the regex that matched or failed to match.

Moving the cursor with the keyboard has the same effect as clicking. Pressing the End key on the keyboard moves the cursor to the end of a step. Then pressing Arrow Up or Down moves the cursor to the previous or next step while keeping the cursor at the end of that step. By moving the cursor this way, you can easily follow how the regex engine steps through your regular expression and which characters is matches and backtracks along the way.

For more details, see these two pages in RegexBuddy's help file: https://www.regexbuddy.com/manual.html#debug https://www.regexbuddy.com/manual.html#benchmark

Irreverent answered 4/1, 2017 at 0:51 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.