Optional Group Capture with Lua Pattern Matching
Asked Answered
L

2

12

I am trying to parse chemical formulas in Lua using simple pattern matching. However, I do not know how to specify a capture group as being optional. Here is the pattern I have come up with:

pattern = "(%u%l*)(%d*)"

The first group captures the atomic symbol (i.e. "H", "He", etc..) and the second group captures the number of that atom in the molecule. This value is usually an integer value, but if it is 1, it is often omitted, such as in:

formula = "C2H6O"

When I attempt to do a global match, if there is no match the result of count is '' instead of what I would anticipate of nil.

compound = {}
for atom,count in string.gmatch(formula, pattern) do
    compound[atom] = count or 1
end

Obviously I could just check if count = ''but I was curious if there was an optional capturing group in Lua.

Limekiln answered 25/9, 2014 at 17:45 Comment(0)
L
11

if there was an optional capturing group in Lua.

No; pattern items don't list captures as acceptable options, so you can't have, for example, (%d*)? like you'd do in Perl.

Li answered 25/9, 2014 at 18:3 Comment(2)
Thanks, great work on ZeroBrane BTW, it is what I am currently using :)Limekiln
Thanks Moop for the feedback!Li
P
4

There is no optional capturing group in Lua.

count is the empty string instead of nil because the empty string matches %d*.

Try this instead:

compound[atom] = tonumber(count) or 1

Note that tonumber will return nil if count is the empty string, which is what you want to check.

Parts answered 25/9, 2014 at 18:3 Comment(4)
I don't get what you mean when it matches the empty string. Could you explain more on that? Wouldn't anything * match everything?Limekiln
@Moop, %d* means zero or more digits.Parts
@ihf Sure, but doesn't that mean it matches every "empty string"?Limekiln
@Moop, %d* matches the empty string but does not have to end there; in fact, it matches the longest possible string of digits, or none, if there is none.Parts

© 2022 - 2024 — McMap. All rights reserved.