Amount of repetitions of symbols in Lua pattern setup
Asked Answered
A

1

11

I'm looking for amount of repetitions of symbols in Lua pattern setup. I try to check amount of symbols in a string. As I read in manual, Even with character classes this is still very limiting, because we can only match strings with a fixed length.

To solve this, patterns support these four repetition operators:

  • '*' Match the previous character (or class) zero or more times, as many times as possible.
  • '+' Match the previous character (or class) one or more times, as many times as possible.
  • '-' Match the previous character (or class) zero or more times, as few times as possible.
  • '?' Make the previous character (or class) optional.

So, no information about Braces {} e.g.,

{1,10}; {1,}; {10};

doesn't work.

local np = '1'
local a =  np:match('^[a-zA-Z0-9_]{1}$' )

returns np = nil.

local np = '1{1}'
local a =  np:match('^[a-zA-Z0-9_]{1}$' )

returns np = '1{1}' :)

This url says that no such magic symbols:

Some characters, called magic characters, have special meanings when used in a pattern. The magic characters are

( ) . % + - * ? [ ^ $

Curly brackets do work only as simple text and no more. Am I right? What is the best way to avoid this 'bug'?

It is possible to read usual usage of braces, for instance, here.

Almonte answered 1/10, 2015 at 9:31 Comment(8)
Lua do not provide it. You can repetition byself e.g.(\d{2,} is %d%d+). Also you can use Lua rex pcre library.Snapp
@moteus, very monstrous and ugly usability. but thanks for the idea.Almonte
Lua pattern doesn't support full set of Perl regex features. Braces are not supported. Use explicit count: np:match('^'..('[%w_]'):rep(k)..'$')Wulf
@EgorSkriptunoff, doesn't it dramatically increase the calculations?Almonte
@trololo - Regex had never been a fast thing. It's always CPU-intensive. Search for another approaches to calculate faster: #np==k and not np:find'[^%w_]'Wulf
Another limitation for the supported quantifiers: Unlike some other systems, in Lua a modifier can only be applied to a character class; there is no way to group patterns under a modifier. Try with PCRE library: > require "rex_pcre" > return rex_pcre.new("^[a-zA-Z0-9_]{2}$"):exec("12").Moralez
@EgorSkriptunoff, I think you are right: to check string length in my simple case beforehand is the most usable solution without using external libraries.Almonte
@stribizhev, thanks, i will try.Almonte
N
8

We can't but admit that Lua regex quantifiers are very limited in functionality.

  1. They are just those 4 you mentioned (+, -, * and ?)
  2. No limiting quantifier support (the ones you require)
  3. Unlike some other systems, in Lua a modifier can only be applied to a character class; there is no way to group patterns under a modifier (see source). Unfortunately Lua patterns do not support this ('(foo)+' or '(foo|bar)'), only single characters can be repeated or chosen between, not sub-patterns or strings.

As a "work-around", in order to use limiting quantifiers and all other PCRE regex perks, you can use rex_pcre library.

Or, as @moteus suggests, a partial workaround to "emulate" limiting quantifiers having just the lower bound, just repeat the pattern to match it several times and apply the available Lua quantifier to the last one. E.g. to match 3 or more occurrences of a pattern:

local np = 'abc_123'
local a = np:match('^[a-zA-Z0-9_][a-zA-Z0-9_][a-zA-Z0-9_]+$' )

See IDEONE demo

Another library to consider instead of PCRE is Lpeg.

Neon answered 1/10, 2015 at 10:31 Comment(4)
Thanks for the summarize of the comments.Almonte
Just noting: [a-zA-Z0-9_] can be replaced with [_%w] which would shorten that pattern a fair bit.Hyperplasia
You could also use the string.rep function, e.g. if np:match(("x"):rep(50)) to match 50 x charactersGenitor
@Genitor Yes, just that way we need to use some code already, and in my answer, I was more focusing on what can be done with pure Lua patterns. And that was answered 7 years ago when I knew Lua much worse.Moralez

© 2022 - 2024 — McMap. All rights reserved.