Regex named capture groups in Delphi XE
Asked Answered
M

2

8

I have built a match pattern in RegexBuddy which behaves exactly as I expect. But I cannot transfer this to Delphi XE, at least when using the latest built in TRegEx or TPerlRegEx.

My real world code have 6 capture group but I can illustrate the problem in an easier example. This code gives "3" in first dialog and then raises an exception (-7 index out of bounds) when executing the second dialog.

var
  Regex: TRegEx;
  M: TMatch;
begin
  Regex := TRegEx.Create('(?P<time>\d{1,2}:\d{1,2})(?P<judge>.{1,3})');
  M := Regex.Match('00:00  X1 90  55KENNY BENNY');
  ShowMessage(IntToStr(M.Groups.Count));
  ShowMessage(M.Groups['time'].Value);
end;

But if I use only one capture group

Regex := TRegEx.Create('(?P<time>\d{1,2}:\d{1,2})');

The first dialog shows "2" and the second dialog will show the time "00:00" as expected.

However this would be a bit limiting if only one named capture group was allowed, but thats not the case... If I change the capture group name to for example "atime".

var
  Regex: TRegEx;
  M: TMatch;
begin
  Regex := TRegEx.Create('(?P<atime>\d{1,2}:\d{1,2})(?P<judge>.{1,3})');
  M := Regex.Match('00:00  X1 90  55KENNY BENNY');
  ShowMessage(IntToStr(M.Groups.Count));
  ShowMessage(M.Groups['atime'].Value);
end;

I'll get "3" and "00:00", just as expected. Is there reserved words I cannot use? I don't think so because in my real example I've tried completely random names. I just cannot figure out what causes this behaviour.

Mcnulty answered 16/3, 2011 at 8:45 Comment(1)
you shall report this in QC, as this is obviously a bug.Floyd
M
7

When pcre_get_stringnumber does not find the name, PCRE_ERROR_NOSUBSTRING is returned.

PCRE_ERROR_NOSUBSTRING is defined in RegularExpressionsAPI as PCRE_ERROR_NOSUBSTRING = -7.

Some testing shows that pcre_get_stringnumber returns PCRE_ERROR_NOSUBSTRING for every name that has the first letter in the range of k to z and that range is dependent of the first letter in judge. Changing judge to something else changes the range.

As i see it there is at lest two bugs involved here. One in pcre_get_stringnumber and one in TGroupCollection.GetItem that needs to raise a proper exception instead of SRegExIndexOutOfBounds

Marrin answered 16/3, 2011 at 10:9 Comment(1)
Thanks a lot for digging in to the problem to this level. I've found a workaround solution using the TPerlRegEx library from RegexBuddy's author. This library, as I understand, form the base for XE's implementation, so I find the differences very mysterious.Mcnulty
S
5

The bug seems to be in the RegularExpressionsAPI unit that wraps the PCRE library, or in the PCRE OBJ files that it links. If I run this code:

program Project1;

{$APPTYPE CONSOLE}

uses
  SysUtils, RegularExpressionsAPI;

var
  myregexp: Pointer;
  Error: PAnsiChar;
  ErrorOffset: Integer;
  Offsets: array[0..300] of Integer;
  OffsetCount, Group: Integer;

begin
  try
    myregexp := pcre_compile('(?P<time>\d{1,2}:\d{1,2})(?P<judge>.{1,3})', 0, @error, @erroroffset, nil);
    if (myregexp <> nil) then begin
      offsetcount := pcre_exec(myregexp, nil, '00:00  X1 90  55KENNY BENNY', Length('00:00  X1 90  55KENNY BENNY'), 0, 0, @offsets[0], High(Offsets));
      if (offsetcount > 0) then begin
        Group := pcre_get_stringnumber(myregexp, 'time');
        WriteLn(Group);
        Group := pcre_get_stringnumber(myregexp, 'judge');
        WriteLn(Group);
      end;
    end;
  except
    on E: Exception do
      Writeln(E.ClassName, ': ', E.Message);
  end;
  ReadLn;
end.

It prints -7 and 2 instead of 1 and 2.

If I remove RegularExpressionsAPI from the uses clause and add the pcre unit from my TPerlRegEx component, then it does correctly print 1 and 2.

The RegularExpressionsAPI in Delphi XE is based on my pcre unit, and the RegularExpressionsCore unit is based on my PerlRegEx unit. Embarcadero did make some changes to both units. They also compiled their own OBJ files from the PCRE library that are linked by RegularExpressionsAPI.

I have reported this bug as QC 92497

I have also created a separate report QC 92498 to request that TGroupCollection.GetItem raise a more sensible exception when requesting a named group that does not exist. (This code is in the RegularExpressions unit which is based on code written by Vincent Parrett, not myself.)

Syrian answered 23/3, 2011 at 3:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.