Any build-in function like PosEx that finds a sub-string starting from the back of the string?
Asked Answered
S

8

12

Is there any Delphi D2010 function like PosEx that finds a sub-string inside a string starting from the end of the string?

I'm removing all the calls to the FastStrings library and one of the functions I was using was FastPosBack:

function FastPosBack(const aSourceString, aFindString : AnsiString; const aSourceLen, aFindLen, StartPos : Integer) : Integer;

I found LastDelimiter but it's not quite the same thing since it only finds the last delimiter and I can't specify a start position.

Update: Following DR comment, I've created this function:

function FastPosBack(const aSourceString, aFindString : String; const aSourceLen, aFindLen, StartPos : Integer) : Integer;
var
  RevSourceString, RevFindString: string;
begin
  RevSourceString := AnsiReverseString(aSourceString);
  RevFindString := AnsiReverseString(aFindString);

  Result := Length(aSourceString) - PosEx(RevFindString, RevSourceString, StartPos) + 1;
end;

Is there any more effective way of doing this? On a 1000000 loop cycle, Pos takes 47ms while FastPosBack takes 234ms to complete.

Squilgee answered 10/10, 2009 at 14:29 Comment(5)
Just out of curiosity: how did your test look like exactly?Blocking
I call GetTickCount, followed by a 1000000 loop of the call to the function and then obtain the difference, GetTickCount - TickCount.Squilgee
I was more interested in what strings that you pass to the functions for testing...Blocking
For the SourceString "dfkfkL%&/s"#<.676505" and for the SearchString "#<". Pretty small strings to search for.Squilgee
I don't think any function that copies the entire string can be considered "fast"Kos
P
14

Try this/these:

function RPos(const aSubStr, aString : String; const aStartPos: Integer): Integer; overload;
var
  i: Integer;
  len: Integer;
  pStr: PChar;
  pSub: PChar;
begin
  pSub := Pointer(aSubStr);
  len := Length(aSubStr) * SizeOf(Char);

  for i := aStartPos downto 1 do
  begin
    pStr := @(aString[i]);
    if (pStr^ = pSub^) and CompareMem(pSub, pStr, len) then
    begin
      result := i;
      EXIT;
    end;
  end;

  result := 0;
end;


function RPos(const aSubStr, aString : String): Integer; overload;
begin
  result := RPos(aSubStr, aString, Length(aString) - Length(aSubStr) + 1);
end;

The overload provides a way to call RPos using the most efficient startpos for searching from the very end of the string without having to calculate that yourself. For efficiency no checking is performed on startpos when explicitly specified.

In my SmokeTest performance testing suite this comes out about 20% faster than your FastPosBack (which incidentally contains an "off by one" error as well as requiring some parameters which it doesn't actually use).

Parette answered 10/10, 2009 at 22:44 Comment(10)
Hi. Thanks for your function, it is indeed much faster than the one I can up with (344ms vs 109ms). The extra parameters on the function where meant so that I don't need to change any of the FastStrings functions calls and just replace them with my own functions.Squilgee
Deltics, is your function Unicode compatible? I noticed that Marco's functions parameters are AnsiStrings and I'll be using this code in D2010.Squilgee
It was developed and tested using Delphi 2009, so yes I believe it is Unicode "safe". However, it does assume that String and SubStr contain the same 'payload' types (i.e. both string and substr are UTF16 or ANSI etc).Parette
I just tested it with some unicode specific characters and indeed it works just fine. I'm going to accept your answer since it provides a comparable performance to the regular Pos which is more than enough for me. Thanks again for all the replies. You guys are the best!Squilgee
Deltics, do you have a similar function that is not case-sensitive? I'm going to open a new question for this specific question but perhaps you have something already.Squilgee
I'm afraid not - this wasn't a routine I had "lying around". The question appealed to my sense of curiosity (and I like finding efficient solutions to such problems), so I created the routine to see what sort of results I got - they were good enough that I thought it was worth sharing. :) Removing case sensitivity makes the problem significantly harder, especially if looking for a single solution that is portable across ANSI and Unicode strings, but you've gone and appealed to that part of me again, so I'll spend a few minutes on it tonight and see how far I get.Parette
I know this is very old, but wanted to mention that the CompareMem call should be using the byte length of aSubStr, not the char length.Clavate
You're right, and I find it curious that this was discussed in the comments and appears to have tested successfully... which is odd because it appears it should not work, at least not reliably. Very odd. I don't recall it very well, to be honest. It was, as you say, a LONG time ago. :) I can't check right now, but if I can find time to resurrect a Delphi environment I'll update with a verified correction.Parette
This method results in a wrong result if the substr is the last part of the stringLys
@Lys the substring being at the end of the string shouldn't be a problem. However, MarkF was quite right that the (previous) code did not compare a sufficient number of bytes (if used with Wide/Unicode strings). Could that be your problem? (I have updated the answer to accommodate that).Parette
W
9

You can use Pos in combination with ReverseString (from StrUtils)

Wobble answered 10/10, 2009 at 14:32 Comment(1)
Thanks for your reply, using your hints I've created a function that is surprisingly fast compared with the other alternatives so far.Squilgee
H
3

Delphi comes with a function that can search backward, SearchBuf in the StrUtils unit. It's specialized for searching for words, though, so it might not behave quite the way you want. Below I've wrapped it into a function matching your desired interface.

function FastPosBack(const aSourceString, aFindString: AnsiString;
                     const aSourceLen, aFindLen, StartPos: Integer): Integer;
var
  Source, Match: PAnsiChar;
begin
  Source := PAnsiChar(ASourceString);
  Match := SearchBuf(Source, ASourceLen, ASourceLen, 0,
                     AFindString, [soMatchCase]);
  if Assigned(Match) then
    Result := Match - Source + 1
  else
    Result := 0;
end;
Hardden answered 10/10, 2009 at 15:57 Comment(1)
Thanks for the reply Rob. Unfortunately this function is even slower than the one I can me up, taking 1310ms to complete. I've changed the AnsiString and PAnsiChar to String and PChar respectively because that's what I'm trying to accomplish, a decent replacement to these ultra-fast AnsiString functions.Squilgee
K
2

First, consider if a speed optimized solution is necessary. If its not likely that it will be called 100000 times in real use reversing the strings and using the existing substring search is fine.

If speed is an issue, there are plenty of good resources for writing you own. Look on wikipedia for "string search algorithms" for ideas. I'll post a link and an example algorithm when I'm at a computer. I'm typing this from my phone at the moment.

Update:

Here's the example I promised:

function RPOS(pattern: string; text:string): Integer;
var patternPosition,
    textPosition: Integer;
begin
  Result := -1;
  for textPosition := Length(text) downto 0 do
  begin
    for patternPosition := Length(pattern) downto 0 do
      if not (pattern[patternPosition] = (text[textPosition - (Length(pattern) - patternPosition)])) then
        break;
    if patternPosition = 0 then
      Result := textPosition -Length(pattern) + 1;
  end;
end;

Its basically an inverted naive(brute force) string search algorithm. It starts at the end of both the pattern and text and works its way to the beginning. I can guarantee it is less efficient than Delphi's Pos() function though I can't say whether its faster or slower than the Pos()-ReverseString() combination as I haven't tested it. There is an error in it which I haven't found the cause of. If the two strings are identical its returning -1 (not found).

Kooky answered 10/10, 2009 at 16:4 Comment(7)
You are probably right, these differences won't be noticeable unless doing a huge number of iterations.Squilgee
Small differences soon add up, and in this case the routine required is so small that it makes sense to optimise since the effort to do so is negligible and means that you then don't have to worry about using it in performance critical code if the need arises in the future. Premature optimisation is a mistake when the effort to optimise is disproportionate to the gains. In this case the effort is minimal and so is justified for any gain if it can be achieved without sacrificing ease of use, imho.Parette
I disagree. I think simplicity wins over speed even in trivial cases. Anytime you optimize you increase the complexity of the code. The only time I optimize is when the existing solution has proven to be inadequate in a test or production environment.Kooky
Kenneth thanks for the function. I tested it and comes up at 453ms against 344ms of my own. Deltics function is 300% faster than mine at 109ms.Squilgee
I expected as much. The naive string search is known for its simplicity not its speed. In fact, unless you add unnecessary operations to it its probably the least efficient search algorithm there is. I used it because it only took about 5 minutes to write.Kooky
Kenneth, generally I'd agree with you, but in this case the "optimization" does not result in over complication. If anything my solution, of all those offered, most directly implements the problem as stated (look for a substring starting at the end of some other string). Others involved juggling the inputs and re-expressing the problem differently before again juggling the outputs. That the most direct approach also turned out to be the most efficient was a bonus. I'd also say that users of poorly performing software generally aren't impressed by how much simpler the source code is. :)Parette
@Kenneth: On my own experience, not anytime you optimize the complexity of the code is increased. Sometimes, less complex code performs better than previous. The main factor is how bad the original programmer was.Halfandhalf
K
2

I use the RPOS variants from FreePascal's strutils function:

http://svn.freepascal.org/cgi-bin/viewvc.cgi/trunk/rtl/objpas/strutils.pp?view=markup

the string,string version is nearly the same as Deltics', but there are variants:

Function RPosEX(C:char;const S : AnsiString;offs:cardinal):Integer; overload;<br>
Function RPosex (Const Substr : AnsiString; Const Source : AnsiString;offs:cardinal) : Integer; overload;<br>
Function RPos(c:char;const S : AnsiString):Integer; overload;<br>
Function RPos (Const Substr : AnsiString; Const Source : AnsiString) : Integer; overload;

They are licensed under FPC's LGPL+linking exception license, but since I wrote them, I hereby release them under BSD license.

Kos answered 11/10, 2009 at 17:28 Comment(3)
Marco, I noticed that these functions all take AnsiStrings so I take it that they are not Unicode compatible in D2009/D2010. Do you know if there are any plans on making these Unicode compatible?Squilgee
FPC is working on unicode compiler modes. It will probably follow the same model as with Turbo Pascal->Delphi, namely that the type of the default string can be selected per unit. But base language support will take a while, and for the libraries to catch up will take longer. surrogates make the unicode case considerably more difficult, since most document surrogate scanning techniques work forward, not backwards.Kos
I wouldn't expect anything production ready in the next year, year and an half.Kos
S
1

Not in the standard RTL but in INDY (unit idGlobalProtocols according to the online help), which is part of recent Delphi installations:

function RPos(
    const ASub: String, 
    const AIn: String, 
    AStart: Integer = -1
): Integer;
Snider answered 10/10, 2009 at 15:47 Comment(1)
Thanks for the reply. This function is way way slower than the one I came up with, it took 5913ms (RPos) against 234ms to complete.Squilgee
M
1

Maybe adding Uppercasing or lowercasing aSubstr and aString parameters before doing the search can make Deltics purpose case insensitive. I think he left you to do this before calling RPos. but maybe an optional parameter can do the job.

this is how Deltic's purpose should look:

function RPos(const aSubStr, aString : String; const aStartPos: Integer;
              const aCaseInSensitive:boolean=true): Integer; overload;
var
  i, _startPos: Integer;
  pStr: PChar;
  pSub: PChar;
  _subStr, _string: string;
begin

 if aCaseInSensitive then
 begin
  _subStr := lowercase( aSubstr );
  _string := lowercase( aString );
 end
 else 
 begin
  _subStr := aSubstr:
  _string := aString;
 end;

 pSub := Pointer(_subStr);

 if aStartPos = -1 then
    _startPos :=  Length(_string) - Length(_subStr) + 1
 else
    _startPos := aStartPos;

 for i := _startPos downto 1 do
 begin
   pStr := @(_string[i]);
   if (pStr^ = pSub^) then
   begin
     if CompareMem(pSub, pStr, Length(_subStr)) then
     begin
       result := i;
       EXIT;
     end;
   end;
 end;

 result := 0;
end;


function RPos(const aSubStr, aString : String; 
              const aCaseInSensitive:boolean=true): Integer; overload;
begin
  result := RPos(aSubStr, aString, Length(aString) - Length(aSubStr) + 1,
                 aCaseInSensitive);
end;
Millburn answered 4/8, 2011 at 1:49 Comment(0)
L
0

Another simple way

function ReversePosStr(Const substr, str: string): Integer;
Var i: Integer;
begin
  Result := Pos(substr, str);
  if Result = 0 then Exit(0);
  i := Pos(substr, str, Result+1);
  While i <> 0 do begin
    Result := i;
    i := Pos(substr, str, Result+1);
  end;
end;
Lys answered 25/6 at 23:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.