regex - negative lookahead to exclude strings
Asked Answered
S

4

5

I am trying to find (and replace with something else) in a text all parts which

  1. start with '/'
  2. ends with '/'
  3. between the two /'s there can be anything, except the strings '.' and '..'.

(For your info, I am searching for and replacing directory and file names, hence the '.' and '..' should be excluded.)

This is the regular expression I came up with:

/(?!\.|\.\.)([^/]+)/

The second part

([^/]+)

matches every sequence of characters, '/' excluded. There are no character restrictions required, I am simply interpreting the input.

The first part

(?!\.|\.\.)

uses the negative lookahead assertion to exclude the strings '.' and '..'.

However, this doesn't seem to work in PHP with mb_ereg_replace().

Can somebody help me out? I fail to see what's wrong with my regex.

Thank you.

Startling answered 14/6, 2011 at 22:27 Comment(5)
what do you want to replace with? Also do you know file name can contain the . ? your solution seems to exclude itWheatear
It's like this: for (; ($r = mb_ereg_replace('/(?!\.|\.\.)([^/]+)/', 'BLA', $path)) !== $input; $input = $r) { } If the input for example is '.../..', the last '..' isn't replaced by BLA.Startling
Looks correct to me. Does php have any escape characters?Alica
@yes123 - What do you mean? I wan't to allow anything (even '...'), unless it's '.' or '..'.Startling
Ah ok! nevermind. Also with what you wnat to replace?Wheatear
S
4

POSIX regex probably don't have support for negative lookaheads. (I may be wrong though)

Anyway since PCRE regex are usually faster than POSIX I think you can use PCRE version of the same function since PCRE supports utf8 as well using u flag.

Consider this code as a substitute:

preg_replace('~/(?!\.|\.\.)([^/]+)/~u', "", $str);

EDIT: Even better is to use:

preg_replace('~/(?!\.)([^/]+)/~u', "", $str);
Swastika answered 14/6, 2011 at 22:44 Comment(8)
I never worked with PCRE regex'es before. What does the '~' mean? Why do I need that to be added?Startling
@SevyD: The ~ is just a choice of delimiter. Usually / is used, but since / features prominently in your expression, anubhava has picked different ones.Pirouette
@Tomalak Geret'kal: Thanks for explaining that. @Sevy D. PCRS regex are in general considered better and faster thatn POSIX one. You can read tons of information about them by searching on Google. Also take a look at: php.net/manual/en/reference.pcre.pattern.modifiers.php.Swastika
(?!\.|\.\.) is better written as (?!\.\.?), or just (?!\.), since the end is not anchored in any way, and that match is not used.Michalemichalski
@Qtax: I would prefer (?!\.\.?), its compact and more readable.Swastika
@Qtax: And what do you think about (?!\.{1,2})?Swastika
In this case all other versions are equivalent to (?!\.), and thus meaningless. If you want these longer lookaheads to make sense you have to anchor them, eg (?!\.\.?/). Of \.\.? and \.{1,2} I'd prefer the shorter one.Michalemichalski
@Qtax: Yes that makes sens, better to use (?!\.). I have edited my answer's EDIT section.Swastika
P
3

This is a little verbose, but it definitely does work:

#/((\.[^./][^/]*)|(\.\.[^/]+)|([^.][^/]*))/#
^  |------------| |---------| |---------|
|        |             |               |
|        |        text starting with   |
|        |        two dots, that isn't |
|        |             "." or ".."     |
|  text starting with                  |
|  a dot, that isn't                text not starting
|  "." or ".."                         with a dot
|
delimiter

Does not match:

  • hi
  • //
  • /./
  • /../

Does match:

  • /hi/
  • /.hi/
  • /..hi/
  • /.../

Have a play around with it on http://regexpal.com/.

I wasn't sure whether or not you wanted to allow //. If you do, stick * before the last /.

Pirouette answered 14/6, 2011 at 23:1 Comment(1)
even tho i am hating you I must give a +1 here for the ascii artWheatear
P
1

I'm not against regex, but I would have done this instead:

function simplify_path($path, $directory_separator = "/", $equivalent = true){
  $path = trim($path);
  // if it's absolute, it stays absolute:
  $prepend = (substr($path,0,1) == $directory_separator)?$directory_separator:"";
  $path_array = explode($directory_separator, $path);
  if($prepend) array_shift($path_array);
  $output = array();
  foreach($path_array as $val){
    if($val != '..' || ((empty($output) || $last == '..') && $equivalent)) {
      if($val != '' && $val != '.'){
        array_push($output, $val);
        $last = $val;
      }
    } elseif(!empty($output)) {
        array_pop($output);
    }
  }
  return $prepend.implode($directory_separator,$output);
}

Tests:

echo(simplify_path("../../../one/no/no/../../two/no/../three"));
// =>  ../../../one/two/three
echo(simplify_path("/../../one/no/no/../../two/no/../three"));
// =>  /../../one/two/three
echo(simplify_path("/one/no/no/../../two/no/../three"));
// =>  /one/two/three
echo(simplify_path(".././../../one/././no/./no/../../two/no/../three"));
// =>  ../../../one/two/three
echo(simplify_path(".././..///../one/.///./no/./no/../../two/no/../three/"));
// =>  ../../../one/two/three

I thought that it would be better to return an equivalent string, so I respected the ocurrences of .. at the begining of the string.

If you dont want them, you can call it with the third parameter $equivalent = false:

echo(simplify_path("../../../one/no/no/../../two/no/../three", "/", false));
// =>  one/two/three
echo(simplify_path("/../../one/no/no/../../two/no/../three", "/", false));
// =>  /one/two/three
echo(simplify_path("/one/no/no/../../two/no/../three", "/", false));
// =>  /one/two/three
echo(simplify_path(".././../../one/././no/./no/../../two/no/../three", "/", false));
// =>  one/two/three
echo(simplify_path(".././..///../one/.///./no/./no/../../two/no/../three/", "/", false));
// =>  one/two/three
Peep answered 14/6, 2011 at 23:14 Comment(0)
A
0

/(?!(\.|\.\.)/)([^/]+)/ This will allow ... as a valid name.

Alica answered 14/6, 2011 at 22:38 Comment(2)
This goes in the right direction, but still not working. What I am doing is this: replacing '/w/..' by '/' to shorten my paths. Given your expression, I made up this: for (; ($r = self::mb_ereg_replace('(^|/)(?!(\.|\.\.)/)([^/]+)/\.\.(/|$)', '\\1', $path)) !== $path; $path = $r) { } However no replacement is done for input '/name/..' for example. I can't see where it is going wrong now.Startling
Can you do a simple lookahead to make sure your regex engine supports them?Alica

© 2022 - 2024 — McMap. All rights reserved.