Regex pattern that does not match certain extensions?
Asked Answered
L

5

13

I have this pattern written

^.*\.(?!jpg$|png$).+$

However there is a problem - this pattern matches file.name.jpg (2 dots)

It works correctly (does not match) on filename.jpg. I am trying to figure out how to make it not match ANY .jpg files even if the file's name has 2 or more dots in it. I tried using a look behind but python complains about not using a fixed width (which I'm not exactly sure what that means, but the file name will be variable length.)

Lexington answered 7/4, 2012 at 5:24 Comment(0)
O
19

This should work: ^.*\.(?!jpg$|png$)[^.]+$

Orthocephalic answered 7/4, 2012 at 5:53 Comment(2)
Can someone explain this regex?Inez
@Inez You can take a look at negative lookaheads, they have almost equivalent example there.Orthocephalic
B
3

Use os.path's nifty functions to properly split up the filepath into components for easier parsing:

filepath, filename = os.path.split(str)
basename, extension = os.path.splitext(filename)

if exension[1:] in ['jpg', 'png']:
  # The extension matches

Try this regex (don't do it. It does the exact opposite of what you want to do):

\.(jpg|png)([^\.]|$)
Bother answered 7/4, 2012 at 5:29 Comment(2)
I don't have access to Python, it is Python's regex engine, but I only have access to a JSON config file to put the regex there for a Python program. I removed Python tag to prevent confusion.Lexington
Your regex looks like it's trying to exclude strings that contain .jpg. or .png., but I believe the idea is to exclude anything that ends with .jpg or .png. The OP's regex is failing because both the lookahead and the final .+$ can match after the first . in file.name.jpg. Changing that to [^.]+$, as @Orthocephalic did, forces the lookahead to apply only to the final dot-whatever sequence.Flowers
F
2

If you only care that the string doesn't end with .jpg or .png, you can use this:

^.+$(?<!\.jpg)(?<!\.png)

The ^.+ isn't strictly necessary, but depending on how the JSON parser is coded you might need to force the regex to consume the whole string. If you're using the regex for other validations as well, you might want something more elaborate, like:

^\w+(?:\.\w+)+$(?<!\.jpg)(?<!\.png)

You probably tried to use (?<!\.jpg|\.png), which wouldn't work because Python's regex flavor is one of the most restrictive when it comes to lookbehinds. PHP and Ruby 1.9+ would accept it because each of the alternatives has a fixed length. They don't even have to be the same length; (?<!\.jpg|\.jpeg|\.png) would work, too. Just don't try to factor out the dot, as in (?<!\.(?:jpg|jpeg|png)); the alternation has to be at the top level of the lookbehind.

Java would accept the factored-out version because it does a little more work at compile time to determine the maximum number of characters the lookbehind might need to match. The lookbehind expression needs to be fairly simple though, and it can't use the + or * quantifiers. Finally, the .NET and JGSoft flavors place no restrictions at all on lookbehinds. But Python makes a very simple-minded attempt to figure out the exact number of characters the lookbehind needs to match, generating that cryptic error message when it fails.

Flowers answered 7/4, 2012 at 15:23 Comment(0)
E
1

Looks like you almost had it:

.*\.(?!jpg$|png$)[^.]+

According to my tests (in java) I get these results:

file.jpg - false
file.png - false
file.name.jpg - false
file.name.png - false
file.gif - true
file.name.gif - true
file.jpg.gif - true
file.jpge - true

If this is not what you wanted pleas update your question with your expectations.

Electrotherapeutics answered 7/4, 2012 at 7:45 Comment(0)
O
0

Please try

    .*\.(jpg$|png$)

It will correctly match on filename.jpg. your trying to figure out how to make match ANY .jpg files even if the file's name has 2 or more dots in it, will work fine.
While using python script be sure that you are using the right type of split. the different type of split viz rsplit(right split) and lsplit(left split).

Othilia answered 7/4, 2012 at 7:14 Comment(1)
You've got it backward: the regex should NOT match filename.jpg OR file.name.png. filename.txt or file.name.foo are okay, I presume.Flowers

© 2022 - 2024 — McMap. All rights reserved.