Can I use a boolean AND condition in a regular expression?

Asked 31/5, 2012 at 4:32 Answered 31/5, 2012 at 4:50

Say, if I have a DN string, something like this:

OU=Karen,OU=Office,OU=admin,DC=corp,DC=Fabrikam,DC=COM

How to make a regular expression to pick only DNs that have both OU=Karen and OU=admin?

Andrews answered 31/5, 2012 at 4:32 Comment(0)

This is the regex lookahead solution, matching the whole string if it contains required parts in any order just for the reference. If you do not store the pattern in some sort of configurable variable, I'd stick with nhahtdh's solution, though.

/^(?=.*OU=Karen)(?=.*OU=admin).*$/

^        - line start
(?=      - start zero-width positive lookahead
.*       - anything or nothing
OU=Karen - literal
)        - end zero-width positive lookahead
         - place as many positive or negative look-aheads as required
.*       - the whole line
$        - line end

Combes answered 31/5, 2012 at 4:50 Comment(0)

You realise you don't have to do everything with a single regex, or even one regex.

Regular expressions are very good for catching classes of input but, if you have two totally fixed strings, you can just use a contains()-type method for both of them and then and the results.

Alternatively, if you need to use regexes, you can do that twice (once per string) and and the results together.

If you need to do it with a single regex, you could try something like:

,OU=Karen,.*,OU=admin,|,OU=admin,.*,OU=Karen,

but you'll then have to also worry about when those stanzas appear at the start or end of the line, and all sorts of other edge cases (one or both at start or end, both next to each other, names like Karen7 or administrator-lesser, and so on).

Having to allow for all possibilities will probably end up with something monstrous like:

^OU=Karen(,[^,]*)*,OU=admin,|
^OU=Karen(,[^,]*)*,OU=admin$|
,OU=Karen(,[^,]*)*,OU=admin,|
,OU=Karen(,[^,]*)*,OU=admin$|
^OU=admin(,[^,]*)*,OU=Karen,|
^OU=admin(,[^,]*)*,OU=Karen$|
,OU=admin(,[^,]*)*,OU=Karen,|
,OU=admin(,[^,]*)*,OU=Karen$

although, with an advanced enouge regex engine, this may be reducible to something smaller (although it would be unlikely to be any faster, simply because of all the forward-looking/back-tracking).

One way that could be improved without a complex regex is to massage your string slightly before-hand so that boundary checks aren't needed:

newString = "," + origString.replace (",", ",,") + ","

so that it starts and ends with a comma and all commas within it are duplicated:

,OU=Karen,,OU=Office,,OU=admin,,DC=corp,,DC=Fabrikam,,DC=COM,

Then you need only check for the much simpler:

,OU=Karen,.*,OU=admin,|,OU=admin,.*,OU=Karen,

and this removes all the potential problems mentioned:

either at start of string.
either at end of string.
both abutting each other.
extended names like Karen2 being matched accidentally.

Probably the best way to do this (if your language allows) is to simply split the string on commas and examine them, something like:

str = "OU=Karen,OU=Office,OU=admin,DC=corp,DC=Fabrikam,DC=COM"
elems[] = str.splitOn(",")

gotKaren = false
gotAdmin = false
for each elem in elems:
    if elem = "OU=Karen": gotKaren = true
    if elem = "OU=admin": gotAdmin = true

if gotKaren and gotAdmin:
    weaveYourMagicHere()

This both ignores the order in which they may appear and bypasses any regex "gymnastics" that may be required to detect the edge cases.

It also has the advantage of probably being more readable than the equivalent regex :-)

Ixtle answered 31/5, 2012 at 4:35 Comment(4)

Again, as I said above, thanks for the explanation but this has to be a regexp only. – Andrews 31/5, 2012 at 5:5

@ahmd0, then you can look into the regex I gave, accounting for all the possibilities I warned against. Or, provided your regex engine is advanced enough, Eugene's solution is probably the best. – Ixtle 31/5, 2012 at 5:6

+1. Forgot about splitting and check (I only consider this for repeated check in my post). Not sure, but the performance may be better this way. – Pistachio 31/5, 2012 at 5:9

@ahmd0, I've also added another possibility you should examine, that of massaging the string in advance (as a temporary string, of course) to make the RE a lot simpler. – Ixtle 31/5, 2012 at 5:23

If you must use a regex, you can use

/OU=Karen.*?OU=admin|OU=admin.*?OU=Karen/

Lammergeier answered 31/5, 2012 at 4:35 Comment(0)

You can contains(), or indexOf() as many times as the number of conditions to check the exact string. No need for regex.

Extensible regex (as in it can support more conditions) may be possible with look ahead, but I doubt it will perform any better.

If you want to perform this type of action multiple times on the same string, and there are many tokens on the string, then you may consider parsing the string and store in some data structure.

Pistachio answered 31/5, 2012 at 4:34 Comment(2)

Sometimes the best answer to the question "How do I X?" is "Don't do X, do Y instead." REs are a fantastic tool but so are chainsaws - I still wouldn't use them to bang in a nail :-) – Ixtle 31/5, 2012 at 5:2

I can't use any programming languages. It must be the regexp. Sorry. – Andrews 31/5, 2012 at 5:3

No, not unless you're using vi: it has an \& operator

/(OU=Karen.*OU=admin|ou=admin.*OU=Karen)/

This might be close enough, though, or similar.

Turves answered 31/5, 2012 at 4:35 Comment(0)

-1

You can use something like (OU\=Karen

Plural answered 31/5, 2012 at 4:42 Comment(1)

Hi and welcome to Stack Overflow. This may well answer the question - but a bit of explanation is always a good idea. There are heaps of newbies on S/O that may learn a thing or two from you, and what may be obvious to you, won't be to them. – Gearwheel 16/7, 2014 at 0:42

Recommended topics

Hot tags