regex to find files containing one word but not another [duplicate]
Asked Answered
I

4

14

I am trying to quickly find all .java files which contain one term but are missing another term. I'm using MyEclipse 10.7 and its 'Search | File Search' feature, which supports regular expressions.

Will regex work in this scenario? What would the correct regex be?

Isoagglutinin answered 4/3, 2013 at 19:39 Comment(4)
This question is not a duplicate with the linked alleged answer. This one asks about searching FILES that contain one string and do not contain another. The link question is focused on a single string that contains one sub-string but not another. Two completely different things.Shillyshally
@Shillyshally You're wrong, the accepted answer of this question is (bar the search terms itself) identical to the one in the duplicate. The difference whether you're searching in a single file, or over all files depends on how the tool searches, not on the search regex itself.Harwell
@MarkRotteveel - You are viewing these two questions purely from the need to address the regex part of the question. This is fine for one question as it asks only for a regex solution for a single string, thus a regex only solution is sufficient. However the other question brings into scope 2 distinctive requirements 1) Can regex be used with the tools OP is using, 2) If so, what form of regex expression can search an entire file to isolate those files that meet its criteria? Even the solutions that satisfy the two OP questions are distinct. (Note the last line of accepted answer below.)Shillyshally
@MarkRotteveel - i.e. "The key is the \s\S, which ensures the whole file is searched and not each line."Shillyshally
I
19

The only solution I could find to work is the following Regex:

^(?!.[\s\S]*MISSING_TERM).[\s\S]*INCLUDED_TERM.*$

It finds every file which includes INCLUDED_TERM but lacks MISSING_TERM, regardless of the line.

The key is the \s\S, which ensures the whole file is searched and not each line.

Isoagglutinin answered 13/3, 2013 at 15:44 Comment(0)
M
4

If you want to find it on a single line, use it like this:

^(?!.*MISSING_TERM).*INCLUDED_TERM.*$

You can also use \ as an escape character, cause you may need it like class\.variable.

Manlove answered 20/4, 2017 at 10:34 Comment(1)
very nice answer (y). viva live broda.Dagan
S
0

(?m)\A(?=.*REGEX_TO_FIND)(?!.*MISSING_REGEX.*).*\z

The regex can get kinda tricky but it breaks down into two pieces.

  1. Find the matching term/phrase/word. This part isn't too tricky as this is what regex normally looks for.
  2. Finding the term not present. This is the tricky part, but it's possible.

I have an example HERE which shows how you want to find the word connectReadOnly in the text, and fail to find disconnect. Since the text contains connectReadOnly it starts looking for the next piece, not finding disconnect. Since disconnect is in the text it fails on the entire string (what you will need for your entire file to match). If you play around with the second piece, the negation part (?!.*disconnect.*), you can set that as whatever regex you need. In my example I don't want to find disconnect anywhere in my code :) You can easily replace that with your word to search on, or even a more complex regex to "not find".

The key is to use multi line mode, which is set using the beginning (?m) and then using the start/end of string chars. Using ^ and $ to start/end a line, where \A and \z start and end a string, thus extending the match over the entire file.

EDIT: For the connectReadOnly and disconnect question use: (?m)\A(?=.*connectReadOnly)(?!.*disconnect.*).*\z. The updated example can be found here.

Sleepwalk answered 4/3, 2013 at 20:6 Comment(3)
This seems close but still returns a number of false positives (e.g. files which contain both terms). Here is the expression I tried but returns files which contain both terms: (?m)^(?=.*connectReadOnly)((?!disconnect).)*$ My goal is to find the files which have 'connectReadOnly' on any line but are missing the term 'disconnect'.Isoagglutinin
@SAL Changes have been made to the answer, try them out... it should work now :)Sleepwalk
That isn't working either. I'm wondering if perhaps the regex parser in Eclipse is different than what you're using? What I have found works, after piecing together tips from various sources, is: ^(?!.[\s\S]*disconnect).[\s\S]*connect.*$Isoagglutinin
S
0

You could use something like:

(?<!.*bar)foo(?!.*bar)

Will match if "foo" is found but "bar" is not.

Notice: you must configure your search engine to use multiline regex (EX: Notepad++ has an option called ". matches newline") because usually the dot represent any character except line break.

Soemba answered 4/3, 2013 at 20:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.