How do I find the text that matches a pattern?
Asked Answered
A

1

13

NOTE: This is not a duplicate of any existing question, it's intended to show why such an extremely common and seemingly simple question is unanswerable and provide guidance on how people posting such questions can modify them to make them answerable so we don't have to keep providing the same guidance in comments almost every day and can just refer to this instead.

Given the following input file:

foo
o.b
bar

I need to output all lines that match the pattern o.b so my expected output is:

o.b

and I have tried awk '"o.b"' file but that output all lines (this part just added to avoid complaints that no attempted solution was posted in the question).

Abatement answered 7/1, 2021 at 23:20 Comment(0)
A
15

While on the surface this seems to be a simple question with an obvious answer it actually is not because of 2 factors:

  1. The word pattern is ambiguous - we don't know if the OP wants to do a regexp match or a string match, and
  2. The word match is ambiguous - we don't know if the OP wants to do a full match on each line (consider line and record synonymous for simplicity of this answer) or a full match on specific substrings (e.g. "words" or fields) on a line or a partial match on part of each line or something else.

Either of these would produce the expected output from the posted sample input:

  1. awk '/o.b/' file
  2. awk '/^o.b$/' file
  3. awk 'index($0,"o.b")' file
  4. awk '$0 == "o.b"' file

but we don't know which is correct, if any, all we know is that they produce the expected output from the specific sample input in the question.

Consider how each would behave if the OPs real data contains additional strings like this rather than just the minimal example shown in the question:

$ cat file
foo
foo.bar
foobar
o.b
orb
bar

then here are 4 possible answers that will all produce the expected output given the sample input from the question but will produce very different output given just slightly different input and we just have no way of knowing from the question as asked which output would be correct for the OPs needs:

  1. Partial regexp match:
$ awk '/o.b/' file
foo.bar
foobar
o.b
orb
  1. Full-line regexp match:
$ awk '/^o.b$/' file
o.b
orb
  1. Partial string match:
$ awk 'index($0,"o.b")' file
foo.bar
o.b
  1. Full-line string match:
$ awk '$0 == "o.b"' file
o.b

There are various other possibilities that might also be the correct answer when you consider full-word, full-field, and other types of matching against specific substrings on each line.

So whenever you ask a question about matching some text against other text:

  1. Never use the word pattern but instead use string or regexp, whichever it is you mean, and
  2. Always state whether you want the match to be on a full line or part of a line or full substring (e.g. word or field) or part of a substring of a line.

Otherwise you may end up with a solution to a problem that you don't have which could be inefficient and/or simply wrong and even if it produces the expected output for some specific input set you run it against now, it may well come back to bite you when run against some other input set later.

Also see https://unix.stackexchange.com/a/631532/133219 for more examples of this issue.

Abatement answered 7/1, 2021 at 23:20 Comment(1)
Ed, I bookmarked this one.. nice reference.. provides clarity and attention to minute details..Yetac

© 2022 - 2024 — McMap. All rights reserved.