How to use word boundaries in awk without using match() function?

Asked 13/3, 2012 at 0:44 Answered 19/1, 2017 at 21:2

I want to add word boundaries to this awk command:

awk '{$0=tolower($0)};/wordA/&&/wordB/ { print FILENAME ":" $0; }' myfile.txt

I tried adding \y at left and right of wordA and wordB but it didn't work in my tests.
I tried this: /\ywordA\y/&&/\ywordB\y/

Thanks all!

(ps: I'm new to awk so I was trying to avoid the match() function.)

Neela answered 13/3, 2012 at 0:44 Comment(4)

Your curly-brackets don't seem to be balanced properly; you have more }s than {s. – Zanezaneski 13/3, 2012 at 0:52

I only know one version of awk that understands \b word boundaries: the one you get when you run it through a2p. :) – Monocarpic 13/3, 2012 at 0:52

I also tried \b and \<wordA\> but not working (mac osx). – Neela 13/3, 2012 at 0:59

@Neela You don’t understand. a2p someawkcode | perl is the awk-to-perl translator. That way you can get real perl regexes. – Monocarpic 13/3, 2012 at 2:18

You want to use gawk instead of awk:

gawk '{$0=tolower($0)};/\ywordA\y/&&/\ywordB\y/ { print FILENAME ":" $0; }' myfile.txt

will do what you want, if your system has gawk (e.g. on Mac OS X). \y is a GNU extension to awk.

Schenk answered 13/3, 2012 at 0:59 Comment(0)

GNU awk also supports the \< and \> conventions for word boundaries.
On a Mac, /usr/bin/awk version 20070501 does not support [[:<:]] or [[:>:]]
If you're stuck with a recalcitrant awk, then since awk is normally splitting lines into tokens anyway, it might make sense to use:

function word(s, i) { for (i=1;i<=NF;i++) {if ($i ~ "^" s "$") {return i}}; return 0; }

So, for example, instead of writing

/\<[abc]\>/ { print "matched"; }

you could just as easily write:

word("[abc]") { print "matched"; }

Sodomy answered 19/1, 2017 at 21:2 Comment(0)

This might work for you on Mac OS X:

awk '{$0=tolower($0)};/[[:<:]]wordA[[:>:]]/&&/[[:<:]]wordB[[:>:]]/ { print FILENAME ":" $0; }' myfile.txt

But as it won't work on linux you're best off installing GNU awk.

Jonniejonny answered 13/3, 2012 at 1:55 Comment(1)

As of MacOS Mojave, this doesn't work for me. (^|[^A-Za-z0-9_]) works for a starting word boundary and ([^A-Za-z0-9_]|$) for an ending one (maybe exclude the underscore or perform other tweaks depending on your use case). I don't think this is portable between Awks, either, unfortunately. – Dupondius 1/8, 2022 at 10:0

Recommended topics

Hot tags