Extract substring from a field with single AWK in AIX
Asked Answered
A

6

5

I have a file, file, with content like:

stringa    8.0.1.2     stringx
stringb    12.01.0.0    stringx

I have to get a substring from field 2 (the first two values with the dot). I am currently doing cat file | awk '{print $2}' | awk -F. '{print $1"."$2}' and am getting the expected output:

8.0
12.01

How can I do this with a single AWK?

I have tried with match(), but I am not seeing an option for a back reference.

Aforethought answered 18/11, 2021 at 14:38 Comment(1)
Single AWK what? Single AWK line? Single AWK file? Single AWK expression? Single AWK regular expression? A one-liner?Require
D
4

You can do something like this.

awk '{ split($2,str,"."); print str[1]"."str[2] }' file
8.0
12.01

Also, keep in mind that your cat is not needed. Simply give the file directly to awk.

Depilatory answered 18/11, 2021 at 14:45 Comment(0)
F
4

With GNU grep, please try following command once.

grep -oP '^\S+\s+\K[[:digit:]]+\.[[:digit:]]+' Input_file

Explanation: I am using GNU grep here. Using its -oP options to print the matched part and enable PCRE with -P option here. In the main program, matching from starting non-space characters followed by one or more spaces, then using the \K option to forget that match. Then matching one or more digits occurrences followed by a dot; which is further followed by digits. If a match is found then it prints the matched value.

Frankfurter answered 18/11, 2021 at 16:29 Comment(5)
grep -oP '^\S+\s+\K([[:digit:]]+\.){3}[[:digit:]]+' file 8.0.1.2 12.01.0.0Aforethought
@vijesh, its been edited please see my latest solution once.Frankfurter
It prints the whole field2Aforethought
@vijesh, yes please. its been edited please see my latest/updated solution once.Frankfurter
grep -oP '^\S+\s+\K[[:digit:]]+\.[[:digit:]]+' file Works!Aforethought
S
2

I would use GNU AWK's split function as follows. Let the file.txt content be

stringa    8.0.1.2     stringx
stringb    12.01.0.0    stringx

Then

awk '{split($2,arr,".");print arr[1]"."arr[2]}' file.txt

Output:

8.0
12.01

Explanation: split at . the second field and put elements into array arr.

(Tested in Gawk 4.2.1)

Stillmann answered 18/11, 2021 at 14:45 Comment(0)
D
2

You could match digits . digits from the second column and print if there is a match:

awk 'match($2, /^[[:digit:]]+\.[[:digit:]]+/) {
    print substr($2, RSTART, RLENGTH)
}
' file

Output

8.0
12.01
Deandreadeane answered 18/11, 2021 at 15:17 Comment(0)
B
2

Also with GNU awk and gensub():

awk '{print gensub(/([[:digit:]]+[.][[:digit:]]+)(.*)/,"\\1","g",$2)}' file
8.0
12.01
  • gensub() provides the ability to specify components of a regexp in the replacement text using parentheses in the regexp to mark the components and then specifying \\n in the replacement text, where n is a digit from 1 to 9.
Blotch answered 18/11, 2021 at 16:55 Comment(0)
A
0

You should perhaps not use AWK at all (or any other external program, for that matter), but rely on the field-splitting capabilities of the shell and some variable expansion. For instance:

 # printf "%s\n%s\n" "stringa    8.0.1.2     stringx" \
                     "stringb    12.01.0.0    stringx" |\
   while read first second third junk ; do
        printf "=%s= =%s= =%s=\n" "$first" "$second" "$third"
   done
   =stringa= =8.0.1.2= =stringx=
   =stringb= =12.01.0.0= =stringx=

As you can see, the value is captured in the variable "$second" already and you just need to further isolate the parts you want to see—the first and second part separated by a dot. You can do that either with parameter expansion:

 # variable="8.0.1.2"
 # echo ${variable%.*.*}
   8.0

Or like this:

 # variable="12.01.0.0"
 # echo ${variable%${variable#*.*.}}
   12.01

Or you can use a further read statement to separate the parts and then put them back together:

 # variable="12.01.0.0"
 # echo ${variable} | IFS=. read parta partb junk
 # echo ${parta}.${partb}
   12.01

So, putting all together:

 # printf "%s\n%s\n" "stringa    8.0.1.2     stringx" \
                     "stringb    12.01.0.0    stringx" |\
   while read first second third junk ; do
        printf "%s\n" "$second" | IFS=. read parta partb junk
        printf "%s.%s\n" "$parta" "$partb"
   done
   8.0
   12.01
Ashlynashman answered 22/12, 2021 at 21:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.