The premise of the awk language is that there should only be constructs to do things that aren't easy to do with other constructs to keep the language concise and so avoid the language bloat that some other tools/languages suffer from. e.g. some people like that perl has many unique language constructs to do anything you could possible want to do while others express their opposing view of the language in cartoons like https://www.zoitz.com/comics/perl_small.png.
This is just one of the many things that it'd be nice to have a function to do, but it's so easy to code whatever you actually need to do to skip a couple of fields for any specific input it'd just be cluttering up the language if a function existed to do it and if we had a function for THIS there are 100s of other functions that should also be created to do all of the other things it'd just be nice to have a function to do.
Using GNU awk for \s/\S
shorthand
$ awk 'sub(/^\s*(\S+\s+){2}/,"")' file
data for row1
data for row2
data for row3
data for row4
and the same with any POSIX awk:
$ awk 'sub(/^[[:space:]]*([^[:space:]]+[[:space:]]+){2}/,"")' file
data for row1
data for row2
data for row3
data for row4
Note that the awk output from above would retain any trailing white space, unlike a shell read loop.
Both of those rely on the FS
being the default blank character but are easily modified for any other FS
that can be negated in a bracket expression (or opposite character class).
Note that the entire approach relies on being able to negate the FS
in a bracket expression so it wouldn't work if the FS
was some arbitrary regexp or even a multi-char string but then neither would the shell read loop you're asking to duplicate the function of.
If you do happen to have a FS
you can't just negate in a bracket expression, e.g. if your fields are separated by 3 digits or 2 punctuation characters so you have something like:
$ echo 'abc345def;%ghi+klm;%nop345qrs' |
awk -v FS='[[:digit:]]{3}|[[:punct:]]{2}' '{for (i=1; i<=NF; i++) print i, $i}'
1 abc
2 def
3 ghi+klm
4 nop
5 qrs
then here's a more general approach using GNU awk for the 4th arg to split()
:
$ echo 'abc345def;%ghi+klm;%nop345qrs' |
awk -v FS='[[:digit:]]{3}|[[:punct:]]{2}' '{
split($0,f,FS,s)
print substr( $0, length(s[0] f[1] s[1] f[2] s[2]) + 1 )
}'
ghi+klm;%nop345qrs
$1 = ""; $2 = ""
and then leverage the reconstituted$0
. (Not building this into an answer because that's obviously not what you want, but it may work for someone else). – Professoratewhile read -r _ _ data; do printf '<%s>\n' "$data"; done <<< 'a b c '
, i.e. with spaces after thec
, and note that it outputs just<c>
, not<c >
with the spaces afterc
that you want to keep. – Bluestonedata
appears in the input on rather than wanting to print from the 3rd field on. Those are 2 very different problems that each can benefit from quite different solutions. If you change one of the 2nd field values, e.g.1.2
todata
and one of the 3rd field values fromdata
tofoodatabar
that should make what you're asking about even clearer (though I thought it was already clear that you wanted "a method in awk for splitting only the first N columns"). – Bluestone