Print column contents by column name
Asked Answered
A

8

10

I want to input a string name (i.e. "COL2") to an awk or cut command and print the column that matches that column header string.

the datafile looks like this:

COL1 COL2 COL3 COL4 COL5 COL6
a a b d c f
a d g h e f
c v a s g a

If I pass in COL3, I want it to print the third column, etc. I'm thinking awk might be the easiest thing to use, but cut may also work. I'm just not sure how to go about doing this.

Abednego answered 1/5, 2011 at 3:49 Comment(3)
okay, now that we've answered your question, let me ask one: why not use cut(1)?Guizot
I would love to use cut. The problem is that I don't know how to match the column name :-D. cut -f $COLUMN_NAME is looking for a number, not a string to match...Abednego
And what happens if there are duplicate "column" names?Breadthways
S
8

Awk 1 liner for above problem (if you are interested):

awk -v col=COL2 'NR==1{for(i=1;i<=NF;i++){if($i==col){c=i;break}} print $c} NR>1{print $c}' file.txt

awk -v col=COL3 'NR==1{for(i=1;i<=NF;i++){if($i==col){c=i;break}} print $c} NR>1{print $c}' file.txt

Just pass your column name COL1, COL2, COL3 etc with -vcol= flag.

Setaceous answered 1/5, 2011 at 4:46 Comment(6)
I'm very interested. Thanks for the one-liner. However it says "-v is an invalid option". Not quite sure what to make of that. EDIT: it worked when I put a space between -v and col. ABSOLUTELY AMAZING. Thanks!Abednego
any multiple columns option ?Paugh
@lenzai: Try this code for multiple columns: awk -v col1=COL2 -v col2=COL6 'NR==1{for(i=1;i<=NF;i++){if($i==col1)c1=i; if ($i==col2)c2=i;}} NR>1{print $c1 " " $c2}' file.txtSetaceous
@Setaceous How should the latter command be modified such that also the column names are printed? So COL2 and COL6. Using awk -v col=COL2 'NR==1{for(i=1;i<=NF;i++){if($i==col){c=i;break}} print $c} NR>1{print $c}' file.txt does print the column name COL2.Clang
@Setaceous Indeed! That's what I said ;). I said it was not working for the multiple column case ;). Anyways, I believe the answer is awk -v col1=h -v col2=N_bf 'NR==1{for(i=1;i<=NF;i++){if($i==col1)c1=i; if ($i==col2)c2=i;} print $c1 " " $c2} NR>1{print $c1 " " $c2}' file.txt. Thanks!Clang
Then you can use: awk -v col1=COL2 -v col2=COL6 'NR==1{for(i=1;i<=NF;i++){if($i==col1)c1=i; if ($i==col2)c2=i;}} {print $c1 " " $c2}' fileSetaceous
S
2

a slight modification of anubhava post on top, for multiple columns

awk -vcol1="COL2" -vcol2="COL6" 'NR==1{for(i=1;i<=NF;i++){if($i==col1)c1=i; if ($i==col2)c2=i;}} NR>0{print $c1 " " $c2}' file.txt

when NR>1 does not print the column headers. This was modified to NR>0 which should print the columns with header names.

Shelli answered 19/7, 2012 at 15:36 Comment(0)
C
2

Note that the first solution prints out the whole file if the named column does not exist. To output a warning message if this occurs try

awk -v col=NoneSuch 'NR==1{for(i=1;i<=NF;i++){if($i==col){c=i;break}}   if (c > 0) {print $c}} else {print "Column " col "does not exist"} NR>1 && c > 0 {print $c}' file1.txt
Collywobbles answered 29/9, 2014 at 18:44 Comment(0)
G
1

It's a little unclear what you're trying to do.

If you want to get the single column from the data, use substr().

If you want to use an argument to choose the column use something like

BEGIN { mycol = ARGV[1] ; }
      { print $mycol }

Update

Hmmm, so you want generalized column names?

Okay, we'll assume that your data is organized like this:

 XXXXX YYYYY ZZZZZ

and you want to name the columns "harpo", "groucho" and "zeppo", and the column name is in ARGV[1]:

 BEGIN { cols["harpo"] = 1; cols["groucho"] = 2; cols["zeppo"] = 3; }
       { print $cols[ARGV[1]]   }

Second update

Yup, this trick will do it. Replace "harpo" etc with "COL1", "Col2", and so on.

Guizot answered 1/5, 2011 at 4:10 Comment(1)
Hardcoding the column names is less than optimal unless you know that they are fixed for all time.Choreograph
B
0

say column is the variable you declared that is the column you want from the shell. You pass it in using awk's -v option

column=3
awk -vcol="$column" '{print $col}' file
Brokenhearted answered 1/5, 2011 at 5:11 Comment(0)
R
0

I'm about a decade late, but here is another simple way to extract a column from .csv file using column name:

column_number=$(awk -v RS=',' '/column_name/{print NR; exit}' file.csv)
column_output=$(awk -F "\"*,\"*" -vcol="$column_number" '{print $col}' file.csv)
echo $column_output

You can change the -F flag in the second command from comma to tab for a .tsv file

Readability answered 18/5, 2022 at 7:41 Comment(0)
M
0
gawk 'NR==1 {for(i=1; i<=NF; i++) _[$i]=i} NR>1 {print $_["COL2"]}'

There's nothing special about the name "_" btw, you could easily have written:

gawk 'NR==1 {for(i=1; i<=NF; i++) n[$i]=i} NR>1 {print $n["COL2"]}'

But _ conflicts less with variable names people tend to use.

Mach answered 20/5 at 4:51 Comment(0)
C
-1

When you say "pass a string" to awk, I guess you want to give the string on the command line. One option is to use the -v feature for defining variables

$ gawk -f columnprinter.awk -v col=thecolumnnameyouwant

Alternately you can use the built-in variable ARGV as Charlie explains.

That only leaves the matter for forming an array to associate column names with column numbers. If the first line of the input contains the column names (a common convention) this becomes pretty easy.

Use

NR==1{...}

to process the first column to get the mapping

NR==1{
   colnum=-1;
   for(i=1; i<=NF; i++)
     if ($i == col) {
        colnum=i
        break
     }
}

which you can use like

{
  print $colnum
}
Choreograph answered 1/5, 2011 at 4:1 Comment(4)
That's not really true -- just use the ARGV and ARGC built-ins.Guizot
that doesn't seem to work. The tough part is getting awk to match the string I input...Abednego
@Charlie: Mea Culpa. I've used awk for years, but I never really "learned" it. I just read the man page when I need it. Somehow I missed that.Choreograph
Nick columnprinter.awk is the script you are writing. Or you could do it inline, but I was guessing that this was something that you were going to want to do repeatedly.Choreograph

© 2022 - 2024 — McMap. All rights reserved.