How to Skip 1st line of file - awk
Asked Answered
P

3

27

I am beginner to awk. I have created one file which contains employee information. There are employees in different departments. And i wanna count that how many employees in each department. like

marketing        3
sales            3
production       4

For that i used following command.

awk 'NR>1 {dept=$5} {count[dept]++} END {for (dept in count) {print dept count[dept]}}' emp

But above code it count and displays the first line i.e header also. like

marketing 3
sales 3
department 1
production 4

where department is a header of column which is also counted although i used NR>1.. And how to add space or increase the width of all columns.. because it looks like above output.. but i wanna display it properly.. So any solution for this?

Here is my input file

empid       empname     department
101         ayush    sales
102         nidhi    marketing
103         priyanka    production  
104         shyam    sales
105         ami    marketing
106         priti    marketing
107         atuul    sales
108         richa    production
109         laxman    production
110         ram     production
Philis answered 27/9, 2016 at 11:39 Comment(3)
Show your input file. Also try NR>1{count[$5]++}END{yadayda}Pruritus
oh thnx it works.. and how to increase the width of the columns..Philis
@Philis Show an example input file. Otherwise it is just guessing for others.Tasteful
V
22

Use GNU printf for proper tab-spaced formatting

awk 'NR>1 {count[$3]++} END {for (dept in count) {printf "%-15s%-15s\n", dept, count[dept]}}' file

You can use printf with width options as below example if printf "%3s"

  • 3: meaning output will be padded to 3 characters.

From man awk, you can see more details:

width   The field should be padded to this width. The field is normally padded
        with spaces. If the 0  flag  has  been  used, it is padded with zeroes.

.prec   A number that specifies the precision to use when printing.  For the %e,
        %E, %f and %F, formats, this specifies the number of digits you want
        printed to the right of the decimal point. For the %g, and %G formats,
        it specifies the maximum number of significant  digits. For the %d, %o,
        %i, %u, %x, and %X formats, it specifies the minimum number of digits to
        print. For %s, it specifies the maximum number of characters from the
        string that should be printed.

You can add the padding count as you need. For the input file you specified

$ awk 'NR>1 {count[$3]++} END {for (dept in count) {printf "%-15s%-15s\n", dept, count[dept]}}' file
production     4
marketing      3
sales          3
Vociferation answered 27/9, 2016 at 12:1 Comment(3)
it will give tab between two fields .. i wanna increase widthPhilis
will you please tell me exactly where to place NR in any code/query?Philis
You can google it a bit, some examples here, unix.com/shell-programming-and-scripting/…Vociferation
C
20

You can use tail to skip a specific number of header lines. Here is an example:

command | awk  '{print $1}' | tail -n +2

This will skip the first line after performing awk on the first column of the command result.

Camaraderie answered 22/4, 2022 at 12:36 Comment(1)
Nice and simple +1Remuneration
W
1

I encourage you to make the title of the question more specific. The answer to the question in the title is to use NR>1, as you found.

$ awk 'NR>1 { print $0 }' emp
101         ayush    sales
102         nidhi    marketing
103         priyanka    production  
104         shyam    sales
105         ami    marketing
106         priti    marketing
107         atuul    sales
108         richa    production
109         laxman    production
110         ram     production

Next, I was not able to reproduce your output using the input and command you provided. It helps a lot to provide a reproducible example.

$ awk 'NR>1 {dept=$5} {count[dept]++} END {for (dept in count) {print dept count[dept]}}' emp
11

There are 3 things we need to do to this command to get the desired output.

  1. It refers to column 5, which does not exist. Instead, it should be column 3.
    $ awk 'NR>1 { dept=$3 } { count[dept]++ } END { for (dept in count) { print dept count[dept] } }' emp
    1
    production4
    sales3
    marketing3
    
  2. The next problem is that while the pattern NR>1 has been provided, it is only being applied to the first action. Each action is designated with curly braces ({ }). This can be resolved by combining the two actions, and separating them with a semicolon (;).
    $ awk 'NR>1 { dept=$3; count[dept]++ } END { for (dept in count) { print dept count[dept] } }' emp
    production4
    sales3
    marketing3
    
  3. The final part is to format the output in an appealing way. This can be done using the code inspired by @Inian's answer. The example below aligns the text to the left, separates the columns with a tab, and aligns the numbers to the right.
    awk 'NR>1 { dept=$3; count[dept]++ } END { for (dept in count) { printf "%-16s\t%4d\n", dept, count[dept] } }' emp
    production             4
    sales                  3
    marketing              3
    
Wisecrack answered 16/5 at 19:35 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.