Should I use cut or awk to extract fields and field substrings?
Asked Answered
F

4

13

I have a file with pipe-separated fields. I want to print a subset of field 1 and all of field 2:

cat tmpfile.txt

# 10 chars.|variable length num|text
ABCDEFGHIJ|99|U|HOMEWORK
JIDVESDFXW|8|C|CHORES
DDFEXFEWEW|73|B|AFTER-HOURS

I'd like the output to look like this:

# 6 chars.|variable length num
ABCDEF|99
JIDVES|8
DDFEXF|73

I know how to get fields 1 & 2:

cat tmpfile.txt | awk '{FS="|"} {print $1"|"$2}'

And know how to get the first 6 characters of field 1:

cat tmpfile.txt | cut -c 1-6

I know this is fairly simple, but I can't figure out is how to combine the awk and cut commands.

Any suggestions would be greatly appreciated.

Faucal answered 1/4, 2014 at 17:26 Comment(0)
R
14

You could use awk. Use the substr() function to trim the first field:

awk -F'|' '{print substr($1,1,6),$2}' OFS='|' inputfile

For your input, it'd produce:

ABCDEF|99
JIDVES|8
DDFEXF|73

Using sed, you could say:

sed -r 's/^(.{6})[^|]*([|][^|]*).*/\1\2/' inputfile

to produce the same output.

Reiss answered 1/4, 2014 at 17:34 Comment(9)
+1; slightly shorter: awk -F'|' '{print substr($1,1,6) FS $2}' inputfileStockholder
If shortness is important: awk -F\| '{$0=substr($1,1,6)FS$2}1'Ajani
Thank you -What does the "1" (not the $1) mean in this context?Faucal
@Faucal You can consider it equivalent to print.Reiss
@Faucal Its a very basic thing in awk, its the same as 1 {print $0}. It just print the current line. Or do you mean the 1 in substr? Start at position 1 of the string.Ajani
@Ajani Why even bother the braces and 1. awk -F\| '$0=substr($1,1,6)FS$2' should suffice. ;)Meggs
@jaypal Clever :) Just this will always be true hence printed.Ajani
Good one! I sometimes find more clear to use BEGIN{FS=OFS="|"} when both FS/OFS are the same.Aid
@Aid You're perhaps right in the sense that it's more clear. I find it cumbersome to add a BEGIN block and those curly parenthesis that demand pressing the Shift key. It hurts, ouch!Reiss
A
3

You could use cut and paste, but then you have to read the file twice, which is a big deal if the file is very large:

paste -d '|' <(cut -c 1-6 tmpfile.txt ) <(cut -d '|' -f2 tmpfile.txt )
Adamo answered 1/4, 2014 at 17:48 Comment(0)
P
2

Just for another variation: awk -F\| -vOFS=\| '{print $1,$2}' t.in | cut -c 1-6,11-

Also, as tripleee points out, two cuts can do this too: cut -c 1-6,11- t.in | cut -d\| -f 1,2

Pignus answered 1/4, 2014 at 17:41 Comment(1)
Or if you can guesstimate the maximal length of the second field, use two cuts; cut -c1-6,11-16 t.in | cut -d'|' -f1-2Torpedo
L
1

I like a combination of cut and sed, but that's just a preference:

cut -f1-2 -d"|" tmpfile.txt|sed 's/\([A-Z]\{6\}\)[A-Z]\{4\}/\1/g'

Result:

# 10-digits|variable length num
ABCDEF|99
JIDVES|8
DDFEXF|73

Edit: (Removed the useless cat) Thanks!

Lola answered 1/4, 2014 at 17:35 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.