Extracting and rearranging columns

Asked 26/8, 2022 at 12:2 Answered 26/8, 2022 at 18:14

I read from stdin lines which contain fields. The field delimiter is a semicolon. There are no specific quoting characters in the input (i.e. the fields can't contain themselves semicolons or newline characters). The number of the input fields is unknown, but it is at least 4.

The output is supposed to be a similar file, consisting of the fields from 2 to the end, but field 2 and 3 reversed in order.

I'm using zsh.

I came up with a solution, but find it clumsy. In particular, I could not think of anything specific to zsh which would help me here, so basically I reverted to awk. This is my approach:

awk -F ';' '{printf("%s", $3 ";" $2); for(i=4;i<=NF;i++) printf(";%s", $i); print "" }' <input_file >output_file

The first printf takes care about the two reversed fields, and then I use an explicit loop to write out the remaining fields. Is there a possibility in awk (or gawk) to print a range of fields in a single command? Or did I miss some incredibly clever feature in zsh, which could make my life simpler?

UPDATE: Example input data

a;bb;c;D;e;fff
gg;h;ii;jj;kk;l;m;n

Should produce the output

c;bb;D;e;fff
ii;h;jj;kk;l;m;n

Intermixture answered 26/8, 2022 at 12:2 Comment(0)

Another awk variant:

awk 'BEGIN{FS=OFS=";"} {$1=$3; $3=""; sub(/;;/, ";")} 1' file

c;bb;D;e;fff
ii;h;jj;kk;l;m;n

Jsandye answered 26/8, 2022 at 18:14 Comment(0)

Using any awk in any shell on every Unix box:

$ awk 'BEGIN{FS=OFS=";"} {t=$3; $3=$2; $2=t; sub(/[^;]*;/,"")} 1' file
c;bb;D;e;fff
ii;h;jj;kk;l;m;n

Beaconsfield answered 26/8, 2022 at 12:27 Comment(0)

With GNU awk you could try following code. Using match function ogf GNU awk, where using regex ^[^;]*;([^;]*;)([^;]*;)(.*)$ to catch the values as per requirement, this is creating 3 capturing groups; whose values are getting stored into array named arr(GNU awk's functionality) and then later in program printing values as per requirement.

Here is the Online demo for used regex.

awk 'match($0,/^[^;]*;([^;]*;)([^;]*;)(.*)$/,arr){
  print arr[2] arr[1] arr[3]
}
' Input_file

Christ answered 26/8, 2022 at 12:55 Comment(0)

If perl is accepted, it provides a join() function to join elements on a delimiter. In awk though you'd have to explicitly define one (which isn't complex, just more lines of code)

perl -F';' -nlae '$t = @F[2]; @F[2] = @F[1]; $F[1] = $t; print join(";", @F[1..$#F])' file

Bruckner answered 26/8, 2022 at 12:40 Comment(5)

You can slice it directly too: perl -F';' -lane 'print join ";", @F[2,1,3..$#F]' – Lamoureux 26/8, 2022 at 12:44

I also thought about Perl or Ruby, but I ideally wanted to do it with zsh/awk. BTW, if I remember well my Perl times, the field exchange in your solution could be written simpler as (@F[2], @F[1]) = (@F[1], @F[2]). – Intermixture 26/8, 2022 at 12:45

@Intermixture if you don't get a zsh answer here, I'd recommend asking over at unix.stackexchange.com – Lamoureux 26/8, 2022 at 12:47

@Lamoureux : The slicing idea is cute. I wonder if we couldn't do with -p instead of -n, since we ar outputting each line anyway. – Intermixture 26/8, 2022 at 12:47

@Intermixture we need join, so -n is needed here, unless you assign join to $_ (which will indeed help with golfing) – Lamoureux 26/8, 2022 at 12:49

With sed, perl, hck and rcut (my own script):

$ sed -E 's/^[^;]+;([^;]+);([^;]+)/\2;\1/' ip.txt
c;bb;D;e;fff
ii;h;jj;kk;l;m;n

# can also use: perl -F';' -lape '$_ = join ";", @F[2,1,3..$#F]' ip.txt
$ perl -F';' -lane 'print join ";", @F[2,1,3..$#F]' ip.txt
c;bb;D;e;fff
ii;h;jj;kk;l;m;n

# -d and -D specifies input/output separators
$ hck -d';' -D';' -f3,2,4- ip.txt
c;bb;D;e;fff
ii;h;jj;kk;l;m;n

# syntax similar to cut, but output field order can be different
$ rcut -d';' -f3,2,4- ip.txt
c;bb;D;e;fff
ii;h;jj;kk;l;m;n

Note that the sed version will preserve input lines with less than 3 fields.

$ cat ip.txt
1;2;3
apple;fig
abc

$ sed -E 's/^[^;]+;([^;]+);([^;]+)/\2;\1/' ip.txt
3;2
apple;fig
abc

$ perl -F';' -lane 'print join ";", @F[2,1,3..$#F]' ip.txt
3;2
;fig
;

Lamoureux answered 26/8, 2022 at 12:53 Comment(2)

Do you mean by rcut this regexp-cut? – Intermixture 27/8, 2022 at 9:5

Yeah, that's the same link as mentioned in my answer... repo name is regexp-cut, and tool name is rcut – Lamoureux 27/8, 2022 at 11:27

Another awk variant:

awk 'BEGIN{FS=OFS=";"} {$1=$3; $3=""; sub(/;;/, ";")} 1' file

c;bb;D;e;fff
ii;h;jj;kk;l;m;n

Jsandye answered 26/8, 2022 at 18:14 Comment(0)

With gnu awk and gensub switching the position of 2 capture groups:

awk '{print gensub(/^[^;]*;([^;]*);([^;]*)/, "\\2;\\1", 1)}' file

The pattern matches

^ Start of string
[^;]*; Negated character class, match optional chars other than ; and then match ;
([^;]*);([^;]*) 2 capture groups, both capturing chars other than ; and match ; in between

Output

c;bb;D;e;fff
ii;h;jj;kk;l;m;n

Hackman answered 26/8, 2022 at 15:42 Comment(0)

awk '{print $3, $0}' {,O}FS=\; < file | cut -d\; -f1,3,5-

This uses awk to prepend the third column, then pipes to cut to extract the desired columns.

Boatbill answered 26/8, 2022 at 15:35 Comment(0)

Here is one way to do it using only zsh:

rearrange() {
    local -a lines=(${(@f)$(</dev/stdin)})
    for line in $lines; do
        local -a flds=(${(s.;.)line})
        print $flds[3]';'$flds[2]';'${(j.;.)flds[4,-1]}
    done
}

The same idea in a single line. This may not be an improvement over your awk script:

for l in ${(@f)$(<&0)}; print ${${(A)i::=${(s.;.)l}}[3]}\;$i[2]\;${(j.;.)i:3}

Some of the pieces:

$(</dev/stdin) - read from stdin using pseudo-device.
$(<&0) - another way to read from stdin.
(f) - parameter expansion flag to split by newlines.
(@) - treat split as an array.
(s.;.) - split by semicolon.
$flds[3] - expands to the third array element.
$flds[4,-1] - fourth, fifth, etc. array elements.
$i:3 - ksh-style array slice for fourth, fifth ... elements.
Mixing styles like this can be confusing, even if it is slightly shorter.
(j.;.) - join array by semicolon.
i::= - assign the result of the expansion to the variable i.
This lets us use the semicolon-split fields later.
(A)i::= - the (A) flag ensures i is an array.

Stokowski answered 26/8, 2022 at 15:37 Comment(1)

While this version does not look very simple on the first glance, I understand that we can't make it much simpler in plain zsh, and I aprreciate it really much for its educative value, in particular the use of the various flags. – Intermixture 27/8, 2022 at 9:11

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags