transpose column and rows using gawk
Asked Answered
W

4

5

I am trying to transpose a really long file and I am concerned that it will not be transposed entirely.

My data looks something like this:

Thisisalongstring12345678   1   AB  abc 937 4.320194
Thisisalongstring12345678   1   AB  efg 549 0.767828
Thisisalongstring12345678   1   AB  hi  346 -4.903441
Thisisalongstring12345678   1   AB  jk  193 7.317946

I want my data to look like this:

Thisisalongstring12345678 Thisisalongstring12345678 Thisisalongstring12345678 Thisisalongstring12345678
1                         1                         1                         1
AB                        AB                        AB                        AB
abc                       efg                       hi                        jk
937                       549                       346                       193
4.320194                  0.767828                  -4.903441                 7.317946

Would the length of the first string prove to be an issue? My file is much longer than this approx 2000 lines long. Also is it possible to change the name of the first string to Thisis234, and then transpose?

Wallah answered 4/4, 2012 at 0:8 Comment(1)
If you're willing to put up with lines of 20,000 * 25 characters (or so) per column (so 100 KiB or so per line), and the applications you work with are too, then the chances are that gawk will be fine with it too. Yes, you can trim the long names; devise the algorithm and apply on output or during input.Sphery
R
7

I don't see why it will not be - unless you don't have enough memory. Try the below and see if you run into problems.

Input:

$ cat inf.txt 
a b c d
1 2 3 4
. , + -
A B C D

Awk program:

$ cat mkt.sh
awk '
{
  for(c = 1; c <= NF; c++) {
    a[c, NR] = $c
  }
  if(max_nf < NF) {
    max_nf = NF
  }
}
END {
  for(r = 1; r <= NR; r++) {
    for(c = 1; c <= max_nf; c++) {
      printf("%s ", a[r, c])
    }
    print ""
  }
}
' inf.txt

Run:

$ ./mkt.sh 
a 1 . A 
b 2 , B 
c 3 + C 
d 4 - D 

Credits:

Hope this helps.

Redhanded answered 4/4, 2012 at 0:32 Comment(2)
Similar to command line pivotDurable
@Durable Agree, it's a similar topic, different approach - good for OP to have options!Redhanded
L
7

This can be done with the rs BSD command:

http://www.unix.com/man-page/freebsd/1/rs/

Check out the -T option.

Layer answered 4/4, 2012 at 2:30 Comment(1)
This is brilliant: also, available (stock) in OSX. rs as many features. I suggest reading the man page.Lamina
S
4

I tried icyrock.com's answer, but found that I had to change:

for(r = 1; r <= NR; r++) {
  for(c = 1; c <= max_nf; c++) {

to

for(r = 1; r <= max_nf; r++) {
  for(c = 1; c <= NR; c++) {

to get the NR columns and max_nf rows. So icyrock's code becomes:

$ cat mkt.sh
awk '
{
  for(c = 1; c <= NF; c++) {
    a[c, NR] = $c
  }
  if(max_nf < NF) {
    max_nf = NF
  }
}
END {
  for(r = 1; r <= max_nf; r++) {
    for(c = 1; c <= NR; c++) {
      printf("%s ", a[r, c])
    }
    print ""
  }
}
' inf.txt

If you don't do that and use an asymmetrical input, like:

a b c d
1 2 3 4
. , + -

You get:

a 1 .
b 2 ,
c 3 +

i.e. still 3 rows and 4 columns (the last of which is blank).

Sunken answered 23/1, 2015 at 0:51 Comment(0)
F
0

For @ ScubaFishi and @ icyrock code:

"if (max_nf < NF)" seems unnecessary. I deleted it, and the code works just fine.

Fourwheeler answered 25/2, 2017 at 3:15 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.