Format text with sed or awk
Asked Answered
Z

11

5

Am trying to format the below actual output to get in the same line for each disks

0. ct1d0 <INTEL-ADDPF2KX076T9S-2CV1-6.19TB>
   /pci@4,0/pci8086,347c@4/e,487c@0/disk@1
   /dev/chassis/SYS/DBP/HDD0/NVME/disk
1. c2t1d0 <INTEL-ADDPF2KX076T9S-2CV1-6.19TB>
   /pci@4,0/pci8086,347d@5/apci108e,487c@0/disk@1
   /dev/chassis/DBP/HDD1/NVME/disk
2. c3t0d0 <ATA-Min_5300_MAAAD-D3MU-223.57GB>
   /pci@0,0/pci8e,4872@17/disk@0,0
   /dev/chassis/MB/SSDR0/SSD0/disk
3. c4t2d0 <ATA-Min_5300_MTFD-D3MU-223.57GB>
   /pci@0,0/pci08e,4872@17/disk@2,0
   /dev/chassis/SYS/MB/SSDR0/SSD1/disk

Trying to get the expected output like below,

0. ct1d0 <INTEL-ADDPF2KX076T9S-2CV1-6.19TB>| /pci@4,0/pci8086,347c@4/e,487c@0/disk@1| /dev/chassis/SYS/DBP/HDD0/NVME/disk|
1. c2t1d0 <INTEL-ADDPF2KX076T9S-2CV1-6.19TB>| /pci@4,0/pci8086,347d@5/apci108e,487c@0/disk@1| /dev/chassis/DBP/HDD1/NVME/disk|
2. c3t0d0 <ATA-Min_5300_MAAAD-D3MU-223.57GB>| /pci@0,0/pci108e,4872@17/disk@0,0| /dev/chassis/MB/SSDR0/SSD0/disk|
3. c4t2d0 <ATA-Min_5300_MTFD-D3MU-223.57GB>| /pci@0,0/pci108e,4872@17/disk@2,0| /dev/chassis/SYS/MB/SSDR0/SSD1/disk|

I tried with below,

cat actual_output | tr -s " " | tr "\n" "|"

Which is resulting all in single line,

0. ct1d0 <INTEL-ADDPF2KX076T9S-2CV1-6.19TB>| /pci@4,0/pci8086,347c@4/e,487c@0/disk@1| /dev/chassis/SYS/DBP/HDD0/NVME/disk|1. c2t1d0 <INTEL-ADDPF2KX076T9S-2CV1-6.19TB>| /pci@4,0/pci8086,347d@5/apci108e,487c@0/disk@1| /dev/chassis/DBP/HDD1/NVME/disk|2. c3t0d0 <ATA-Micron_5300_MAAAD-D3MU-223.57GB>| /pci@0,0/pci108e,4872@17/disk@0,0| /dev/chassis/MB/SSDR0/SSD0/disk|3. c4t2d0 <ATA-Micron_5300_MTFD-D3MU-223.57GB>| /pci@0,0/pci108e,4872@17/disk@2,0| /dev/chassis/SYS/MB/SSDR0/SSD1/disk|

Now need to replace 0. until next 1. contents with newline(\n), so that will get expected result. Do we have any regex to do the same?

TIA

Zoo answered 24/3, 2023 at 10:11 Comment(4)
last 2 devices have input string of pci08e but expected output is showing pci108e ... 08 vs 108 ... typo?Bamboo
FYI according to the OP there actually aren't always 3 lines per record.Infralapsarian
Do you REALLY want a | stuck on the end of every output line?Infralapsarian
No need to have "|" at the end of linesZoo
B
1

Modifying one data set to have only 2 lines:

$ cat disk.dat
0. ct1d0 <INTEL-ADDPF2KX076T9S-2CV1-6.19TB>
   /pci@4,0/pci8086,347c@4/e,487c@0/disk@1
   /dev/chassis/SYS/DBP/HDD0/NVME/disk
1. c2t1d0 <INTEL-ADDPF2KX076T9S-2CV1-6.19TB>
   /pci@4,0/pci8086,347d@5/apci108e,487c@0/disk@1
   /dev/chassis/DBP/HDD1/NVME/disk
2. c3t0d0 <ATA-Min_5300_MAAAD-D3MU-223.57GB>
   /pci@0,0/pci8e,4872@17/disk@0,0
3. c4t2d0 <ATA-Min_5300_MTFD-D3MU-223.57GB>
   /pci@0,0/pci08e,4872@17/disk@2,0
   /dev/chassis/SYS/MB/SSDR0/SSD1/disk

Extending OP's current code:

cat disk.dat | tr -s " " | tr "\n" "|" | sed -E "s/\|([0-9])/\|\n\1/g; s/$/\n/"

Where:

  • the 1st half of the sed script places a \n between a pipe (|) and a number ([0-9])
  • the 2nd half of the sed script adds a \n at the end of the line

An alternative awk idea:

awk -F'.' '                                        # input field delimiter is a period
           { sub(/[[:space:]]+/,"",$1) }           # remove leading white space from 1st field
($1+0)==$1 { if (NR>1) print ""; pfx="" }          # if 1st field is numeric; if beyond 1st row then terminate previous line of output; reset prefix to empty string
           { printf "%s%s|", pfx, $0; pfx=" " }    # print prefix plus current line; reset prefix to a single space
END        { if (NR>=1) print "" }                 # if we had at least one row of input then terminate previous line of output
' disk.dat

Both of these generate:

0. ct1d0 <INTEL-ADDPF2KX076T9S-2CV1-6.19TB>| /pci@4,0/pci8086,347c@4/e,487c@0/disk@1| /dev/chassis/SYS/DBP/HDD0/NVME/disk|
1. c2t1d0 <INTEL-ADDPF2KX076T9S-2CV1-6.19TB>| /pci@4,0/pci8086,347d@5/apci108e,487c@0/disk@1| /dev/chassis/DBP/HDD1/NVME/disk|
2. c3t0d0 <ATA-Min_5300_MAAAD-D3MU-223.57GB>| /pci@0,0/pci8e,4872@17/disk@0,0|
3. c4t2d0 <ATA-Min_5300_MTFD-D3MU-223.57GB>| /pci@0,0/pci08e,4872@17/disk@2,0| /dev/chassis/SYS/MB/SSDR0/SSD1/disk|
Bamboo answered 24/3, 2023 at 13:29 Comment(0)
N
4

paste in serial mode could work if it is always 3 lines per group:

paste -sd'||\n'

Output:

0. ct1d0 <INTEL-ADDPF2KX076T9S-2CV1-6.19TB>|   /pci@4,0/pci8086,347c@4/e,487c@0/disk@1|   /dev/chassis/SYS/DBP/HDD0/NVME/disk
1. c2t1d0 <INTEL-ADDPF2KX076T9S-2CV1-6.19TB>|   /pci@4,0/pci8086,347d@5/apci108e,487c@0/disk@1|   /dev/chassis/DBP/HDD1/NVME/disk
2. c3t0d0 <ATA-Min_5300_MAAAD-D3MU-223.57GB>|   /pci@0,0/pci8e,4872@17/disk@0,0|   /dev/chassis/MB/SSDR0/SSD0/disk
3. c4t2d0 <ATA-Min_5300_MTFD-D3MU-223.57GB>|   /pci@0,0/pci08e,4872@17/disk@2,0|   /dev/chassis/SYS/MB/SSDR0/SSD1/disk
Novice answered 24/3, 2023 at 10:19 Comment(1)
pipe to tr -s ' ' to squeeze the whitespaceHarakiri
I
3

Using any awk and not relying on there being 3 lines per input record:

$ awk '/^[0-9]/{ if (NR>1) print rec; rec=$0; next} {sub(/ */,"| "); rec=rec $0} END{print rec}' file
0. ct1d0 <INTEL-ADDPF2KX076T9S-2CV1-6.19TB>| /pci@4,0/pci8086,347c@4/e,487c@0/disk@1| /dev/chassis/SYS/DBP/HDD0/NVME/disk
1. c2t1d0 <INTEL-ADDPF2KX076T9S-2CV1-6.19TB>| /pci@4,0/pci8086,347d@5/apci108e,487c@0/disk@1| /dev/chassis/DBP/HDD1/NVME/disk
2. c3t0d0 <ATA-Min_5300_MAAAD-D3MU-223.57GB>| /pci@0,0/pci8e,4872@17/disk@0,0| /dev/chassis/MB/SSDR0/SSD0/disk
3. c4t2d0 <ATA-Min_5300_MTFD-D3MU-223.57GB>| /pci@0,0/pci08e,4872@17/disk@2,0| /dev/chassis/SYS/MB/SSDR0/SSD1/disk

If you really do want a | added the the end of every output line then just change each print rec to print rec"|".

Infralapsarian answered 24/3, 2023 at 16:27 Comment(0)
Z
2

With GNU awk:

$ awk '/^\s/{r=r "| " $0;next} NR!=1{print r "|"} {r=$0} END{print r "|"}' data.txt
0. ct1d0 <INTEL-ADDPF2KX076T9S-2CV1-6.19TB>|    /pci@4,0/pci8086,347c@4/e,487c@0/disk@1|    /dev/chassis/SYS/DBP/HDD0/NVME/disk|
1. c2t1d0 <INTEL-ADDPF2KX076T9S-2CV1-6.19TB>|    /pci@4,0/pci8086,347d@5/apci108e,487c@0/disk@1|    /dev/chassis/DBP/HDD1/NVME/disk|
2. c3t0d0 <ATA-Min_5300_MAAAD-D3MU-223.57GB>|    /pci@0,0/pci8e,4872@17/disk@0,0|    /dev/chassis/MB/SSDR0/SSD0/disk|
3. c4t2d0 <ATA-Min_5300_MTFD-D3MU-223.57GB>|    /pci@0,0/pci08e,4872@17/disk@2,0|    /dev/chassis/SYS/MB/SSDR0/SSD1/disk|

With GNU sed:

sed -zE 's/\n\s+/| /g;s/(\n|$)/|&/g' data.txt
0. ct1d0 <INTEL-ADDPF2KX076T9S-2CV1-6.19TB>| /pci@4,0/pci8086,347c@4/e,487c@0/disk@1| /dev/chassis/SYS/DBP/HDD0/NVME/disk|
1. c2t1d0 <INTEL-ADDPF2KX076T9S-2CV1-6.19TB>| /pci@4,0/pci8086,347d@5/apci108e,487c@0/disk@1| /dev/chassis/DBP/HDD1/NVME/disk|
2. c3t0d0 <ATA-Min_5300_MAAAD-D3MU-223.57GB>| /pci@0,0/pci8e,4872@17/disk@0,0| /dev/chassis/MB/SSDR0/SSD0/disk|
3. c4t2d0 <ATA-Min_5300_MTFD-D3MU-223.57GB>| /pci@0,0/pci08e,4872@17/disk@2,0| /dev/chassis/SYS/MB/SSDR0/SSD1/disk|

-z to process the whole input as a single line, -E for extended regex. s/\n\s+/| /g to replace newlines followed by one or more spaces by | . s/(\n|$)/|&/g to insert a | before all remaining newlines or the end of file.

Zoonosis answered 24/3, 2023 at 14:0 Comment(0)
G
2

With GNU awk please try following awk code. Written and tested in shown samples only.

awk -v RS='(^|\n)[0-9]+\\.' -v OFS="| " '
rt{
  sub(/^\n/,"",RT)
  $1=$1
  print rt " " $0,_
}
{ rt=RT }
'  Input_file
Groggy answered 24/3, 2023 at 15:40 Comment(0)
R
1

Using GNU sed

$ sed -Ez ':a;s/([0-9]+\.[^\n]*)\n +/\1| /;ta;s/\n|$/|&/g' input_file
0. ct1d0 <INTEL-ADDPF2KX076T9S-2CV1-6.19TB>| /pci@4,0/pci8086,347c@4/e,487c@0/disk@1| /dev/chassis/SYS/DBP/HDD0/NVME/disk|
1. c2t1d0 <INTEL-ADDPF2KX076T9S-2CV1-6.19TB>| /pci@4,0/pci8086,347d@5/apci108e,487c@0/disk@1| /dev/chassis/DBP/HDD1/NVME/disk|
2. c3t0d0 <ATA-Min_5300_MAAAD-D3MU-223.57GB>| /pci@0,0/pci8e,4872@17/disk@0,0| /dev/chassis/MB/SSDR0/SSD0/disk|
3. c4t2d0 <ATA-Min_5300_MTFD-D3MU-223.57GB>| /pci@0,0/pci08e,4872@17/disk@2,0| /dev/chassis/SYS/MB/SSDR0/SSD1/disk|
Rousseau answered 24/3, 2023 at 13:10 Comment(0)
B
1

Modifying one data set to have only 2 lines:

$ cat disk.dat
0. ct1d0 <INTEL-ADDPF2KX076T9S-2CV1-6.19TB>
   /pci@4,0/pci8086,347c@4/e,487c@0/disk@1
   /dev/chassis/SYS/DBP/HDD0/NVME/disk
1. c2t1d0 <INTEL-ADDPF2KX076T9S-2CV1-6.19TB>
   /pci@4,0/pci8086,347d@5/apci108e,487c@0/disk@1
   /dev/chassis/DBP/HDD1/NVME/disk
2. c3t0d0 <ATA-Min_5300_MAAAD-D3MU-223.57GB>
   /pci@0,0/pci8e,4872@17/disk@0,0
3. c4t2d0 <ATA-Min_5300_MTFD-D3MU-223.57GB>
   /pci@0,0/pci08e,4872@17/disk@2,0
   /dev/chassis/SYS/MB/SSDR0/SSD1/disk

Extending OP's current code:

cat disk.dat | tr -s " " | tr "\n" "|" | sed -E "s/\|([0-9])/\|\n\1/g; s/$/\n/"

Where:

  • the 1st half of the sed script places a \n between a pipe (|) and a number ([0-9])
  • the 2nd half of the sed script adds a \n at the end of the line

An alternative awk idea:

awk -F'.' '                                        # input field delimiter is a period
           { sub(/[[:space:]]+/,"",$1) }           # remove leading white space from 1st field
($1+0)==$1 { if (NR>1) print ""; pfx="" }          # if 1st field is numeric; if beyond 1st row then terminate previous line of output; reset prefix to empty string
           { printf "%s%s|", pfx, $0; pfx=" " }    # print prefix plus current line; reset prefix to a single space
END        { if (NR>=1) print "" }                 # if we had at least one row of input then terminate previous line of output
' disk.dat

Both of these generate:

0. ct1d0 <INTEL-ADDPF2KX076T9S-2CV1-6.19TB>| /pci@4,0/pci8086,347c@4/e,487c@0/disk@1| /dev/chassis/SYS/DBP/HDD0/NVME/disk|
1. c2t1d0 <INTEL-ADDPF2KX076T9S-2CV1-6.19TB>| /pci@4,0/pci8086,347d@5/apci108e,487c@0/disk@1| /dev/chassis/DBP/HDD1/NVME/disk|
2. c3t0d0 <ATA-Min_5300_MAAAD-D3MU-223.57GB>| /pci@0,0/pci8e,4872@17/disk@0,0|
3. c4t2d0 <ATA-Min_5300_MTFD-D3MU-223.57GB>| /pci@0,0/pci08e,4872@17/disk@2,0| /dev/chassis/SYS/MB/SSDR0/SSD1/disk|
Bamboo answered 24/3, 2023 at 13:29 Comment(0)
A
1

This might work for you (GNU sed):

sed -E '/^\S/{:a;x;1!s/\n(\s)+|$/|\2/gp;d};H;$!d;ba' file

There will be two conditions; where a line does not start with spaces or it does.

If a line does not begin with spaces (new record):

  1. Switch to the hold space
  2. If it is not the first line, replace all newlines (followed by spaces) with | and print the result.
  3. Delete the result.

If the line starts with spaces (mid record):

  1. Append the current line to the hold space
  2. Delete the line if it is not the last line
  3. Otherwise, jump back and process as if it were a new record.

N.B. That each time a new record is encountered, the previously stored record is processed and printed. Also the symmetry between first and last and the asymmetry of the use of x and H.


Alternative:

sed -zE 's/\n\s+/| /g;s/.*/&|/gm' file

N.B. A good example of wholemeal programming. The first substitution reduces all lines to separate records. The second substitution appends | to each record.

Aurum answered 25/3, 2023 at 11:43 Comment(0)
I
0

You can use awk with modulo operator. In a condition: if the current line is not divisible by 3 then the pipe symbol "|" is added, otherwise the new line is added:

awk 'BEGIN{ ORS=""; }{printf "%s%s", $0,(NR%3?"|":"\n");}' actual_output
Insulting answered 24/3, 2023 at 11:6 Comment(2)
Sometimes, Output may not contain 3 lines in a group. It may be 2 or 3 and the only common thing I found is the starting number pattern like "0." or "1." or "2."Zoo
@Zoo you've GOT to show stuff like that in the sample input/output in your question as it's a huge detail that greatly impacts potential solution.Infralapsarian
S
0

There is always Perl:

perl -0777 -nE 'for $_ (split(/(?<!\A)(?=^\d+\.\h+)/m)) {s/\R\h*/| /g; say}' file

Or an awk:

awk '/^[0-9]+\.[ \t]+/{
    if (s) print s
    s=$0 "|"
    next
}
{sub(/^[ \t]+/," ")
s=s $0 "|"
}
END{print s}
' file 

Either prints:

0. ct1d0 <INTEL-ADDPF2KX076T9S-2CV1-6.19TB>| /pci@4,0/pci8086,347c@4/e,487c@0/disk@1| /dev/chassis/SYS/DBP/HDD0/NVME/disk| 
1. c2t1d0 <INTEL-ADDPF2KX076T9S-2CV1-6.19TB>| /pci@4,0/pci8086,347d@5/apci108e,487c@0/disk@1| /dev/chassis/DBP/HDD1/NVME/disk| 
2. c3t0d0 <ATA-Min_5300_MAAAD-D3MU-223.57GB>| /pci@0,0/pci8e,4872@17/disk@0,0| /dev/chassis/MB/SSDR0/SSD0/disk| 
3. c4t2d0 <ATA-Min_5300_MTFD-D3MU-223.57GB>| /pci@0,0/pci08e,4872@17/disk@2,0| /dev/chassis/SYS/MB/SSDR0/SSD1/disk|
Sizzle answered 25/3, 2023 at 19:15 Comment(0)
F
0

with ANY awk :

awk 'ORS = NR % 3 ? "|" : "\n"'
0. ct1d0 <INTEL-ADDPF2KX076T9S-2CV1-6.19TB>|   /pci@4,0/pci8086,347c@4/e,487c@0/disk@1|   /dev/chassis/SYS/DBP/HDD0/NVME/disk
1. c2t1d0 <INTEL-ADDPF2KX076T9S-2CV1-6.19TB>|   /pci@4,0/pci8086,347d@5/apci108e,487c@0/disk@1|   /dev/chassis/DBP/HDD1/NVME/disk
2. c3t0d0 <ATA-Min_5300_MAAAD-D3MU-223.57GB>|   /pci@0,0/pci8e,4872@17/disk@0,0|   /dev/chassis/MB/SSDR0/SSD0/disk
3. c4t2d0 <ATA-Min_5300_MTFD-D3MU-223.57GB>|   /pci@0,0/pci08e,4872@17/disk@2,0|   /dev/chassis/SYS/MB/SSDR0/SSD1/disk
Fellner answered 25/3, 2023 at 19:52 Comment(0)
E
0
awk '
    /^[0-9]*\./{
        printf "%s|", (NR==1 ? $0 : "\n"$0); next
    }
    {
        sub(/^ */," "); printf "%s|", $0
    }
    END{printf "\n"}
' file

0. ct1d0 <INTEL-ADDPF2KX076T9S-2CV1-6.19TB>| /pci@4,0/pci8086,347c@4/e,487c@0/disk@1| /dev/chassis/SYS/DBP/HDD0/NVME/disk|
1. c2t1d0 <INTEL-ADDPF2KX076T9S-2CV1-6.19TB>| /pci@4,0/pci8086,347d@5/apci108e,487c@0/disk@1| /dev/chassis/DBP/HDD1/NVME/disk|
2. c3t0d0 <ATA-Min_5300_MAAAD-D3MU-223.57GB>| /pci@0,0/pci8e,4872@17/disk@0,0| /dev/chassis/MB/SSDR0/SSD0/disk|
3. c4t2d0 <ATA-Min_5300_MTFD-D3MU-223.57GB>| /pci@0,0/pci08e,4872@17/disk@2,0| /dev/chassis/SYS/MB/SSDR0/SSD1/disK|
Endanger answered 25/3, 2023 at 20:32 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.