Remove lines from a file corresponding to blank lines of another file
Asked Answered
R

4

5

I have two files with same amount of rows and columns. Delimited with ;. Example;

file_a:

1;1;1;1;1
2;2;2;2;2
3;3;3;3;3
4;4;4;4;4

file_b:

A;A;A;A;A
B;B;;;B
;;;;
D;D;D;D;D

Ignoring delimiters, line 3 is empty from file_b. So I want to remove line 3 from file_a as well, before command;

paste -d ';' file_a file_b.

in order to have an output like this:

1;1;1;1;1;A;A;A;A;A
2;2;2;2;2;B;B;;;B
4;4;4;4;4;D;D;D;D;D

Edit: Number of columns is 93 and same for each row and for both files, so both files have exactly the same matrix of rows and columns.

Recurrence answered 24/9, 2020 at 7:38 Comment(0)
L
7

Could you please try following, written and tested with shown samples in GNU awk.

awk '
BEGIN{
  FS=OFS=";"
}
FNR==NR{
  arr[FNR]=$0
  next
}
!/^;+$/{
  print arr[FNR],$0
}
' file_a file_b

Explanation: Adding detailed explanation for above.

awk '                 ##Starting awk program from here.
BEGIN{                ##Starting BEGIN section from here.
  FS=OFS=";"          ##Setting field separator and output field separator as ; here.
}
FNR==NR{              ##Checking condition if FNR==NR which will be TRUE when file_a is being read.
  arr[FNR]=$0         ##Creating arr with index FNR and value is current line.
  next                ##next will skip all further statements from here.
}
!/^;+$/{              ##Checking condition if line NOT starting from ; till end then do following.
  print arr[FNR],$0   ##Printing arr with index of FNR and current line.
}
' file_a file_b       ##Mentioning Input_file names here.
Lithopone answered 24/9, 2020 at 7:42 Comment(0)
F
5

Since you mention that both files have same number of lines, getline would fit here:

$ awk '(getline line < "f2")==1 && line ~ /[^;]/' f1
1;1;1;1;1
2;2;2;2;2
4;4;4;4;4

And you can do the paste functionality within awk as well:

$ awk '(getline line < "f2")==1 && line ~ /[^;]/{print $0 ";" line}' f1
1;1;1;1;1;A;A;A;A;A
2;2;2;2;2;B;B;;;B
4;4;4;4;4;D;D;D;D;D

The return value of getline is 1 if line was read successfully. line ~ /[^;] checks if the line contains any non ; character. If both conditions are satisfied, you can then print the required results.

Friseur answered 24/9, 2020 at 7:48 Comment(0)
G
3

Basically a modification of @RavinderSingh13's solution but I only store the NR's of the empty records:

$ awk '
NR==FNR {            # process the b file
    if($0~/^;+$/)    # when empty record met
        a[NR]        # hash the record number NR
    next
}
!(FNR in a)          # print non-empty matches of a file
' fileb filea

Output:

1;1;1;1;1
2;2;2;2;2
4;4;4;4;4
Goodkin answered 24/9, 2020 at 7:55 Comment(0)
N
3

Filtering after paste is easier. Assuming the format of the input lines to exclude is exactly as shown in the question, you can filter the output of paste with a grep pattern anchored to the end of the line. (5 empty fields at the end of the line)

paste -d ';' file_a file_b | grep -v ';;;;;$'

With the input files shown in the question, this prints exactly the requested output.

Edit:
To fulfill an additional requirement from a comment, the grep command can be modified to specify the number of semicolons corresponding to the number of empty columns. For different input files, simply change the number 5 accordingly.

paste -d ';' file_a file_b | grep -v ';\{5\}$'

If the number of columns is 93 as now specified in the question, the command would be

paste -d ';' file_a file_b | grep -v ';\{93\}$'

Edit2:
You can also get the required number of semicolons from the first line of file_b

SEMICOLONS=$(head -1 file_b | sed 's/[^;]*//g')
paste -d ';' file_a file_b | grep -v ";$SEMICOLONS"'$'

or combined to

paste -d ';' file_a file_b | grep -v ';'$(head -1 file_b | sed 's/[^;]*//g')'$'
Nuris answered 24/9, 2020 at 8:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.