Print first few and last few lines of file through a pipe with "..." in the middle

Asked 7/12, 2021 at 21:3 Answered 8/12, 2021 at 16:36

Problem Description

This is my file

I would like to send the cat output of this file through a pipe and receive this

% cat file | some_command
1
2
...
9
10

Attempted solutions

Here are some solutions I've tried, with their output

% cat temp | (head -n2 && echo '...' && tail -n2)
1
2
...

% cat temp | tee >(head -n3) >(tail -n3) >/dev/null
1
2
3
8
9
10
# I don't know how to get the ...

% cat temp | sed -e 1b -e '$!d'
1
10

% cat temp | awk 'NR==1;END{print}'
1
10
# Can only get 2 lines

Leanora answered 7/12, 2021 at 21:3 Comment(0)

An awk:

awk -v head=2 -v tail=2 'FNR==NR && FNR<=head
FNR==NR && cnt++==head {print "..."}
NR>FNR && FNR>(cnt-tail)' file file

Or if a single pass is important (and memory allows), you can use perl:

perl -0777 -lanE 'BEGIN{$head=2; $tail=2;}
END{say join("\n", @F[0..$head-1],("..."),@F[-$tail..-1]);}' file

Or, an awk that is one pass:

awk -v head=2 -v tail=2 'FNR<=head
{lines[FNR]=$0}
END{
    print "..."
    for (i=FNR-tail+1; i<=FNR; i++) print lines[i]
}' file

Or, nothing wrong with being a caveman direct like:

head -2 file; echo "..."; tail -2 file

Any of these prints:

1
2
...
9
10

It terms of efficiency, here are some stats.

For small files (ie, less than 10 MB or so) all these are less than 1 second and the 'caveman' approach is 2 ms.

I then created a 1.1 GB file with seq 99999999 >file

The two pass awk: 50 secs
One pass perl: 10 seconds
One pass awk: 29 seconds
'Caveman': 2 MS

Betaine answered 7/12, 2021 at 21:13 Comment(7)

Now handle cases where lines count is less than head and tail, and case when head and tail lines intersects ^^ – Pecan 7/12, 2021 at 22:42

They all handle overlapping head and tail. – Betaine 8/12, 2021 at 1:20

Especially with large files, the "caveman" approach is the best, because it's the only one that won't read the whole file (head stops after a few lines, and tail seeks to the end and works its way back). Try the perl version with a file that's larger than your available RAM and you're in for a surprise. – Seleta 8/12, 2021 at 9:42

@dawg, I think that by overlapping head and tail, they mean e.g. a case where the file has only three lines. Given three lines 1, 2, and 3, that last head+tail solution would print 1, 2, ..., 2, 3, which is probably technically correct at least for some phrasings of the problem, but it might also be considered misleading. Looks like the others print the same. – Causal 8/12, 2021 at 10:41

@ilkkachu: think the case of three line file is at best ambiguous what the 'correct result' is. I think 1\n2\n...\n2\n3 is most correct in my view. What do you think is a better result for that? – Betaine 8/12, 2021 at 13:42

@GuntramBlohm: Agreed and I added a note to that effect. The two pass awk is reasonable as well in that situation. – Betaine 8/12, 2021 at 13:45

@dawg, in this narrow context of this Q, we don't know, since the post doesn't say. But more generally, 1\n2\n...\n2\n3 implies that there's something removed in the part where it says ..., and that's not true in the case of a three or four-line file. It would make more sense to me to print a three line file just as-is, without the ellipsis. In general. Of course we don't know what they're doing in this particular case, if there's a use-case that requires/expects all four lines and the ..., and where the doubled 2 line makes sense, then that needs to be done. – Causal 8/12, 2021 at 13:56

You may consider this awk solution:

awk -v top=2 -v bot=2 'FNR == NR {++n; next} FNR <= top || FNR > n-top; FNR == top+1 {print "..."}' file{,}

1
2
...
9
10

Megdal answered 7/12, 2021 at 21:12 Comment(0)

Two single pass sed solutions:

sed '1,2b
     3c\
...
     N
     $!D'

and

sed '1,2b
     3c\
...
     $!{h;d;}
     H;g'

Samarskite answered 7/12, 2021 at 22:4 Comment(1)

How does this work? It would be more helpful for future readers with related problems (like a count other than 2) if you commented the code and said what you're doing with the pattern / hold space. – Merger 8/12, 2021 at 8:36

Assumptions:

as OP has stated, a solution must be able to work with a stream from a pipe
the total number of lines coming from the stream is unknown
if the total number of lines is less than the sum of the head/tail offsets then we'll print duplicate lines (we can add more logic if OP updates the question with more details on how to address this situation)

A single-pass awk solution that implements a queue in awk to keep track of the most recent N lines; the queue allows us to limit awk's memory usage to just N lines (as opposed to loading the entire input stream into memory, which could be problematic when processing a large volume of lines/data on a machine with limited available memory):

h=2 t=3

cat temp | awk -v head=${h} -v tail=${t} '
    { if (NR <= head) print $0
      lines[NR % tail] = $0
    }

END { print "..."

      if (NR < tail) i=0
      else           i=NR

      do { i=(i+1)%tail
           print lines[i]
         } while (i != (NR % tail) )
    }'

This generates:

1
2
...
8
9
10

Demonstrating the overlap issue:

$ cat temp4
1
2
3
4

With h=3;t=3 the proposed awk code generates:

$ cat temp4 | awk -v head=${h} -v tail=${t} '...'
1
2
3
...
2
3
4

Whether or not this is the 'correct' output will depend on OP's requirements.

Catt answered 8/12, 2021 at 16:36 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Problem Description

Attempted solutions

Recommended topics

Hot tags