Removing trailing / starting newlines with sed, awk, tr, and friends
Asked Answered
O

19

51

I would like to remove all of the empty lines from a file, but only when they are at the end/start of a file (that is, if there are no non-empty lines before them, at the start; and if there are no non-empty lines after them, at the end.)

Is this possible outside of a fully-featured scripting language like Perl or Ruby? I’d prefer to do this with sed or awk if possible. Basically, any light-weight and widely available UNIX-y tool would be fine, especially one I can learn more about quickly (Perl, thus, not included.)

Octosyllabic answered 9/9, 2011 at 9:20 Comment(0)
P
74

From Useful one-line scripts for sed:

# Delete all leading blank lines at top of file (only).
sed '/./,$!d' file

# Delete all trailing blank lines at end of file (only).
sed -e :a -e '/^\n*$/{$d;N;};/\n$/ba' file

Therefore, to remove both leading and trailing blank lines from a file, you can combine the above commands into:

sed -e :a -e '/./,$!d;/^\n*$/{$d;N;};/\n$/ba' file
Patmos answered 9/9, 2011 at 9:52 Comment(4)
According to the note at that site, the trailing-blank-line script won't work for gsed 3.02.*. This one will work: sed -e :a -e '/^\n*$/{$d;N;ba' -e '}'Randirandie
If it fails, try to do dos2unix before. This reference is such a useful complete set of sed examples.Longsufferance
This isn't appropriate for large filesNob
It will not remove white spaces. To remove leading blank lines or white spaces, use: sed '/\S/,$!d'Field
S
17

So I'm going to borrow part of @dogbane's answer for this, since that sed line for removing the leading blank lines is so short...

tac is part of coreutils, and reverses a file. So do it twice:

tac file | sed -e '/./,$!d' | tac | sed -e '/./,$!d'

It's certainly not the most efficient, but unless you need efficiency, I find it more readable than everything else so far.

Sedimentology answered 27/5, 2014 at 16:27 Comment(2)
There's an edge case worth mentioning: if the file doesn't have a trailing \n, the last line won't be handled correctly: try tac <(printf 'a\nb'). Arguably, this behavior is flawed; also affects tac's OSX equivalent, tail -r.Tachylyte
paste can solve this edge case. I've added an answer below showing how.Veracity
T
8

here's a one-pass solution in awk: it does not start printing until it sees a non-empty line and when it sees an empty line, it remembers it until the next non-empty line

awk '
    /[[:graph:]]/ {
        # a non-empty line
        # set the flag to begin printing lines
        p=1      
        # print the accumulated "interior" empty lines 
        for (i=1; i<=n; i++) print ""
        n=0
        # then print this line
        print
    }
    p && /^[[:space:]]*$/ {
        # a potentially "interior" empty line. remember it.
        n++
    }
' filename

Note, due to the mechanism I'm using to consider empty/non-empty lines (with [[:graph:]] and /^[[:space:]]*$/), interior lines with only whitespace will be truncated to become truly empty.

Tercentenary answered 9/9, 2011 at 14:42 Comment(1)
+1 for a single-pass, single-utility solution that is also memory-efficient (though, as noted, its behavior differs slightly from what was asked for).Tachylyte
A
8

As mentioned in another answer, tac is part of coreutils, and reverses a file. Combining the idea of doing it twice with the fact that command substitution will strip trailing new lines, we get

echo "$(echo "$(tac "$filename")" | tac)"

which doesn't depend on sed. You can use echo -n to strip the remaining trailing newline off.

Amylopsin answered 7/7, 2014 at 12:35 Comment(4)
+1 for (relative) simplicity (albeit at the expense of efficiency); OSX version (where tac is not available by default): echo "$(echo "$(tail -r "$filename")" | tail -r)" I ran tests to compare relative execution speed with a 1-million-lines file for several answers (didn't pay attention to memory use); earlier means faster: OSX 10.10: sed (dogbane) < bash (mklement0) < awk (glenn jackman) < tac (tail -r; you) Ubuntu 14.04: sed (dogbane) < tac (you) < bash (mklement0) < awk (glenn jackman) One interesting difference is that tac is much faster on Ubuntu than on OSX.Tachylyte
There's an edge case worth mentioning: if the file doesn't have a trailing \n, the last line won't be handled correctly: try echo "$(echo "$(printf 'a\nb' | tac)" | tac)". This is inherent in the - arguably flawed - behavior of tac (and also tail -r on OSX) with input not ending in \n.Tachylyte
Using echo "$(echo "$(cat "$filename")" | tac)" | tac fixes the edge case that @Tachylyte mentioned.Tamalatamale
paste can also solve this edge case. I've added an answer below showing how.Veracity
A
6

Here's an adapted sed version, which also considers "empty" those lines with just spaces and tabs on it.

sed -e :a -e '/[^[:blank:]]/,$!d; /^[[:space:]]*$/{ $d; N; ba' -e '}'

It's basically the accepted answer version (considering BryanH comment), but the dot . in the first command was changed to [^[:blank:]] (anything not blank) and the \n inside the second command address was changed to [[:space:]] to allow newlines, spaces an tabs.

An alternative version, without using the POSIX classes, but your sed must support inserting \t and \n inside […]. GNU sed does, BSD sed doesn't.

sed -e :a -e '/[^\t ]/,$!d; /^[\n\t ]*$/{ $d; N; ba' -e '}'

Testing:

prompt$ printf '\n \t \n\nfoo\n\nfoo\n\n \t \n\n' 



foo

foo



prompt$ printf '\n \t \n\nfoo\n\nfoo\n\n \t \n\n' | sed -n l
$
 \t $
$
foo$
$
foo$
$
 \t $
$
prompt$ printf '\n \t \n\nfoo\n\nfoo\n\n \t \n\n' | sed -e :a -e '/[^[:blank:]]/,$!d; /^[[:space:]]*$/{ $d; N; ba' -e '}'
foo

foo
prompt$
Alidia answered 5/3, 2015 at 14:58 Comment(0)
L
6

this can be solved easily with sed -z option

sed -rz 's/^\n+//; s/\n+$/\n/g' file
Hello

Welcome to
Unix and Linux
Lawrence answered 30/7, 2020 at 17:50 Comment(0)
Q
4

using awk:

awk '{a[NR]=$0;if($0 && !s)s=NR;}
    END{e=NR;
        for(i=NR;i>1;i--) 
            if(a[i]){ e=i; break; } 
        for(i=s;i<=e;i++)
            print a[i];}' yourFile
Quake answered 9/9, 2011 at 9:42 Comment(3)
I wonder if there’s a way to reduce/refactor that to handle it in one pass? (I’m not massively familiar with awk; I can read what you wrote, but I’m not sure how to refactor it.)Octosyllabic
basically this is an one-line command, the only dynamic part is 'yourFile', which is the filename you want to process. why you need reduce/refactor?Quake
Because it’s long and complex, even if it doesn’t need any newlines? Several for loops, multiple statements; unnecessary complexity. (=Octosyllabic
C
3

For an efficient non-recursive version of the trailing newlines strip (including "white" characters) I've developed this sed script.

sed -n '/^[[:space:]]*$/ !{x;/\n/{s/^\n//;p;s/.*//;};x;p;}; /^[[:space:]]*$/H'

It uses the hold buffer to store all blank lines and prints them only after it finds a non-blank line. Should someone want only the newlines, it's enough to get rid of the two [[:space:]]* parts:

sed -n '/^$/ !{x;/\n/{s/^\n//;p;s/.*//;};x;p;}; /^$/H'

I've tried a simple performance comparison with the well-known recursive script

sed -e :a -e '/^\n*$/{$d;N;};/\n$/ba'

on a 3MB file with 1MB of random blank lines around a random base64 text.

shuf -re 1 2 3 | tr -d "\n" | tr 123 " \t\n" | dd bs=1 count=1M > bigfile
base64 </dev/urandom | dd bs=1 count=1M >> bigfile
shuf -re 1 2 3 | tr -d "\n" | tr 123 " \t\n" | dd bs=1 count=1M >> bigfile

The streaming script took roughly 0.5 second to complete, the recursive didn't end after 15 minutes. Win :)

For completeness sake of the answer, the leading lines stripping sed script is already streaming fine. Use the most suitable for you.

sed '/[^[:blank:]]/,$!d'
sed '/./,$!d'
Craps answered 30/6, 2017 at 16:12 Comment(0)
L
2

@dogbane has a nice simple answer for removing leading empty lines. Here's a simple awk command which removes just the trailing lines. Use this with @dogbane's sed command to remove both leading and trailing blanks.

awk '{ LINES=LINES $0 "\n"; } /./ { printf "%s", LINES; LINES=""; }'

This is pretty simple in operation.

  • Add every line to a buffer as we read it.
  • For every line which contains a character, print the contents of the buffer and then clear it.

So the only things that get buffered and never displayed are any trailing blanks.

I used printf instead of print to avoid the automatic addition of a newline, since I'm using newlines to separate the lines in the buffer already.

Laurelaureano answered 30/1, 2015 at 9:0 Comment(0)
K
2

This AWK script will do the trick:

BEGIN {
    ne=0;
}

/^[[:space:]]*$/ {
    ne++;
}

/[^[:space:]]+/ {
    for(i=0; i < ne; i++)
        print "";
    ne=0;
    print
}

The idea is simple: empty lines do not get echoed immediately. Instead, we wait till we get a non-empty line, and only then we first echo out as much empty lines as seen before it, and only then echo out the new non-empty line.

Krenn answered 3/11, 2018 at 8:57 Comment(1)
This successfully removes trailing blank lines (including lines containing nothing but white space). However, it does not preserve white space in intermediate blank lines; these are truncated to empty lines. Example: $'a\n \nb' is transformed into $'a\n\nb'.Quietly
R
2
perl -0pe 's/^\n+|\n+(\n)$/\1/gs'
Ricardoricca answered 5/9, 2019 at 21:40 Comment(0)
Q
2

Here's an awk version that removes trailing blank lines (both empty lines and lines consisting of nothing but white space).

It is memory efficient; it does not read the entire file into memory.

awk '/^[[:space:]]*$/ {b=b $0 "\n"; next;} {printf "%s",b; b=""; print;}'

The b variable buffers up the blank lines; they get printed when a non-blank line is encountered. When EOF is encountered, they don't get printed. That's how it works.

If using gnu awk, [[:space:]] can be replaced with \s. (See full list of gawk-specific Regexp Operators.)

If you want to remove only those trailing lines that are empty, see @AndyMortimer's answer.

Quietly answered 30/4, 2020 at 5:53 Comment(0)
A
1

In bash, using cat, wc, grep, sed, tail and head:

# number of first line that contains non-empty character
i=`grep -n "^[^\B*]" <your_file> | sed -e 's/:.*//' | head -1`
# number of hte last one
j=`grep -n "^[^\B*]" <your_file> | sed -e 's/:.*//' | tail -1`
# overall number of lines:
k=`cat <your_file> | wc -l`
# how much empty lines at the end of file we have?
m=$(($k-$j))
# let strip last m lines!
cat <your_file> | head -n-$m
# now we have to strip first i lines and we are done 8-)
cat <your_file> | tail -n+$i

Man, it's definitely worth to learn "real" programming language to avoid that ugliness!

Ambler answered 9/9, 2011 at 9:36 Comment(7)
Well that part is easy enough with sed! Let me play with it, and try to get back here with a completed command. Thanks!Octosyllabic
Actually, that won’t work for the last lines, because it removes all newlines in the grep stage, thus throwing off the count at the end. /=Octosyllabic
Nope: after executing these commands you still have your original file. Second command prints all non-blanks preppenging with their line numbers. Thus you'll have number of last non-blank.Ambler
Ah! I misunderstood the operation of grep -n it seems. Yes!Octosyllabic
(Accepted, though I used a one-line variant without any shell-variables, instead expressing a bit more with the sed commands.)Octosyllabic
(Also, for what it’s worth; I know many ‘real language,’ not to mention having written a few thereof. They just weren’t appropriate for this task-space ;D)Octosyllabic
That's heavy-handed: 11 invocations of external utilities, and a bunch of subshells.Tachylyte
H
1

Using bash

$ filecontent=$(<file)
$ echo "${filecontent/$'\n'}"
Harald answered 9/9, 2011 at 9:38 Comment(3)
This only removes a single blank line from the start, and none from the end.Peppers
@me_and: While you're correct about only removing one empty line from the start, this actually does remove all trailing newlines, because command substitution ($(<file)) does that implicitly.Tachylyte
@mklement0: Huh, so it does. Learn a new thing every day!Peppers
T
0

A bash solution.

Note: Only useful if the file is small enough to be read into memory at once.

[[ $(<file) =~ ^$'\n'*(.*)$ ]] && echo "${BASH_REMATCH[1]}"
  • $(<file) reads the entire file and trims trailing newlines, because command substitution ($(....)) implicitly does that.
  • =~ is bash's regular-expression matching operator, and =~ ^$'\n'*(.*)$ optionally matches any leading newlines (greedily), and captures whatever comes after. Note the potentially confusing $'\n', which inserts a literal newline using ANSI C quoting, because escape sequence \n is not supported.
  • Note that this particular regex always matches, so the command after && is always executed.
  • Special array variable BASH_REMATCH rematch contains the results of the most recent regex match, and array element [1] contains what the (first and only) parenthesized subexpression (capture group) captured, which is the input string with any leading newlines stripped. The net effect is that ${BASH_REMATCH[1]} contains the input file content with both leading and trailing newlines stripped.
  • Note that printing with echo adds a single trailing newline. If you want to avoid that, use echo -n instead (or use the more portable printf '%s').
Tachylyte answered 7/7, 2014 at 13:30 Comment(0)
S
0

I'd like to introduce another variant for gawk v4.1+

result=($(gawk '
    BEGIN {
        lines_count         = 0;
        empty_lines_in_head = 0;
        empty_lines_in_tail = 0;
    }
    /[^[:space:]]/ {
        found_not_empty_line = 1;
        empty_lines_in_tail  = 0;
    }
    /^[[:space:]]*?$/ {
        if ( found_not_empty_line ) {
            empty_lines_in_tail ++;
        } else {
            empty_lines_in_head ++;
        }
    }
    {
        lines_count ++;
    }
    END {
        print (empty_lines_in_head " " empty_lines_in_tail " " lines_count);
    }
' "$file"))

empty_lines_in_head=${result[0]}
empty_lines_in_tail=${result[1]}
lines_count=${result[2]}

if [ $empty_lines_in_head -gt 0 ] || [ $empty_lines_in_tail -gt 0 ]; then
    echo "Removing whitespace from \"$file\""
    eval "gawk -i inplace '
        {
            if ( NR > $empty_lines_in_head && NR <= $(($lines_count - $empty_lines_in_tail)) ) {
                print
            }
        }
    ' \"$file\""
fi
Sinusitis answered 2/11, 2014 at 18:7 Comment(0)
B
0

Because I was writing a bash script anyway containing some functions, I found it convenient to write those:

function strip_leading_empty_lines()
{
    while read line; do
        if [ -n "$line" ]; then
            echo "$line"
            break
        fi
    done
    cat
}

function strip_trailing_empty_lines()
{
    acc=""
    while read line; do
        acc+="$line"$'\n'
        if [ -n "$line" ]; then
            echo -n "$acc"
            acc=""
        fi
    done
}

Bhagavadgita answered 22/6, 2021 at 11:24 Comment(0)
V
0

@mklement0 notes that @Izkata's answer has an issue when the last line doesn't end in a newline.

You can solve this problem using paste from coreutils. The following code works whether or not the last line ends in a newline.

sed '/\S/,$!d' | paste | tac | sed '/\S/,$!d' | tac

Example:

printf '\n\na\nb\nc' and printf '\n\na\nb\nc\n' piped to this code both give

a
b
c

The use of /\S/ means that lines with at least one non-white-space character are classed as not blank; all other leading and trailing lines are deleted. To delete empty lines only, use:

sed '/./,$!d' | paste | tac | sed '/./,$!d' | tac
Veracity answered 2/7, 2023 at 15:46 Comment(0)
G
0

this might not be fool-proof, but seems to kinda work :

 __=$'\n\nline 3\n\nline 5\n\nline 7\n\n'

 printf '%s' "$__" | gcat -b | gcat -n

 1  
 2  
 3       1  line 3
 4  
 5       2  line 5
 6  
 7       3  line 7
 8

mawk 'NF,EOF' RS='\n|[ \t-\r]+$'

 1       1  line 3
 2  
 3       2  line 5
 4  
 5       3  line 7
Genoa answered 3/7, 2023 at 0:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.