why limit yourself to new line \n being the RS ? Maybe something like this :
- \056 is the period. \040 is space. i'll add the + in case there have
been legacy practices of typing 2 spaces after each sentence and u
wanna standardize it.
- I presume question mark \044 is more frequent
than exclamation \041. Only reason why i'm using all octal is that
all those are ones that can wreck havor on a terminal when just a
slight chance of didn't quoting and escaping properly.
- Unlike FS or RS, OFS/ORS are constant strings (are they?), so typing in the characters will be safe.
- the periods are taken care of by RS. No need special processing. So if the row contains neither ? nor ! , just print it as is, and move on (it'll handle the ". \n" )
.
mawk 'BEGIN { RS = "[\056][\040]+" ; ORS = ". \n";
FS = "[\044][\040]+"; OFS = "? \n"; }
($0 !~ /[\041\044]/) {
print; next; }
/[\041]/ {
gsub("[\041][\040]+", "\041 \n"); }
( NF==1 ) || ( $1=$1 )'
As fast as mawk is, a gsub ( ) or $1=$1 still costs money, so skip the costly parts unless it actually has a ? or ! mark.
Last line is the fun trick, done *outside the brace brackets. You've already done the ! the line before, so if no ? found (aka NF is 1), then that one evaluates true, which awk will short circuit and not execute part 2 , simply print.
But if you've found any ? marks, the assignment of $1=$1 will re-arrange them in new order, and because it's an assignment operation not equality-compare, it always come back successful if the assignment itself didn't fail, which will also serve as it self's always-true flag to print towards the end.
I used this script :
- you should read why-is-using-a-shell-loop-to-process-text-considered-bad-practice. – Pentagrid