Sed/Awk - remove blankspaces / join lines in ldif dump

Asked 30/10, 2012 at 12:46 Answered 31/10, 2012 at 15:47

I got some entries in my ldif file that makes my dump bad for next import.

sambaPasswordHistory: 712BC301C488FD2651BEF5AA11899950547B9ED3C059FF83CE39049B
 BAEECB31692629A94A3C1F4737E3EA854C001704793DB9A67EB977563CE601DF98E7E23C2851F
 082D3D695C8655378629DCCDAF125ACA63141B361190ABC750AF403FDEF000000000000000000
 00000000000000000000000000000000000000000000000000000000000000000000000000000
 00000000000000000000000000000000000000000000000000000000000000000000000000000
 00000000000000000000000000000000000000000000000000000000000000000000000000000
 00000000000000000000000000000000000000000000000000000000000000000000000000000
 00000000000000000000000000000000000000000000000000000000000000000000000000000
 00000000000000000000000000000000000000000000000000000000000000000000000000000
 00000000000000000000000000000000000000000000000000000000000000000000000000000
 00000000000000000000000000000000000000000000000000000000000000000000000000000
 00000000000000000000000000000000000000000000000000000000000000000000000000000
 000000000000000000000000000000000000000000000000000000000
homeDirectory: /home_nfs/

How can I make using sed/awk/etc to change it to

sambaPasswordHistory: 712BC301C488FD2651BEF5AA11899950547B9ED3C059FF83CE39049BBAEECB31692629A94A3C1F4737E3EA854C001704793DB9A67EB977563CE601DF98E7E23C2851F082D3D695C8655378629DCCDAF125ACA63141B361190ABC750AF403FDEF000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
homeDirectory: /home_nfs/

Aka keep everything in one line

Redcap answered 30/10, 2012 at 12:46 Comment(0)

One way using GNU sed:

sed -n 'H; ${ x; s/\n//; s/\n //g; p}' file.txt

Result:

sambaPasswordHistory: 712BC301C488FD2651BEF5AA11899950547B9ED3C059FF83CE39049BBAEECB31692629A94A3C1F4737E3EA854C001704793DB9A67EB977563CE601DF98E7E23C2851F082D3D695C8655378629DCCDAF125ACA63141B361190ABC750AF403FDEF000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
homeDirectory: /home_nfs/

Kozlowski answered 30/10, 2012 at 13:2 Comment(5)

+1. This will also work in non-GNU sed if you put a semi-colon between the p and the }. :) – Mclellan 30/10, 2012 at 13:13

I'd also like to print the length of each line before printing it - how would I tweak the sed command to do that? My point is you should not use sed for anything other than simple substitutions on a single line as even the simplest requirements change usually means a total re-write of the script using some other arcane combination of single letters and punctuation marks. Just use awk... – Sturges 30/10, 2012 at 13:20

@EdMorton - I've run into many problems that could be solved in awk but not in sed. But I've also run into problems that were completely impractical in awk, yet easy to solve in sed. Just because you're really skilled with a hammer doesn't mean you should forsake the needle nosed pliers. Every tool has its place. – Mclellan 30/10, 2012 at 17:3

@Mclellan - I completely agree on using the right tool for the job. For simple substitutions on a single line you should use sed. For anything else, though, you should use awk or perl or ruby or... Basically, if you find yourself using more than "s" and "g" commands in a sed script you should reconsider. – Sturges 30/10, 2012 at 17:35

I think this approach might be a little bit heavy, if the file is large, as it reads everything into the buffer – Bibliomania 5/6, 2018 at 22:14

$ cat file
sambaPasswordHistory: abc
 def
 12345
 67
homeDirectory: /home_nfs/
$
$ awk 'NR>1 && !sub(/^ /,""){print s; s=""} {s = s $0} END{print s}' file
sambaPasswordHistory: abcdef1234567
homeDirectory: /home_nfs/

Sturges answered 30/10, 2012 at 13:1 Comment(2)

+1. Nice. Warning about shells though - this will work in Bourne and bash, but in csh and tcsh the ! will be grabbed as an attempted history reference. – Mclellan 30/10, 2012 at 13:12

Just one more reason not to write scripts in [t]csh! Thanks for the tip @ghoti. – Sturges 30/10, 2012 at 13:16

This might work for you (GNU sed):

sed ':a;N;s/\n //;ta;P;D' file

Open a window of two lines. Remove a newline followed by a space and repeat the pattern fails. Finally print the first line and if there is still a second line in the pattern space, repeat.

Woothen answered 30/10, 2012 at 14:58 Comment(3)

Also works only for the first attribute only, though it matches the question, but doesn't process the whole file correctly if there are multiple occasions of multiline attributes. – Bibliomania 6/6, 2018 at 5:52

@PhilippGrigoryev thank you, I've removed the original solution and replaced it with a better one. – Woothen 6/6, 2018 at 12:31

Perfect, I like your one, works super fast compared to the one where all stored in Hold place. Thank you! – Bibliomania 6/6, 2018 at 19:22

One way to do using sed:

sed ':a;$!N;s/\n //;ta'  file

sed joins(N) every line other than the last line($!). After joining, the newline followed by space(\n ) is removed. 'ta' is to loop to the branch 'a' till the substitution fails.

Lalittah answered 30/10, 2012 at 12:52 Comment(2)

This doesn't work in FreeBSD or OSX. Apparently, you are using GNU sed. Since the OP didn't mention a platform, you should qualify your answer as being platform-specific, or modify it so that it will work in other environments than yours. – Mclellan 30/10, 2012 at 13:8

it only replaces some entries in the file. If you put the second attribute also with a continuation on the next line, it will not join it – Bibliomania 6/6, 2018 at 5:50

If the only occurrences of \n, i.e. newline followed by space, are where the lines need to be joined, you could use bbe like this:

<file bbe -e 's/\n //'

Lenticel answered 30/10, 2012 at 13:29 Comment(0)

Another solution:

awk 'ORS="";!/home/{$1=$1; print}{RS="\n"}END{print "\n" $0 "\n"}' file

Balaam answered 31/10, 2012 at 15:47 Comment(0)

Recommended topics

Hot tags