Identifying and removing null characters in UNIX

Asked 7/3, 2010 at 23:12 Answered 15/9, 2022 at 2:8

Solved unix shell null special-characters

120

I have a text file containing unwanted null characters (ASCII NUL, \0). When I try to view it in vi I see ^@ symbols, interleaved in normal text. How can I:

Identify which lines in the file contain null characters? I have tried grepping for \0 and \x0, but this did not work.
Remove the null characters? Running strings on the file cleaned it up, but I'm just wondering if this is the best way?

Flack answered 7/3, 2010 at 23:12 Comment(2)

This kind of question probably belongs to SuperUser.com – Peddling 8/3, 2010 at 7:15

In fact, this question is on superuser.com: superuser.com/questions/75130/how-to-remove-ths-symbol-with-vim – Musical 26/4, 2011 at 9:23

154

I’d use tr:

tr < file-with-nulls -d '\000' > file-without-nulls

If you are wondering if input redirection in the middle of the command arguments works, it does. Most shells will recognize and deal with I/O redirection (<, >, …) anywhere in the command line, actually.

Illaffected answered 7/3, 2010 at 23:14 Comment(17)

and a "diff file-with-nulls file-without-nulls" should show me which lines had null characters? It brings back a lot more than expected. – Flack 7/3, 2010 at 23:27

Actually, I believe it should be tr -d '\000' < file-with-nulls > file-without-nulls since < is part of the shell pipe functionality and not tr. – Bobettebobina 7/3, 2010 at 23:50

Most shells will recognize & deal with < or > anywhere in the argument string, actually. Surprised me too. – Lethargy 8/3, 2010 at 18:16

+1 For usage of input redirection instead of cat |. A fine, clean solution and it solved my problem. – Rezzani 13/2, 2014 at 7:14

This is an order of magnitude slower than sed for me – Yager 30/10, 2017 at 4:6

@Yager that's pretty interesting. I wonder what's behind that; buffering? – Illaffected 30/10, 2017 at 13:0

@Illaffected I have no idea how the internals of either tool work, so I can't hazard a guess – Yager 2/11, 2017 at 1:46

@Illaffected Is there a reason you use '\000' instead of '\0'? On the surface they seem to have the same effect – Larrikin 31/5, 2018 at 1:38

@HaroldFischer I don't recall why I wrote it that way 8 years ago I'm afraid :) – Illaffected 31/5, 2018 at 2:8

@Illaffected '\000' is used in lieu of '\0' in the POSIX opengroup specification for tr. That is a good reason to prefer it – Larrikin 31/5, 2018 at 2:45

@HaroldFischer well I'm not sure what you're trying to do; if you want to see if file has any nulls in it you could use wc to compare the size of the file pre-filtering and post-filtering. In a Unicode world it's generally a better idea to be prepared for non-ASCII characters than to worry about them. – Illaffected 31/5, 2018 at 17:35

@Illaffected I do apologize, that question was meant for someone else – Larrikin 31/5, 2018 at 18:3

I manage to detect nulls using grep -Poa '\000' and using wc. Seems easier and more direct / less error-prone. – Rearmost 26/4, 2019 at 3:7

@Rearmost I'm glad that works for you, but I'm not sure what's less "error-prone" about it. – Illaffected 26/4, 2019 at 3:25

I can identify nulls more exactly matched to the content rather than aggregate file size. I guess if we are replacing all nulls then maybe that's less of a problem. Grepping just gives a more focused approach that I was trying to mention. – Rearmost 26/4, 2019 at 4:17

This is an extra-good answer for the description of the file redirects. That has the potential to make things so much clearer! – Nonoccurrence 10/6, 2019 at 16:38

How can we automate the process, the moment multiple files arrived, remove its null characters ? – Principate 6/1, 2021 at 5:58

Use the following sed command for removing the null characters in a file.

sed -i 's/\x0//g' null.txt

this solution edits the file in place, important if the file is still being used. passing -i'ext' creates a backup of the original file with 'ext' suffix added.

Sonora answered 8/3, 2010 at 7:13 Comment(6)

Note: In FreeBSD (and I believe also Mac OS X), sed -i requires an extension in the next argument, but it may be empty. In those systems, add a '', as in: sed -i '' 's/\x0//g "$FILE". – Henry 1/2, 2017 at 21:5

This is an order of magnitude faster than tr for me – Yager 30/10, 2017 at 4:6

For me, using Git for Windows and $ sed --version -> sed (GNU sed) 4.7, I had to use the following invocation to get a backup file called example.csv.bak: sed -i.bak 's/\x0//g' example.csv – Sair 22/1, 2020 at 18:21

@TimČas you did it great, just missed one ' so it should be sed -i '' 's/\x0//g' some_file.xml – Atrophied 29/4, 2020 at 7:29

On mac this only did the first null character and not all of them. gsed did work to do all of them. – Sansone 8/9, 2021 at 18:36

Is sed -i '/\x0/d' null.txt a valid alternative? Maybe more elegant – Newsboy 17/11, 2022 at 20:28

A large number of unwanted NUL characters, say one every other byte, indicates that the file is encoded in UTF-16 and that you should use iconv to convert it to UTF-8.

Predatory answered 7/3, 2010 at 23:16 Comment(2)

I ran out of disk space while my application was logging. This resulting in these characters. – Flack 7/3, 2010 at 23:21

For example, it works using this command: iconv -f UTF-16 -t UTF-8 file. – Sidewalk 27/4, 2020 at 15:30

I discovered the following, which prints out which lines, if any, have null characters:

perl -ne '/\000/ and print;' file-with-nulls

Also, an octal dump can tell you if there are nulls:

od file-with-nulls | grep ' 000'

Flack answered 8/3, 2010 at 8:8 Comment(1)

for me only 'od -to1 ..' had the desired behaviout otherwise 2 bytes were put out like '001002'. Seems somehow like default settings or implementation of od have changed over time. – Rheology 14/2 at 16:37

If the lines in the file end with \r\n\000 then what works is to delete the \n\000 then replace the \r with \n.

tr -d '\n\000' <infile | tr '\r' '\n' >outfile

Stoltz answered 24/11, 2015 at 10:41 Comment(1)

PS. If you find yourself in a Windows DOS shell, you can get the GNU/win32 versions of Unix commands from Sourceforge.net. I use them all the time. Check out "od" the octal dump command for analysing what's in a file... – Stoltz 20/6, 2016 at 14:22

Here is example how to remove NULL characters using ex (in-place):

ex -s +"%s/\%x00//g" -cwq nulls.txt

and for multiple files:

ex -s +'bufdo!%s/\%x00//g' -cxa *.txt

^{For recursivity, you may use globbing option **/*.txt (if it is supported by your shell).}

Useful for scripting since sed and its -i parameter is a non-standard BSD extension.

Bennink answered 29/5, 2015 at 23:1 Comment(0)

I used:

recode UTF-16..UTF-8 <filename>

to get rid of zeroes in file.

Brandybrandyn answered 22/6, 2015 at 10:4 Comment(0)

I faced the same error with:

import codecs as cd
f=cd.open(filePath,'r','ISO-8859-1')

I solved the problem by changing the encoding to utf-16

f=cd.open(filePath,'r','utf-16')

Deanadeanda answered 4/9, 2018 at 6:57 Comment(0)

Remove trailing null character at the end of a PDF file using PHP, . This is independent of OS

This script uses PHP to remove a trailing NULL value at the end of a binary file, solving a crashing issue that was triggered by the NULL value. You can edit this script to remove all NULL characters, but seeing it done once will help you understand how this works.

Backstory
We were receiving PDF's from a 3rd party that we needed to upload to our system using a PDF library. In the files being sent to us, there was a null value that was sometimes being appended to the PDF file. When our system processed these files, files that had the trailing NULL value caused the system to crash.

Originally we were using sed but sed behaves differently on Macs and Linux machines. We needed a platform independent method to extract the trailing null value. Php was the best option. Also, it was a PHP application so it made sense :)

This script performs the following operation:

Take the binary file, convert it to HEX (binary files don't like exploding by new lines or carriage returns), explode the string using carriage return as the delimiter, pop the last member of the array if the value is null, implode the array using carriage return, process the file.

//In this case we are getting the file as a string from another application. 
// We use this line to get a sample bad file.
$fd = file_get_contents($filename);

//We trim leading and tailing whitespace and convert the string into hex
$bin2hex = trim(bin2hex($fd));

//We create an array using carriage return as the delminiter
$bin2hex_ex = explode('0d0a', $bin2hex);

//look at the last element.  if the last element is equal to 00 we pop it off
$end = end($bin2hex_ex);
if($end === '00') {
   array_pop($bin2hex_ex);
}

//we implode the array using carriage return as the glue
$bin2hex = implode('0d0a', $bin2hex_ex);

//the new string no longer has the null character at the EOF
$fd = hex2bin($bin2hex);

Bickering answered 15/9, 2022 at 2:8 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags