Trying to delete non-ASCII characters only [duplicate]

About

Asked 22/2, 2013 at 23:26 Answered 23/2, 2013 at 0:30

I am trying to manipulate a text file and remove non-ASCII characters from the text. I don't want to remove the line. I only want to remove the offending characters. I am trying to get the following expression to work:

sed '/[\x80-\xFF]/d'

Eunaeunice answered 22/2, 2013 at 23:26 Comment(6)

See this answer. – Phosphatize 22/2, 2013 at 23:38

This thread might have the answer you are looking #8572101 – Waly 22/2, 2013 at 23:38

Your command will delete all lines containing non-ascii characters. If that's not what you want, check the duplicate questions – Larios 23/2, 2013 at 0:2

I have tried two commands : 1) sed -E 's/[^[:print:]]//' <-- this should remove non printable characters. However, non printable stuff is still appearing. When I try to use sed -E 's/[\d128-\d255]//', I get a Invalid Collation error. Is there any other commands that somone can suggest to remove non-ascii characters only – Eunaeunice 23/2, 2013 at 0:15

There is decent perl example in the first comments link. If that is what you mean by "any other commands"... – Bus 23/2, 2013 at 0:29

Thanks Josh but I am looking to do it with Sed or maybe TR – Eunaeunice 23/2, 2013 at 0:32

The suggested solutions may fail with specific version of sed, e.g. GNU sed 4.2.1.

Using tr:

tr -cd '[:print:]' < yourfile.txt

This will remove any characters not in [\x20-\x7e].

If you want to keep e.g. line feeds, just add \n:

tr -cd '[:print:]\n' < yourfile.txt

If you really want to keep all ASCII characters (even the control codes):

tr -cd '[:print:][:cntrl:]' < yourfile.txt

This will remove any characters not in [\x00-\x7f].

Phosphatize answered 23/2, 2013 at 0:30 Comment(5)

Hey speakr, is there a way to preserve the format of the text file. The tr command feeds everything onto a continuous line right? – Eunaeunice 23/2, 2013 at 0:39

@bosra: I added an example to preserve line feeds. – Phosphatize 23/2, 2013 at 0:44

Man, if I could upvote this a few more times I would..Thanks – Eunaeunice 23/2, 2013 at 21:18

any idea why meld would still consider the fixed files as binary? btw, the result seems different from tr -cd '\11\12\15\40-\176' which worked with meld (at least with my files) ref – Kerf 1/3, 2015 at 19:45

This question helped me a lot, but since I wanted to keep the \n and \t in the output file, I used the command below instead: tr -cd '[:print:][/n/t]' < yourfile.txt > output.txt – Proprietress 15/5, 2015 at 13:10

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags