Trying to delete non-ASCII characters only [duplicate]
Asked Answered
E

1

11

I am trying to manipulate a text file and remove non-ASCII characters from the text. I don't want to remove the line. I only want to remove the offending characters. I am trying to get the following expression to work:

sed '/[\x80-\xFF]/d'

Eunaeunice answered 22/2, 2013 at 23:26 Comment(6)
See this answer.Phosphatize
This thread might have the answer you are looking #8572101Waly
Your command will delete all lines containing non-ascii characters. If that's not what you want, check the duplicate questionsLarios
I have tried two commands : 1) sed -E 's/[^[:print:]]//' <-- this should remove non printable characters. However, non printable stuff is still appearing. When I try to use sed -E 's/[\d128-\d255]//', I get a Invalid Collation error. Is there any other commands that somone can suggest to remove non-ascii characters onlyEunaeunice
There is decent perl example in the first comments link. If that is what you mean by "any other commands"...Bus
Thanks Josh but I am looking to do it with Sed or maybe TREunaeunice
P
45

The suggested solutions may fail with specific version of sed, e.g. GNU sed 4.2.1.

Using tr:

tr -cd '[:print:]' < yourfile.txt

This will remove any characters not in [\x20-\x7e].

If you want to keep e.g. line feeds, just add \n:

tr -cd '[:print:]\n' < yourfile.txt

If you really want to keep all ASCII characters (even the control codes):

tr -cd '[:print:][:cntrl:]' < yourfile.txt

This will remove any characters not in [\x00-\x7f].

Phosphatize answered 23/2, 2013 at 0:30 Comment(5)
Hey speakr, is there a way to preserve the format of the text file. The tr command feeds everything onto a continuous line right?Eunaeunice
@bosra: I added an example to preserve line feeds.Phosphatize
Man, if I could upvote this a few more times I would..ThanksEunaeunice
any idea why meld would still consider the fixed files as binary? btw, the result seems different from tr -cd '\11\12\15\40-\176' which worked with meld (at least with my files) refKerf
This question helped me a lot, but since I wanted to keep the \n and \t in the output file, I used the command below instead: tr -cd '[:print:][/n/t]' < yourfile.txt > output.txtProprietress

© 2022 - 2024 — McMap. All rights reserved.