Strange character for empty line in TextWrangler and cat -v
Asked Answered
Q

1

5

I have a text file, which on my Mac I open with TextWrangler. I enable the invisible characters to see the line endings. I see that every empty line has a red, upside down question mark in it. Which character is this?

When in the terminal I type cat -v file.txt, it shows these characters as ^@ (and the line endings themselves as ^M). What I need to know is the regex of that specific character, like /n for the end of line.

In the hex dump, I see the following:

0000000: 312e 300d 0a00 0d0a 2231 3130 3030 3030  1.0....."1100000
0000010: 3030 3222 3b22 3922 3b22 5354 4422 3b3b  002";"9";"STD";;
0000020: 3b0d 0a22 3131 3030 3030 3030 3639 223b  ;.."1100000069";

If I manually remove the strange characters, and make a new hex dump, I see:

0000000: 312e 300d 0a0d 0a22 3131 3030 3030 3030  1.0...."11000000
0000010: 3032 223b 2239 223b 2253 5444 223b 3b3b  02";"9";"STD";;;
0000020: 0d0a 2231 3130 3030 3030 3036 3922 3b22  .."1100000069";"

The difference is a byte sequence 00. Is there an encoding in which this 00 is required for empty lines?

Quadrisect answered 27/5, 2015 at 14:17 Comment(0)
A
10

The red inverted question mark, you are looking at, is apparently a NULL / NUL character. Whether or not it makes any difference does depend on the application writing/reading the files in question. (So, it's most likely not a general encoding issue of sorts. Compare: Wikipedia.)
Once you made the hidden characters visible in TextWrangler, you can mark that/any character (or character sequence for that matter), and copy it to the Find input field using CMD + E. The NULL character shows up as \x{00} on my machine.
Alternatively, you might use -> Text -> Zap Gremlins... with (at least) Null (ASCII 0) characters checked, Replace with code selected, and were told \x00. Either one of these should work when searching for these characters - no matter whether grep is enabled or not. Not sure, though, whether \s should actually find it as well in grep mode - it does not on my machine. But \W does grep it.

Please comment, if and as this requires adjustment / further detail.

Adust answered 4/7, 2015 at 7:16 Comment(9)
The trick with Cmd+E worked brilliantly! I had tried a similar thing with selecting the text, copy (Cmd+C) and paste in search field (Cmd+V), but that didn't work. With Cmd+E, it shows the strange character indeed as \x{00}.Quadrisect
This just helped me too. Have no idea why suddenly newlines turn up as upside down question marks in TextWrangler. But at least this way I could find-replace all those to real newlines too.Palaeography
@Palaeography You might have used an application, which uses NULL terminated strings - and for one reason or another wrote them to your file.Adust
FYI The application I used is Google Spreadsheets. docs.google.com/spreadsheets/u/0 I copied a few lines of a column and got this result.Palaeography
The CMD+E trick shows me \r\t, which is exactly what should be there (but I see the red, upside-down question marks instead).Discreditable
@JustinPutney OP had a specific character in their data. This is, what I responded to. Apparently the TextWrangler programmers do use the upside down question mark for (at least) \r as well. (And this even for files marked "Legacy Mac OS", where this was actually used to mark the end of a line.) As a matter of fact, with -> View -> Text Display -> Show Invisibles my machine shows the upside down question mark followed by an additional triangle (for the tabulator) for \r\t. - If I missed your point/question, please, rephrase.Adust
@Adust I'm seeing the upside question marks even when Show Invisibles is toggled off. Zap Gremlins does seem to get rid of them. Wish I knew why they were suddenly showing up in the first place Thanks!Discreditable
@JustinPutney - Yeah - the upside down question mark is always shown for some ((not just) to the TextWrangler programmers) 'special characters' - nicknamed gremlins. You, however, mentioned \t in your via CMD + E discovered sequence. But the tabulators are represented by a triangle - with invisibles shown, only. - Earlier, you seemed to suggest, the indicated sequence is rightfully in your file. With no further context, it's hard to tell, why they are indicated only now. Do you have older versions of your file(s) for comparison? Did you use a different application for processing?Adust
@Adust I'll have to look for older versions. I've only noticed them recently.Discreditable

© 2022 - 2024 — McMap. All rights reserved.