How to get vim to show a byte-by-byte representation of file data

Asked 31/8, 2012 at 17:19 Answered 13/2, 2015 at 16:34

I don't want vim to ever interpret my data in any encoding specific way. In other words, when I'm in vim, I want the character that my cursor is on to correspond to the actual byte, not a utf* (etc.) representation of that byte.

I need to use vim to analyze issues caused by Unicode conversion errors made by other people (using other software) so it's important that I see what is actually there.

For example, in Cygwin's vim, I have been able to see UTF-8 BOMs as

ï»¿ [START OF FILE DATA]

This is perfect. I recognize this as a UTF-8 BOM and if I want to know what the hex for each character is, I can put the cursor on the characters and use 'ga'.

I recently got a proper Linux machine (Fedora). In /etc/vimrc, this line exists

set fileencodings=ucs-bom,utf-8,latin1

When I look at a UTF-8 BOM on this machine, the BOM is completely hidden.

When I add the following line to ~/.vimrc

set fileencodings=latin1

I see

Ã¯Â»Â¿

The first 3 characters are the BOM (when ga is used against them). I don't know what the last 3 characters are.

At one point, I even saw the UTF-8 BOM represented as "feff" - the UTF-16 BOM.

Anyway, you see my problem. I need to see exactly what is in my file without vim interpreting the bytes for me. I know I could use xxd, od, etc but vim has always been very convenient as an analysis tool. Plus I want to be able to edit the files and save them without any conversion problems.

Thanks for your help.

Toh answered 31/8, 2012 at 17:19 Comment(1)

Mind you: whenever someone writes, says or even thinks "UTF-8 BOM", a kitten gets killed. – Rosco 1/9, 2012 at 15:9

Use 'binary' mode:

:edit ++bin file

vim -b file

From :help 'binary':

The 'fileencoding' and 'fileencodings' options will not be used, the file is read without conversion.

Phila answered 31/8, 2012 at 17:48 Comment(1)

Thanks. It's a very logical suggestion but I get the same results. – Toh 31/8, 2012 at 21:30

I get some good mileage from doing :e ++enc=latin1 after loading the file (VIm's initial guess on the encoding isn't important at this stage).

Inimical answered 13/2, 2015 at 16:34 Comment(1)

this was super helpful – Showdown 2/9, 2016 at 15:4

The sequence Ã¯Â»Â¿ is actually the U+FEFF (BOM) encoded UTF-8, decoded latin1, encoded UTF-8, and decoded latin1 again. ï»¿ is the U+FEFF (BOM) encoded as UTF-8 and decoded as latin1. You can't get away from encodings. Those aren't the actual bytes, they are the latin1 characters displayed from an incorrect decoding. If you want bytes, use a hex editor; otherwise, use the correct decoding.

Sherlocke answered 1/9, 2012 at 0:59 Comment(0)

Recommended topics

Hot tags