You are confusing 'encoding'
which is a Vim global setting, and 'fileencoding'
, which is a local setting to each buffer.
When opening a file, the variable 'fileencodings'
(note the final s) determines what encodings Vim will try to open the file with. If it starts with ucs-bom
then any file with a BOM will be properly opened if it parses correctly.
If you want to change the encoding of a file, you should use :set fenc=<foo>
. If you want to remove the BOM you should use :set [no]bomb
. Then use :w
to save.
Avoid changing enc
after having opened a buffer, it could mess up things. enc
determines what characters vim can work with, and it has nothing to do with the files that you are working with.
Details
c:\> gvim umlaute.txt
You are opening vim, with a nonexistent file name. Vim creates a buffer, gives it that name, and sets fenc
to an empty value since there is no file associated with it.
:set enc
(VIM echoes encoding=latin1)
This means that the Vim stores the buffer contents in ISO-8859-1 (maybe another number).
and then I check the file encoding ...
:set fenc
(VIM echoes fileencoding=)
This is normal, there is no file for the moment.
Then I write the file
:w
Since 'fileencoding'
is empty, it will write it to the disk using the internal encoding, latin1
.
And check the file's size on the harddisk:
!dir umlaute.txt
(The size is 5 bytes) That is of course expected, 3 bytes for the text and 2 for the \x0a \x0d.
Ok, so I now set the encoding to
:set enc=utf8
WRONG! You are telling vim that it must interpret the buffer contents as UTF8 content. the buffer contains, in hexadecimal, e4 f6 fc 0a 0d
, the first three bytes are invalid UTF8 character sequences. You should have typed :set fenc=utf-8
. This would have converted the buffer.
The buffer get's wierd
That's what happens when you force Vim to interpret an illegal UTF-8 file as UTF8.
I guess this is the hex representation of the ascii characters I previously typed in. So I rewrite them
äöü
Writing, checking size:
:w
:$ dir umlaute.txt
This time, it's 8 bytes. I guess that makes sense 2 bytes for every character plus \x0d \x0a.
Ok, so I want to make sure the next time I open the file it will be opened with encodiung=utf8.
:set bomb
:w
:$ dir umlaute.txt
11 Bytes. This is of course 8 (previous) Bytes + 3 Bytes for the BOM (ef bb bf).
So I
:quit
vim and open the file again
and check, if the encoding is set:
:set enc
But VIM insists its encoding=latin1.
You should run set fenc?
to know what is the detected encoding of your file. And if you want Vim to be able to work with Unicode files, you should set in your vimrc that 'enc'
is utf-8.
# vim: set fileencoding=utf-8
to make sure I get utf-8 in vim – Rendering