I develop C++ cross platform using Microsoft Visual Studio on Windows and GCC on Ubuntu Linux.
In Visual Studio, I can use Unicode symbols like "π" and "²" in my code. Visual Studio always saves the source files as UTF-8 with BOM (Byte Order Mark).
For example:
// A = π.r²
double π = 3.14;
GCC happily compiles these files only if I remove the BOM first. If I do not remove the BOM, I get errors like these:
wwga_hydutils.cpp:28:9: error: stray ‘\317’ in program
wwga_hydutils.cpp:28:9: error: stray ‘\200’ in program
Which brings me to the question:
Is there a way to get GCC to compile UTF-8 files without first removing the BOM?
I'm using:
- Windows 7
- Visual Studio 2010
and:
- Ubuntu 11.10 (Oneiric Ocelot)
- GCC 4.6.1, 2011-06-27 (as provided by apt-get install gcc)
As the first commenter pointed out, my problem was not the BOM, but having non-ASCII characters outside of string constants. GCC does not like non-ASCII characters in symbol names, but it turns out GCC is fully compatible with UTF-8 with BOM.
double π = 3.14;
: typography +1, math -1. – HothousefunΛ()
, would be written asfun\u039B()
to be able to run in gcc. I changed my compiler to clang, and things worked fine. gcc's-finput-charset=UTF-8 -fextended-identifiers
don't help either.-fextended-identifiers
is simply for supporting universal character name format, if turn off(-fno-extended-identifiers)
evenfun\u039B()
fails. – Lupulin