The problem appears when MATLAB character encoding is UTF-8, which is usually the case for Linux users (hence no problem for Amro's configuration using CP1252). When MATLAB character set encoding (get it with slCharacterEncoding()
) is UTF-8, MATLAB eps export function is bugged (at least until R2011b) as it exports the non-ASCII characters in the octal escaped UTF-8 format (2 bytes) whereas the Postscript interpreter is set to decode 1-byte format.
Let's illustrate the bug with the character ö U+00F6 whose some representations are:
- UTF-16: 0x00F6
- UTF-8: 0xC3 0xB6
- C octal escaped UTF-8: \303\266
- XML decimal entity: ö
The eps file created by MATLAB contains:
/Helvetica /ISOLatin1Encoding 120 FMSR
(\303\266) s
MATLAB defines in the eps file a function FMSR
that re-encodes Helvetica font into another encoding, here ISOLatin1Encoding which is one of the two built-in encoding vectors and closely matches the ISO-8859-1 (Latin1) standard (see p.329-330 of the Postscript Language Reference Manual for more details). Briefly, encoding vectors are 256-element arrays that associates a character name to a character code. So it only reads 1-byte character codes. In ISO-8859-1, \303=195=à and \266=182=¶. As a result, it prints ö.
Options for exporting non-ASCII ISO-8859-1 characters with a UTF-8 locale environment
Convert the octal UTF-8 codes into octal ISO-8859-1 codes, which is easy because non-ASCII ISO-8859-1 characters follow the same layout in UTF-8. For example, with the program sed, which can be run from the Command window or from your export script:
!sed -i -e 's/\\302\(\\2[4-7][0-7]\)/\1/g' -e 's/\\303\\2\([0-7][0-7]\)/\\3\1/g' file.eps
Thus, \303\266
becomes \366
=246=ö. You can directly type the non-ASCII characters in MATLAB.
Change the MATLAB character set encoding slCharacterEncoding('ISO-8859-1')
before adding text to the figure and, if you add text from the Command window, use char(number) for non-ASCII characters. If you add text directly in the figure with the plot tools, you can enter the non-ASCII characters. This solution is not ideal because the non-ASCII characters do not appear on the figure in the default font (Helvetica by default with MATLAB on Linux) and it requires to use char(number) if you script the creation of the figure.
Render the text later with LaTex by using a user-submitted MATLAB function such as LaPrint or one of its forks, which creates a tex-file with the text of the figure and an eps-file with the non-text part of the figure. A similar solution is matlab2tikz which creates a tikz/pgfplot file and a tex file.
Use the Latex interpreter of MATLAB: \"{o}
. MATLAB creates the character by combining the ASCII character with its diacritic but the result is low quality because of bad relative positioning (the diacritic is a bit too much on the right compared to the character). MATLAB uses the glyphs from Computer Modern font and embeds the font in the eps file (which adds ~ 80 Ko). Furthermore, the raw text in the pdf created from the eps does not contain ö
but o ̈
.
Exporting non-ISO-8859-1 characters
For exporting characters that are not in ISO-8859-1, which was asked on here, there is probably a reasonable solution if the number of characters needed is less than 256 (8-bit format) and ideally in a standard encoding set. It involves the following steps:
- Convert the octal code into the Unicode character;
- Save the file into the target encoding standard (in a 8-bit format);
- Add the encoding vector for the target encoding set.
For example, if you want to export Polish text, you need to convert the file into ISO-8859-2. Here is an implementation on Linux with Bash:
#!/bin/bash
name=$(basename "$1" .eps)
ascii2uni -a K "$1" > /tmp/eps_uni.eps
iconv -t ISO-8859-2 /tmp/eps_uni.eps -o "$name"_latin2.eps
sed -i -e '/%EndPageSetup/ r ISOLatin2Encoding.ps' -e 's/ISOLatin1Encoding/MyEncoding/' "$name"_latin2.eps
saved as eps_lat2; then running the command sh eps_lat2 file.eps
creates file_latin2.eps with Latin-2 encoding. The file ISOLatin2Encoding.ps contains this:
/MyEncoding
% The first 144 entries are the same as the ISO Latin-1 encoding.
ISOLatin1Encoding 0 144 getinterval aload pop
% \22x
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
% \24x
/nbspace /Aogonek /breve /Lslash /currency /Lcaron /Sacute /section
/dieresis /Scaron /Scedilla /Tcaron /Zacute /hyphen /Zcaron /Zdotaccent
/degree /aogonek /ogonek /lslash /acute /lcaron /sacute /caron
/cedilla /scaron /scedilla /tcaron /zacute /hungarumlaut /zcaron /zdotaccent
% \30x
/Racute /Aacute /Acircumflex /Abreve /Adieresis /Lacute /Cacute /Ccedilla
/Ccaron /Eacute /Eogonek /Edieresis /Ecaron /Iacute /Icircumflex /Dcaron
/Dcroat /Nacute /Ncaron /Oacute /Ocircumflex /Ohungarumlaut /Odieresis /multiply
/Rcaron /Uring /Uacute /Uhungarumlaut /Udieresis /Yacute /Tcedilla /germandbls
% \34x
/racute /aacute /acircumflex /abreve /adieresis /lacute /cacute /ccedilla
/ccaron /eacute /eogonek /edieresis /ecaron /iacute /icircumflex /dcaron
/dcroat /nacute /ncaron /oacute /ocircumflex /ohungarumlaut /odieresis /divide
/rcaron /uring /uacute /uhungarumlaut /udieresis /yacute /tcedilla /dotaccent
256 packedarray def
Here is another implementation with Python (so it can work also on Windows and Mac):
#!/usr/bin/python
# -*- coding: utf-8 -*-
import sys,codecs
input = sys.argv[1]
fo = codecs.open(input[:-4]+'_latin2.eps','w','latin2')
with codecs.open(input,'r','string_escape') as fi:
data = fi.readlines()
with open('ISOLatin2Encoding.ps') as fenc:
for line in data:
fo.write(line.decode('utf-8').replace('ISOLatin1Encoding','MyEncoding'))
if line.startswith('%%EndPageSetup'):
fo.write(fenc.read())
fo.close()
saved as eps_lat2.py; then running the command python eps_lat2.py file.eps
creates file_latin2.eps with Latin-2 encoding.
It can easily be adapted to other 8-bit encoding standards by changing the encoding vector and the iconv (or codecs.open) parameter in the script.