How to export umlaut (or any foreign character) in Matlab eps format?

C

2

7

I'm trying to use an umlaut within a legend command in MATLAB. A quick Google tells me the form I want is char(146), and that works fine for displaying the file, or printing it to tif.

But when I print to EPS format (or epsc, eps2, epsc2) then a different character is displayed in the file. I've tried printing the first 300-odd characters, and they certainly change (albeit very slowly, a good half of which are "A" with a symbol immediately afterward), but this seems a pretty slow approach, and I'm not guaranteed to actually find the symbol I want. So, does anyone here have any ideas on what I can try?.

I'm using MATLAB R2011a, my default character-set is UTF-8, my print line looks something like..

legend( plot_id , strcat('lala',char(146)) )

and my print line looks like..

print -depsc2 -tiff -r600 <filename>

(but switching off the tiff thumbnail generation doesn't have any effect)

Crow answered 9/8, 2012 at 20:57 Comment(0)

F

10

The problem appears when MATLAB character encoding is UTF-8, which is usually the case for Linux users (hence no problem for Amro's configuration using CP1252). When MATLAB character set encoding (get it with slCharacterEncoding()) is UTF-8, MATLAB eps export function is bugged (at least until R2011b) as it exports the non-ASCII characters in the octal escaped UTF-8 format (2 bytes) whereas the Postscript interpreter is set to decode 1-byte format.

Let's illustrate the bug with the character ö U+00F6 whose some representations are:

UTF-16: 0x00F6
UTF-8: 0xC3 0xB6
C octal escaped UTF-8: \303\266
XML decimal entity: &#246

The eps file created by MATLAB contains:

/Helvetica /ISOLatin1Encoding 120 FMSR
(\303\266) s

MATLAB defines in the eps file a function FMSR that re-encodes Helvetica font into another encoding, here ISOLatin1Encoding which is one of the two built-in encoding vectors and closely matches the ISO-8859-1 (Latin1) standard (see p.329-330 of the Postscript Language Reference Manual for more details). Briefly, encoding vectors are 256-element arrays that associates a character name to a character code. So it only reads 1-byte character codes. In ISO-8859-1, \303=195=Ã and \266=182=¶. As a result, it prints Ã¶.

Options for exporting non-ASCII ISO-8859-1 characters with a UTF-8 locale environment

Convert the octal UTF-8 codes into octal ISO-8859-1 codes, which is easy because non-ASCII ISO-8859-1 characters follow the same layout in UTF-8. For example, with the program sed, which can be run from the Command window or from your export script:
```
!sed -i -e 's/\\302\(\\2[4-7][0-7]\)/\1/g' -e 's/\\303\\2\([0-7][0-7]\)/\\3\1/g' file.eps
```
Thus, \303\266 becomes \366=246=ö. You can directly type the non-ASCII characters in MATLAB.
Change the MATLAB character set encoding slCharacterEncoding('ISO-8859-1') before adding text to the figure and, if you add text from the Command window, use char(number) for non-ASCII characters. If you add text directly in the figure with the plot tools, you can enter the non-ASCII characters. This solution is not ideal because the non-ASCII characters do not appear on the figure in the default font (Helvetica by default with MATLAB on Linux) and it requires to use char(number) if you script the creation of the figure.
Render the text later with LaTex by using a user-submitted MATLAB function such as LaPrint or one of its forks, which creates a tex-file with the text of the figure and an eps-file with the non-text part of the figure. A similar solution is matlab2tikz which creates a tikz/pgfplot file and a tex file.
Use the Latex interpreter of MATLAB: \"{o}. MATLAB creates the character by combining the ASCII character with its diacritic but the result is low quality because of bad relative positioning (the diacritic is a bit too much on the right compared to the character). MATLAB uses the glyphs from Computer Modern font and embeds the font in the eps file (which adds ~ 80 Ko). Furthermore, the raw text in the pdf created from the eps does not contain ö but o ̈.

Exporting non-ISO-8859-1 characters

For exporting characters that are not in ISO-8859-1, which was asked on here, there is probably a reasonable solution if the number of characters needed is less than 256 (8-bit format) and ideally in a standard encoding set. It involves the following steps:

Convert the octal code into the Unicode character;
Save the file into the target encoding standard (in a 8-bit format);
Add the encoding vector for the target encoding set.

For example, if you want to export Polish text, you need to convert the file into ISO-8859-2. Here is an implementation on Linux with Bash:

#!/bin/bash
name=$(basename "$1" .eps)
ascii2uni -a K "$1" > /tmp/eps_uni.eps
iconv -t ISO-8859-2 /tmp/eps_uni.eps -o "$name"_latin2.eps
sed -i -e '/%EndPageSetup/ r ISOLatin2Encoding.ps' -e 's/ISOLatin1Encoding/MyEncoding/' "$name"_latin2.eps

saved as eps_lat2; then running the command sh eps_lat2 file.eps creates file_latin2.eps with Latin-2 encoding. The file ISOLatin2Encoding.ps contains this:

/MyEncoding
% The first 144 entries are the same as the ISO Latin-1 encoding.
ISOLatin1Encoding 0 144 getinterval aload pop
% \22x
    /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
    /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
% \24x
    /nbspace /Aogonek /breve /Lslash /currency /Lcaron /Sacute /section
    /dieresis /Scaron /Scedilla /Tcaron /Zacute /hyphen /Zcaron /Zdotaccent
    /degree /aogonek /ogonek /lslash /acute /lcaron /sacute /caron
    /cedilla /scaron /scedilla /tcaron /zacute /hungarumlaut /zcaron /zdotaccent
% \30x
    /Racute /Aacute /Acircumflex /Abreve /Adieresis /Lacute /Cacute /Ccedilla
    /Ccaron /Eacute /Eogonek /Edieresis /Ecaron /Iacute /Icircumflex /Dcaron
    /Dcroat /Nacute /Ncaron /Oacute /Ocircumflex /Ohungarumlaut /Odieresis /multiply
    /Rcaron /Uring /Uacute /Uhungarumlaut /Udieresis /Yacute /Tcedilla /germandbls
% \34x
    /racute /aacute /acircumflex /abreve /adieresis /lacute /cacute /ccedilla
    /ccaron /eacute /eogonek /edieresis /ecaron /iacute /icircumflex /dcaron
    /dcroat /nacute /ncaron /oacute /ocircumflex /ohungarumlaut /odieresis /divide
    /rcaron /uring /uacute /uhungarumlaut /udieresis /yacute /tcedilla /dotaccent
256 packedarray def

Here is another implementation with Python (so it can work also on Windows and Mac):

#!/usr/bin/python
# -*- coding: utf-8 -*-
import sys,codecs
input = sys.argv[1]
fo = codecs.open(input[:-4]+'_latin2.eps','w','latin2')
with codecs.open(input,'r','string_escape') as fi:
    data = fi.readlines()
with open('ISOLatin2Encoding.ps') as fenc:
    for line in data:
        fo.write(line.decode('utf-8').replace('ISOLatin1Encoding','MyEncoding'))
        if line.startswith('%%EndPageSetup'):
            fo.write(fenc.read())
fo.close()

saved as eps_lat2.py; then running the command python eps_lat2.py file.eps creates file_latin2.eps with Latin-2 encoding.

It can easily be adapted to other 8-bit encoding standards by changing the encoding vector and the iconv (or codecs.open) parameter in the script.

Finbur answered 8/2, 2013 at 9:50 Comment(2)

+1 great answer thanks for sharing. Now I know very little when it comes to Postscript, but I think that the real issue is not related to printing. Rather the fact that MATLAB is not capable of displaying Unicode text in handle graphics. I previously showed the problem with some possible workarounds, unfortunately that will not help in this case since printing UI controls produce raster graphics. – Stemma 4/6, 2013 at 7:54

Sidenote: slCharacterEncoding function is part of Simulink. I'd prefer using feature('DefaultCharacterSet','xxx') for getting/setting the charset, which is part of core MATLAB (although it is undocumented). – Stemma 4/6, 2013 at 7:56

S

3

Here is a simple test:

%# common text properties
props = {'FontSize',30};

%# LaTeX
str = '\"a\"o\"u';
subplot(121), plot(1:10)
text(5, 5, str, 'Interpreter','latex', props{:})
legend({str}, 'Interpreter','latex', props{:})
xlabel(str, 'Interpreter','latex', props{:})
title(str, 'Interpreter','latex', props{:})

%# normal text
str = 'äöü';
subplot(122), plot(10:-1:1)
text(5, 5, str, props{:})
legend({str}, props{:})
title(str, props{:})
xlabel(str, props{:})

%# export as EPS file
print -depsc2 -tiff -r600 file.eps

screenshot

with the resulting EPS file looking the same.

Notes:

I am on Windows XP, and the default character encoding is Windows-1252:

>> feature('DefaultCharacterSet')
ans =
windows-1252

So you can directly type those umlauts using their (extended) ASCII code: Alt+0228, Alt+0246, and Alt+0252 for ä, ö, ü respectively:

>> char([228 246 252])
ans =
äöü

Also note that I am using the Arial font by default:

>> get(0, 'defaultTextFontName')
ans =
Arial

>> get(0, 'defaultAxesFontName')
ans =
Arial

Stemma answered 9/8, 2012 at 22:24 Comment(0)

Options for exporting non-ASCII ISO-8859-1 characters with a UTF-8 locale environment

Exporting non-ISO-8859-1 characters

Notes:

Recommended topics

Hot tags