Apache FOP Displaying ### with SimSun
Asked Answered
D

1

10

I am maintaining a program which uses the Apache FOP for printing PDF documents. There have been a couple complaints about the Chinese characters coming up as "####". I have found an existing thread out there about this problem and done some research on my side.

http://apache-fop.1065347.n5.nabble.com/Chinese-Fonts-td10789.html

I do have the uming.tff language files installed on my system. Unlike the person in this thread, I am still getting the "####".

From this point forward, has anyone seen a work around that would allow you to print complex characters in a PDF document using Apache FOP?

Disentomb answered 17/9, 2014 at 16:40 Comment(0)
M
17

Three steps must be taken for chinese characters to correctly show in a PDF file created with FOP (this is also true for all characters not available in the default font, and more generally to use a non-default font).

Let us use this simple fo example to show the warnings produced by FOP when something is wrong:

<?xml version="1.0" encoding="UTF-8"?>
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
    <fo:layout-master-set>
        <fo:simple-page-master master-name="one">
            <fo:region-body />
        </fo:simple-page-master>
    </fo:layout-master-set>
    <fo:page-sequence master-reference="one">
        <fo:flow flow-name="xsl-region-body">
            <!-- a block of chinese text -->
            <fo:block>博洛尼亚大学中国学生的毕业论文</fo:block>
        </fo:flow>
    </fo:page-sequence>
</fo:root>

Processing this input, FOP gives several warnings similar to this one:

org.apache.fop.events.LoggingEventListener processEvent
WARNING: Glyph "?" (0x535a) not available in font "Helvetica".
...

Without any explicit font-family indication in the FO file, FOP defaults to using Helvetica, which is one of the Base-14 fonts (fonts that are available everywhere, so there is no need to embed them).

Each font supports a set of characters, assigning a visible glyphs to them; when a font does not support a character, the above warning is produced, and the PDF shows "#" instead of the missing glyph.

Step 1: set font-family in the FO file

If the default font doesn't support the characters of our text (or we simply want to use a different font), we must use the font-family property to state the desired one.

The value of font-family is inherited, so if we want to use the same font for the whole document we can set the property on the fo:page-sequence; if we need a special font just for some paragraphs or words, we can set font-family on the relevant fo:block or fo:inline.

So, our input becomes (using a font I have as example):

<?xml version="1.0" encoding="UTF-8"?>
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
    <fo:layout-master-set>
        <fo:simple-page-master master-name="one">
            <fo:region-body />
        </fo:simple-page-master>
    </fo:layout-master-set>
    <fo:page-sequence master-reference="one">
        <fo:flow flow-name="xsl-region-body">
            <!-- a block of chinese text -->
            <fo:block font-family="SimSun">博洛尼亚大学中国学生的毕业论文</fo:block>
        </fo:flow>
    </fo:page-sequence>
</fo:root>

But now we get a new warning, in addition to the old ones!

org.apache.fop.events.LoggingEventListener processEvent
WARNING: Font "SimSun,normal,400" not found. Substituting with "any,normal,400".
org.apache.fop.events.LoggingEventListener processEvent
WARNING: Glyph "?" (0x535a) not available in font "Times-Roman".
...

FOP doesn't know how to map "SimSun" to a font file, so it defaults to a generic Base-14 font (Times-Roman) which does not support our chinese characters, and the PDF still shows "#".

Step 2: configure font mapping in FOP's configuration file

Inside FOP's folder, the file conf/fop.xconf is an example configuration; we can directly edit it or make a copy to start from.

The configuration file is an XML file, and we have to add the font mappings inside /fop/renderers/renderer[@mime = 'application/pdf']/fonts/ (there is a renderer section for each possible output mime type, so check you are inserting your mapping in the right one):

<?xml version="1.0"?>
<fop version="1.0">
  ...
  <renderers>
    <renderer mime="application/pdf">
      ...
      <fonts>

        <!-- specific font mapping -->
        <font kerning="yes" embed-url="/Users/furini/Library/Fonts/SimSun.ttf" embedding-mode="subset">
          <font-triplet name="SimSun" style="normal" weight="normal"/>
        </font>

        <!-- "bulk" font mapping -->
        <directory>/Users/furini/Library/Fonts</directory>

      </fonts>
      ...
    </renderer>
    ...
  </renderers>
</fop>
  • each font element points to a font file
  • each font-triplet entry identifies a combination of font-family + font-style (normal, italic, ...) + font-weight (normal, bold, ...) mapped to the font file in the parent font element
  • using directory elements it is also possible to automatically configure all the font files inside the indicated folders (but this takes some time if the folders contain a lot of fonts)

If we have a complete file set with specific versions of the desired font (normal, italic, bold, light, bold italic, ...) we can map each file to the precise font triplet, thus producing a very sophisticated PDF.

On the opposite end of the spectrum we can map all the triplet to the same font file, if it's all we have available: in the output all text will appear the same, even if in the FO file parts of it were marked as italic or bold.

Note that we don't need to register all possible font triplets; if one is missing, FOP will use the font registered for a "similar" one (for example, if we don't map the triplet "SimSun,italic,400" FOP will use the font mapped to "SimSun,normal,400", warning us about the font substitution).

We are not done yet, as without the next and last step nothing changes when we process our input file.

Step 3: tell FOP to use the configuration file

If we are calling FOP from the command line, we use the -c option to point to our configuration file, for example:

$ fop -c /path/to/our/fop.xconf input.fo input.pdf

From java code we can use (see also FOP's site):

fopFactory.setUserConfig(new File("/path/to/our/fop.xconf"));

Now, at last, the PDF should correctly use the desired fonts and appear as expected.

If instead FOP terminates abruptly with an error like this:

org.apache.fop.cli.Main startFOP
SEVERE: Exception org.apache.fop.apps.FOPException: Failed to resolve font with embed-url '/Users/furini/Library/Fonts/doesNotExist.ttf'

it means that FOP could not find the font file, and the font configuration needs to be checked again; typical causes are

  • a typo in the font url
  • insufficient privileges to access the font file
Musjid answered 31/1, 2015 at 13:8 Comment(11)
This partly helped solve my issue displaying Japanese on my PDF. however "<xsl:attribute name="font-family">Arial Unicode MS</xsl:attribute>" in my xslt for page-sequnce nailed it.Marcelo
Is it possible to display Japanse language without configuration file? We are using properties bundle for different languages and working for all other languages. I want to use font which would support for all languages including japsnse. Is it possible?Kristoforo
@RKG: no, without font configuration you cannot embed fonts, and japanese characters are not supported by the "base 14" fonts; however you could configure the same font to be used for all languages if you can find one supporting both latin and japanese characters (another comment mentioned Arial Unicode MS, it could be worth a try).Musjid
Yes, Working fine after creating configuration file with Arial Unicode MS font.Kristoforo
How to give ttf file path in configuration file? My font file placed under /webapp/src/main/resources/fonts/ARIALUNI.ttf and configuration file under /webapp/src/main/resources/fop.xconf. Currently i'm using "Arial Unicode MS" font family.Kristoforo
I think relative font paths refer to the configuration file path, so yours should be embed-url="fonts/ARIALUNI.ttf".Musjid
Using the Arial Unicode MS font the disadvantage is that bold text will be displayed normally. I use a separate FOP-configuration file for the chinese text and simply map it to Arial. So I'm able to use the same report stylesheet with font-family Arial for all languages including chinese and I still have bold text for the non-Chinese reports.Dannydannye
Has anybody been using chinese bold fonts successfully? I tried the Windows-supplied font mingliub.ttc, "MingLiU-ExtB", but it won't render (displays ###). The normal font mingliu.ttc, "MingLiU" works perfectly, but only for non-bold text.Dannydannye
@Dannydannye Do you get any warning / error message? Note that each font file must be configured.Musjid
@Musjid yes, I configured both font-collections exactly the same way with the proper subfont-name: mingliu.ttc with sub-font: MingLiU vs. mingliub.ttc with sub-font: MingLiU-ExtB; mingliu.ttc works, mingliub.ttc not. However, I've now switched to uming.ttc due to licensing restrictions. For uming.ttc there doesn't seem to be a bold collection (freedesktop.org/wiki/Software/CJKUnifonts/Download).Dannydannye
@Musjid I actually see the fonts are selected correctly in the logfile: 18:27:45,793 INFO TTFFile:1794 - This is a TrueType collection file with 3 fonts 18:27:45,793 INFO TTFFile:1796 - Containing the following fonts: 18:27:45,793 INFO TTFFile:1813 - MingLiU-ExtB <-- selected 18:27:45,793 INFO TTFFile:1815 - PMingLiU-ExtB 18:27:45,793 INFO TTFFile:1815 - MingLiU_HKSCS-ExtB - but the bold glyphs don't show up (instead ##).Dannydannye

© 2022 - 2024 — McMap. All rights reserved.