ReportLab: working with Chinese/Unicode characters
Asked Answered
M

3

21

TL;DR: Is there some way of telling ReportLab to use a specific font, and fallback to another if glyphs for some characters are missing? Alternatively, Do you know of a condensed TrueType font which contains the glyphs for all European languages, Hebrew, Russian, Chinese, Japanese and Arabic?

I've been creating reports with ReportLab, and have encountered problems with rendering strings containing Chinese characters. The font I've been using is DejaVu Sans Condensed, which does not contain the glyphs for Chinese (however, it does contain Cyrillic, Hebrew, Arabic and all sorts of Umlauts for European language support - which makes it pretty versatile, and I need them all from time to time)

Chinese, however, is not supported with the font, and I've not been able to find a TrueType font which supports ALL languages, and meets our graphic design requirements. As a temporary workaround, I made it so that reports for Chinese customers use an entirely different font, containing only English and Chinese glyphs, hoping that characters in other languages won't be present in the strings. However this is, for obvious reasons, clunky and breaks the graphic design, since it's not DejaVu Sans, around which the whole look&feel has been designed.

So the question is, how would you deal with the need to support multiple languages in one document, and maintain usage of a specified font for each language. This is made more complicated due to the fact that sometimes strings contain a mix of languages, so determining which ONE font should be used for each string is not an option.

Is there some way of telling ReportLab to use a specific font, and fallback to another if glyphs for some characters are missing? I found vague hints in the docs that it should be possible, although I might understand it incorrectly.

Alternatively, Do you know of a condensed TrueType font which contains the glyphs for all European languages, Hebrew, Russian, Chinese, Japanese and Arabic?

Thanks.

Massimo answered 3/2, 2016 at 8:45 Comment(1)
I don't know a full asnwer but i believe using any unicode fonts [link]en.wikipedia.org/wiki/Unicode_font should help you show characters in many languagesMerkley
B
6

This question fascinated me the complete week, so since it is weekend I dived right into it and exactly found a solution which I called MultiFontParagraph it is a normal Paragraph with one big difference you can exactly set a font fallback order.

Example of the font fallback working

For example this random Japanese text I pulled of the internet used the following font fallback "Bauhaus", "Arial", "HanaMinA". It checks whether the first font has a glyph for the character, if so it uses it, if not it fallsback to the next font. Currently the code isn't really efficient as it places tags around each character, this can easily be fixed but for clarity I didn't do it here.

Using the following code I created the above example:

foreign_string = u'6905\u897f\u963f\u79d1\u8857\uff0c\u5927\u53a6\uff03\u5927'
P = MultiFontParagraph(foreign_string, styles["Normal"],
                     [  ("Bauhaus", "C:\Windows\Fonts\\BAUHS93.TTF"),
                        ("Arial", "C:\Windows\Fonts\\arial.ttf"),
                        ("HanaMinA", 'C:\Windows\Fonts\HanaMinA.ttf')])

The source of the MultiFontParagraph (git) is as follows:

from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont
from reportlab.platypus import Paragraph


class MultiFontParagraph(Paragraph):
    # Created by B8Vrede for http://stackoverflow.com/questions/35172207/
    def __init__(self, text, style, fonts_locations):

        font_list = []
        for font_name, font_location in fonts_locations:
            # Load the font
            font = TTFont(font_name, font_location)

            # Get the char width of all known symbols
            font_widths = font.face.charWidths

            # Register the font to able it use
            pdfmetrics.registerFont(font)

            # Store the font and info in a list for lookup
            font_list.append((font_name, font_widths))

        # Set up the string to hold the new text
        new_text = u''

        # Loop through the string
        for char in text:

            # Loop through the fonts
            for font_name, font_widths in font_list:

                # Check whether this font know the width of the character
                # If so it has a Glyph for it so use it
                if ord(char) in font_widths:

                    # Set the working font for the current character
                    new_text += u'<font name="{}">{}</font>'.format(font_name, char)
                    break

        Paragraph.__init__(self, new_text, style)
Birdie answered 12/2, 2016 at 20:33 Comment(3)
I haven't tested it yet, but it seems it would work. Thing is, although this solution is correct, it was exactly the solution I was trying to avoid :) Since there's no way other than iterating through all the fonts for each character in the text, and some of the reports are hundreds of pages long, this might result in quite the performance hit. Plus, Paragraph() is not the only problematic element. I also draw text directly on a canvas in some cases (not using the Flowable construct), although this solution can be replicated there. Anyway, thanks for the response and kudos on the solution.Massimo
Oh, by the way, I ended up going with merging the different fonts I need into one TTF file. This makes the whole thing to work seamlessly.Massimo
Merging the fonts is always the easiest solution I think. But with this solution the complexity isn't that high in the worst case it is O(N*F*1) where N are the number of characters and F the number of fonts specified and the 1 for the dictionary look up but if the proper fonts are chosen it will only have to check 2 or 3 fonts to find a font that is able to supply the character needed.Birdie
B
3

From Google Noto Fonts:

Google has been developing a font family called Noto, which aims to support all languages with a harmonious look and feel.

The unified Noto Sans font includes a single font, supporting 581 languages from the following areas:

enter image description here

Others such as Hebrew, Arabic and Japanese are listed as separate items on the Noto website.

Bandeen answered 13/2, 2016 at 10:12 Comment(0)
U
0

We can use the Reportlab Chinese Fonts package as well.

from reportlab.pdfgen import canvas
from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.cidfonts import UnicodeCIDFont

# Register the Chinese font with Reportlab
pdfmetrics.registerFont(UnicodeCIDFont('STSong-Light'))

# Create a new canvas
c = canvas.Canvas("sample.pdf")

# Set the font to the Chinese font
c.setFont('STSong-Light', 32)

# Draw some Chinese characters
c.drawString(50, 750, '世界,你好!')

# Save the PDF
c.save()
Uveitis answered 10/3, 2023 at 4:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.