Delete all unused characters from a TTF-font with Fontforge
Asked Answered
F

2

14

How can I delete all characters from a TTF font file that are not used in a given text sample with Fontforge? In other words I want to create a subset from an existing font which contains only the characters that actually show in my text.

Here's a small example: There is a text file that contains the words "사슴 코끼리 당나귀". So how can I delete all the other characters from the font file that are not part of that small text sample? In the end I want to end up with a new TTF file that contains only the used characters. The actual text is longer than this, so searching manually in fontforge is not an option.

In fontforge there are certain selection options (Main Menu > Edit > Selection), and I tried "Select by Wildcarcds" after converting the korean characters into their unicode sequences. But no luck yet.

Thanks a lot for any ideas! kind regards

EDIT: use case: I am creating children's ebooks which by their nature consist mainly of images. The text, however, is not part of the pictures but is displayed on an additional layer that is displayed in foreground of the pictures . The ebook files (I am producing mainly for amazon kindle) consist of some meta data, the image files, layout information and of course the font files. The Amazon Kindle publishing program has very strict file size restrictions. In order for a book to be sold for the certain price range that I'm going for, the file size must not exceed 3 megabytes. That is ok when I use a western font set. But my ebooks are bilingual and for the Korean edition I need to add a Korean font (in addition to the western font). Asian font files are comparatively huge due to the nature of their alphabets / glyphs. storing 20.000 (in extreme cases up to 200.000) glyphs makes for ~ 7-12 megabytes per font weight. Again, my overall book filesize limit is 3 megabytes which has to do for all the pictures and the font files (plus the layout and meta files). Knowing that the text of an ebook is not altered by the reader it is safe to discard all the glyphs from the font that are not used in my text. Not filling up the storage of the user's reading device unneccessarily is another consideration here. I already compressed the image files heavily and cannot go any further with compression as the quality starts suffering at certain compression rates. I hope now it's clear why I think subsetting the font is a good solution.

Flush answered 22/2, 2016 at 14:11 Comment(7)
Question: why? What problem are you trying to solve that you think requires subsetting?Tansy
thanks for pointing out. I just added a use case to the question.Flush
the use case you describe is already covered by telling Acrobat etc. to generate your document master with subset fonts instead of full embeds. This is not a thing you need to do yourself.Tansy
thanks for your answer. i am not using acrobat. did you assume that the ebooks are in pdf format, or why did you assume I use Acrobat? the format is .mobi (which is an Amazon version of .epub format) and it's got nothing to do with pdf. there is a software called "kindle children's book creator" which assists in assembling the .mobi files (which otherwise is done by hand). it does not have the functionality you describe (i.e. subsetting embedded fonts)Flush
I never said, nor assumed, you were making a pdf. I used "acrobat etc" because I assume you're using desktop publishing tools, not just "a text editor" or something. Acrobat for PDF, Calibre for EPUB, etc. etc. etc. So: which tools are you using? Because that's the most important information (because those tools should take care of subsetting already), and is still missing.Tansy
The only DPT software I use is Kindle Kids' Book Creator. As I said before, it does not support font subsetting. (Before that I used to actually make the books in a text editor because non of the existing DPTs can create compliant picture(!) books)Flush
can Book Creator read in epubs, itself? If so, you could run your book through Calibre, to tell it to do the font subsetting for you, then load the result from that back into Book Creator.Tansy
F
25

I have found a way to create a subset of an existing font in FontForge on a semi-automated basis. The key was to use Fontforge's scripting capabilities. I used an internet service (see link below) to get the unicodes of all the characters that I use in my book. This is looking like this "\uc6d0\uc22d\uc774\uac1c\ubbf8\uacf0\ubc8c\ub3cc\uace0\ub798"

I took the output of the service and used Notepad++'s "search and replace" functionality to get the following structure for a script:

SelectMore("uc6d0")
SelectMore("uc22d")
SelectMore("uc774")
SelectMore("uac1c")
SelectMore("ubbf8")
SelectMore("uacf0")
SelectMore("ubc8c")
SelectMore("ub3cc")
SelectMore("uace0")
SelectMore("ub798")

It's just repeatedly calling the same function: SelectMore(). This function selects the glyph that gets passed as the argument without clearing any previous selection. Note also, that this script assumes that fontforge is running and has the font file opened. (Link to fontforge scripting help, see below). To execute a script right from within Fontforge select "file -> execute script... " from the main menu, paste the script and hit run.

Now all the used glyphs are selected, all unused glyphs are deselected. In the main menu hit "Edit -> Select -> Invert Selection" to have all the unused glyphs selected. now we can run in main menu "Encoding -> Detach and Remove glyphs" to remove all the selected (unused) glyphs. Now saving the font as a new font results in the required subset.

Flush answered 2/3, 2016 at 17:43 Comment(0)
D
4

Great question/answer user3725694. To make script generation more automatic the following python code may be used to get the Unicode data for English printable characters:

import string 
s = string.printable
for c in s: print('SelectMore("u%04x")' % ord(c))

It returns:

SelectMore("u0030")

SelectMore("u0031")

...

There's a great article on automatic alternatives to manual FontForge activities

Dudleyduds answered 23/2, 2018 at 21:17 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.