OpenType Font Parsing for Pleasure and Profit (anyone understand these stupid tables?)
Asked Answered
V

2

1

So, this is mainly for fun, I'm poking around and trying to find my way inside a few fonts and I have a few questions I'd really appreciate some help on if anyone has done this kind of stuff.

cmap table

The fonts I am testing with contain several cmap subtables of different formats. I can read them, but I don't understand which I should be using. ie. what is the strategy for choosing the most appropriate subtable? Does this even make sense?

glyf table

This is really making my head hurt. I'm going by what is on here. Looking at the second table on that page, I've got 'n' endPtsOfContours, 'n' instructions and 'n' flags but it is not clear to me if I have the same number of flags as contours (I know how many contours I have). Then, to make matters worse..(fun!) I have an array of xCoords and an array of yCoords. These arrays seem to be of indeterminate length and may contain data of either BYTE or SHORT but we are not going to tell you which.

Ok, I suppose this is what the instructions and flags are for but as you can probably tell I don't really know how to deal with them. Do I need a TrueType interpreter to access the coordinate data?

Vaporish answered 2/6, 2010 at 12:2 Comment(2)
Python's FontTools is very helpful for hacking OTFs, if that's a possibility.Embank
Thanks @Embank any code that can do this is a great helpVaporish
T
1

You are correct, of course.

flags bit 1: If set, the corresponding x-coordinate is 1 byte long. If not set, 2 bytes.

flags bit 2: If set, the corresponding y-coordinate is 1 byte long. If not set, 2 bytes.

I wrote code to walk the TrueType tables a long time ago--in C of course. I suppose you can compare your results with the output of TTFDump (still available from Microsoft).

Trahurn answered 2/6, 2010 at 15:53 Comment(2)
Thanks @Trahurn i'm still struggling to see how many flags and coords i might have, but having the output of TTFDump would be a great help.Vaporish
So i used TTFDump on a few fonts and it makes it pretty clearVaporish
W
0

For the cmap encoding, favor any Unicode encoding first, either platform id = 0 or platform id = 3 with encoding id = 10 or 1 (platform id list on the name table), and favor cmap format 12 (complete Unicode space) over 4 (only the basic multilingual plane). After that, the relative priorities of the encodings become more vague {Wansung, BIG5, PRC, Shift-JIS...}, but also less important since a font tends to be mainly a Japanese, Chinese, or Korean font - not all the above at once. Formats 4 and 12 are by far the most common over 0, 2, 6. Format 14 can be found in CJK fonts with variation selectors as a supplement to format 4 or 12, and format 13 can be found a special "last resort" font (used during font fallback when no good choice supports the given text).

Wrasse answered 26/10, 2012 at 4:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.