Get unicode character by glyph index in a CTFontRef or CGFontRef object
Asked Answered
M

2

7

The CTFontRef provides excellent method such as CTFontGetGlyphsForCharacters for mapping character(s) to glyph(s). My question is, is there any method for invert mapping? That is say, can I get characters(s) by given glyph(s)? Since I found there is a CTFontCopyCharacterSet for getting all supported characters, I think there will be some nice solutions.

Marieann answered 12/2, 2011 at 10:8 Comment(0)
I
4

I think you may end up having to parse the font’s mapping tables yourself. You can obtain access to the tables using CGFontCopyTableForTag(); the table you're after is the 'cmap' table, the format of which is documented here:

http://www.microsoft.com/typography/otspec/cmap.htm

and also here:

http://developer.apple.com/fonts/TTRefMan/RM06/Chap6cmap.html

Unfortunately, as you’ll discover by reading through these, the business of mapping characters to glyphs is decidedly non-trivial, and in addition any given font may have more than one mapping table (i.e. the set of characters that use a given glyph may depend on which mapping table format you—or the renderer—chooses).

Furthermore, advanced font technology like OpenType or AAT may result in the existence of glyphs for which there is no direct mapping from characters, but that are nevertheless present in the output as a result of substitutions made by the smart font technology. Inverting the OpenType or AAT substitution mechanisms would be tricky, and might also not lead to a single Unicode code point (or indeed even a single grapheme cluster).

Incitement answered 8/4, 2011 at 10:22 Comment(0)
B
7

TLDR: CTFont/CTFontRef/CTGlyph aren't sufficient - CTLine and CTRun need to get involved; and even then it's only meaningful if you have access to the original String->Glyph mapping.

I'm coming back to this a few years late in case others end up hitting this question. As alastair noted, there is no way to generically map glyphs back to characters. Simple examples - there are multiple unicode characters for 'space', often mapped to the same glyph. The same it often true for 'micro' and greek 'mu'.

However, it is sometimes (often?) the case that you have the original string and what you really want is to know how it was mapped to glyphs. In other words - I've got my string, and I've got the resulting glyphs - for each glyph index, what is the character index in the string it is contributing to. I wrote this sample to demonstrate a way to do this. (Aside: Lesson learned - Swift gets a little rough when working with some Core Foundation APIs)

import CoreText
import AppKit

func main(argc: Int, argv: [String])
{
    var stringAttributes: [String: AnyObject] = [:]
    var fontName = "Zapfino"
    var fUseLigatures = false

    var fontNameIndex = 0
    if argc > 1
    {
        if argv[1] == "/lig"
        {
            fUseLigatures = true;
            if (argc > 2) { fontNameIndex = 3 }
        }
        else { fontNameIndex = 2 }
    }

    if fontNameIndex > 0 { fontName = argv[fontNameIndex] }

    if let font = NSFont(name:fontName, size:24.0)
        { stringAttributes[NSFontAttributeName] = font }

    stringAttributes[NSLigatureAttributeName] = fUseLigatures ? 2 : 0

    let string = NSAttributedString(
    string:"This is \(fontName)!",
    attributes: stringAttributes)

    let line = CTLineCreateWithAttributedString(string) // CTLine

    let runs = CTLineGetGlyphRuns(line) // CTRun[]
    let nsRuns:Array<AnyObject> = runs as Array<AnyObject>
    assert(nsRuns.count == 1)

    let run = nsRuns[0] as! CTRun

    let glyphCount = CTRunGetGlyphCount(run)
    println("String: \(string.string)")
    println("\tStrLen: \(count(string.string)), Count Of Glyphs: \(glyphCount)");

    let clusters = UnsafeMutablePointer<CFIndex>.alloc(glyphCount)

    CTRunGetStringIndices(run, CFRange(location:0, length:glyphCount), clusters)

    for var idx = 0; idx < glyphCount; idx++
    {
        let idxString = clusters[idx];
        println("Glyph @ \(idx) maps to String @ \(idxString)")
    }
}

main(Process.arguments.count, Process.arguments)

If you run this without params and then with /lig at the command line you will get the following output:

    String: This is Zapfino!
        StrLen: 16, Count Of Glyphs: 16
Glyph @ 0 maps to String @ 0
Glyph @ 1 maps to String @ 1
Glyph @ 2 maps to String @ 2
Glyph @ 3 maps to String @ 3
Glyph @ 4 maps to String @ 4
Glyph @ 5 maps to String @ 5
Glyph @ 6 maps to String @ 6
Glyph @ 7 maps to String @ 7
Glyph @ 8 maps to String @ 8
Glyph @ 9 maps to String @ 9
Glyph @ 10 maps to String @ 10
Glyph @ 11 maps to String @ 11
Glyph @ 12 maps to String @ 12
Glyph @ 13 maps to String @ 13
Glyph @ 14 maps to String @ 14
Glyph @ 15 maps to String @ 15
joes-mac: Tue Apr 14, 10:26:00
~/Source/FontGlyph/./main /lig
String: This is Zapfino!
        StrLen: 16, Count Of Glyphs: 7
Glyph @ 0 maps to String @ 0
Glyph @ 1 maps to String @ 2
Glyph @ 2 maps to String @ 4
Glyph @ 3 maps to String @ 5
Glyph @ 4 maps to String @ 7
Glyph @ 5 maps to String @ 8
Glyph @ 6 maps to String @ 15

I added the Ligature option to help visualize that glyphs and characters can pretty easily not be 1 to 1. Here is a visual representation of the two strings: enter image description here

Bergamo answered 14/4, 2015 at 17:42 Comment(0)
I
4

I think you may end up having to parse the font’s mapping tables yourself. You can obtain access to the tables using CGFontCopyTableForTag(); the table you're after is the 'cmap' table, the format of which is documented here:

http://www.microsoft.com/typography/otspec/cmap.htm

and also here:

http://developer.apple.com/fonts/TTRefMan/RM06/Chap6cmap.html

Unfortunately, as you’ll discover by reading through these, the business of mapping characters to glyphs is decidedly non-trivial, and in addition any given font may have more than one mapping table (i.e. the set of characters that use a given glyph may depend on which mapping table format you—or the renderer—chooses).

Furthermore, advanced font technology like OpenType or AAT may result in the existence of glyphs for which there is no direct mapping from characters, but that are nevertheless present in the output as a result of substitutions made by the smart font technology. Inverting the OpenType or AAT substitution mechanisms would be tricky, and might also not lead to a single Unicode code point (or indeed even a single grapheme cluster).

Incitement answered 8/4, 2011 at 10:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.