OS X - how to calculate normalized file name
Asked Answered
P

3

6

I need to create a mapping between file names generated on Windows and OS X. I know that OS X "converts all file names to decomposed Unicode" however, "most volume formats do not follow the exact specification for these normal forms"

So, it does not seem a simple matter of converting the Windows name to NFD using a standard UTF8 API and being sure I have the correct OS X name. Is there a way to determine what the actual OS X file name will be without actually creating the file in the file system and then scanning the directory to see what was actually created?

Pompidou answered 26/10, 2012 at 15:12 Comment(0)
P
3

I think the answer is this from TechNote 1150 HFS Plus Volume Format:

Note: The Mac OS Text Encoding Converter provides several constants that let you convert to and from the canonical, decomposed form stored on HFS Plus volumes. When using CreateTextEncoding to create a text encoding, you should set the TextEncodingBase to kTextEncodingUnicodeV2_0, set the TextEncodingVariant to kUnicodeCanonicalDecompVariant, and set the TextEncodingFormat to kUnicode16BitFormat. Using these values ensures that the Unicode will be in the same form as on an HFS Plus volume, even as the Unicode standard evolves.

Pompidou answered 26/10, 2012 at 19:22 Comment(2)
That's specific to HFS+, while your question appears to want a generic answer!Hanseatic
This changed in Mac OS X 10.3. Shortly below that part, it says: “Mac OS versions 8.1 through 10.2.x used decompositions based on Unicode 2.1. Mac OS X version 10.3 and later use decompositions based on Unicode 3.2.” So arguably the right encoding base was kTextEncodingUnicodeV2_1, and is now kTextEncodingUnicodeV3_2 (at least for volumes last mounted writable on post-10.3 systems).Exiguous
A
3

You're probably looking for -[NSString fileSystemRepresentation] method.

Note that there is no general solution for this task. What is a valid file name depends on filesystem of the volume you're saving on. Not every file name valid for HFS+ is valid for FAT32, for example.

For Mac's “standard” filesystem (currently HFS+), fileSystemRepresentation should give what you need; for other file systems, there is no general way. Think about ones that don't exist but will be introduced in the future, for example :)

Ackley answered 26/10, 2012 at 19:34 Comment(0)
H
1

According to your link, filesystem drivers appear to (mostly) follow one of two behaviours: * Return all names in NFD, and convert names as appropriate. * Don't perform any conversions.

In both these cases, if you create a file on OSX in NFD, reading it back on OSX should give you the name in NFD.

OTOH, if your filename goes from Windows → NFS → Mac and you want to do some sort of sync, you're out of luck. This is not an easy thing to do, since the underlying problem is a little philosophical: Should filenames be byte strings or Unicode strings? I believe Unix traditionally does the former, and at least in Linux, UTF-8 NFC names are merely a convention.

(It gets worse, since IIRC HFS+ is defined to use Unicode 3.something, so a naïve conversion to NFD might be wrong for characters added/changed since then unless the API you use can guarantee a specific Unicode version.)

Hanseatic answered 26/10, 2012 at 19:41 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.