How to convert path to Mac OS X path, the almost-NFD normal form?
Asked Answered
E

1

8

Macs normally operate on the HFS+ file system which normalizes paths. That is, if you save a file with accented é in it (u'\xe9') for example, and then do a os.listdir you will see that the filename got converted to u'e\u0301'. This is normal unicode NFD normalization that the Python unicodedata module can handle. Unfortunately HFS+ is not fully consistent with NFD, meaning some paths will not be normalized, for example 福 (u'\ufa1b') will not be changed, although its NFD form is u'\u798f'.

So, how to do the normalization in Python? I would be fine using native APIs as long as I can call them from Python.

Extract answered 8/8, 2013 at 22:50 Comment(4)
A stupid hack that should work: make an empty file in a temp directory and list it.Waver
Note that the temp file hack gets very expensive when you consider that that you can be passed a path that represents a deep directory structure. You would need to do os.makedirs and touch the file and then walk the directory structure to see what got created.Extract
Presumably the normalization is consistent between directory and file names, so you could split the parts and only make files for ones that have possibly-changing characters to avoid walking directories. But yes, this is obviously not a very good solution.Waver
It seems like this is practically a duplicate of #13090082 and that seems to have the answer I need: NSString fileSystemRepresentation. Not sure if this should be marked duplicate or deleted or what...Extract
E
5

Well, decided to write out the Python solution, since the related other question I pointed to was more Objective-C.

First you need to install https://pypi.python.org/pypi/pyobjc-core and https://pypi.python.org/pypi/pyobjc-framework-Cocoa. Then following should work:

import sys

from Foundation import NSString, NSAutoreleasePool

def fs_normalize(path):
    _pool = NSAutoreleasePool.alloc().init()
    normalized_path = NSString.fileSystemRepresentation(path)
    upath = unicode(normalized_path, sys.getfilesystemencoding() or 'utf8')
    return upath

if __name__ == '__main__':
    e = u'\xe9'
    j = u'\ufa1b'
    e_expected = u'e\u0301'

    assert fs_normalize(e) == e_expected
    assert fs_normalize(j) == j

Note that NSString.fileSystemRepresentation() seems to also accept str input. I had some cases where it was returning garbage in that case, so I think it would be just safer to use it with unicode. It always returns str type, so you need to convert back to unicode.

Extract answered 9/8, 2013 at 21:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.