Unable to locate files with long names on Windows with Python
Asked Answered
B

3

33

I need to walk through folders with long file names in Windows.

I tried using os.listdir(), but it crashes with long pathnames, which is bad.

I tried using os.walk(), but it ignores the pathnames longer than ~256, which is worse.

I tried the magic word workaround described here, but it only works with mapped drives, not with UNC pathnames.

Here is an example with short pathnames, that shows that UNC pathnames don't work with the magic word trick.

>>> os.listdir('c:\\drivers')
['nusb3hub.cat', 'nusb3hub.inf', 'nusb3hub.sys', 'nusb3xhc.cat', 'nusb3xhc.inf', 'nusb3xhc.sys']
>>> os.listdir('\\\\Uni-hq-srv6\\router')
['2009-04-0210', '2010-11-0909', ... ]

>>> mw=u'\\\\?\\'
>>> os.listdir(mw+'c:\\drivers')
[u'nusb3hub.cat', u'nusb3hub.inf', u'nusb3hub.sys', u'nusb3xhc.cat', u'nusb3xhc.inf', u'nusb3xhc.sys']
>>> os.listdir(mw+'\\\\Uni-hq-srv6\\router')

Traceback (most recent call last):
  File "<pyshell#160>", line 1, in <module>
    os.listdir(mw+'\\\\Uni-hq-srv6\\router')
WindowsError: [Error 123] The filename, directory name, or volume label syntax is incorrect: u'\\\\?\\\\\\Uni-hq-srv6\\router\\*.*'

Any idea on how to deal with long pathnames or with unicode UNC pathnames?

Edit:

Following the suggestion of the comments below, I created some test functions to compare Python 2.7 and 3.3, and I added the test of glob.glob and os.listdir after os.chdir.

The os.chdir didn't help as expected (see this comment).

The glob.glob is the only one that in Python 3.3 works better, but only in one condition: using the magic word and with the drive name.

Here is the code I used (it works on both 2.7 and 3.3). I am learning Python now, and I hope these tests make sense:

from __future__ import print_function
import os, glob

mw = u'\\\\?\\'

def walk(root):
    n = 0
    for root, dirs, files in os.walk(root):
        n += len(files)
    return n

def walk_mw(root):
    n = 0
    for root, dirs, files in os.walk(mw + root):
        n += len(files)
    return n

def listdir(root):
    try:
        folders = [f for f in os.listdir(root) if os.path.isdir(os.path.join(root, f))]
        files = [f for f in os.listdir(root) if os.path.isfile(os.path.join(root, f))]
        n = len(files)
        for f in folders:
            n += listdir(os.path.join(root, f))
        return n
    except:
        return 'Crash'

def listdir_mw(root):
    if not root.startswith(mw):
        root = mw + root
    try:
        folders = [f for f in os.listdir(root) if os.path.isdir(os.path.join(root, f))]
        files = [f for f in os.listdir(root) if os.path.isfile(os.path.join(root, f))]
        n = len(files)
        for f in folders:
            n += listdir_mw(os.path.join(root, f))
        return n
    except:
        return 'Crash'

def listdir_cd(root):
    try:
        os.chdir(root)
        folders = [f for f in os.listdir('.') if os.path.isdir(os.path.join(f))]
        files = [f for f in os.listdir('.') if os.path.isfile(os.path.join(f))]
        n = len(files)
        for f in folders:
            n += listdir_cd(f)
        return n
    except:
        return 'Crash'

def listdir_mw_cd(root):
    if not root.startswith(mw):
        root = mw + root
    try:
        os.chdir(root)
        folders = [f for f in os.listdir('.') if os.path.isdir(os.path.join(f))]
        files = [f for f in os.listdir('.') if os.path.isfile(os.path.join(f))]
        n = len(files)
        for f in folders:
            n += listdir_cd(f) # the magic word can only be added the first time
        return n
    except:
        return 'Crash'

def glb(root):
    folders = [f for f in glob.glob(root + '\\*') if os.path.isdir(os.path.join(root, f))]
    files = [f for f in glob.glob(root + '\\*') if os.path.isfile(os.path.join(root, f))]
    n = len(files)
    for f in folders:
        n += glb(os.path.join(root, f))
    return n

def glb_mw(root):
    if not root.startswith(mw):
        root = mw + root
    folders = [f for f in glob.glob(root + '\\*') if os.path.isdir(os.path.join(root, f))]
    files = [f for f in glob.glob(root + '\\*') if os.path.isfile(os.path.join(root, f))]
    n = len(files)
    for f in folders:
        n += glb_mw(os.path.join(root, f))
    return n

def test():
    for txt1, root in [('drive ', r'C:\test'),
                    ('UNC   ', r'\\Uni-hq-srv6\router\test')]:
        for txt2, func in [('walk                    ', walk),
                           ('walk     magic word     ', walk_mw),
                           ('listdir                 ', listdir),
                           ('listdir  magic word     ', listdir_mw),
                           ('listdir              cd ', listdir_cd),
                           ('listdir  magic word  cd ', listdir_mw_cd),
                           ('glob                    ', glb),
                           ('glob     magic word     ', glb_mw)]:
            print(txt1, txt2, func(root))

test()

And here is the result:

  • The number 8 means all the files were found
  • The number 0 means it didn't even try without crashing
  • Any number between 1 and 7 means it failed half way without crashing
  • The word Crash means it crashed

-

Python 2.7
drive  walk                     5
drive  walk     magic word      8      * GOOD *
drive  listdir                  Crash
drive  listdir  magic word      8      * GOOD *
drive  listdir              cd  Crash
drive  listdir  magic word  cd  5
drive  glob                     5
drive  glob     magic word      0
UNC    walk                     6
UNC    walk     magic word      0
UNC    listdir                  5
UNC    listdir  magic word      Crash
UNC    listdir              cd  5
UNC    listdir  magic word  cd  Crash
UNC    glob                     5
UNC    glob     magic word      0

Python 3.3
drive  walk                     5
drive  walk     magic word      8      * GOOD *
drive  listdir                  Crash
drive  listdir  magic word      8      * GOOD *
drive  listdir              cd  Crash
drive  listdir  magic word  cd  5
drive  glob                     5
drive  glob     magic word      8      * GOOD *
UNC    walk                     6
UNC    walk     magic word      0
UNC    listdir                  5
UNC    listdir  magic word      Crash
UNC    listdir              cd  5
UNC    listdir  magic word  cd  Crash
UNC    glob                     5
UNC    glob     magic word      0
Benempt answered 22/8, 2013 at 20:35 Comment(6)
how about using net use and assign a drive letter for the UNC?Germanophobe
@iTayb: Thanks. It's ugly, but it should work. I need to scan several network drives, so I should net use them and then net use /delete them.Benempt
I would be interested to know whether this is on py2 or py3 and if py2, whether the behaviour is different on py3?Cheju
I don't use Windows for a long time, just curious: would glob.glob("*") work for you?Ephraimite
This is a great question. Could you make the title more descriptive so people searching in the future will be more likely to find it? Something like "Unable to locate files with long names on Windows with Python"Gallinule
is file name or its full path longer than 256? if the second one, have you tried doing chdir and listing relative paths: os.chdir(mv) os.listdir('Uni-hq-srv6\\router')Bactria
D
7

Use the 8.3 fallback to avoid the long pathname, browsing in Win7 explorer this seems to be what windows itself does, ie every long paths has a shorter 'true name':

>>> long_unc="\\\\K53\\Users\\Tolan\\testing\\xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\\xxxxxxxxxxxxxxxxxxxxxxxxdddddddddddddddddddddwgggggggggggggggggggggggggggggggggggxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\\esssssssssssssssssssssggggggggggggggggggggggggggggggggggggggggggggggeee"
>>> os.listdir(long_unc)
FileNotFoundError: [WinError 3]

but you can use win32api (pywin32) to 'build' up a shorter version, ie

short_unc=win32api.GetShortPathName(win32api.GetShortPathName(win32api.GetShortPathName("\\\\K53\\Users\\Tolan\\testing\\xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx")+"\\xxxxxxxxxxxxxxxxxxxxxxxxdddddddddddddddddddddwgggggggggggggggggggggggggggggggggggxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx") + "\\esssssssssssssssssssssggggggggggggggggggggggggggggggggggggggggggggggeee")
>>> print(short_unc)
\\K53\Users\Tolan\testing\XXXXXX~1\XXXXXX~1\ESSSSS~1
>>> import os
>>> os.listdir(short_unc)
['test.txt']

clearly you can just fold the win32api.GetShortPathName call into you dir exploration rather than nesting as in my example. I've done it like this with 3 calls because if you've already got a 'too long' path then win32api.GetShortPathName wont cope with it either, but you can do it per dir and stay below the limit.

Decimate answered 23/8, 2013 at 23:41 Comment(1)
It looks like the nested call is never required. Calling win32api.GetShortPathName('C:\\test\\123456789 123456789 123456789 123456789 123456789\\123456789 123456789 123456789 123456789 123456789') returns C:\test\123456~1\123456~1.Benempt
A
5

To locate files on UNC paths, the the magic prefix is \\?\UNC\ rather than just \\?\.

Reference: https://msdn.microsoft.com/en-us/library/aa365247(VS.85).aspx#maxpath

So to access //server/share/really/deep/path/etc/etc, you'd need to

  1. Convert it to unicode (use the unicode() constructor)
  2. Add the magic prefix ("\\?\\UNC\"), and
  3. Ensure all directory separators are "\" (see os.path.normpath())

Resulting unicode string: \\?\UNC\server\share\really\deep\path\etc\etc

I've only experimented a little (much less than @stenci did) but with Python 2.7 it seems to work OK with os.walk(), and to fail with os.listdir().

Caveat: It only works with os.walk() if the starting path for the traversal is within the MAX_PATH limit, and none of the sub directories in the starting path would push it over the limit either. This is because as os.walk() uses os.listdir() on the top directory.

Auspicate answered 17/7, 2015 at 10:47 Comment(1)
One important trick : your long file name must be of unicode and not str type if you use Python 2.x : u'\\\\?\\UNC\\'Gillispie
B
1

In my previous comment I said that the nested recursive call of GetShortPathName is not required. I found it is not required most of the times, but once in a while it crashes. I wasn't able to figure out when, so I made this little function that has been working smoothly for some time:

This is the function that I use now:

def short_name(name):
    try:
        return win32api.GetShortPathName(name)
    except win32api.error:
        dirname = os.path.dirname(name)
        basename = os.path.basename(name)
        short_dirname = win32api.GetShortPathName(dirname)
        return win32api.GetShortPathName(os.path.join(short_dirname, basename))

try:
    mtime = os.path.getmtime(name)
except FileNotFoundError:
    name = short_name(name)
    mtime = os.path.getmtime(name)
Benempt answered 15/7, 2016 at 17:37 Comment(1)
Would calling short_name(dirname) instead of GetShortPahtName(dirname) be a good improvement? It would decompose the path to the minimum required and then add all the bits remaining...Spew

© 2022 - 2024 — McMap. All rights reserved.