cross-platform splitting of path in python
Asked Answered
W

8

48

I'd like something that has the same effect as this:

>>> path = "/foo/bar/baz/file"
>>> path_split = path.rsplit('/')[1:]
>>> path_split
['foo', 'bar', 'baz', 'file']

But that will work with Windows paths too. I know that there is an os.path.split() but that doesn't do what I want, and I didn't see anything that does.

Wolters answered 2/1, 2011 at 18:52 Comment(1)
BTW Python has os.path that assumes your current OS path syntax but there are also OS-specific path modules called posixpath, ntpath, macpath and os2emxpath with the same interface.Fisherman
O
23

The OP specified "will work with Windows paths too". There are a few wrinkles with Windows paths.

Firstly, Windows has the concept of multiple drives, each with its own current working directory, and 'c:foo' and 'c:\\foo' are often not the same. Consequently it is a very good idea to separate out any drive designator first, using os.path.splitdrive(). Then reassembling the path (if required) can be done correctly by drive + os.path.join(*other_pieces)

Secondly, Windows paths can contain slashes or backslashes or a mixture. Consequently, using os.sep when parsing an unnormalised path is not useful.

More generally:

The results produced for 'foo' and 'foo/' should not be identical.

The loop termination condition seems to be best expressed as "os.path.split() treated its input as unsplittable".

Here's a suggested solution, with tests, including a comparison with @Spacedman's solution

import os.path

def os_path_split_asunder(path, debug=False):
    parts = []
    while True:
        newpath, tail = os.path.split(path)
        if debug: print repr(path), (newpath, tail)
        if newpath == path:
            assert not tail
            if path: parts.append(path)
            break
        parts.append(tail)
        path = newpath
    parts.reverse()
    return parts

def spacedman_parts(path):
    components = [] 
    while True:
        (path,tail) = os.path.split(path)
        if not tail:
            return components
        components.insert(0,tail)

if __name__ == "__main__":
    tests = [
        '',
        'foo',
        'foo/',
        'foo\\',
        '/foo',
        '\\foo',
        'foo/bar',
        '/',
        'c:',
        'c:/',
        'c:foo',
        'c:/foo',
        'c:/users/john/foo.txt',
        '/users/john/foo.txt',
        'foo/bar/baz/loop',
        'foo/bar/baz/',
        '//hostname/foo/bar.txt',
        ]
    for i, test in enumerate(tests):
        print "\nTest %d: %r" % (i, test)
        drive, path = os.path.splitdrive(test)
        print 'drive, path', repr(drive), repr(path)
        a = os_path_split_asunder(path)
        b = spacedman_parts(path)
        print "a ... %r" % a
        print "b ... %r" % b
        print a == b

and here's the output (Python 2.7.1, Windows 7 Pro):

Test 0: ''
drive, path '' ''
a ... []
b ... []
True

Test 1: 'foo'
drive, path '' 'foo'
a ... ['foo']
b ... ['foo']
True

Test 2: 'foo/'
drive, path '' 'foo/'
a ... ['foo', '']
b ... []
False

Test 3: 'foo\\'
drive, path '' 'foo\\'
a ... ['foo', '']
b ... []
False

Test 4: '/foo'
drive, path '' '/foo'
a ... ['/', 'foo']
b ... ['foo']
False

Test 5: '\\foo'
drive, path '' '\\foo'
a ... ['\\', 'foo']
b ... ['foo']
False

Test 6: 'foo/bar'
drive, path '' 'foo/bar'
a ... ['foo', 'bar']
b ... ['foo', 'bar']
True

Test 7: '/'
drive, path '' '/'
a ... ['/']
b ... []
False

Test 8: 'c:'
drive, path 'c:' ''
a ... []
b ... []
True

Test 9: 'c:/'
drive, path 'c:' '/'
a ... ['/']
b ... []
False

Test 10: 'c:foo'
drive, path 'c:' 'foo'
a ... ['foo']
b ... ['foo']
True

Test 11: 'c:/foo'
drive, path 'c:' '/foo'
a ... ['/', 'foo']
b ... ['foo']
False

Test 12: 'c:/users/john/foo.txt'
drive, path 'c:' '/users/john/foo.txt'
a ... ['/', 'users', 'john', 'foo.txt']
b ... ['users', 'john', 'foo.txt']
False

Test 13: '/users/john/foo.txt'
drive, path '' '/users/john/foo.txt'
a ... ['/', 'users', 'john', 'foo.txt']
b ... ['users', 'john', 'foo.txt']
False

Test 14: 'foo/bar/baz/loop'
drive, path '' 'foo/bar/baz/loop'
a ... ['foo', 'bar', 'baz', 'loop']
b ... ['foo', 'bar', 'baz', 'loop']
True

Test 15: 'foo/bar/baz/'
drive, path '' 'foo/bar/baz/'
a ... ['foo', 'bar', 'baz', '']
b ... []
False

Test 16: '//hostname/foo/bar.txt'
drive, path '' '//hostname/foo/bar.txt'
a ... ['//', 'hostname', 'foo', 'bar.txt']
b ... ['hostname', 'foo', 'bar.txt']
False
Osculate answered 2/1, 2011 at 22:46 Comment(2)
I'm on OS X. os_path_split_asunder(r'c:\windows\program files\winword.exe') >>> ['c:\\windows\\program files\\winword.exe']Inflexible
except in python 2.7 you have os.path.normpath(), so normalize the path, then split it using os.sep. The objective should be behavorial identity, not literal identity. ('C:/dir/..\\file2' === 'C:\\file2').Eyewitness
H
77

Python 3.4 introduced a new module pathlib. pathlib.Path provides file system related methods, while pathlib.PurePath operates completely independent of the file system:

>>> from pathlib import PurePath
>>> path = "/foo/bar/baz/file"
>>> path_split = PurePath(path).parts
>>> path_split
('\\', 'foo', 'bar', 'baz', 'file')

You can use PosixPath and WindowsPath explicitly when desired:

>>> from pathlib import PureWindowsPath, PurePosixPath
>>> PureWindowsPath(path).parts
('\\', 'foo', 'bar', 'baz', 'file')
>>> PurePosixPath(path).parts
('/', 'foo', 'bar', 'baz', 'file')

And of course, it works with Windows paths as well:

>>> wpath = r"C:\foo\bar\baz\file"
>>> PurePath(wpath).parts
('C:\\', 'foo', 'bar', 'baz', 'file')
>>> PureWindowsPath(wpath).parts
('C:\\', 'foo', 'bar', 'baz', 'file')
>>> PurePosixPath(wpath).parts
('C:\\foo\\bar\\baz\\file',)
>>>
>>> wpath = r"C:\foo/bar/baz/file"
>>> PurePath(wpath).parts
('C:\\', 'foo', 'bar', 'baz', 'file')
>>> PureWindowsPath(wpath).parts
('C:\\', 'foo', 'bar', 'baz', 'file')
>>> PurePosixPath(wpath).parts
('C:\\foo', 'bar', 'baz', 'file')

Huzzah for Python devs constantly improving the language!

Haemoid answered 7/7, 2015 at 15:50 Comment(4)
looks great but need for 2.7Gaivn
@user966588: that is no problem at all, just run pip install pathlibCotton
Backport of pathlib (named pathlib2) is here.Prehistory
this library is a godsendQuack
O
23

The OP specified "will work with Windows paths too". There are a few wrinkles with Windows paths.

Firstly, Windows has the concept of multiple drives, each with its own current working directory, and 'c:foo' and 'c:\\foo' are often not the same. Consequently it is a very good idea to separate out any drive designator first, using os.path.splitdrive(). Then reassembling the path (if required) can be done correctly by drive + os.path.join(*other_pieces)

Secondly, Windows paths can contain slashes or backslashes or a mixture. Consequently, using os.sep when parsing an unnormalised path is not useful.

More generally:

The results produced for 'foo' and 'foo/' should not be identical.

The loop termination condition seems to be best expressed as "os.path.split() treated its input as unsplittable".

Here's a suggested solution, with tests, including a comparison with @Spacedman's solution

import os.path

def os_path_split_asunder(path, debug=False):
    parts = []
    while True:
        newpath, tail = os.path.split(path)
        if debug: print repr(path), (newpath, tail)
        if newpath == path:
            assert not tail
            if path: parts.append(path)
            break
        parts.append(tail)
        path = newpath
    parts.reverse()
    return parts

def spacedman_parts(path):
    components = [] 
    while True:
        (path,tail) = os.path.split(path)
        if not tail:
            return components
        components.insert(0,tail)

if __name__ == "__main__":
    tests = [
        '',
        'foo',
        'foo/',
        'foo\\',
        '/foo',
        '\\foo',
        'foo/bar',
        '/',
        'c:',
        'c:/',
        'c:foo',
        'c:/foo',
        'c:/users/john/foo.txt',
        '/users/john/foo.txt',
        'foo/bar/baz/loop',
        'foo/bar/baz/',
        '//hostname/foo/bar.txt',
        ]
    for i, test in enumerate(tests):
        print "\nTest %d: %r" % (i, test)
        drive, path = os.path.splitdrive(test)
        print 'drive, path', repr(drive), repr(path)
        a = os_path_split_asunder(path)
        b = spacedman_parts(path)
        print "a ... %r" % a
        print "b ... %r" % b
        print a == b

and here's the output (Python 2.7.1, Windows 7 Pro):

Test 0: ''
drive, path '' ''
a ... []
b ... []
True

Test 1: 'foo'
drive, path '' 'foo'
a ... ['foo']
b ... ['foo']
True

Test 2: 'foo/'
drive, path '' 'foo/'
a ... ['foo', '']
b ... []
False

Test 3: 'foo\\'
drive, path '' 'foo\\'
a ... ['foo', '']
b ... []
False

Test 4: '/foo'
drive, path '' '/foo'
a ... ['/', 'foo']
b ... ['foo']
False

Test 5: '\\foo'
drive, path '' '\\foo'
a ... ['\\', 'foo']
b ... ['foo']
False

Test 6: 'foo/bar'
drive, path '' 'foo/bar'
a ... ['foo', 'bar']
b ... ['foo', 'bar']
True

Test 7: '/'
drive, path '' '/'
a ... ['/']
b ... []
False

Test 8: 'c:'
drive, path 'c:' ''
a ... []
b ... []
True

Test 9: 'c:/'
drive, path 'c:' '/'
a ... ['/']
b ... []
False

Test 10: 'c:foo'
drive, path 'c:' 'foo'
a ... ['foo']
b ... ['foo']
True

Test 11: 'c:/foo'
drive, path 'c:' '/foo'
a ... ['/', 'foo']
b ... ['foo']
False

Test 12: 'c:/users/john/foo.txt'
drive, path 'c:' '/users/john/foo.txt'
a ... ['/', 'users', 'john', 'foo.txt']
b ... ['users', 'john', 'foo.txt']
False

Test 13: '/users/john/foo.txt'
drive, path '' '/users/john/foo.txt'
a ... ['/', 'users', 'john', 'foo.txt']
b ... ['users', 'john', 'foo.txt']
False

Test 14: 'foo/bar/baz/loop'
drive, path '' 'foo/bar/baz/loop'
a ... ['foo', 'bar', 'baz', 'loop']
b ... ['foo', 'bar', 'baz', 'loop']
True

Test 15: 'foo/bar/baz/'
drive, path '' 'foo/bar/baz/'
a ... ['foo', 'bar', 'baz', '']
b ... []
False

Test 16: '//hostname/foo/bar.txt'
drive, path '' '//hostname/foo/bar.txt'
a ... ['//', 'hostname', 'foo', 'bar.txt']
b ... ['hostname', 'foo', 'bar.txt']
False
Osculate answered 2/1, 2011 at 22:46 Comment(2)
I'm on OS X. os_path_split_asunder(r'c:\windows\program files\winword.exe') >>> ['c:\\windows\\program files\\winword.exe']Inflexible
except in python 2.7 you have os.path.normpath(), so normalize the path, then split it using os.sep. The objective should be behavorial identity, not literal identity. ('C:/dir/..\\file2' === 'C:\\file2').Eyewitness
F
20

Someone said "use os.path.split". This got deleted unfortunately, but it is the right answer.

os.path.split(path)

Split the pathname path into a pair, (head, tail) where tail is the last pathname component and head is everything leading up to that. The tail part will never contain a slash; if path ends in a slash, tail will be empty. If there is no slash in path, head will be empty. If path is empty, both head and tail are empty. Trailing slashes are stripped from head unless it is the root (one or more slashes only). In all cases, join(head, tail) returns a path to the same location as path (but the strings may differ).

So it's not just splitting the dirname and filename. You can apply it several times to get the full path in a portable and correct way. Code sample:

dirname = path
path_split = []
while True:
    dirname, leaf = split(dirname)
    if leaf:
        path_split = [leaf] + path_split #Adds one element, at the beginning of the list
    else:
        #Uncomment the following line to have also the drive, in the format "Z:\"
        #path_split = [dirname] + path_split 
        break

Please credit the original author if that answer gets undeleted.

Fisherman answered 2/1, 2011 at 19:6 Comment(5)
+1: Just using the proper library functions the right way is really the best solution.Cayes
@lunaryorn: Not necessarily the best; this is O(n^2), after all. It's not likely to matter for path-length strings, though.Sublime
Why is this solution O(n^2)? Sorry, if the answer is obvious, but I'm not getting it. "os.split()" likely just traverses the given path from the right side up to the last occurrence of a path separator. So even if the function itself is applied multiple times, the overall complexity should just be O(n) as well (with n being the length of the initial path).Cayes
lunaryorn my intuition for why it's O(n^2) is that if i think of every iteration before the last taking a little longer until it's the entire length of the string... add up all the time it takes 1 + 2 + ... + n and that series is n * (n+1) / 2, which is O(n^2)Nephew
Since in both posix and nt implementations it iterates over both parts of the path on every os.path.split, it most surely is O(n^2). If speed is important it may be worthwhile to switch to pathlib, and use Path.parts instead.Nuncupative
B
5

Use the functionality provided in os.path, e.g.

os.path.split(path)

Like written elsewhere you can call it multiple times to split longer paths.

Bollworm answered 2/1, 2011 at 18:54 Comment(2)
Please read my question. os.path.split() just splits it into a pair in the form (dir, basename)—not what I want.Wolters
I'm sorry for not getting what you said. If you want, edit your question so I can upvote you.Wolters
W
3

Here's an explicit implementation of the approach that just iteratively uses os.path.split; uses a slightly different loop termination condition than the accepted answer.

def splitpath(path):
    parts=[]
    (path, tail)=os.path.split( path)
    while path and tail:
         parts.append( tail)
         (path,tail)=os.path.split(path)
    parts.append( os.path.join(path,tail) )
    return map( os.path.normpath, parts)[::-1]

This should satisfy os.path.join( *splitpath(path) ) is path in the sense that they both indicate the same file/directory.

Tested in linux:

In [51]: current='/home/dave/src/python'

In [52]: splitpath(current)
Out[52]: ['/', 'home', 'dave', 'src', 'python'] 

In [53]: splitpath(current[1:])
Out[53]: ['.', 'dave', 'src', 'python']

In [54]: splitpath( os.path.join(current, 'module.py'))
Out[54]: ['/', 'home', 'dave', 'src', 'python', 'module.py']

In [55]: splitpath( os.path.join(current[1:], 'module.py'))
Out[55]: ['.', 'dave', 'src', 'python', 'module.py']

I hand checked a few of the DOS paths, using the by replacing os.path with ntpath module, look OK to me, but I'm not too familiar with the ins and outs of DOS paths.

Wingfooted answered 21/11, 2014 at 16:26 Comment(0)
S
2

Use the functionality provided in os.path, e.g.

os.path.split(path)

(This answer was by someone else and was mysteriously and incorrectly deleted, since it's a working answer; if you want to split each part of the path apart, you can call it multiple times, and each call will pull a component off of the end.)

Sublime answered 2/1, 2011 at 19:7 Comment(2)
He asked how to do it, and I told him how. I'm not going to hold his hand for such a simple thing.Sublime
(I'm surprised people are willing to admit publically that they can't figure out how to call a function twice without it being written for them.)Sublime
V
0

One more try with maxplit option, which is a replacement for os.path.split()

def pathsplit(pathstr, maxsplit=1):
    """split relative path into list"""
    path = [pathstr]
    while True:
        oldpath = path[:]
        path[:1] = list(os.path.split(path[0]))
        if path[0] == '':
            path = path[1:]
        elif path[1] == '':
            path = path[:1] + path[2:]
        if path == oldpath:
            return path
        if maxsplit is not None and len(path) > maxsplit:
            return path
Verecund answered 7/11, 2012 at 8:31 Comment(0)
A
-3

So keep using os.path.split until you get to what you want. Here's an ugly implementation using an infinite loop:

import os.path
def parts(path):
    components = [] 
    while True:
        (path,tail) = os.path.split(path)
        if tail == "":
            components.reverse()
            return components
        components.append(tail)

Stick that in parts.py, import parts, and voila:

>>> parts.parts("foo/bar/baz/loop")
['foo', 'bar', 'baz', 'loop']

Probably a nicer implementation using generators or recursion out there...

Aggri answered 2/1, 2011 at 19:22 Comment(3)
how about the_path.split(os.path.sep)? just saying…Caramel
@aharon: -1 Multiple problems, see my answer.Osculate
@hop: as already commented elsewhere, Windows paths can MIX slashes and backslashes.Osculate

© 2022 - 2024 — McMap. All rights reserved.