The best option is to use the posixpath
module when working with the path component of URLs. This module has the same interface as os.path
and consistently operates on POSIX paths when used on POSIX and Windows NT based platforms.
Sample Code:
#!/usr/bin/env python3
import urllib.parse
import sys
import posixpath
import ntpath
import json
def path_parse( path_string, *, normalize = True, module = posixpath ):
result = []
if normalize:
tmp = module.normpath( path_string )
else:
tmp = path_string
while tmp != "/":
( tmp, item ) = module.split( tmp )
result.insert( 0, item )
return result
def dump_array( array ):
string = "[ "
for index, item in enumerate( array ):
if index > 0:
string += ", "
string += "\"{}\"".format( item )
string += " ]"
return string
def test_url( url, *, normalize = True, module = posixpath ):
url_parsed = urllib.parse.urlparse( url )
path_parsed = path_parse( urllib.parse.unquote( url_parsed.path ),
normalize=normalize, module=module )
sys.stdout.write( "{}\n --[n={},m={}]-->\n {}\n".format(
url, normalize, module.__name__, dump_array( path_parsed ) ) )
test_url( "http://eg.com/hithere/something/else" )
test_url( "http://eg.com/hithere/something/else/" )
test_url( "http://eg.com/hithere/something/else/", normalize = False )
test_url( "http://eg.com/hithere/../else" )
test_url( "http://eg.com/hithere/../else", normalize = False )
test_url( "http://eg.com/hithere/../../else" )
test_url( "http://eg.com/hithere/../../else", normalize = False )
test_url( "http://eg.com/hithere/something/./else" )
test_url( "http://eg.com/hithere/something/./else", normalize = False )
test_url( "http://eg.com/hithere/something/./else/./" )
test_url( "http://eg.com/hithere/something/./else/./", normalize = False )
test_url( "http://eg.com/see%5C/if%5C/this%5C/works", normalize = False )
test_url( "http://eg.com/see%5C/if%5C/this%5C/works", normalize = False,
module = ntpath )
Code output:
http://eg.com/hithere/something/else
--[n=True,m=posixpath]-->
[ "hithere", "something", "else" ]
http://eg.com/hithere/something/else/
--[n=True,m=posixpath]-->
[ "hithere", "something", "else" ]
http://eg.com/hithere/something/else/
--[n=False,m=posixpath]-->
[ "hithere", "something", "else", "" ]
http://eg.com/hithere/../else
--[n=True,m=posixpath]-->
[ "else" ]
http://eg.com/hithere/../else
--[n=False,m=posixpath]-->
[ "hithere", "..", "else" ]
http://eg.com/hithere/../../else
--[n=True,m=posixpath]-->
[ "else" ]
http://eg.com/hithere/../../else
--[n=False,m=posixpath]-->
[ "hithere", "..", "..", "else" ]
http://eg.com/hithere/something/./else
--[n=True,m=posixpath]-->
[ "hithere", "something", "else" ]
http://eg.com/hithere/something/./else
--[n=False,m=posixpath]-->
[ "hithere", "something", ".", "else" ]
http://eg.com/hithere/something/./else/./
--[n=True,m=posixpath]-->
[ "hithere", "something", "else" ]
http://eg.com/hithere/something/./else/./
--[n=False,m=posixpath]-->
[ "hithere", "something", ".", "else", ".", "" ]
http://eg.com/see%5C/if%5C/this%5C/works
--[n=False,m=posixpath]-->
[ "see\", "if\", "this\", "works" ]
http://eg.com/see%5C/if%5C/this%5C/works
--[n=False,m=ntpath]-->
[ "see", "if", "this", "works" ]
Notes:
- On Windows NT based platforms
os.path
is ntpath
- On Unix/Posix based platforms
os.path
is posixpath
ntpath
will not handle backslashes (\
) correctly (see last two cases in code/output) - which is why posixpath
is recommended.
- remember to use
urllib.parse.unquote
- consider using
posixpath.normpath
- The semantics of multiple path separators (
/
) is not defined by RFC 3986. However, posixpath
collapses multiple adjacent path separators (i.e. it treats ///
, //
and /
the same)
- Even though POSIX and URL paths have similar syntax and semantics, they are not identical.
Normative References:
scheme://domain:port/path?query_string#fragment_id
, so 'hithere' is the wholepath
in the first case and 1 section of the it in the second. Just urlparse it then 'hithere' is going to be path.split('/')[1] – Virtuosopath.split('/')[0]
? (the first item of the list) – Ballenger