Convert a filename to a file:// URL
Asked Answered
V

4

58

In WeasyPrint’s public API I accept filenames (among other types) for the HTML inputs. Any filename that works with the built-in open() should work, but I need to convert it to an URL in the file:// scheme that will later be passed to urllib.urlopen().

(Everything is in URL form internally. I need to have a "base URL" for documents in order to resolve relative URL references with urlparse.urljoin().)

urllib.pathname2url is a start:

Convert the pathname path from the local syntax for a path to the form used in the path component of a URL. This does not produce a complete URL. The return value will already be quoted using the quote() function.

The emphasis is mine, but I do need a complete URL. So far this seems to work:

def path2url(path):
    """Return file:// URL from a filename."""
    path = os.path.abspath(path)
    if isinstance(path, unicode):
        path = path.encode('utf8')
    return 'file:' + urlparse.pathname2url(path)

UTF-8 seems to be recommended by RFC 3987 (IRI). But in this case (the URL is meant for urllib, eventually) maybe I should use sys.getfilesystemencoding()?

However, based on the literature I should prepend not just file: but file:// ... except when I should not: On Windows the results from nturl2path.pathname2url() already start with three slashes.

So the question is: is there a better way to do this and make it cross-platform?

Vaishnava answered 27/7, 2012 at 12:8 Comment(3)
Couldn't you just check for something like url[0:2] == '///', and if false add the two extra slashes?Microclimatology
Joachim, maybe that would work. I just don’t know what rules to follow to avoid surprising corner-cases.Vaishnava
Hey, your example code uses urlparse.pathname2url, which doesn't exist. Did you mean urllib.pathname2url?Extravasation
M
106

For completeness, in Python 3.4+, you should do:

import pathlib

pathlib.Path(absolute_path_string).as_uri()
Manis answered 9/8, 2015 at 15:43 Comment(4)
This module is also on PyPI (for other Python versions) pypi.python.org/pypi/pathlibVaishnava
pathlib2 should now be used for other Python versionsLumper
as_uri() doesn't work on relative filenames (there are use cases for converting only partial filename to (partial) URLSoprano
Relative URLs can be easily supported using pathlibs resolve() function, e.g. pathlib.Path(src).resolve().as_uri()Yasmineyasu
P
33

I'm not sure the docs are rigorous enough to guarantee it, but I think this works in practice:

import urlparse, urllib

def path2url(path):
    return urlparse.urljoin(
      'file:', urllib.pathname2url(path))
Pinero answered 12/1, 2013 at 21:38 Comment(6)
Tested on Linux, Windows, and OS X and it works fine on all three.Incept
And in py3k this becomes import urlib.parse as urlparse and import urlib.request as urllibLarentia
you should probably call os.path.abspath(path) here.Faucal
If you use the six library to assure Python 2 and 3 portability: return six.moves.urllib_parse.urljoin( "file://", six.moves.urllib.request.pathname2url(path))Numerical
This produces urls that look like file:///C:/foo%20bar/spam/eggs" Shouldn't it be file:///C%3A/foo%20bar/spam/eggs" with the colon turned into %3A?Tiepolo
@stib: aparently not, not on Windows, where the drive letter colon is to be kept intact. See blogs.msdn.microsoft.com/ie/2006/12/06/file-uris-in-windows and en.wikipedia.org/wiki/File_URI_scheme#Windows_2Clingy
H
6

Credit to comment from @danodonovan above.

For Python3, the following code will work:

from urllib.parse import urljoin
from urllib.request import pathname2url

def path2url(path):
    return urljoin('file:', pathname2url(path))
Hoo answered 8/6, 2015 at 6:14 Comment(0)
S
1

Does the following work for you?

from urlparse import urlparse, urlunparse

urlunparse(urlparse('yourURL')._replace(scheme='file'))
Sarasvati answered 27/7, 2012 at 12:48 Comment(1)
The idea is interesting, but I don’t know if this is enough. In particular, ` in Windows filenames is supposed to become /. Still on Windows, The C in C:\foo\bar.html` is parsed as a scheme and then replaced. The expected output would be file:///C:/foo/bar.html.Vaishnava

© 2022 - 2024 — McMap. All rights reserved.