Modify URL components in Python 2
Asked Answered
M

2

11

Is there a cleaner way to modify some parts of a URL in Python 2?

For example

http://foo/bar -> http://foo/yah

At present, I'm doing this:

import urlparse

url = 'http://foo/bar'

# Modify path component of URL from 'bar' to 'yah'
# Use nasty convert-to-list hack due to urlparse.ParseResult being immutable
parts = list(urlparse.urlparse(url))
parts[2] = 'yah'

url = urlparse.urlunparse(parts)

Is there a cleaner solution?

Marlanamarlane answered 13/6, 2014 at 8:33 Comment(1)
What exactly do you mean by 'clean'?Mannose
S
24

Unfortunately, the documentation is out of date; the results produced by urlparse.urlparse() (and urlparse.urlsplit()) use a collections.namedtuple()-produced class as a base.

Don't turn this namedtuple into a list, but make use of the utility method provided for just this task:

parts = urlparse.urlparse(url)
parts = parts._replace(path='yah')

url = parts.geturl()

The namedtuple._replace() method lets you create a new copy with specific elements replaced. The ParseResult.geturl() method then re-joins the parts into a url for you.

Demo:

>>> import urlparse
>>> url = 'http://foo/bar'
>>> parts = urlparse.urlparse(url)
>>> parts = parts._replace(path='yah')
>>> parts.geturl()
'http://foo/yah'

mgilson filed a bug report (with patch) to address the documentation issue.

Skiles answered 13/6, 2014 at 8:35 Comment(10)
I was going to point this out. The utility methods are provided to urlparse.ParseResult by the subclass returned by namedtuple. I think that this should be pointed out in the 2.7 docs, because without knowing that, you have no way of knowing that _replace actually is part of the public API for this class...Diffractive
Even more interesting is the mention of BaseResult in the docs which doesn't appear in the source at all ... (sorry about the digression ... It's late ... +1 anyway)Diffractive
@mgilson: heh, indeed, that must be a leftover from before namedtuple was used.Skiles
Thanks - that's a nicer solution. Although, as pointed out in the other comments, there seems to be no way to know about it, based on the docs alone.Marlanamarlane
@GarethStockwell: yeah, looks like a doc bug; none filed yet, I'll do that later.Skiles
@MartijnPieters -- Ninja'd you on the doc bugDiffractive
@mgilson: \o/ less work for me! :-P Can I push you to add a decent example to the docs as well, along the lines of what I did in this answer?Skiles
Whoa there buddy, lets not get carried away now. an example?!?! Who's gonna use that :-)Diffractive
Sadly, this does not allow setting attributes that are not always available. For example, the username and password attribute, which are only in the result tuple when they were part of the URL that was parsed.Sansculotte
@TimVisée: those extra attributes are derived values. They are part of the netloc value; use _replace() to set a netloc string that includes the login info: parts._replace(netloc='{}:{}@{}'.format(newusername, newpassword, parts.netloc.rpartition('@')[-1]))Skiles
H
-1

I guess the proper way to do it is this way.

As using _replace private methods or variables is not suggested.

from urlparse import urlparse, urlunparse

res = urlparse('http://www.goog.com:80/this/is/path/;param=paramval?q=val&foo=bar#hash')
l_res = list(res)
# this willhave ['http', 'www.goog.com:80', '/this/is/path/', 'param=paramval', 'q=val&foo=bar', 'hash']
l_res[2] = '/new/path'
urlunparse(l_res)
# outputs 'http://www.goog.com:80/new/path;param=paramval?q=val&foo=bar#hash'
Hitchhike answered 24/8, 2017 at 7:33 Comment(1)
it's part of the public interface, it's just prefixed with underscrore to not clash with actual members.Leaves

© 2022 - 2024 — McMap. All rights reserved.