change urlparse.path of a url
Asked Answered
T

2

16

Here is the python code:

url = http://www.phonebook.com.pk/dynamic/search.aspx
path = urlparse(url)
print (path)

>>>ParseResult(scheme='http', netloc='www.phonebook.com.pk', path='/dynamic/search.aspx', params='', query='searchtype=cat&class_id=4520&page=1', fragment='')

print (path.path)
>>>/dynamic/search.aspx

Now I need to change the path.path to my requirement. Like if "/dynamic/search.aspx" is the path then I only need the parts between the first slash and last slash including slashes which is "/dynamic/".

I have tried these two lines but end result is not what I expected that's why I am asking this question as my knowledge of "urllib.parse" is insufficient.

path = path.path[:path.path.index("/")]
print (path)
>>>Returns nothing.
path = path.path[path.path.index("/"):]
>>>/dynamic/search.aspx (as it was before, no change.)

In short whatever the path.path result is my need is directory names only. For example:" dynamic/search/search.aspx". now I need "dynamic/search/"

Trahan answered 24/7, 2016 at 12:53 Comment(0)
G
1

I've tried to look into urlparse to find any method that could help in your situation, but didn't find, may be overlooked, but anyway, at this level, you probably would have to make your own method or hack:

>>> path.path
'/dynamic/search.aspx'

>>> import re
>>> d = re.search(r'/.*/', path.path)
>>> d.group(0)
'/dynamic/'

This is just an example to you, you may also use built-in methods, like so:

>>> i = path.path.index('/', 1)
>>> 
>>> path.path[:i+1]
'/dynamic/'

EDIT:

I didn't notice your last example, so here is another way:

>>> import os
>>> path = os.path.dirname(path.path) + os.sep
>>> path
'/dynamic/'
>>> path = os.path.dirname(s) + os.sep
>>> path
'dynamic/search/'

Or with re:

>>> s
'dynamic/search/search.aspx'
>>> d = re.search(r'.*/', s)
>>> d
<_sre.SRE_Match object; span=(0, 15), match='dynamic/search/'>
>>> d.group(0)
'dynamic/search/'
>>> 
>>> s = '/dynamic/search.aspx'
>>> d = re.search(r'.*/', s)
>>> d.group(0)
'/dynamic/'
Gawen answered 24/7, 2016 at 13:44 Comment(0)
A
12

First, the desired part of the path can be obtained using rfind which returns the index of the last occurrence. The + 1 is for keeping the trailing slash.

desired_path = path.path[:path.path.rfind("/") + 1]

Second, use the _replace method to replace the path attribute of the urlparse object as follows:

desired_url = urlunparse(path._replace(path=desired_path))

The full working example:

from urllib.parse import urlparse, urlunparse

url = "http://www.phonebook.com.pk/dynamic/search/search.aspx"
path = urlparse(url)

desired_path = path.path[:path.path.rfind("/") + 1]
desired_url = urlunparse(path._replace(path=desired_path))
Ares answered 24/3, 2019 at 18:29 Comment(4)
_replace looks like a private API that could disappear without warning :(Tatiana
@Tatiana I agree it is not ideal. But by setting the Python version, you also freeze the urllib version. A controlled environments should avoid a code break if the method changes.Ares
No, _replace is not a private API (even though it seriously looks like one...) it's just underscored so it doesn't collide with field names on the namedtuple.Celebrated
_replace is mentioned in the documentation: docs.python.org/3/library/… So it's not going away any time soon.Iota
G
1

I've tried to look into urlparse to find any method that could help in your situation, but didn't find, may be overlooked, but anyway, at this level, you probably would have to make your own method or hack:

>>> path.path
'/dynamic/search.aspx'

>>> import re
>>> d = re.search(r'/.*/', path.path)
>>> d.group(0)
'/dynamic/'

This is just an example to you, you may also use built-in methods, like so:

>>> i = path.path.index('/', 1)
>>> 
>>> path.path[:i+1]
'/dynamic/'

EDIT:

I didn't notice your last example, so here is another way:

>>> import os
>>> path = os.path.dirname(path.path) + os.sep
>>> path
'/dynamic/'
>>> path = os.path.dirname(s) + os.sep
>>> path
'dynamic/search/'

Or with re:

>>> s
'dynamic/search/search.aspx'
>>> d = re.search(r'.*/', s)
>>> d
<_sre.SRE_Match object; span=(0, 15), match='dynamic/search/'>
>>> d.group(0)
'dynamic/search/'
>>> 
>>> s = '/dynamic/search.aspx'
>>> d = re.search(r'.*/', s)
>>> d.group(0)
'/dynamic/'
Gawen answered 24/7, 2016 at 13:44 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.