How do I remove a query string from URL using Python

T

9

40

Example:

http://example.com/?a=text&q2=text2&q3=text3&q2=text4

After removing "q2", it will return:

http://example.com/?q=text&q3=text3

In this case, there were multiple "q2" and all have been removed.

Thierry answered 12/10, 2011 at 2:19 Comment(0)

P

87

import sys

if sys.version_info.major == 3:
    from urllib.parse import urlencode, urlparse, urlunparse, parse_qs
else:
    from urllib import urlencode
    from urlparse import urlparse, urlunparse, parse_qs

url = 'http://example.com/?a=text&q2=text2&q3=text3&q2=text4&b#q2=keep_fragment'
u = urlparse(url)
query = parse_qs(u.query, keep_blank_values=True)
query.pop('q2', None)
u = u._replace(query=urlencode(query, True))
print(urlunparse(u))

Output:

http://example.com/?a=text&q3=text3&b=#q2=keep_fragment

Penalty answered 12/10, 2011 at 2:42 Comment(6)

Best answer. One addition, geturl method of urlparse object can be used instead of urlunparse print(u.geturl()) – Flofloat 29/3, 2016 at 10:2

If the goal is to remove all query params, wouldn't u = u._replace(query='') also work? We could avoid an extra import this way. – Apuleius 12/10, 2017 at 19:1

@Apuleius The OP wanted to remove only one parameter, not the whole query. – Penalty 14/10, 2017 at 7:35

Python 3 imports: from urllib.parse import urlencode, urlparse, urlunparse, parse_qs – Burnham 21/11, 2017 at 11:44

access to a protected member _replace of a class.... how can we avoid this warning message? – Wheelhouse 9/5, 2019 at 2:9

@Wheelhouse This is how namedtuples work - docs.python.org/3/library/… It's probably some kind of linter adding the warning. – Penalty 10/5, 2019 at 9:33

L

86

To remove all query string parameters:

from urllib.parse import urljoin, urlparse

url = 'http://example.com/?a=text&q2=text2&q3=text3&q2=text4'
urljoin(url, urlparse(url).path)  # 'http://example.com/'

For Python2, replace the import with:

from urlparse import urljoin, urlparse

Lollygag answered 25/8, 2015 at 21:56 Comment(3)

I like this approach a bit better than the popular answer because it doesn't call any internal APIs, but it will also eliminate URL fragments, whereas the popular answer will preserve them. It also doesn't solve the OP's exact question (it deletes all query string parameters), but it solves mine :) – Hubby 10/8, 2018 at 14:4

Was first looking into furl, but this removes the need to install another library. Works perferctly! – Infantryman 7/6, 2020 at 15:33

This should be the accepted answer. I came here twice after a few weeks since it is hard to remember and searched again for it. – Infantryman 25/12, 2020 at 14:24

A

28

Isn't this just a matter of splitting a string on a character?

>>> url = http://example.com/?a=text&q2=text2&q3=text3&q2=text4
>>> url = url.split('?')[0]
'http://example.com/'

Ant answered 6/2, 2019 at 8:3 Comment(4)

I was thinking about this solution as well. Can anyone tell me if there are any issue (potential bug/loophole) in this proposed solution? – Understrapper 7/5, 2019 at 0:0

@ProgramerBeginner There isn't one, really! – Moresque 29/8, 2019 at 18:3

The problem will be clear if you carefully read the original question. The OP wanted to remove only one parameter, all the query parameters. – Autoionization 7/12, 2022 at 10:34

This solution will work if all parameter was always sent in the same order, if you work with a URL that have unordered parameters you end deleting the wrong parameter. – Franco 28/10, 2023 at 12:42

S

11

Using python's url manipulation library furl:

import furl
f = furl.furl("http://example.com/?a=text&q2=text2&q3=text3&q2=text4")
f.remove(['q2'])
print(f.url)

Sienese answered 2/12, 2016 at 9:8 Comment(1)

Calling it 'python's url manipulation library' makes it sound like it's included in the standard lib, which it isn't. – Consultant 23/4, 2020 at 17:21

K

3

query_string = "https://example.com/api/api.php?user=chris&auth=true"
url = query_string[:query_string.find('?', 0)]

Kalfas answered 17/9, 2018 at 9:8 Comment(2)

this does not exactly provide a solution for the given answer. please try improving your answer or deleting it. – Calyces 17/9, 2018 at 9:13

While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value. – Meridional 17/9, 2018 at 11:39

P

1

Or simply put, just use url_query_cleaner() from w3lib.url

from w3lib.url import url_query_cleaner

url = 'http://example.com/?a=text&q2=text2&q3=text3&q2=text4'
url_query_cleaner(url, ('q2'), remove=True)

Output: http://example.com/?a=text&q3=text3

Poucher answered 14/11, 2018 at 7:57 Comment(0)

B

0

Another method that you can use to have more control over what you want to do is urlunparse() which takes a tuple of the parts returned from urlparse().

For example, recently I needed to change the path but keep the query:

from urllib.parse import urlparse, urlunparse

url = 'https://test.host.com/some/path?type_id=7'
parsed_url = urlparse(url)

modified_path = f'{parsed_url.path}/new_path_ending'

output_url = urlunparse((
    parsed_url.scheme,
    parsed_url.netloc,
    modified_path,
    parsed_url.params,
    parsed_url.query,
    parsed_url.fragment
))

print(output_url)
'https://test.host.com/some/path/new_path_ending?type_id=7'

This method preserves all of the URL and gives you granular control of what you want to keep, change, and remove.

Beadledom answered 16/5 at 17:14 Comment(0)

U

-2

import re
q ="http://example.com/?a=text&q2=text2&q3=text3&q2=text4"
todelete="q2"
#Delete every query string matching the pattern
r = re.sub(r''+todelete+'=[a-zA-Z_0-9]*\&*',r'',q)
#Delete the possible trailing #
r = re.sub(r'&$',r'',r)

print r

Useful answered 12/10, 2011 at 2:49 Comment(0)

O

-2

Or you could just use strip

>>> l='http://example.com/?a=text&q2=text2&q3=text3&q2=text4'
>>> l.strip('&q2=text4')
'http://example.com/?a=text&q2=text2&q3=text3'
>>>

Ochrea answered 30/8, 2021 at 10:1 Comment(0)

Recommended topics

Hot tags