How do I remove a query string from URL using Python
Asked Answered
T

9

40

Example:

http://example.com/?a=text&q2=text2&q3=text3&q2=text4

After removing "q2", it will return:

http://example.com/?q=text&q3=text3

In this case, there were multiple "q2" and all have been removed.

Thierry answered 12/10, 2011 at 2:19 Comment(0)
P
87
import sys

if sys.version_info.major == 3:
    from urllib.parse import urlencode, urlparse, urlunparse, parse_qs
else:
    from urllib import urlencode
    from urlparse import urlparse, urlunparse, parse_qs

url = 'http://example.com/?a=text&q2=text2&q3=text3&q2=text4&b#q2=keep_fragment'
u = urlparse(url)
query = parse_qs(u.query, keep_blank_values=True)
query.pop('q2', None)
u = u._replace(query=urlencode(query, True))
print(urlunparse(u))

Output:

http://example.com/?a=text&q3=text3&b=#q2=keep_fragment
Penalty answered 12/10, 2011 at 2:42 Comment(6)
Best answer. One addition, geturl method of urlparse object can be used instead of urlunparse print(u.geturl())Flofloat
If the goal is to remove all query params, wouldn't u = u._replace(query='') also work? We could avoid an extra import this way.Apuleius
@Apuleius The OP wanted to remove only one parameter, not the whole query.Penalty
Python 3 imports: from urllib.parse import urlencode, urlparse, urlunparse, parse_qsBurnham
access to a protected member _replace of a class.... how can we avoid this warning message?Wheelhouse
@Wheelhouse This is how namedtuples work - docs.python.org/3/library/… It's probably some kind of linter adding the warning.Penalty
L
86

To remove all query string parameters:

from urllib.parse import urljoin, urlparse

url = 'http://example.com/?a=text&q2=text2&q3=text3&q2=text4'
urljoin(url, urlparse(url).path)  # 'http://example.com/'

For Python2, replace the import with:

from urlparse import urljoin, urlparse
Lollygag answered 25/8, 2015 at 21:56 Comment(3)
I like this approach a bit better than the popular answer because it doesn't call any internal APIs, but it will also eliminate URL fragments, whereas the popular answer will preserve them. It also doesn't solve the OP's exact question (it deletes all query string parameters), but it solves mine :)Hubby
Was first looking into furl, but this removes the need to install another library. Works perferctly!Infantryman
This should be the accepted answer. I came here twice after a few weeks since it is hard to remember and searched again for it.Infantryman
A
28

Isn't this just a matter of splitting a string on a character?

>>> url = http://example.com/?a=text&q2=text2&q3=text3&q2=text4
>>> url = url.split('?')[0]
'http://example.com/'
Ant answered 6/2, 2019 at 8:3 Comment(4)
I was thinking about this solution as well. Can anyone tell me if there are any issue (potential bug/loophole) in this proposed solution?Understrapper
@ProgramerBeginner There isn't one, really!Moresque
The problem will be clear if you carefully read the original question. The OP wanted to remove only one parameter, all the query parameters.Autoionization
This solution will work if all parameter was always sent in the same order, if you work with a URL that have unordered parameters you end deleting the wrong parameter.Franco
S
11

Using python's url manipulation library furl:

import furl
f = furl.furl("http://example.com/?a=text&q2=text2&q3=text3&q2=text4")
f.remove(['q2'])
print(f.url)
Sienese answered 2/12, 2016 at 9:8 Comment(1)
Calling it 'python's url manipulation library' makes it sound like it's included in the standard lib, which it isn't.Consultant
K
3
query_string = "https://example.com/api/api.php?user=chris&auth=true"
url = query_string[:query_string.find('?', 0)]
Kalfas answered 17/9, 2018 at 9:8 Comment(2)
this does not exactly provide a solution for the given answer. please try improving your answer or deleting it.Calyces
While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value.Meridional
P
1

Or simply put, just use url_query_cleaner() from w3lib.url

from w3lib.url import url_query_cleaner

url = 'http://example.com/?a=text&q2=text2&q3=text3&q2=text4'
url_query_cleaner(url, ('q2'), remove=True)

Output: http://example.com/?a=text&q3=text3

Poucher answered 14/11, 2018 at 7:57 Comment(0)
B
0

Another method that you can use to have more control over what you want to do is urlunparse() which takes a tuple of the parts returned from urlparse().

For example, recently I needed to change the path but keep the query:

from urllib.parse import urlparse, urlunparse

url = 'https://test.host.com/some/path?type_id=7'
parsed_url = urlparse(url)

modified_path = f'{parsed_url.path}/new_path_ending'

output_url = urlunparse((
    parsed_url.scheme,
    parsed_url.netloc,
    modified_path,
    parsed_url.params,
    parsed_url.query,
    parsed_url.fragment
))

print(output_url)
'https://test.host.com/some/path/new_path_ending?type_id=7'

This method preserves all of the URL and gives you granular control of what you want to keep, change, and remove.

Beadledom answered 16/5 at 17:14 Comment(0)
U
-2
import re
q ="http://example.com/?a=text&q2=text2&q3=text3&q2=text4"
todelete="q2"
#Delete every query string matching the pattern
r = re.sub(r''+todelete+'=[a-zA-Z_0-9]*\&*',r'',q)
#Delete the possible trailing #
r = re.sub(r'&$',r'',r)

print r
Useful answered 12/10, 2011 at 2:49 Comment(0)
O
-2

Or you could just use strip

>>> l='http://example.com/?a=text&q2=text2&q3=text3&q2=text4'
>>> l.strip('&q2=text4')
'http://example.com/?a=text&q2=text2&q3=text3'
>>> 
Ochrea answered 30/8, 2021 at 10:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.