I get InvalidURL: URL can't contain control characters when I try to send a request using urllib
Asked Answered
P

8

12

I am trying to get a JSON response from the link used as a parameter to the urllib request. but it gives me an error that it can't contain control characters.

how can I solve the issue?

start_url = "https://devbusiness.un.org/solr-sitesearch-output/10//0/ds_field_last_updated/desc?bundle_fq =procurement_notice&sm_vid_Institutions_fq=&sm_vid_Procurement_Type_fq=&sm_vid_Countries_fq=&sm_vid_Sectors_fq= &sm_vid_Languages_fq=English&sm_vid_Notice_Type_fq=&deadline_multifield_fq=&ts_field_project_name_fq=&label_fq=&sm_field_db_ref_no__fq=&sm_field_loan_no__fq=&dm_field_deadlineFrom_fq=&dm_field_deadlineTo_fq =&ds_field_future_posting_dateFrom_fq=&ds_field_future_posting_dateTo_fq=&bm_field_individual_consulting_fq="
    
source = urllib.request.urlopen(start_url).read()

the error I get is :

URL can't contain control characters. '/solr-sitesearch-output/10//0/ds_field_last_updated/desc?bundle_fq =procurement_notice&sm_vid_Institutions_fq=&sm_vid_Procurement_Type_fq=&sm_vid_Countries_fq=&sm_vid_Sectors_fq= &sm_vid_Languages_fq=English&sm_vid_Notice_Type_fq=&deadline_multifield_fq=&ts_field_project_name_fq=&label_fq=&sm_field_db_ref_no__fq=&sm_field_loan_no__fq=&dm_field_deadlineFrom_fq=&dm_field_deadlineTo_fq =&ds_field_future_posting_dateFrom_fq=&ds_field_future_posting_dateTo_fq=&bm_field_individual_consulting_fq=' (found at least ' ')
Progress answered 25/10, 2020 at 7:8 Comment(4)
Is it a valid URL? It doesn't work from a browser and has a strange part /10//0/ . Normally, double-slash // can be replaced with a single slash, then the URL gives page not found.Jacobs
you are right, I provided the wrong URL, i Fixed it. thank you. small mistakes are always big headaches.Progress
please make sure you encode you urls: urlencoder.io/python actually in url, " " should be "%20" or "+"Puli
Looks like a unprintable character slipped into the URL somehow.Encephalo
F
25

Replacing whitespace with:

url = url.replace(" ", "%20")

if the problem is with the whitespace.

Format answered 2/11, 2021 at 13:11 Comment(1)
so simple and do the jobEligible
P
6

Spaces are not allowed in URL, I removed them and it seems to be working now:

import urllib.request
start_url = "https://devbusiness.un.org/solr-sitesearch-output/10//0/ds_field_last_updated/desc?bundle_fq =procurement_notice&sm_vid_Institutions_fq=&sm_vid_Procurement_Type_fq=&sm_vid_Countries_fq=&sm_vid_Sectors_fq= &sm_vid_Languages_fq=English&sm_vid_Notice_Type_fq=&deadline_multifield_fq=&ts_field_project_name_fq=&label_fq=&sm_field_db_ref_no__fq=&sm_field_loan_no__fq=&dm_field_deadlineFrom_fq=&dm_field_deadlineTo_fq =&ds_field_future_posting_dateFrom_fq=&ds_field_future_posting_dateTo_fq=&bm_field_individual_consulting_fq="
url = start_url.replace(" ","")
source = urllib.request.urlopen(url).read()
Piggott answered 25/10, 2020 at 7:18 Comment(0)
I
4

Solr search strings can get pretty weird. Better use the 'quote' method to encode characters before making the request. See example below:

from urllib.parse import quote

start_url = "https://devbusiness.un.org/solr-sitesearch-output/10//0/ds_field_last_updated/desc?bundle_fq =procurement_notice&sm_vid_Institutions_fq=&sm_vid_Procurement_Type_fq=&sm_vid_Countries_fq=&sm_vid_Sectors_fq= &sm_vid_Languages_fq=English&sm_vid_Notice_Type_fq=&deadline_multifield_fq=&ts_field_project_name_fq=&label_fq=&sm_field_db_ref_no__fq=&sm_field_loan_no__fq=&dm_field_deadlineFrom_fq=&dm_field_deadlineTo_fq =&ds_field_future_posting_dateFrom_fq=&ds_field_future_posting_dateTo_fq=&bm_field_individual_consulting_fq="
    
source = urllib.request.urlopen(quote(start_url)).read()

Better later than never...

Inimical answered 6/7, 2021 at 21:57 Comment(2)
urlopen raises the ValueError exception, because it also escapes the colon separator after https. Better to use urlparse on the url first, and then use quote only what is needed.Gymkhana
See my suggested answer https://mcmap.net/q/892710/-i-get-invalidurl-url-can-39-t-contain-control-characters-when-i-try-to-send-a-request-using-urllibGymkhana
O
2

You probably already found out by now but let's get it written here.

There can't be any space character in the URL, and there are 2, after bundle_fq e dm_field_deadlineTo_fq

Remove those and you're good to go

Outface answered 19/3, 2021 at 18:24 Comment(0)
G
1

Parsing the url first and then encoding the url elements would work.

import urllib.request
from urllib.parse import urlparse, quote

def make_safe_url(url: str) -> str:
    """
    Returns a parsed and quoted url
    """
    _url = urlparse(url)
    url = _url.scheme + "://" + _url.netloc + quote(_url.path) + "?" + quote(_url.query)
    return url

start_url = "https://devbusiness.un.org/solr-sitesearch-output/10//0/ds_field_last_updated/desc?bundle_fq =procurement_notice&sm_vid_Institutions_fq=&sm_vid_Procurement_Type_fq=&sm_vid_Countries_fq=&sm_vid_Sectors_fq= &sm_vid_Languages_fq=English&sm_vid_Notice_Type_fq=&deadline_multifield_fq=&ts_field_project_name_fq=&label_fq=&sm_field_db_ref_no__fq=&sm_field_loan_no__fq=&dm_field_deadlineFrom_fq=&dm_field_deadlineTo_fq =&ds_field_future_posting_dateFrom_fq=&ds_field_future_posting_dateTo_fq=&bm_field_individual_consulting_fq="
start_url = make_safe_url(start_url)
source = urllib.request.urlopen(start_url).read()

The code returns the JSON-document despite the double forward-slash and the whitespace in the url.

Gymkhana answered 16/12, 2022 at 17:16 Comment(0)
I
0

Like the error message says, there are some control characters in your url, which doesn't seem to be a valid one by the way.

Installation answered 25/10, 2020 at 7:16 Comment(0)
D
0

You need to encode the control characters inside the URL. Especially spaces need to be encoded to %20.

Decree answered 25/10, 2020 at 7:18 Comment(0)
C
-2

Please check the proxy address is there any Blank Space,

Due to blank space, we are getting this error.

Cronyism answered 13/7, 2023 at 10:14 Comment(1)
This was already said more than 1 year agoLaporte

© 2022 - 2024 — McMap. All rights reserved.