Python: How to get the Content-Type of an URL?

R

3

18

I need to get the content-type of an internet(intranet) resource not a local file. How can I get the MIME type from a resource behind an URL:

I tried this:

res = urllib.urlopen("http://www.iana.org/assignments/language-subtag-registry")
http_message = res.info()
message = http_message.getplist()

I get: ['charset=UTF-8']

How can I get the Content-Type, can be done using urllib and how or if not what is the other way?

Rifleman answered 18/9, 2012 at 9:45 Comment(2)

See #843892 – Slob 18/9, 2012 at 9:48

https://mcmap.net/q/669046/-how-to-check-the-url-is-either-web-page-link-or-file-link-in-python – Gestalt 3/2, 2014 at 20:26

I

20

res = urllib.urlopen("http://www.iana.org/assignments/language-subtag-registry" )
http_message = res.info()
full = http_message.type # 'text/plain'
main = http_message.maintype # 'text'

Innoxious answered 18/9, 2012 at 10:3 Comment(1)

what to do if it gives 403 error? – Irredentist 23/2 at 15:34

D

31

A Python3 solution to this:

import urllib.request
with urllib.request.urlopen('http://www.google.com') as response:
    info = response.info()
    print(info.get_content_type())      # -> text/html
    print(info.get_content_maintype())  # -> text
    print(info.get_content_subtype())   # -> html

Dubois answered 27/4, 2016 at 7:7 Comment(0)

I

20

res = urllib.urlopen("http://www.iana.org/assignments/language-subtag-registry" )
http_message = res.info()
full = http_message.type # 'text/plain'
main = http_message.maintype # 'text'

Innoxious answered 18/9, 2012 at 10:3 Comment(1)

what to do if it gives 403 error? – Irredentist 23/2 at 15:34

F

2

Update: since info() function is deprecated in Python 3.9, you can read about the preferred type called headers here

import urllib

r = urllib.request.urlopen(url)
header = r.headers                              # type is email.message.EmailMessage
contentType = header.get_content_type()         # or header.get('content-type')
contentLength = header.get('content-length')
filename = header.get_filename()

also, a good way to quickly get the mimetype without actually loading the url

import mimetypes
contentType, encoding = mimetypes.guess_type(url)

The second method does not guarantee an answer but is a quick and dirty trick since it's just looking at the URL string rather than actually opening the URL.

Foul answered 23/2, 2022 at 14:57 Comment(2)

res = urllib.urlopen("#12474906" ) gives 403 error, is there any other workaround to get the content_type? – Irredentist 23/2 at 14:57

Try resquests library import requests r = requests.get("stackoverflow.com") r.headers.get('content-type') – Foul 21/3 at 6:59

Recommended topics

Hot tags