How can I unshorten a URL?

S

10

23

I want to be able to take a shortened or non-shortened URL and return its un-shortened form. How can I make a python program to do this?

Additional Clarification:

Case 1: shortened --> unshortened
Case 2: unshortened --> unshortened

e.g. bit.ly/silly in the input array should be google.com in the output array
e.g. google.com in the input array should be google.com in the output array

Sandman answered 17/11, 2010 at 2:56 Comment(2)

Are you talking about a specific URL shortening service, and does this service have an API you can retrieve the info from? – Delaminate 17/11, 2010 at 2:58

If you are in a hurry, you could also use this API rapidapi.com/logicione/api/url-expander1 – Aggy 8/2, 2020 at 10:28

B

40

Send an HTTP HEAD request to the URL and look at the response code. If the code is 30x, look at the Location header to get the unshortened URL. Otherwise, if the code is 20x, then the URL is not redirected; you probably also want to handle error codes (4xx and 5xx) in some fashion. For example:

# This is for Py2k.  For Py3k, use http.client and urllib.parse instead, and
# use // instead of / for the division
import httplib
import urlparse

def unshorten_url(url):
    parsed = urlparse.urlparse(url)
    h = httplib.HTTPConnection(parsed.netloc)
    h.request('HEAD', parsed.path)
    response = h.getresponse()
    if response.status/100 == 3 and response.getheader('Location'):
        return response.getheader('Location')
    else:
        return url

Bus answered 17/11, 2010 at 3:20 Comment(3)

ignores url query, better version here: https://mcmap.net/q/584051/-how-can-i-un-shorten-a-url-using-python – Turfman 12/7, 2013 at 21:6

do note when using above code does not unshorten recursively in case you want to obtain the actual URL. Try on http://t.co/hAplNMmSTg. You need to do return unshorten_url(response.getheader('Location')) for recursivity. – Herculaneum 30/6, 2014 at 11:48

Possibly also keep track of previous urls in a set to prevent cyclic recursion. – Purify 15/2, 2017 at 14:15

P

34

Using requests:

import requests

session = requests.Session()  # so connections are recycled
resp = session.head(url, allow_redirects=True)
print(resp.url)

Philter answered 7/3, 2015 at 18:0 Comment(3)

I like this solution, it automatically follows multiple redirects – Gooding 16/11, 2016 at 9:33

I had to set verify=False as Requests could not validate the cert – Demers 14/6, 2022 at 19:32

Is there a way to have requests display the url of each redirect? – Tuberose 2/7, 2023 at 2:48

H

5

Unshorten.me has an api that lets you send a JSON or XML request and get the full URL returned.

Heder answered 17/11, 2010 at 3:0 Comment(0)

D

5

If you are using Python 3.5+ you can use the Unshortenit module that makes this very easy:

from unshortenit import UnshortenIt
unshortener = UnshortenIt()
uri = unshortener.unshorten('https://href.li/?https://example.com')

Disparity answered 4/5, 2020 at 7:51 Comment(0)

L

4

Open the url and see what it resolves to:

>>> import urllib2
>>> a = urllib2.urlopen('http://bit.ly/cXEInp')
>>> print a.url
http://www.flickr.com/photos/26432908@N00/346615997/sizes/l/
>>> a = urllib2.urlopen('http://google.com')
>>> print a.url
http://www.google.com/

Lingulate answered 17/11, 2010 at 3:19 Comment(3)

This does a GET of the whole page. If the page isn't a redirect and happens to be very large, you're wasting a huge amount of bandwidth just to determine that it's not a redirect. Much better to use a HEAD request instead. – Bus 17/11, 2010 at 3:22

@Adam Rosenfeld: It's probably an appropriate answer for a side project for someone beginning python. I don't recommend that Google or Yahoo spider pages like this to find the real URL. – Lingulate 18/11, 2010 at 15:8

It is a NOT GOOD IDEA doing this. You wasting a lot of bandwidth. Just using unshort.me api is better and faster as @Heder suggested – Menstruate 26/3, 2011 at 22:57

P

4

To unshort, you can use requests. This is a simple solution that works for me.

import requests
url = "http://foo.com"

site = requests.get(url)
print(site.url)

Promethium answered 1/5, 2017 at 0:3 Comment(0)

T

1

http://github.com/stef/urlclean

sudo pip install urlclean
urlclean.unshorten(url)

Turfman answered 12/7, 2013 at 13:34 Comment(1)

Unfortunately this is python 2 only, and why would one write unparenthised print's in python code in 2012 :( – Purify 15/2, 2017 at 14:21

P

1

Here a src code that takes into account almost of the useful corner cases:

set a custom Timeout.
set a custom User Agent.
check whether we have to use an http or https connection.
resolve recursively the input url and prevent ending within a loop.

The src code is on github @ https://github.com/amirkrifa/UnShortenUrl

comments are welcome ...

import logging
logging.basicConfig(level=logging.DEBUG)

TIMEOUT = 10
class UnShortenUrl:
    def process(self, url, previous_url=None):
        logging.info('Init url: %s'%url)
        import urlparse
        import httplib
        try:
            parsed = urlparse.urlparse(url)
            if parsed.scheme == 'https':
                h = httplib.HTTPSConnection(parsed.netloc, timeout=TIMEOUT)
            else:
                h = httplib.HTTPConnection(parsed.netloc, timeout=TIMEOUT)
            resource = parsed.path
            if parsed.query != "": 
                resource += "?" + parsed.query
            try:
                h.request('HEAD', 
                          resource, 
                          headers={'User-Agent': 'curl/7.38.0'}
                                   }
                          )
                response = h.getresponse()
            except:
                import traceback
                traceback.print_exec()
                return url

            logging.info('Response status: %d'%response.status)
            if response.status/100 == 3 and response.getheader('Location'):
                red_url = response.getheader('Location')
                logging.info('Red, previous: %s, %s'%(red_url, previous_url))
                if red_url == previous_url:
                    return red_url
                return self.process(red_url, previous_url=url) 
            else:
                return url 
        except:
            import traceback
            traceback.print_exc()
            return None

Pinpoint answered 15/7, 2015 at 21:22 Comment(3)

If I understand your flow correctly, you might want to put a cap on how many redirects you'll tolerate – Calender 16/7, 2015 at 14:10

@Calender in some cases, the redirect points to the same previous url, so, to prevent the trap of an infinite loop, i propagate the previous url within the recusive call and if i end up with red_url == previous_url, i stop and return that url. Otherwise, in a normal case, at some iteration, the response.status will not be equal anymore to a redirection status, so, we return the retrieved url. – Pinpoint 16/7, 2015 at 17:19

@AmirKrifa does that handle link.foo which points to link.bar which points back to link.foo? (I don't know httplib to know if there's an option to follow redirects, in which case, this sort of link would throw an exception before you called the recursive call) – Calender 16/7, 2015 at 17:26

D

1

You can use geturl()

from urllib.request import urlopen
url = "bit.ly/silly"
unshortened_url = urlopen(url).geturl()
print(unshortened_url)
# google.com

Dragon answered 17/6, 2020 at 7:23 Comment(0)

B

0

This Is very easy task you just need to add 4 lines of codes thats it :)

import requests
url = input('Enter url : ')
site = requests.get(url)
print(site.url)

just run this code you will successfully unshort the url.

Betthezel answered 3/9, 2021 at 17:14 Comment(1)

It's the same as this answer: https://mcmap.net/q/554101/-how-can-i-unshorten-a-url – Deviate 5/9, 2021 at 3:1

Recommended topics

Hot tags