Given a "t.co" link, how can I find out what the link resolves to? For example, if I have "t.co/foo", I want a function or process that returns "domain.com/bar".
I would stay away from external APIs over which you have no control. That will simply introduce a dependency into your application that is a potential point of failure, and could cost you money to use.
CURL can do this quite nicely. Here's how I did it in PHP:
function unshorten_url($url) {
$ch = curl_init($url);
curl_setopt_array($ch, array(
CURLOPT_FOLLOWLOCATION => TRUE, // the magic sauce
CURLOPT_RETURNTRANSFER => TRUE,
CURLOPT_SSL_VERIFYHOST => FALSE, // suppress certain SSL errors
CURLOPT_SSL_VERIFYPEER => FALSE,
));
curl_exec($ch);
return curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
}
I'm sure this could be adapted to other languages or even scripted with the curl
command on UNIXy systems.
http://jonathonhill.net/2012-05-18/unshorten-urls-with-php-and-curl/
CURLOPT_NOBODY => true
so a HEAD request is performed instead and the final resource isn't actually fetched? –
Taxiplane If you want to do it from the command line, curl's verbose option comes to the rescue:
curl -v <url>
gives you the HTTP reply. For t.co it seems to give you an HTTP/301 reply (permanently moved). Then, there's a Location field, which points to the URL behind the shortened one.
curl -s -o /dev/null --head -w "%{url_effective}\n" -L "https://t.co/6e7LFNBv"
--head
or-I
only downloads HTTP headers-w
or--write-out
prints the specified string after the output-L
or--location
follows location headers
Here is a Python solution.
import urllib2
class HeadRequest(urllib2.Request):
def get_method(self): return "HEAD"
def get_real(url):
res = urllib2.urlopen(HeadRequest(url))
return res.geturl()
Tested with an actual twitter t.co link:
url = "http://t.co/yla4TZys"
expanded = get_real(url)
expanded = http://twitter.com/shanselman/status/276958062156320768/photo/1
Wrap it up with a try-except and you are good to go.
http://t.co/OFlTpTzCqt
.throws HTTPError: HTTP Error 303: The HTTP server returned a redirect error that would lead to an infinite loop. The last 30x error message was: See Other
–
Coastwise Another Python solution, this time relying on the requests module instead of urllib2 (and all the rest of those libraries):
#!/usr/bin/env python
import requests
shorturl = raw_input("Enter the shortened URL in its entirety: ")
r = requests.get(shorturl)
print("""
The shortened URL forwards to:
%s
""" % r.url)
Here is an R solution, ported from other answers in this thread, and from example()
code of the RCurl Package:
unshorten_url <- function(uri){
require(RCurl)
if(RCurl::url.exists(uri)){
# listCurlOptions()
opts <- list(
followlocation = TRUE, # resolve redirects
ssl.verifyhost = FALSE, # suppress certain SSL errors
ssl.verifypeer = FALSE,
nobody = TRUE, # perform HEAD request
verbose = FALSE
);
curlhandle = getCurlHandle(.opts = opts)
getURL(uri, curl = curlhandle)
info <- getCurlInfo(curlhandle)
rm(curlhandle) # release the curlhandle!
info$effective.url
} else {
# just return the url as-is
uri
}
}
Twitter expands the URL. Assume you have a single tweet using twitter API encoded as json file.
import json
urlInfo=[]
tweet=json.loads(tweet)
keyList=tweet.keys() # list of all posssible keys
tweet['entities'] # gives us values linked to entities
You can observe that there is a value called 'urls' tweet['entities']['urls'] # gives values mapped to key urls
urlInfo=tweet['entities']['expanded_url'] # move it to a list
# iterating over the list.. gives shortened URL
# and expanded URL
for item in urlInfo:
if "url" and "expanded_url" in urlInfo.keys():
print(item["url"] + " "+item["expanded_url"])
© 2022 - 2024 — McMap. All rights reserved.