Accessing LinkedIn public pages using Python
Asked Answered
H

1

8

I want to access my publicly available LinkedIn page. On my local machine, following code works:

import requests
url = "http://de.linkedin.com/pub/ankush-shah/73/9/982"
html = requests.get(url).text
print html

And it gives the correct html of my profile.

But when I execute the same code on my Heroku server, I (guess) am redirected to somewhere and gets this html.

Also, when I try with urllib2 on the heroku server:

import urllib2
url = "http://de.linkedin.com/pub/ankush-shah/73/9/982"
u = urllib2.urlopen(url)

This throws an urllib2.HTTPError: HTTP Error 999: Request denied

As I am using virtualenv, all the libraries on my local machine are exactly similar to the one installed on heroku server. Does LinkedIn blocks HTTP requests from servers like Heroku? Any help/suggestions would be appreciated.

Herminahermine answered 24/5, 2014 at 9:38 Comment(7)
Why not test for this directly ? Change the user agent on the request on the Heroku server to match the user agent from the other machine.Chavey
You mean something like this: requests.get(url, headers={'User-agent': 'Mozilla/5.0'}).text This works on my local machine but still not on heroku.Herminahermine
There's no platform information in that user agent string. Try a string from here.Chavey
I tried couple of strings from there but still no luck.Herminahermine
Hang on. If Heroku is a hosted service, it has a static IP range (probably). Perhaps LinkedIn has IP blocked Heroku itself. This means you might need to proxy (or not use Heroku).Chavey
Yes, you are right. LinkedIn do not allows for such requests: developer.linkedin.com/forum/heroku-requests-return-999Herminahermine
You should post that as the answer.Chavey
H
7

As mention here, LinkedIn do not allow direct access. They have blacklisted Heroku's IP address and the only way to access the data is to use their APIs.

Herminahermine answered 24/5, 2014 at 21:1 Comment(2)
@Ankush_Shah : Did they remove the Ip-Adress from their blacklist after a while ?Canna
i am not aware of it as I switched to use their API which is way better than directly scrapping data. so i doubt if they have any reason for removing the blacklisted ip addresses.Herminahermine

© 2022 - 2024 — McMap. All rights reserved.