999 Error Code on HEAD request to LinkedIn
Asked Answered
P

4

55

We're using a curl HEAD request in a PHP application to verify the validity of generic links. We check the status code just to make sure that the link the user has entered is valid. Links to all websites have succeeded, except LinkedIn.

While it seems to work locally (Mac), when we attempt the request from any of our Ubuntu servers, LinkedIn returns a 999 status code. Not an API request, just a simple curl like we do for every other link. We've tried on a few different machines and tried altering the user agent, but no dice. How do I modify our curl so that working links return a 200?

A sample HEAD request:

curl -I --url https://www.linkedin.com/company/linkedin

Sample Response on Ubuntu machine:

HTTP/1.1 999 Request denied
Date: Tue, 18 Nov 2014 23:20:48 GMT
Server: ATS
X-Li-Pop: prod-lva1
Content-Length: 956
Content-Type: text/html

To respond to @alexandru-guzinschi a little better. We've tried masking the User Agents. To sum up our trials:

  • Mac machine + Mac UA => works
  • Mac machine + Windows UA => works
  • Ubuntu remote machine + (no UA change) => fails
  • Ubuntu remote machine + Mac UA => fails
  • Ubuntu remote machine + Windows UA => fails
  • Ubuntu local virtual machine (on Mac) + (no UA change) => fails
  • Ubuntu local virtual machine (on Mac) + Windows UA => works
  • Ubuntu local virtual machine (on Mac) + Mac UA => works

So now I'm thinking they block any curl requests that dont provide an alternate UA and also block hosting providers?

Is there any other way I can check if a link to linkedin is valid or if it will lead to their 404 page, from an Ubuntu machine using PHP?

Politburo answered 1/12, 2014 at 14:57 Comment(7)
Chances are they've blacklisted hosting companies to force them to use the API.Apologetics
What happens when you load the link via a command-line browser like lynx? Same HTTP error?Overtake
I get 999 with curl and wget, but elinks works from the same ip. My guess would be too that they detect curl and wget somehow.Algia
@Overtake same 999. with lynx.Politburo
@Apologetics We've tried a few different hosting companies, including some smaller boutique ones. I guess next step is Virtual Box Ubuntu to see if it has to do with OS or they've just blocked a whole bunch of hosting providers' IP blocks.Politburo
@Politburo - Any updates on this, Did you got this working? If yes how?Jemima
I'm running into this issue while trying to maintain npmjs.com/broken-link-checkerBigler
M
25

It looks like they filter requests based on the user-agent:

$ curl -I --url https://www.linkedin.com/company/linkedin | grep HTTP
HTTP/1.1 999 Request denied

$ curl -A "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3" -I --url https://www.linkedin.com/company/linkedin | grep HTTP
HTTP/1.1 200 OK
Mckee answered 1/12, 2014 at 15:21 Comment(8)
We tried altering the user agent, though. So our responses have been: [Mac machine + Mac UA => works] [Mac machine + Windows UA => works] [Ubuntu machine + Ubuntu UA => fails] [Ubuntu machine + Mac UA => fails] [Ubuntu machine + Windows UA => fails] No access to a windows machine at the moment, so I'm sure about that.Politburo
@Politburo That is strange, because I tried right now with the current UA of Chrome curl -A "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36" -I --url https://www.linkedin.com/company/linkedin | grep HTTP which gives me a HTTP/1.1 200 OK from my Ubuntu. Maybe you tried with an old (or incorrect) UA which they block ? Run a new test with the UA that I used.Mckee
That works on my virtual machine, but fails on remote ones/servers. See above for the full trial matrix. May I ask are you're trying from a remote machine, and if so, what provider?Politburo
@Politburo No, the tests were made from my local machine. If you are sure that you (or some "neighbor", if you are sharing an IP) did not make enough requests so you could be throttled (those are cleared after 24 hours, if I remember correctly), most likely they have some restrictions in place for your IP range.Mckee
They filter both user agent AND ip address. So you need some kind of valid proxy address.Jequirity
This answer may be correct but it's not really helpful in trying to figure out how to do link checking for linkedin URLs. Providing a fake User-Agent is not something I would like to do or recommend to others. I think bots and link checkers should correctly identify themselves and provide contact information. That is what I do with my link checkers.Candiot
I'm getting HTTP 999 with the user agent header as wellTeraterai
FWIW, I get 999 on my home machine, regardless of being on a VPN or not, checking the URL from within a Word macro link checker. The link works fine if I click on it inside the Word document.Talaria
H
15

I found the workaround, important to set accept-encoding header:

curl --url "https://www.linkedin.com/in/izman" \
--header "user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.94 Safari/537.36" \
--header "accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8" \
--header "accept-encoding:gzip, deflate, sdch, br" \
| gunzip
Histrionics answered 18/8, 2016 at 14:53 Comment(1)
This is limited to about ~30-50 requests/day or so. After that you'll get blocked.Englacial
J
5

Seems like LinkedIn filter both user agent AND ip address. I tried this both at home and from an Digital Ocean node:

curl -A "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3" -I --url https://www.linkedin.com/company/linkedin

From home I got a 200 OK, from DO I got 999 Denied...

So you need a proxy service like HideMyAss or other (haven't tested it so I couldn't say if it's valid or not). Here is a good comparison of proxy services.

Or you could setup a proxy on your home network, for example use a Raspberry PI to proxy your requests. Here is a guide on that.

Jequirity answered 5/6, 2015 at 7:15 Comment(2)
A proxy is a viable solution for small projects, but unfortunately this is for a larger web application. We verify thousands of links per hour this way. We're not going to be able to proxy all of those requests I'm afraid. Plus, LinkedIn urls account for only a small fraction of them.Politburo
Proxy alone wouldn't help. We've tried a HMA proxy, but LinkedIn still blocks URLs to profiles even from actual Chrome. After changing IP, clearing all cookies and history in FireFox and requesting some other profile, LI still responded with 999 and redirected to login page. Perhaps they know and block HMA IP ranges?Cathode
O
4

Proxy would work, but I think there's another way around it. I see that from AWS and other clouds that it's blocked by IP. I can issue the request from my machine and it works just fine.

I did notice that in the response from the cloud service that it returns some JS that the browser has to execute to take you to a login page. Once there, you can login and access the page. The login page is only for those accessing via a blocked IP.

If you use a headless client that executes JS, or maybe go straight to the subsequent link and provide the credentials of a linkedin user, you may be able to bypass it.

Osmo answered 16/9, 2015 at 17:37 Comment(1)
Tried this. After about 20 logins, you'll get a 'We're getting things cleaned up. We'll be back' message after login.Englacial

© 2022 - 2024 — McMap. All rights reserved.