scrape linkedin people search with python
Asked Answered
F

3

5

I want to scrape the result of people search using linkedin.

url='https://www.linkedin.com/search/results/people/?facetCurrentCompany=%5B%222525300%22%5D&facetGeoRegion=%5B%22fi%3A0%22%5D&keywords=python&origin=FACETED_SEARCH'
import bs4
import requests
res=requests.get(url)
soup=bs4.BeautifulSoup(res.text, 'lxml')

There is no error, but the problem is when I click on the link the result shows that there is one person matched my search criteria. and I cannot find that person in the soup result generated from Python code.

Does anyone know how to fix this?

Filippa answered 14/8, 2018 at 11:30 Comment(0)
R
7

You are trying to scrape data, which is available to logged in users only.

You should use the official LinkedIn REST API and authenticate via OAuth2. Give it a try: https://developer.linkedin.com/docs/rest-api

Restoration answered 14/8, 2018 at 11:41 Comment(2)
Thank you for your answer. I have tried the linkedin API; but I think now it does not support searching for people or job, i can only use the API to get my public profileFilippa
The LinkedIn docs are quite obscure. Maybe check out: developer.linkedin.com/docs/guide/v2Restoration
C
5

I would use an open source that already done the hard work and try to modify it to my needs. For example:

https://github.com/ericfourrier/scrape-linkedin

Note: this will only work for public data

Cockpit answered 14/8, 2018 at 11:45 Comment(4)
Hi, I have tried this one, but it has an error "pylinkedin.exceptions.ServerIpBlacklisted: Linkedin blacklists ips for unauthentified http requests, Aws, Digital Ocean"Filippa
Seems like linkedIn blocked the default IP that this code uses (which makes sense). You'll have to find\use a proxy from a different source.Cockpit
and also this library is used for searching a specific profile, I have to know the name of the person I want to search. However, in my case, I want to search for criteria such as people has skill in Python, so I do not know whom I am going to search.Filippa
Like I said in the original answer. This code will need modification for your own needs. But at least it provides a way to access linkedIn out of the box.Cockpit
M
1

The Rest API method is not good for scraping, as it has several restrictions & limits.

Using Selenium for automation can scrape as much data as possible and even enables you to perform actions on LinkedIn.

For scraping I recommend using https://github.com/austinoboyle/scrape-linkedin-selenium. It covers most of the needs but has several bugs as of now(as LinkedIn updates their site frequently).

I am using modified version in a Flask Backend here

It's better to fork the library and use the scraping methods to your needs.

Momentarily answered 28/5, 2020 at 6:43 Comment(1)
The flask backend will be open-sourced soon.Momentarily

© 2022 - 2024 — McMap. All rights reserved.