There is no short answer to your question. I will show you a complete solution and comment this code.
First, import necessary modules:
from bs4 import BeautifulSoup
import requests
import re
Next, get index page and create BeautifulSoup
object:
req = requests.get("http://www.chessgames.com/perl/chesscollection?cid=1014492")
soup = BeautifulSoup(req.text, "lxml")
I strongly advice to use lxml
parser, not common html.parser
After that, you should prepare game's links list:
pages = soup.findAll('a', href=re.compile('.*chessgame\?.*'))
You can do it by searching links containing 'chessgame' word in it.
Now, you should prepare function which will download files for you:
def download_file(url):
path = url.split('/')[-1].split('?')[0]
r = requests.get(url, stream=True)
if r.status_code == 200:
with open(path, 'wb') as f:
for chunk in r:
f.write(chunk)
And final magic is to repeat all previous steps preparing links for file downloader:
host = 'http://www.chessgames.com'
for page in pages:
url = host + page.get('href')
req = requests.get(url)
soup = BeautifulSoup(req.text, "lxml")
file_link = soup.find('a',text=re.compile('.*download.*'))
file_url = host + file_link.get('href')
download_file(file_url)
(first you search links containing text 'download' in their description, then construct full url - concatenate hostname and path, and finally download file)
I hope you can use this code without correction!