Changes to IMDbPY and the IMDb data files format mean that the existing answers no longer work (as of January 2018).
I am using Ubuntu 17.10 and MariaDB 10.1 (not MySQL, but the following will also work with MySQL).
Changes to IMDbPY
The latest version of IMDbPY is 6.2, it is implemented in Python 3, and the dependencies on gcc
and SQLObject
have been removed. Also, the Python package MySQL-python
is not available for Python 3, so we install mysqlclient
instead; see below. (The API of mysqlclient
is compatible with MySQL-python
.)
Changes to the IMDb data files format
Changes to the format of the IMDb data files were introduced in December 2017, and IMDbPY 6.2 (the current version) does not yet work with the new file format. (See this GitHub issue.)
Until this is fixed, use the most recent version of the IMDd data published in the old format, which is available at ftp://ftp.fu-berlin.de/pub/misc/movies/database/frozendata/. Download all *.list.gz
files (excluding files from subdirectories).
New steps to follow
Install Python 3 and required packages:
sudo apt install python3
pip3 install mysqlclient
In MariaDB, create a database imdb
, and grant all privileges to user
with password password
.
CREATE DATABASE imdb;
GRANT ALL PRIVILEGES ON imdb.* TO 'user'@'localhost' IDENTIFIED BY 'password';
FLUSH PRIVILEGES;
Get IMDbPY 6.2:
wget https://github.com/alberanid/imdbpy/archive/6.2.zip
unzip 6.2.zip
cd imdbpy-6.2
python3 setup.py install
Load IMDb data into MariaDB:
cd bin
python3 imdbpy2sql.py -d [imdb_dataset_directory] -u 'mysql://user:password@localhost/imdb'
Edit: Version 6.2 of IMDbPY does not create foreign keys. See this GitHub issue. You will need to use an older version of IMDbPY if you need foreign keys to be created, but there are also reported issues with the generation of foreign keys in old versions too (see linked GitHub issue).
Update: It took 4.5 hours to import, and I had no problems using InnoDB tables.
Edit: If wish to use version 6.2 of IMDbPY and require foreign keys, then you will need to add them manually to the database after it is generated. A very small amount of cleanup of the data is required before foreign keys can be added. This cleanup and the foreign keys that need to be added are described in this GitHub issue.