Change size of train and test set from MNIST Dataset

There's no such argument in mnist.load_data. Instead you can concatenate data via numpy then split via sklearn (or numpy):

from keras.datasets import mnist
import numpy as np
from sklearn.model_selection import train_test_split

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x = np.concatenate((x_train, x_test))
y = np.concatenate((y_train, y_test))

train_size = 0.7
x_train, x_test, y_train, y_test = train_test_split(x, y, train_size=train_size, random_seed=2019)

Set a random seed for a reproducibility.

Via numpy (if you don't use sklearn):

# do the same concatenation
np.random.seed(2019)
train_size = 0.7
index = np.random.rand(len(x)) < train_size  # boolean index
x_train, x_test = x[index], x[~index]  # index and it's negation
y_train, y_test = y[index], y[~index]

You'll get an arrays of approximately required size (~210xx instead of 21000 test size).

The source code of mnist.load_data looks like this function just fetches this data from a URL already split as 60000 / 10000 test, so there's only a concatenation workaround.

You could also download the MNIST dataset from http://yann.lecun.com/exdb/mnist/ and preprocess it manually, and then concatenate it (as you need). But, as far as I understand, it was divided into 60000 examples for training and 10000 for testing because this splitting is used in standard benchmarks.

Recommended topics

Hot tags