What is the difference between the file extensions .h5 .hdf5 and .ckpt and which one should I use?
Asked Answered
P

1

6

I am trying to save my CNN to a file every at every checkpoint. However which extension should I use as my file directory? Also would I need to call model.save(filepath) at the end of the code or would my model be saved automatically by ModelCheckpoint()?

I have my model saved as a .h5 file but I don't know whether I should change it.

from keras import Sequential
from keras_preprocessing.image import ImageDataGenerator
from keras.layers import *
from keras.callbacks import ModelCheckpoint
import numpy as np
import os

img_size = 500 # number of pixels for width and height

#Random Seed
np.random.seed(12321)


training_path = os.getcwd() + "/cats and dogs images/train"
testing_path = os.getcwd() + "/cats and dogs images/test"

#Defines the Model
model = Sequential([
        Conv2D(filters=64, kernel_size=(3,3), activation="relu", padding="same", input_shape=(img_size,img_size,3)),
        MaxPool2D(pool_size=(2,2), strides=2),
        Conv2D(filters=64, kernel_size=(3,3), activation="relu", padding="same"),
        MaxPool2D(pool_size=(2,2), strides=2),
        Flatten(),
        Dense(32, activation="relu"),
        Dense(1, activation="sigmoid")
])


#Scales the pixel values to between 0 to 1
datagen = ImageDataGenerator(rescale=1.0/255.0)

#Prepares Training Data
training_dataset = datagen.flow_from_directory(directory = training_path, target_size=(img_size,img_size), classes = ["cat","dog"], batch_size = 19)

#Prepares Testing Data
testing_dataset = datagen.flow_from_directory(directory = testing_path, target_size=(img_size,img_size), classes = ["cat","dog"], batch_size = 19)


#Compiles the model
model.compile(loss="binary_crossentropy", optimizer="adam", metrics=['accuracy'])


#Checkpoint
checkpoint = ModelCheckpoint("trained_model.h5", monitor='loss', verbose=1, save_best_only=True, mode='min', period=1)

#Fitting the model to the dataset (Training the Model)
model.fit(x = training_dataset, steps_per_epoch = 658, validation_data=testing_dataset, validation_steps=658, epochs = 10, callbacks=[checkpoint], verbose = 1)


# evaluate model on training dataset
acc = model.evaluate_generator(training_dataset, steps=len(training_dataset), verbose=0)
print("Accuracy on training dataset:")
print('> %.3f' % (acc * 100.0))


#evaluate model on testing dataset
acc = model.evaluate_generator(testing_dataset, steps=len(testing_dataset), verbose=0)
print("Accuracy on testing dataset:")
print('> %.3f' % (acc * 100.0))

##Saving the Model:
#model.save("trained model.h5")
#print("Saved model to disk")
Pirnot answered 26/10, 2020 at 9:29 Comment(0)
S
1

What is the difference between the file extensions .h5, .hdf5 and .ckpt ?

.h5 and .hdf5

According to this both .h5 and .hdf5 are basically the same, it is a data file saved in the Hierarchical Data Format (HDF), It contains multidimensional arrays of scientific data.

And according to this saving a model using that format results in saving the model with the following:

  1. The weight values.
  2. The model's architecture.
  3. The model's training configuration (what you pass to the .compile() method)
  4. The optimizer and its state, if any (this enables you to restart training where you left off)

.ckpt

It is short for checkpoint, so by its name it's basically to save a state of the model during training after achieving a certain condition (lower than a certain loss value or higher than a certain accuracy value).

Saving model as .ckpt has its setback as it only saves the weights of the variables or the graph, so you will need to have full architectures and functions used to load those weights and variables into the architecture and build and use the model. (basically the code)

This format is mainly used when you want to resume the training and allows you to customize the saved checkpoints and load them as well. (which allows for continuous improving for the model and changing parameters according to results which allows for creating different models from different checkpoints).

Which extension should i use ?

Depends on your goal of training the model, if you are in the training process and experimenting a lot, I would suggest saving the model as a .ckpt format.

If you're done experimenting and finalizing the model, I would suggest saving it as a .h5 format so that you could load it and use it without needing to have the code used to create model architecture.

Also would I need to call model.save(filepath) at the end of the code or would my model be saved automatically by ModelCheckpoint()?

You can call both, but i would suggest having the extension in ModelCheckpoint() be .ckpt so that you can save the highest possible model state during the training process, and when you are done training call model.save(filepath) but as a .h5 format so that after training the model should be saved and used anywhere without the need for the original architecture code.

That way you give yourself the option to enhance training and load the .ckpt model or if you are satisfied with the final result use the .h5 model as a final version for the model.

Serviette answered 30/9, 2021 at 8:3 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.