How to determine needed memory of Keras model?
Asked Answered
M

5

51

I am working with Keras 2.0.0 and I'd like to train a deep model with a huge amount of parameters on a GPU. Using too large images, I'm running out of memory (OOM). Using too low images, the model's accuracy will be worse than possible. Therefore I'd like to find the biggest possible input size of images that fit to my GPU. Is there any functionality calculating the memory (e.g. comparable to model.summary()) given the model and input data?

I appreciate your help.

Miserable answered 31/3, 2017 at 9:32 Comment(5)
You can look at how they compute the memory usage [here] (cs231n.github.io/convolutional-networks/#case), you could also try to reduce the batch size instead of the resolution.Greet
Thanks for your answer. Actually, I was reading the given link before posting my question. But I wanted to avoid manual computing :D Also I don't want to reduce batch size as I want to have a good representation of my whole data set in a statistical sense.Miserable
try-fail will be the fastest to answer your question. Keras isn't the computation library, it's only a wrapper around the backend you chose. The memory management is handled differently for different backends. The memory consumption will not only depend on the number of parameters, LSTM will use a lot of memory even if the number of parameters is low... You should just try and see the actual memory consumption :)Tyus
I was afraid of that. But I will do so...thanks! (Nassim, you like answering my questions, don't you? :D)Miserable
love it :-) have funTyus
A
68

I created a complete function based on the answer of Fabrício Pereira.

def get_model_memory_usage(batch_size, model):
    import numpy as np
    try:
        from keras import backend as K
    except:
        from tensorflow.keras import backend as K

    shapes_mem_count = 0
    internal_model_mem_count = 0
    for l in model.layers:
        layer_type = l.__class__.__name__
        if layer_type == 'Model':
            internal_model_mem_count += get_model_memory_usage(batch_size, l)
        single_layer_mem = 1
        out_shape = l.output_shape
        if type(out_shape) is list:
            out_shape = out_shape[0]
        for s in out_shape:
            if s is None:
                continue
            single_layer_mem *= s
        shapes_mem_count += single_layer_mem

    trainable_count = np.sum([K.count_params(p) for p in model.trainable_weights])
    non_trainable_count = np.sum([K.count_params(p) for p in model.non_trainable_weights])

    number_size = 4.0
    if K.floatx() == 'float16':
        number_size = 2.0
    if K.floatx() == 'float64':
        number_size = 8.0

    total_memory = number_size * (batch_size * shapes_mem_count + trainable_count + non_trainable_count)
    gbytes = np.round(total_memory / (1024.0 ** 3), 3) + internal_model_mem_count
    return gbytes

UPDATE 2019.10.06: Added support for models which contain other models as layers.

UPDATE 2020.07.17: Function now works correctly in TensorFlow v2.

Akmolinsk answered 14/9, 2017 at 9:57 Comment(10)
the calculation makes sense, but for some reason, it seems to output memory usage far beyond what my GPU has, while Keras is happily training on it. E.g: get_model_memory_usage(batch_size, model) => 28GB, while my GTX 1060 has 6GB :)Cullender
i wonder if it could be related to the loss function? which might require some memory ...Cullender
Probably Theano or TensorFlow don't store all intermediate shapes in memory except 2 shapes which involved in calculation of current layer. So to find memory required by shapes we need to get 2 maximum consecutive shapes volume.Akmolinsk
There's also memory needed for result of every layer and also gradients. So this is incorrect.Pusillanimous
Gradients covered by "shapes_mem_count" part. I think, we don't really need to store intermediate results for layers.Akmolinsk
Shouldn't it be : total_memory = 4.0*( batch_size*shapes_mem_count + trainable_count + non_trainable_count ) weights are shared amongst all the batches. No matter the batch size, the weights will take up same amount of memory. Don't need to multiply weights by batchsize.Synapsis
Would be nice to include the queue_size for models using a batch generator.Ingravescent
Note that if you are specifying your batch size in the model (by batch_input_shape or batch_shape) you shouldn't multiply by the batch size again for the calculation (passing 1 to the function will do).Euclid
Getting: (` trainable_count = np.sum([K.count_params(p) for p in set(model.trainable_weights)]) File "/home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/tensorflow_core/python/ops/variables.py", line 1089, in hash raise TypeError("Variable is unhashable if Tensor equality is enabled. " `)Alvar
I made an update for TF2. Looks like it now works fine.Akmolinsk
N
9

Hope this can help you...

  • Here is how determinate a number of shapes of you Keras model (var model), and each shape unit occupies 4 bytes in memory:

    shapes_count = int(numpy.sum([numpy.prod(numpy.array([s if isinstance(s, int) else 1 for s in l.output_shape])) for l in model.layers]))

    memory = shapes_count * 4

  • And here is how determinate a number of params of your Keras model (var model):

    from keras import backend as K

    trainable_count = int(numpy.sum([K.count_params(p) for p in set(model.trainable_weights)]))

    non_trainable_count = int(numpy.sum([K.count_params(p) for p in set(model.non_trainable_weights)]))

Nimble answered 21/7, 2017 at 16:17 Comment(2)
Please add more description regarding your answer.Blanka
If you use batch training, to calculate the memory needed on the GPU, you additionally have to multiply the calculated memory by the batch size.Centrum
Q
9

Here is my variant of @ZFTurbo's answer. It offers better handling for nested Keras models, different TensorFlow dtypes, and removes the dependency on NumPy. I've written and tested this on TensorFlow 2.3.0, and it may not work on earlier versions.

def keras_model_memory_usage_in_bytes(model, *, batch_size: int):
    """
    Return the estimated memory usage of a given Keras model in bytes.
    This includes the model weights and layers, but excludes the dataset.

    The model shapes are multipled by the batch size, but the weights are not.

    Args:
        model: A Keras model.
        batch_size: The batch size you intend to run the model with. If you
            have already specified the batch size in the model itself, then
            pass `1` as the argument here.
    Returns:
        An estimate of the Keras model's memory usage in bytes.

    """
    default_dtype = tf.keras.backend.floatx()
    shapes_mem_count = 0
    internal_model_mem_count = 0
    for layer in model.layers:
        if isinstance(layer, tf.keras.Model):
            internal_model_mem_count += keras_model_memory_usage_in_bytes(
                layer, batch_size=batch_size
            )
        single_layer_mem = tf.as_dtype(layer.dtype or default_dtype).size
        out_shape = layer.output_shape
        if isinstance(out_shape, list):
            out_shape = out_shape[0]
        for s in out_shape:
            if s is None:
                continue
            single_layer_mem *= s
        shapes_mem_count += single_layer_mem

    trainable_count = sum(
        [tf.keras.backend.count_params(p) for p in model.trainable_weights]
    )
    non_trainable_count = sum(
        [tf.keras.backend.count_params(p) for p in model.non_trainable_weights]
    )

    total_memory = (
        batch_size * shapes_mem_count
        + internal_model_mem_count
        + trainable_count
        + non_trainable_count
    )
    return total_memory

Querida answered 14/10, 2020 at 18:7 Comment(3)
I have a question about internal_model_mem_count, it seems to be always 0? but what does this mean? why compute this? thank youFauve
@Jing, it is possible to use a Keras model as an individual layer in a larger model. To account for this, keras_model_memory_usage_in_bytes() recursively calls itself to measure memory usage and tracks nested model memory usage in the internal_model_mem_count variable.Querida
Doesn't seem that the calculation is correct - my basic UNET model (disk size of 1 GB) with batch size 1, this yields 101.594.448.001, which is 100 GB. It trains fine on a 16 GB RAM, or 12 GB NVidia.Verbality
C
1

Given the previous answers do not take into account the memory required for gradients, and/or intermediate outputs, and/or mixed dtypes, and/or nested models, I decided to give it a go as well. Note that the function returns the estimated memory requirement in bits, that the model must be compiled with a fully known input shape (including batch_size), and that the function does not consider the memory required for internal computations (e.g., neural attention). Microsoft has developed a method that is likely more accurate, but has not released the code.

import tensorflow as tf, warnings

# Define function to calculate one layer's memory requirement
def layer_mem(layer: tf.keras.layers.Layer, prev_layer_mem: int) -> int:
    # Check whether calculations can be performed
    if not hasattr(layer, "output_shape") or (None in layer.output_shape):
        msg = f"Check `model.summary(expand_nested=True)` and recompile model to ensure that {layer.name} has a fully defined `output_shape`, including `batch_size`. Using previous layer's memory requirement."
        warnings.warn(msg)
        return prev_layer_mem
    # Collect sizes
    out_size = int(tf.reduce_prod(layer.output_shape)) 
    params = gradients = int(layer.count_params())
    bits = int(layer.dtype[-2:])
    # Calculate memory requirement
    return (params+gradients+out_size)*bits

# Define recursive function to gather all layers' memory requirements
def model_mem(model: tf.keras.Model) -> int:
    # Make limitations known
    warnings.warn("This function does not take into account the memory required for calculations (e.g., outer products)")
    # Initialize
    total_bits = 0
    # Loop over layers in model
    for layer in model.layers:
        # In case of nested model...
        if hasattr(layer, "layers"):
            # ... apply recursion
            total_bits += model_mem(layer)
        else:
            # Calculate and add layer's memory requirement
            prev_layer_mem = layer_mem(layer, locals().get("prev_layer_mem", 0))
            total_bits += prev_layer_mem
    return total_bits
Cambridgeshire answered 14/11, 2023 at 14:47 Comment(0)
O
-1

I believe that if you use a data generator either custom written or leverage some existing generators from keras, it will resolve your issue. Memory error usually arises when all the loaded data becomes over bearing for the system, instead using a generator will break down the dataset into segments, that way you won't run out of memory and will be train on any system.

Othaothe answered 24/8, 2020 at 19:19 Comment(1)
This attitude is incorrect. While data generators or small batch sizes can help reduce memory usage, it is already common for research-grade models to consume more memory than consumer-grade GPUs can offer.Querida

© 2022 - 2024 — McMap. All rights reserved.