How to calculate the actual size of a .fit()-trained model in sklearn?
Asked Answered
R

3

12

Is it possible to calculate the size of a model ( let's say a Random Forest classifier ) in scikit-learn?

For example:

  from sklearn.ensemble import RandomForestClassifier
  clf = RandomForestClassifier(n_jobs=-1, n_estimators=10000, min_samples_leaf=50)
  clf.fit(self.X_train, self.y_train)

Can I determine the size of clf?

Romany answered 9/8, 2017 at 23:0 Comment(6)
What do you mean by "the size of the model"?Her
For example, Amazon ML sets the limit of Model Size to be between 1 MB to 1GB.Romany
sys.getsizeof() would return the size of only that object in memory. If there are references to other objects, it won't take into account the size of those objects, so there's a real risk of underestimating size. See getsizeof documentationDerte
I did some testing, and it doesn't work. I have added more information to the question as well.Romany
I have added the response.Romany
I know this is old, but I think it is n_estimators * (2^max_depth-1) = number of bytes.Ascidian
U
18

Along the same lines as Nijan's answer, you can also do it without having to save the model, using pickle:

import pickle
import sys

p = pickle.dumps(clf)
print(sys.getsizeof(p))

It will return the size in bytes.

Unbodied answered 3/5, 2018 at 7:43 Comment(0)
R
1

A way to do it is to dump the model in a file using joblib.dump and then calculate the file size.

Based on the previous example, you would use:

joblib.dump(clf, fname)
os.path.getsize(fname)

Romany answered 10/8, 2017 at 0:13 Comment(0)
S
1

I do not have enough reputations points to comment what seems to be the admitted answer so sorry for the additional formal reply.

I tried the dronevil2 (https://mcmap.net/q/923098/-how-to-calculate-the-actual-size-of-a-fit-trained-model-in-sklearn) solution but here is my worry : the size of the pickle file is not the actual size of the model as it would be in production. "pickling" a python object writes it in a file in a binary format thus applying a compression method to it. To actually use the model we would need to "unpickel" it thus increasing it size..

Subscript answered 13/3 at 15:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.