I'm using python to analyse some large files and I'm running into memory issues, so I've been using sys.getsizeof() to try and keep track of the usage, but it's behaviour with numpy arrays is bizarre. Here's an example involving a map of albedos that I'm having to open:
>>> import numpy as np
>>> import struct
>>> from sys import getsizeof
>>> f = open('Albedo_map.assoc', 'rb')
>>> getsizeof(f)
144
>>> albedo = struct.unpack('%df' % (7200*3600), f.read(7200*3600*4))
>>> getsizeof(albedo)
207360056
>>> albedo = np.array(albedo).reshape(3600,7200)
>>> getsizeof(albedo)
80
Well the data's still there, but the size of the object, a 3600x7200 pixel map, has gone from ~200 Mb to 80 bytes. I'd like to hope that my memory issues are over and just convert everything to numpy arrays, but I feel that this behaviour, if true, would in some way violate some law of information theory or thermodynamics, or something, so I'm inclined to believe that getsizeof() doesn't work with numpy arrays. Any ideas?
sys.getsizeof
: "Return the size of an object in bytes. The object can be any type of object. All built-in objects will return correct results, but this does not have to hold true for third-party extensions as it is implementation specific. Only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to." – Xerographygetsizeof
an unreliable indicator of memory consumption, especially for 3rd party extensions. – Xerographyresize
is returning aview
, not a new array. You're getting the size of the view, not the actual data. – Domingasys.getsizeof(albedo.base)
will give the size of the non-view. – Triplex