Caveat
This is NOT a duplicate of this. I'm not interested in finding out my memory consumption or the matter, as I'm already doing that below. The question is WHY the memory consumption is like this.
Also, even if I did need a way to profile my memory do note that guppy
(the suggested Python memory profiler in the aforementioned link does not support Python 3
and the alternative guppy3
does not give accurate results whatsoever yielding in results such as (see actual sizes below):
Partition of a set of 45968 objects. Total size = 5579934 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 13378 29 1225991 22 1225991 22 str
1 11483 25 843360 15 2069351 37 tuple
2 2974 6 429896 8 2499247 45 types.CodeType
Background
Right, so I have this simple script which I'm using to do some RAM consumption tests, by reading a file in 2 different ways:
reading a file one line at a time, processing, and discarding it (via
generators
), which is efficient and recommended for basically any file size (especially large files), which works as expected.reading a whole file into memory (I know this is advised against, however this was just for educational purposes).
Test script
import os
import psutil
import time
with open('errors.log') as file_handle:
statistics = os.stat('errors.log') # See below for contents of this file
file_size = statistics.st_size / 1024 ** 2
process = psutil.Process(os.getpid())
ram_usage_before = process.memory_info().rss / 1024 ** 2
print(f'File size: {file_size} MB')
print(F'RAM usage before opening the file: {ram_usage_before} MB')
file_handle.read() # loading whole file in memory
ram_usage_after = process.memory_info().rss / 1024 ** 2
print(F'Expected RAM usage after loading the file: {file_size + ram_usage_before} MB')
print(F'Actual RAM usage after loading the file: {ram_usage_after} MB')
# time.sleep(30)
Output
File size: 111.75 MB
RAM usage before opening the file: 8.67578125 MB
Expected RAM usage after loading the file: 120.42578125 MB
Actual RAM usage after loading the file: 343.2109375 MB
I also added a 30 second sleep to check with awk
at the os level, where I've used the following command:
ps aux | awk '{print $6/1024 " MB\t\t" $11}' | sort -n
which yields:
...
343.176 MB python # my script
619.883 MB /Applications/PyCharm.app/Contents/MacOS/pycharm
2277.09 MB com.docker.hyperkit
The file contains about 800K
copies of the following line:
[2019-09-22 16:50:17,236] ERROR in views, line 62: 404 Not Found: The
following URL: http://localhost:5000/favicon.ico was not found on the
server.
Is it because of block sizes or dynamic allocation, whereby the contents would be loaded in blocks and a lot of that memory would actually be unused ?
open('errors.log', 'rb')
makes a difference. – Citolerb
the size is roughly the same, although I'd like to understand a bit more about this... if you're up for a formal answer I'm happy to upvote & accept. – Terbium