Here is the reference related to your problem: use rss or vms to track memory. The relationship of RSS
and VMS
is bit confusing. You can learn about these concepts in detail . You should also know that how to calculate the total memory usage in this and this.
**TO SUMMARIZE AND COMPLEMENT MY OPINION**
:
RSS:
Resident set size is used to show how much memory is allocated to a process is in RAM
. Remember - It doesn't include memory which is swapped out
.
It involves memory from shared libraries, including all stack and heap memory.
VMS:
Virtual memory size includes all memory that the process can access. Which includes;
Memory that is swapped out, memory that is allocated but not used, and memory that is from shared libraries.
Example:
Let's assume, a Process-X
has a 500-K binary and is linked to 2500-K of shared libraries, has 200-K of stack/heap allocations of which 100-K is actually in memory (rest is swapped or unused), and it has only actually loaded 1000-K of the shared libraries and 400-K of its own binary then:
RSS: 400K + 1000K + 100K = 1500K
VMS: 500K + 2500K + 200K = 3200K
In this example, since part of the memory is shared, many processes may use it, so if you add up all of the RSS
values you can easily end up with more space than your system has.
As you can see when you simple run this;
import os
import psutil
process = psutil.Process(os.getpid())
print("vms: ", process.memory_info().vms)
print("rss:", process.memory_info().rss)
Output:
vms: 7217152
rss: 13975552
By simply adding, import pandas as pd
, you can see the difference.
import os
import psutil
import pandas as pd
process = psutil.Process(os.getpid())
print("vms: ", process.memory_info().vms)
print("rss:", process.memory_info().rss)
Here is output:
vms: 276295680
rss: 54116352
So, the memory that is allocated also may not be in RSS until it is
actually used by the program. So if your program allocated a bunch of
memory up front, then uses it over time;
- You could see RSS going up and VMS staying the same.
Now whether you go with df.memory_usage().sum()
or Process.memory_info
, I believe RSS
does include memory from dynamically linked libraries. So the sum of their RSS
will be more than the actual memory used.
df.memory_usage().sum()
>vms
. – Drillingdf.memory_usage().sum() = 670,000,128
andvms=815,214,592
consistently. I have 32GB of RAM and 5GB of virtual memory. It seems like your readings mean the size ofdf
is larger than the amount of virtual memory (pagefile) being used. Incidentally,VM
is just space allocated on the hard drive. – Drilling