How to release hugepages from the crashed application
Asked Answered
R

5

11

I have an application that uses hugepage and the application suddenly crashed due to some bug. After crashing, since the application does not release the hugepage properly, the free hugepage number is not increased in sys filesystem.

$ sudo cat /sys/kernel/mm/hugepages/hugepages-2048kB/free_hugepages 
0
$ sudo cat /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages 
1024

Is there a way to release the hugepages by force?

Raptorial answered 4/12, 2013 at 3:22 Comment(0)
T
1

HugeTLB can either be used for shared memory (and Mark J. Bobak's answer would deal with that) or the app mmaps files created in a hugetlb filesystem. If the app crashes without removing those files they survive and keep corresponding memory 'allocated'.

Check hugeTLB filesystem and see if there are any leftover files from the app. Removing them would release the memory.

Truscott answered 4/12, 2013 at 4:55 Comment(2)
What happens if I do this while something using those hugepages is actually running?Deracinate
Linux uses reference counting of file descriptors. If some app still has the file open, the file will be removed from the directory but the file descriptor and the underlying huge page will remain valid until the last open descriptor is closed.Truscott
P
8

Sometimes need to check all directory that hugetlbfs has been mounted. So,

  1. find mounted directory by command mount | grep huge.

  2. check every directory except especially /dev/hugepages.

  3. delete all 2M-sized files. (2M is the size of hugepage)

Peshawar answered 21/10, 2014 at 10:37 Comment(0)
S
2

Use ipcs -m to list the shared memory segments. Use ipcrm to remove the left over shared memory segments.

Edit on 06/24/2019: Ok, so, the above answer, while correct as far as it goes, was a bit brief. In particular, if you have a host with multiple DB instances, and only one is crashed how can you determine which (if any) memory segments should be cleaned up?

Well, this too, can be done. For each running instance, connect w/ / as sysdba, then do oradebug setmypid (any pid will do, as all Oracle PIDs connect to the SGA). Then do oradebug ipc. That will (hopefully) return IPC information written to the trace file. So, go to the udump (or diag_dest) directory, and look for your trace file. It will contain all the IPC information for the instance. This will include ShmId. Look through the file for the ShmId(s) that this instance is using. Now look at the output of ipcs -m.

When you have done that for all the running instances, any memory segment output by ipcs -m that shows non-zero memory allocation, and that you cannot account for in the oradebug ipc information from any running instance, must be the left over memory segments from the crashed instance. Use ipcrm to remove it/them.

When doing this on a host with multiple running instances, this can be a bit fraught. Please proceed with caution. You don't want to remove the SGA of a running instance!

Hope that helps....

Slush answered 4/12, 2013 at 4:6 Comment(2)
Hi Mark, is it possible to tell which shmid maps to which hugepage file? as in some case a crashed application might only use one or 2 hugepages...ipcrm all seems a bit too aggressive. Thought?Artiodactyl
Hi there, yes, if you're running multiple instances, it can be a bit tricky. The (somewhat tedious) solution is to connect to each running instance, and use 'oradebug' to determine the SHMIDs of the running instances. Then, by process of elimination, the ones you cannot identify must be from the crashed instance. Then use 'ipcrm' to remove only those memory segments. Full details will be in an updated answer in a few minutes.Slush
T
1

HugeTLB can either be used for shared memory (and Mark J. Bobak's answer would deal with that) or the app mmaps files created in a hugetlb filesystem. If the app crashes without removing those files they survive and keep corresponding memory 'allocated'.

Check hugeTLB filesystem and see if there are any leftover files from the app. Removing them would release the memory.

Truscott answered 4/12, 2013 at 4:55 Comment(2)
What happens if I do this while something using those hugepages is actually running?Deracinate
Linux uses reference counting of file descriptors. If some app still has the file open, the file will be removed from the directory but the file descriptor and the underlying huge page will remain valid until the last open descriptor is closed.Truscott
G
1

If you follow the instruction below, you can get rid of the allocated hugepages:

1) Let's check the hugepages which were free at restart

dpdk@dpdkvm:~$ ls /mnt/huge/
empty

dpdk@dpdkvm:~/dpdk-1.8.0/examples/kni$ cat /proc/meminfo
...
HugePages_Total:     256
HugePages_Free:      256
...

2) Starting a dpdk application with wrong parameters, producing an error

dpdk@dpdkvm:~/dpdk-1.8.0/examples/kni$ sudo ./build/kni -c 0x03 -n 2 -- -P -p 0x03 --config="(0,0,1),(1,0,1)"
...
EAL: Error - exiting with code: 1
  Cause: No supported Ethernet device found

3) When I check hugepages, there is not any free

dpdk@dpdkvm:~/dpdk-1.8.0/examples/kni$ cat /proc/meminfo
...
HugePages_Total:     256
HugePages_Free:        0
...

4) Now, when I check the mounted hugepage directory, I can see the files which are not given back to OS by dpdk application.

dpdk@dpdkvm:~/dpdk-1.8.0/examples/kni$ ls /mnt/huge/
...
rtemap_0    rtemap_137  rtemap_176  rtemap_214  rtemap_253  rtemap_62
...

5) Finally, if you remove the files starting with rtemap, you can give the hugepages back

dpdk@dpdkvm:~/dpdk-1.8.0/examples/kni$ sudo rm /mnt/huge/*
[sudo] password for dpdk:
dpdk@dpdkvm:~/dpdk-1.8.0/examples/kni$ cat /proc/meminfo
...
HugePages_Total:     256
HugePages_Free:      256
...
Godchild answered 18/11, 2017 at 18:59 Comment(0)
P
-3

your hugetlb may be used by shared memory or mmap files. try to remove the shared memories or umount the hugetlb fs

Predacious answered 4/5, 2016 at 22:41 Comment(1)
These suggestions are in earlier answers; this response does not appear to add anything new.Eating

© 2022 - 2024 — McMap. All rights reserved.