Empty core dump file after Segmentation fault
Asked Answered
E

4

21

I am running a program, and it is interrupted by Segmentation fault. The problem is that the core dump file is created, but of size zero.

Have you heard about such a case and how to resolve it?

I have enough space on the disk. I have already performed ulimit -c unlimited to unlimit the size of core file - both running it or putting on the top of the submitted batch file - but still have 0 byte core dump files. The permissions of the folder containing these files are uog+rw and the permissions on the core files created are u+rw only.

The program is written by C++ and submitted on a linux cluster with qsub command of the Grid Engine, I don't know this information is relevant or not to this question.

Eared answered 15/11, 2012 at 18:28 Comment(9)
You do have free space on the drive I'm assuming?Nasty
What are the write permissions on the zero-length file?Amimia
Next questions: What are the permissions on the containing directory? Is the process running under an effective user id that's different than the directory owner?Amimia
You said you're using Grid Engine. Is it correct that there are multiple nodes in the cluster? It's easy for multiple node to share a single file system, but if they don't also share a user account system it's likely that a job running on another node cannot run the job under your own user id, and thus looks to the file system as an "other" id.Amimia
Try making a temporary directory and setting its permissions to world-writable.Amimia
I'm out of ideas. Also, I'd recommend adding some of this information to the question, so we can clean up these comments.Amimia
Have you tried setting the file size on qsub? (e.g. -l file=100mb)Boilermaker
@Boilermaker It says: Unable to run job: unknown resource "file".Eared
@Eared my bad, I erroneously assumed a "linux like" qsub. However, there should be some related resource like "max filesize per job", or perhaps "max core size per job". Is there some man page on the job resources?Boilermaker
C
17

setting ulimit -c unlimited turned on generation of dumps. by default core dumps were generated in current directory which was on nfs. setting /proc/sys/kernel/core_pattern to /tmp/core helped me to solve the problem of empty dumps.

The comment from Ranjith Ruban helped me to develop this workaround.

What is the filesystem that you are using for dumping the core?

Careen answered 10/3, 2015 at 11:9 Comment(3)
I just had this problem on a Linux VirtualBox image with a vboxsf filesystem that mapped to an NTFS drive (the drive of the host machine).Quaky
modifying the core_pattern as the root user works miracles! The NFS drive path made core files zero bytes. https://mcmap.net/q/660469/-unable-to-create-a-core-file-for-my-crashed-program Besides setting the path where it gets created, there is some nifty syntax for changing how a core file gets named, too. linuxhowtos.org/Tips%20and%20Tricks/coredump.htmPiderit
Had the same problem with a mounted filesystem under VirtualBox. Thanks!Wakefield
L
7

It sounds like you're using a batch scheduler to launch your executable. Maybe the shell that Torque/PBS is using to spawn your job inherits a different ulimit value? Maybe the scheduler's default config is not to preserve core dumps?

Can you run your program directly from the command line instead?

Or if you add ulimit -c unlimited and/or ulimit -s unlimited to the top of your PBS batch script before invoking your executable, you might be able to override PBS' default ulimit behavior. Or adding 'ulimit -c' could report what the limit is anyway.

Lexeme answered 16/11, 2012 at 5:32 Comment(2)
I put both ulimit -c unlimited and ulimit -s unlimited to the PBS batch script, but still the core dumps are empty!Eared
What is the filesystem that you are using for dumping the core?Vaios
E
3

If you run the core file in a mounted drive.The core file can't be written to a mounted drive but must be written to the local drive.

You can copy the file to the local drive.

Eliseo answered 2/5, 2017 at 6:45 Comment(0)
O
0

You can set resource limits such as physical memory required by using qsub option such as -l h_vmem=6G to reserver 6 GB of physical memory.

For file blocks you can set h_fsizeto appropriate value as well.

See RESOURCE LIMITS section of qconf manpage:

http://gridscheduler.sourceforge.net/htmlman/htmlman5/queue_conf.html

s_cpu     The per-process CPU time limit in seconds.

s_core    The per-process maximum core file size in bytes.

s_data    The per-process maximum memory limit in bytes.

s_vmem    The same as s_data (if both are set the minimum is
           used).
h_cpu     The per-job CPU time limit in seconds.

h_data    The per-job maximum memory limit in bytes.

h_vmem    The same as h_data (if both are set the minimum is
           used).

h_fsize   The total number of disk blocks that this job  can
           create.

Also, if cluster uses local TMPDIR to each node, and that is filling up, you can set TMPDIR to alternate location with more capacity, e.g. NFS share:

export TEMPDIR=<some NFS mounted directory>

Then launch qsub with the -V option to export the current environment to the job.

One or a combination of the above may help you solve your problem.

Obsess answered 11/5, 2015 at 13:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.