Segmentation Fault and RAII

Asked 11/9, 2014 at 7:16 Answered 11/9, 2014 at 9:55

Solved c++exception segmentation-fault posix raii

It's more a philosophical type of question.

In C++ we have nice shiny idiom - RAII. But often I see it as incomplete. It does not well aligns with the fact that my application can be killed with SIGSEGV.

I know, I know, programms like that are malformed you say. But there is sad fact that on POSIX (specifically Linux) you can allocate beyond physical memory limits and meet SIGSEGV in the middle of the execution, working with correctly allocated memory.

You may say: "Application dies, why should you care about those poor destructors not being called?". Unfortunately there are some resources that are not automatically freed when application terminates, such as File System entities.

I am pretty sick right now of designing hacks, breaking good application design just to cope with this. So, what I am asking is for a nice, elegant solution to this kind of problems.

Edit:

It seems that I was wrong, and on Linux applications are killed by a kernel pager. In which case the question is still the same, but the cause of application death is different.

Code snippet:

struct UnlinkGuard
{
    UnlinkGuard(const std::string path_to_file)
        : _path_to_file(path_to_file)
    { }

    ~UnlinkGuard() {
        unlink();
    }

    bool unlink() {
        if (_path_to_file.empty())
            return true;

        if (::unlink(_path_to_file.c_str())) {
            /// Probably some logging.
            return false;
        }

        disengage();
        return true;
    }

    void disengage() {
        _path_to_file.clear();
    }

private:
    std::string _path_to_file;
};

void foo()
{
    /// Pick path to temp file.
    std::string path_to_temp_file = "...";

    /// Create file.
    /// ...

    /// Set up unlink guard.
    UnlinkGuard unlink_guard(path_to_temp_file);

    /// Call some potentially unsafe library function that can cause process to be killed either:
    ///  * by a SIGSEGV
    ///  * by out of memory
    /// ...

    /// Work done, file content is appropriate.
    /// Rename tmp file.
    /// ...

    /// Disengage unlink guard.
    unlink_guard.disengage();
}

On success I use file. On failure I want this file to be missing.

This could be achived if POSIX had support for link()-ing of previously unlinked file by file descriptor, but there is no such feature :(.

Corliss answered 11/9, 2014 at 7:16 Comment(20)

That doesn't sound quite right from my experience. If you overallocate memory, you get killed from the oom-killer, which uses a different signal than SIGSEGV. Can you explain exactly what you do to achieve this? – Vinculum 11/9, 2014 at 7:22

"Unfortunately there are some resources that are not automatically freed when application terminates, such as File System entities." - um, file descriptors are freed by the kernel when the application terminates. – Airstrip 11/9, 2014 at 7:24

@MatsPetersson (1) allocate 8G memory (split in pieces) on the system with only 4G (x64); (2) write to first byte of every page. – Corliss 11/9, 2014 at 7:25

@DanielKamilKozar (1) files are not unlinked; (2) directories are not removed; ... – Corliss 11/9, 2014 at 7:26

@grenscape Have you checked the return value of your 8GB pieces ? If not, you probably wrote to address 0 since AFAIK, all heap memory allocation functions should return 0 if not enough memory available ( someone correct me if im wrong ). And that write to address 0 causes your sigsegv. – Mange 11/9, 2014 at 7:34

@Mange If you use non-throwing new, it will return a null pointer on failure. But the normal new just throws std::bad_alloc. – Minestrone 11/9, 2014 at 7:37

@Slyps, now on UNIX like systems. There you can allocate much more memory than the size of available physical memory. It is called a virtual memory. Every page of that memory backed with real physical memory only when you access the page first time. – Corliss 11/9, 2014 at 7:37

Although I'm pretty convinced that you're over thinking this, I'm eager to read an answer from one of the C++ experts around here. – Airstrip 11/9, 2014 at 7:41

@Angew you will get null pointer/exception only when the process is out of virtual address space. On x64 it's practically impossible. It does not fail because new (malloc()) is based on mmap(). And mmap() just let's say extends virtual address space. – Corliss 11/9, 2014 at 7:42

Ok, so I did what you describe, and what I got was a "killed" application, as per what I expected. I have done this quite a few times at work - run something that either has a bug that allocates more memory than it should, or simply doesn't fit in the memory available on that hardware, and it always gets killed by the OOM-killer. Which is not the same as "SIGSEGV". So I would like to know what makes it SIGSEGV rather than OOM-killed? Is it dying in a system call? Do you try to resist the OOM-killer with some magic code? – Vinculum 11/9, 2014 at 7:43

And I think you do have something fundamentally wrong in your applcation if you RELY on overcommit in that way - if nothing else, it makes it INCREDIBLY slow. – Vinculum 11/9, 2014 at 7:44

@Corliss I could not replicate your 8x4gB page-write issue (thankfully my rig has a decent SSD, or the paging would have been brutal). So how again are you causing this? And your desired cleanup behavior being averted due to a hard unhanded signal would indeed leave remnants, but nothing process-wise (as far as the OS is concerned) failed to "free". All open kernel resources on behalf of your process are gone once your process is gone. – Matthei 11/9, 2014 at 7:44

@WhozCraig, swap is turned off. with out swap it is easier to reproduce. @MatsPetersson, @Matthei I may be wrong that specifically on Linux it's SIGSEGV rather than being killed by a pager. Maybe it's on FreeBSD this way. We have both kind of servers with different versions of FreeBSD. So, if this is a pager who kills application, is there an elegant way of handle it? – Corliss 11/9, 2014 at 7:48

@Corliss If it is on freebsd, I would have expected my OSX machine to puke, but I didn't have swap turned off for the process, so ymmv. good to know. – Matthei 11/9, 2014 at 7:54

It would help to give an actual example (preferably a code sample) showing a resource that is not released when the process is killed. "File system entities" is really vague. – Rentschler 11/9, 2014 at 8:2

I wouldn't worry about "how to work around this issue", I would work around the issue that you are using much more memory than what is available in the system... And I don't know if swap is on or off on my Linux boards that I use at work, but they certainly don't have a swap partition, on the other hand, when they are "stuck" just before being killed, the kswapd process is using 99% cpu (probably trying to find pages to swap out, for example parts of executable - and it's probably playing a very large version of a 15-game with the last memory page(s) available) – Vinculum 11/9, 2014 at 8:4

I think the 'filesystem entities' issue is that the OP is creating temporary files & directories and relying on the destructors to delete them again (good RAII design). Is that right? – Brno 11/9, 2014 at 8:18

@Tom, yes that is correct. – Corliss 11/9, 2014 at 9:30

In which case, the solution might be to put all your temporaries in one place, then write a shell script that runs your program and then deletes everything out of that temporary location. – Brno 11/9, 2014 at 10:25

Note that both Windows and FreeBSD do not overcommit (this is tunable in FreeBSD). I.e. malloc will fail on theses systems if they cannot guarantee that the memory is actually available. This is why these systems require swap to make full use of the RAM. It doesn't mean the swap gets used actually, because usually not all the memory reserved is used, but the systems guarantee that it has a physical place for all the memory allocated. – Rebbecarebbecca 7/11, 2016 at 8:59

So, what I am asking is for a nice, elegant solution to this kind of problems.

None exists, neither for C++ nor for other languages. You are faced with a fundamental physical reality here, not a design decision: what happens when the user pulls the plug? No programming solution can guard against that (well, there’s restore-upon-restart).

What you can do is catch POSIX signals and sometimes you can even handle them – but it’s flakey and there are tons of caveats, which another discussion on Stack Overflow details.

Most resources should not be cleared up after a segfault. If you want to do it anyway, simply collect those resources (or rather, handlers for their cleanup) in a global array, trap SIGSEGV, iterate through the cleanup routine array in the handler (hoping that the relevant memory is still intact), and perform the cleanup.

More specifically, for temporary files it helps to create them inside one of the system’s temporary folders. It’s understood that these don’t always get cleaned up by their respective applications, and either the system or the user will periodically perform cleanup instead.

Topless answered 11/9, 2014 at 9:52 Comment(1)

I accept your answer and not the Paul's cause you stated it clearly: No programming solution can guard against that. I 99% knew this would be the answer, but hope dies the last :(. There are cases when it's not possible to have single temporary directory. I rely on rename(). You can't rename files across file systems. And I rely on it's atomicity. And also there not always cheap to detect those dangling files when they are (potentially) spread across many directories. – Corliss 11/9, 2014 at 10:6

Usually the solution, regardless of language or OS, is to clean up when you start the program, not (only) when you terminate. If your program can create temporary files that it cleans up on shutdown, clean up the temporary files when you start the program too.

Most everything else, like file handles, tcp connections, and so forth, is killed by the OS when your application dies.

Holliehollifield answered 11/9, 2014 at 9:55 Comment(3)

Yes, that's what I do now, if possible. But sometimes it's impossible to do that on the start because it would take time where start must be done as soon as possible. I am talking about server application. – Corliss 11/9, 2014 at 10:9

Correctness > Speed. Figuring out what's wrong with a corrupt application and cleaning things up manually also takes time. The 3 things I can think of to speedup the cleanup are (a) optimize the resources to cleanup, e.g. instead of having to check for the existence of various temp files you just delete an entire folder where the "cleanup" files are placed when they are created, or (b) a nested application with subsystems that do lazy initialization and cleanup on initialization, or (c) split off some logic into independent processes, which survive a segfault of the main application. – Holliehollifield 11/9, 2014 at 10:17

yeah, I have considered those point, but not all are applicable in my case. (a) - I indeed have separate folder, but I don't do explicit remove since in most cases folder is empty and attempting remove on every small request is a performance hit (I think). (b) - same performance hit. (c) - spawning new process is too much overhead and also complicates thing too much. Thanks anyway. – Corliss 11/9, 2014 at 10:40

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Edit:

Recommended topics

Hot tags