How to optimize paging for large in memory database
Asked Answered
E

6

7

I have an application where the entire database is implemented in memory using a stl-map for each table in the database.

Each item in the stl-map is a complex object with references to other items in the other stl-maps.

The application works with a large amount of data, so it uses more than 500 MByte RAM. Clients are able to contact the application and get a filtered version of the entire database. This is done by running through the entire database, and finding items relevant for the client.

When the application have been running for an hour or so, then Windows 2003 SP2 starts to page out parts of the RAM for the application (Eventhough there is 16 GByte RAM on the machine).

After the application have been partly paged out then a client logon takes a long time (10 mins) because it now generates a page fault for each pointer lookup in the stl-map. If running the client logon a second time right after then it is fast (few secs) because all the memory is now back in RAM.

I can see it is possible to tell Windows to lock memory in RAM, but this is generally only recommended for device drivers, and only for "small" amounts of memory.

I guess a poor mans solution could be to loop through the entire memory database, and thus tell Windows we are still interested in keeping the datamodel in RAM.

I guess another poor mans solution could be to disable the pagefile completely on Windows.

I guess the expensive solution would be a SQL database, and then rewrite the entire application to use a database layer. Then hopefully the database system will have implemented means to for fast access.

Are there other more elegant solutions ?

Espadrille answered 7/6, 2010 at 13:36 Comment(5)
The application runs as Windows service, but still have a console window (Uses AllocConsole). Wonder if Windows reacts to this console window being minimized, and then decides to trim the working set.Espadrille
Also noticed that many working buffers was allocated using new or malloc but without using a uniform chunk size (This is an old application). By adjusting the allocation size to be dividable by 1024, then it halved the virtual bytes for the application.Espadrille
Have now used ProcDump to register stack traces when it was very busy. It revealed that it spent a lot of time on many large new/malloc operations. Have now implemented better buffer reuse, but I'm still puzzled why the first client logon takes time, and second time it is fast.Espadrille
Greetings, did you ever solve the unnecessary paging problem? And If yes, how?Disarrange
@Disarrange The problem became very small when changing code, so instead of each operation performed its own memory allocations and then freed the memory again. Then it was changed into a single memory allocation that was reused by all operations. By optimizing the memory requests of our program, then we were not hit by the Windows memory manager.Espadrille
R
5

This sounds like either a memory leak, or a serious fragmentation problem. It seems to me that the first step would be to figure out what's causing 500 Mb of data to use up 16 Gb of RAM and still want more.

Edit: Windows has a working set trimmer that actively attempts to page out idle data. The basic idea is that it goes through and marks pages as being available, but leaves the data in them (and the virtual memory manager knows what data is in them). If, however, you attempt to access that memory before it's allocated to other purposes, it'll be marked as being in use again, which will normally prevent it from being paged out.

If you really think this is the source of your problem, you can indirectly control the working set trimmer by calling SetProcessWorkingSetSize. At least in my experience, this is only rarely of much use, but you may be in one of those unusual situations where it's really helpful.

Radioelement answered 7/6, 2010 at 13:43 Comment(5)
I agree - sounds like a leak to me. Have you tried using Valgrind?Austen
I can't find where he says he has only 500 MB of data which use 16 GB of RAM. On the other hand, i also do not understand why the OP mentoins 500 MB of RAM explicitly. Anyway, i do agree with the memory leak idea.Tsimshian
@PeterK:well, he says "over 500MByte", which I presume means only slightly over 500 MByte. In any case, it sounds like it's fast enough to start with, but eventually uses enough memory to start thrashing...Radioelement
The application uses 500 MByte memory, and is running on a computer with 16 GByte RAM. Eventhough there is plenty of memory, then Windows 2003 is rather aggressive about paging out aging memory pages to free even more memory. You can see the same behavior if you leave a Windows XP computer alone for a long time, and then awakes it. It takes a while before the applications reacts as Windows XP have paged them all out.Espadrille
Thank you for the updated tip about SetProcessWorkingSetSize. Will try to google it and see how others uses this API call.Espadrille
G
2

As @Jerry Coffin said, it really sounds like your actual problem is a memory leak. Fix that.

But for the record, none of your "poor mans solutions" would work. At all.

Windows pages out some of your data because there's not room for it in RAM. Looping through the entire memory database would load in every byte of the data model, yes... which would cause other parts of it to be paged out. In the end, you'd generate a lot of page faults, and the only difference in the end would be which parts of the data structure are paged out.

Disabling the page file? Yes, if you think a hard crash is better than low performance. Windows doesn't page data out because it's fun. It does that to handle situations where it would otherwise run out of memory. If you disable the pagefile, the app will just crash when it would otherwise page out data.

If your dataset really is so big it doesn't fit in memory, then I don't see why an SQL database would be especially "expensive". Unlike your current solution, databases are optimized for this purpose. They're meant to handle datasets too large to fit in memory, and to do this efficiently.

It sounds like you have a memory leak. Fixing that would be the elegant, efficient and correct solution.

If you can't do that, then either

  • throw more RAM at the problem (the app ends up using 16GB? Throw 32 or 64GB at it then), or
  • switch to a format that's optimized for efficient disk access (A SQL database probably)
Galvanometer answered 7/6, 2010 at 14:50 Comment(3)
Again the application is only using 500 MByte RAM when using the Task Manager. The problem is how the Windows paging algorithm is swapping out the application eventhough there is enough RAM.Espadrille
@snakefoot: no. Windows doesn't do that. And Task Manager is not a reliable way to determine memory usage.Galvanometer
Have you ever had a Windows XP machine, that was left alone for several hours, and then you start using it. The first few minutes everything is quite sluggish because the memory manager have paged out most of the memory. I want to encourage Windows to keep my application in memory.Espadrille
C
0

---- Edit

Given snakefoot explanation, the problem is swapping out memory that is not used for a longer period of time and due to this not having the data in memory when needed. This is the same as this:

Can I tell Windows not to swap out a particular processes’ memory?

and VirtualLock function should do its job:

http://msdn.microsoft.com/en-us/library/aa366895(VS.85).aspx

---- Previous answer

First of all you need to distinguish between memory leak and memory need problems.

If you have a memory leak then it would be bigger effort to convert entire application to SQL than to debug the application.

SQL cannot be faster then a well designed, domain specific in-memory database and if you have bugs, chances are you will have different ones in an SQL version as well.

If this is a memory need problem, then you will need to switch to SQL anyway and this sounds like a good moment.

Cosecant answered 7/6, 2010 at 13:49 Comment(4)
I don't think there is a memory leak issues, since the application doesn't use more RAM over time. It just doesn't touch all the allocated memory constantly, so the Windows 2003 Memory manager thinks it is okay to page out the memory. The Windows 2003 Memory manager pages out memory eventhough there is plenty of memory in the machine.Espadrille
I agree snakefoot, in theory memory should just be mirrored* to the pagefile until it really needs to be squeezed out. But it can seem to get reallocated much before it needs to be.Buccaneer
Not sure I want the behavior of Virtual Lock, since it prevents Windows from ever paging out the application even if memory is needed for critical situations. I would rather prefere a solution where one could tell Windows not be be so aggressive about my application.Espadrille
@snakefoot - there are situations when you need to accept the world and not try to define it ;) VirtualLock seems OK in your scenario. Just ensure that you have twice the memory you lock.Cosecant
K
0

We have a similar problem and the solution we choose was to allocate everything in a shared memory block. AFAIK, Windows doesn't page this out. However, using stl-map here is not for faint of heart either and was beyond what we required.

We are using Boost Shared Memory to implement this for us and it works well. Follow examples closely and you will be up and running quickly. Boost also has Boost.MultiIndex that will do a lot of what you want.

For a no cost sql solution have you looked at Sqlite? They have an option to run as an in memory database.

Good luck, sounds like an interesting application.

Kaufmann answered 7/6, 2010 at 14:55 Comment(2)
Actually the benefits of not having to map to a database layer really gives a lot of freedom. We only serialize to XML, when needing persistence. Using XML also makes it easy to integrate with other applications as one can use stylesheets during import / export.Espadrille
@snakefoot - I can fully agree with point on db. I was suggesting SQLite only because it was it was easy to make it in-memory db.Kaufmann
G
0

I have an application where the entire database is implemented in memory using a stl-map for each table in the database.

That's the start of the end: STL's std::map is extremely memory inefficient. Same applies to std::list. Every element would be allocated separately causing rather serious memory waste. I often use std::vector + sort() + find() instead of std::map in applications where it is possible (more searches than modifications) and I know in advance memory usage might become an issue.

When the application have been running for an hour or so, then Windows 2003 SP2 starts to page out parts of the RAM for the application (Eventhough there is 16 GByte RAM on the machine).

Hard to tell without knowing how your application is written. Windows has the feature to unload from RAM whatever memory of idle applications can be unloaded. But that normally affects memory mapped files and alike.

Otherwise, I would strongly suggest to read up the Windows memory management documentation . It is not very easy to understand, yet Windows has all sorts and types of memory available to applications. I never had luck with it, but probably in your application using custom std::allocator would work.

Glantz answered 7/6, 2010 at 18:3 Comment(2)
The problem would be to write a STL allocator for these other types of memory. Together with the fact that non-paged memory is a limited resource. I just want to encourage Windows to keep my application in memory instead of paging it out.Espadrille
"Together with the fact that non-paged memory is a limited resource." Well, if you know precisely what applications are going to run on the server and what their memory requirements, then it is perfectly OK to also use non-paged memory. That way one robs OS from physical RAM what is generally regarded as bad. But if task requires the storage to be in RAM for guaranteed fast access, then there is little choice.Glantz
B
0

I can believe it is the fault of flawed pagefile behaviour -i've run my laptops mostly with pagefile turned off since nt4.0. In my experience, at least up to XP Pro, Windows intrusively swaps pages out just to provide the dubious benefit of having a really-really-slow extension to the maximum working set space.

Ask what benefit swapping to harddisk is achieving with 16 Gigabityes of real RAM available? If your working set it so big as to need more virtual memory than +10 Gigs, then once swapping is actualy required processes will take anything from a bit longer, to thousands of times longer to complete. On Windows the untameable file system cache seems to antagonise the relationships.

Now when I (very) occasionaly run out of working set on my XP laptops, there is no traffic jam, the guilty app just crashes. A utility to suspend memory glugging processes before that time and make an alert would be nice, but there is no such thing just a violation, a crash, and sometimes explorer.exe goes down too.

Pagefiles - who needs em'

Buccaneer answered 7/6, 2010 at 19:22 Comment(1)
Well the pagefile and the paging alogoritm were invented at time where computers only had 16 MByte RAM, a lot of application depends on this behavior and would probably break if they changed it. I believe Microsoft have changed things with Windows 2008, so it actually tries to maximize the use of memory. But right now we are using Windows 2003 SP2.Espadrille

© 2022 - 2024 — McMap. All rights reserved.