Performance / stability of a Memory Mapped file - Native or MappedByteBuffer - vs. plain ol' FileOutputStream
Asked Answered
V

7

12

I support a legacy Java application that uses flat files (plain text) for persistence. Due to the nature of the application, the size of these files can reach 100s MB per day, and often the limiting factor in application performance is file IO. Currently, the application uses a plain ol' java.io.FileOutputStream to write data to disk.

Recently, we've had several developers assert that using memory-mapped files, implemented in native code (C/C++) and accessed via JNI, would provide greater performance. However, FileOutputStream already uses native methods for its core methods (i.e. write(byte[])), so it appears a tenuous assumption without hard data or at least anecdotal evidence.

I have several questions on this:

  1. Is this assertion really true? Will memory mapped files always provide faster IO compared to Java's FileOutputStream?

  2. Does the class MappedByteBuffer accessed from a FileChannel provide the same functionality as a native memory mapped file library accessed via JNI? What is MappedByteBuffer lacking that might lead you to use a JNI solution?

  3. What are the risks of using memory-mapped files for disk IO in a production application? That is, applications that have continuous uptime with minimal reboots (once a month, max). Real-life anecdotes from production applications (Java or otherwise) preferred.

Question #3 is important - I could answer this question myself partially by writing a "toy" application that perf tests IO using the various options described above, but by posting to SO I'm hoping for real-world anecdotes / data to chew on.

[EDIT] Clarification - each day of operation, the application creates multiple files that range in size from 100MB to 1 gig. In total, the application might be writing out multiple gigs of data per day.

Valentinevalentino answered 11/2, 2009 at 15:21 Comment(0)
S
4

You might be able to speed things up a bit by examining how your data is being buffered during writes. This tends to be application specific as you would need an idea of the expected data writing patterns. If data consistency is important, there will be tradeoffs here.

If you are just writing out new data to disk from your application, memory mapped I/O probably won't help much. I don't see any reason you would want to invest time in some custom coded native solution. It just seems like too much complexity for your application, from what you have provided so far.

If you are sure you really need better I/O performance - or just O performance in your case, I would look into a hardware solution such as a tuned disk array. Throwing more hardware at the problem is often times more cost effective from a business point of view than spending time optimizing software. It is also usually quicker to implement and more reliable.

In general, there are a lot of pitfalls in over optimization of software. You will introduce new types of problems to your application. You might run into memory issues/ GC thrashing which would lead to more maintenance/tuning. The worst part is that many of these issues will be hard to test before going into production.

If it were my app, I would probably stick with the FileOutputStream with some possibly tuned buffering. After that I'd use the time honored solution of throwing more hardware at it.

Stately answered 11/2, 2009 at 17:52 Comment(1)
Picked this answer to spread the points around. Also, the phrase "or just O performance in your case" really stuck with me.Valentinevalentino
E
6

Memory mapped I/O will not make your disks run faster(!). For linear access it seems a bit pointless.

A NIO mapped buffer is the real thing (usual caveat about any reasonable implementation).

As with other NIO direct allocated buffers, the buffers are not normal memory and wont get GCed as efficiently. If you create many of them you may find that you run out of memory/address space without running out of Java heap. This is obviously a worry with long running processes.

Earthstar answered 11/2, 2009 at 16:10 Comment(2)
The application writes to disk far more frequently (>99%) than it reads. Could you please elaborate on what you mean by "For linear access it seems a bit pointless" - does it apply to append opperations?Valentinevalentino
Append operations would be linear (the filesystem may fragment you file, but that should be a minor issue).Earthstar
S
4

You might be able to speed things up a bit by examining how your data is being buffered during writes. This tends to be application specific as you would need an idea of the expected data writing patterns. If data consistency is important, there will be tradeoffs here.

If you are just writing out new data to disk from your application, memory mapped I/O probably won't help much. I don't see any reason you would want to invest time in some custom coded native solution. It just seems like too much complexity for your application, from what you have provided so far.

If you are sure you really need better I/O performance - or just O performance in your case, I would look into a hardware solution such as a tuned disk array. Throwing more hardware at the problem is often times more cost effective from a business point of view than spending time optimizing software. It is also usually quicker to implement and more reliable.

In general, there are a lot of pitfalls in over optimization of software. You will introduce new types of problems to your application. You might run into memory issues/ GC thrashing which would lead to more maintenance/tuning. The worst part is that many of these issues will be hard to test before going into production.

If it were my app, I would probably stick with the FileOutputStream with some possibly tuned buffering. After that I'd use the time honored solution of throwing more hardware at it.

Stately answered 11/2, 2009 at 17:52 Comment(1)
Picked this answer to spread the points around. Also, the phrase "or just O performance in your case" really stuck with me.Valentinevalentino
S
3

From my experience, memory mapped files perform MUCH better than plain file access in both real time and persistence use cases. I've worked primarily with C++ on Windows, but Linux performances are similar, and you're planning to use JNI anyway, so I think it applies to your problem.

For an example of a persistence engine built on memory mapped file, see Metakit. I've used it in an application where objects were simple views over memory-mapped data, the engine took care of all the mapping stuff behind the curtains. This was both fast and memory efficient (at least compared with traditional approaches like those the previous version used), and we got commit/rollback transactions for free.

In another project I had to write multicast network applications. The data was send in randomized order to minimize the impact of consecutive packet loss (combined with FEC and blocking schemes). Moreover the data could well exceed the address space (video files were larger than 2Gb) so memory allocation was out of question. On the server side, file sections were memory-mapped on demand and the network layer directly picked the data from these views; as a consequence the memory usage was very low. On the receiver side, there was no way to predict the order into which packets were received, so it has to maintain a limited number of active views on the target file, and data was copied directly into these views. When a packet had to be put in an unmapped area, the oldest view was unmapped (and eventually flushed into the file by the system) and replaced by a new view on the destination area. Performances were outstanding, notably because the system did a great job on committing data as a background task, and real-time constraints were easily met.

Since then I'm convinced that even the best fine-crafted software scheme cannot beat the system's default I/O policy with memory-mapped file, because the system knows more than user-space applications about when and how data must be written. Also, what is important to know is that memory mapping is a must when dealing with large data, because the data is never allocated (hence consuming memory) but dynamically mapped into the address space, and managed by the system's virtual memory manager, which is always faster than the heap. So the system always use the memory optimally, and commits data whenever it needs to, behind the application's back without impacting it.

Hope it helps.

Saxhorn answered 11/2, 2009 at 15:52 Comment(0)
U
1

As for point 3 - if the machine crashes and there are any pages that were not flushed to disk, then they are lost. Another thing is the waste of the address space - mapping a file to memory consumes address space (and requires contiguous area), and well, on 32-bit machines it's a bit limited. But you've said about 100MB - so it should not be a problem. And one more thing - expanding the size of the mmaped file requires some work.

By the way, this SO discussion can also give you some insights.

Unasked answered 11/2, 2009 at 15:52 Comment(1)
Actually 100s of MB - so up to a Gig per file. And some deployments of the application have multiple such files! I'll edit to be clearer.Valentinevalentino
P
0

If you write fewer bytes it will be faster. What if you filtered it through gzipoutputstream, or what if you wrote your data into ZipFiles or JarFiles?

Peregrination answered 12/2, 2009 at 14:38 Comment(1)
Then you're trading the overhead of IO operations for the overhead of encoding / decoding the data. It would take some experimentation to see if this was viable.Valentinevalentino
H
0

As mentioned above, use NIO (a.k.a. new IO). There's also a new, new IO coming out.

The proper use of a RAID hard drive solution would help you, but that would be a pain.

I really like the idea of compressing the data. Go for the gzipoutputstream dude! That would double your throughput if the CPU can keep up. It is likely that you can take advantage of the now-standard double-core machines, eh?

-Stosh

Hawthorne answered 21/4, 2009 at 1:59 Comment(0)
A
0

I did a study where I compare the write performance to a raw ByteBuffer versus the write performance to a MappedByteBuffer. Memory-mapped files are supported by the OS and their write latencies are very good as you can see in my benchmark numbers. Performing synchronous writes through a FileChannel is approximately 20 times slower and that's why people do asynchronous logging all the time. In my study I also give an example of how to implement asynchronous logging through a lock-free and garbage-free queue for ultimate performance very close to a raw ByteBuffer.

Aardwolf answered 2/12, 2012 at 20:12 Comment(4)
My finding was that I had to periodically call force() to ensure that my changes made it into the file. Hence, it was slower. Of course, this was several years ago...Valentinevalentino
You do NOT need to call force() unless you want to protect yourself from an OS crash.Aardwolf
The link in the answer is invalid.Mezzo
Can you post the result here instead? Thanks.Milanmilanese

© 2022 - 2024 — McMap. All rights reserved.