Concurrency of RandomAccessFile in Java
Asked Answered
E

2

19

I am creating a RandomAccessFile object to write to a file (on SSD) by multiple threads. Each thread tries to write a direct byte buffer at a specific position within the file and I ensure that the position at which a thread writes won't overlap with another thread:

file_.getChannel().write(buffer, position);

where file_ is an instance of RandomAccessFile and buffer is a direct byte buffer.

For the RandomAccessFile object, since I'm not using fallocate to allocate the file, and the file's length is changing, will this utilize the concurrency of the underlying media?

If it is not, is there any point in using the above function without calling fallocate while creating the file?

Ellerd answered 30/7, 2017 at 4:7 Comment(4)
Related questionWebb
This question is different since I am using the getChannel interface to write at a specific position rather than change the file's current positionEllerd
Yes, they are different enough which is why I did not use the mighty Mjölnir to close this question, but there are bits within the other question and its answer which holds relevance here.Webb
From the documentation of FileChannel, your question answers itself, it depends on the OS implementation : "File channels are safe for use by multiple concurrent threads. [...] Other operations, in particular those that take an explicit position, may proceed concurrently; whether they in fact do so is dependent upon the underlying implementation and is therefore unspecified."Ileum
I
9

I made some testing with the following code :

   public class App {
    public static CountDownLatch latch;

    public static void main(String[] args) throws InterruptedException, IOException {
        File f = new File("test.txt");
        RandomAccessFile file = new RandomAccessFile("test.txt", "rw");
        latch = new CountDownLatch(5);
        for (int i = 0; i < 5; i++) {
            Thread t = new Thread(new WritingThread(i, (long) i * 10, file.getChannel()));
            t.start();

        }
        latch.await();
        file.close();
        InputStream fileR = new FileInputStream("test.txt");
        byte[] bytes = IOUtils.toByteArray(fileR);
        for (int i = 0; i < bytes.length; i++) {
            System.out.println(bytes[i]);

        }  
    }

    public static class WritingThread implements Runnable {
        private long startPosition = 0;
        private FileChannel channel;
        private int id;

        public WritingThread(int id, long startPosition, FileChannel channel) {
            super();
            this.startPosition = startPosition;
            this.channel = channel;
            this.id = id;

        }

        private ByteBuffer generateStaticBytes() {
            ByteBuffer buf = ByteBuffer.allocate(10);
            byte[] b = new byte[10];
            for (int i = 0; i < 10; i++) {
                b[i] = (byte) (this.id * 10 + i);

            }
            buf.put(b);
            buf.flip();
            return buf;

        }

        @Override
        public void run() {
            Random r = new Random();
            while (r.nextInt(100) != 50) {
                try {
                    System.out.println("Thread  " + id + " is Writing");
                    this.channel.write(this.generateStaticBytes(), this.startPosition);
                    this.startPosition += 10;
                } catch (IOException e) {
                    e.printStackTrace();

                }
            }
            latch.countDown();
        }
    }
}

So far what I've seen:

  • Windows 7 (NTFS partition): Run linearly (aka one thread writes and when it is over, another one gets to run)

  • Linux Parrot 4.8.15 (ext4 partition) (Debian based distro), with Linux Kernel 4.8.0: Threads intermingle during the execution

Again as the documentation says:

File channels are safe for use by multiple concurrent threads. The close method may be invoked at any time, as specified by the Channel interface. Only one operation that involves the channel's position or can change its file's size may be in progress at any given time; attempts to initiate a second such operation while the first is still in progress will block until the first operation completes. Other operations, in particular those that take an explicit position, may proceed concurrently; whether they in fact do so is dependent upon the underlying implementation and is therefore unspecified.

So I'd suggest to first give it a try and see if the OS(es) you are going to deploy your code to (possibly the filesystem type) support parallel execution of a FileChannel.write call

Edit: As pointed out to, the above does not mean that threads can write concurrently to the file, it is actually the opposite as the write call behave according to the contract of a WritableByteChannel which clearly specifies that only one thread at a time can write to a given file:

If one thread initiates a write operation upon a channel then any other thread that attempts to initiate another write operation will block until the first operation is complete

Ileum answered 3/8, 2017 at 17:1 Comment(2)
Yes, but my question is, since writing to a specific position will always alter the length of the file (unless it is before the file's current eof), won't all such operations always be serialized (irrespective of the underlying implementation)?Ellerd
@Ellerd EOF is not so much of an issue, as the file is grown to accommodate a buffer superior to the file size. If by "serialized" you mean that only one thread can write at any given time, in that case the answer is yes as the write call from a FileChannel behaves as the WritableByteChannel.Ileum
H
8

As the documentation states and Adonis already mentions this, a write can only be performed by one thread at a time. You won't achieve performance gains through concurreny, moreover, you should only worry about performance if it's an actual issue, because writing concurrently to a disk may actually degrade your performance (probably less for SSDs than HDDs).

The underlying media is in most cases (SSD, HDD, Network) single-threaded - actually, there is no such thing as a thread on hardware level, threads are nothing but an abstraction.

In your case the media is an SSD. While the SSD internally may write data to multiple modules concurrently ( they may reach a level of parallism where writes may be as fast and even outperform a read), the internal mapping datastructures are shared resource and therefore contended, especially on frequent updates such as concurrent writes. Nevertheless, the updates of this datastructure is quite fast and therefore nothing to worry about unless it becomes a problem.

But apart from this, those are just internals of the SSD. On the outside you communicate over a Serial ATA interface, thus one-byte-at-a-time (actually packets in a Frame Information Structure, FIS). On top of this is a OS/Filesystem that again has a probably contended datastructure and/or applies their own means of optimization such as write-behind-caching.

Further, as you know what your media is, you may optimize especially for that and SSDs are really fast when one single threads writes a large piece of data.

Thus, instead of using multiple threads for writing, you may create a large In-Memory Buffer (probably consider a memory-mapped file) and write concurrently into this buffer. The memory itself is not contended, as long as you ensure each thread access it's own address space of the buffer. Once all threads are done, you write this one buffer to the SSD (not needed if using memory-mapped file).

See also this good summary about developing for SSDs: A Summary – What every programmer should know about solid-state drives

The point for doing pre-allocation (or to be more precise, file_.setLength(), which acutally maps to ftruncate) is that the resizing of the file may use extra-cycles and you may wan't to avoid that. But again, this may depend on the OS/Filesystem.

Horsehair answered 7/8, 2017 at 11:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.