Java 7's nio.file package is uber slow at creating new files

B

2

6

I'm trying to create 300M files from a java program, I switched from the old file API to the new java 7 nio package, but the new package is going even slower than the old one.

I see less CPU utilization than I did when I was using the old file API, but I'm running this simple code and I'm getting 0.5Mbytes/sec file transfer rates and the writes from java are reading off one disk and writing to another (the write is the only process accessing the disk).

Files.write(FileSystems.getDefault().getPath(filePath), fiveToTenKBytes, StandardOpenOption.CREATE);

Is there any hope of getting a reasonable throughput here?

Update:

I'm unpacking 300 million 5-10k byte image files from large files. I have 3 disks, 1 local and 2 SAN attached (all have a typical throughput rate of ~20MB/sec on large files).

I've also tried this code which improved speed to barely less than 2MB/sec throughput (9ish days to unpack these files).

ByteBuffer byteBuffer = ByteBuffer.wrap(imageBinary, 0, (BytesWritable)value).getLength());
FileOutputStream fos = new FileOutputStream( imageFile );
fos.getChannel().write(byteBuffer);
fos.close();

I read from the local disk and write to the SAN attached disk. I'm reading from a Hadoop SequenceFile format, hadoop is typically able to read these files at 20MB/sec using basically the same code.

The only thing that appears out of place, other than the uber slowness, is that I see more read IO than write IO by about 2:1, though the sequence file is gziped (images get virtually a 1:1 ratio though), so the compressed file should be approx. 1:1 with the output.

2nd UPDATE

Looking at iostat I see some odd numbers, we're looking at xvdf here, I have one java process reading from xvdb and writing to xvdf and no ohter processes active on xvdf

iostat -d 30
Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvdap1            1.37         5.60         4.13        168        124
xvdb             14.80       620.00         0.00      18600          0
xvdap3            0.00         0.00         0.00          0          0
xvdf            668.50      2638.40       282.27      79152       8468
xvdg           1052.70      3751.87      2315.47     112556      69464

The reads on xvdf are 10x the writes, that's unbelievable.

fstab
/dev/xvdf       /mnt/ebs1       auto    defaults,noatime,nodiratime     0       0
/dev/xvdg       /mnt/ebs2       auto    defaults,noatime,nodiratime     0       0

Beldam answered 15/3, 2013 at 13:34 Comment(5)

How big are these files? – Harrod 15/3, 2013 at 14:7

@Harrod "I'm trying to create 300M files [...]" – Overhear 15/3, 2013 at 14:14

I read that as "I'm trying to create 300 million (or thousand) files", not "I'm trying to create one file that's 300 Mb in size" (otherwise, why use "M" and not "Mb"?). – Harrod 15/3, 2013 at 14:15

Second question: are these disks locally attached or accessed over a network? – Harrod 15/3, 2013 at 14:18

300 million 5-10k byte image files. On AWS unpacking from a large 12GB file on local disk to a SAN attached disk, both of which have typical large-file throughput rates of about 20MB/sec. – Beldam 16/3, 2013 at 2:7

H

1

I think your slowness is coming from creating new files, not actual transfer. I believe that creating a file is a synchronous operation in Linux: the system call will not return until the file has been created and the directory updated. This suggests a couple of things you can do:

Use multiple writer threads with a single reader thread. The reader thread will read data from the source file into a byte[], then create a Runnable that writes the output file from this array. Use a threadpool with lots of threads -- maybe 100 or more -- because they'll be spending most of their time waiting for the creat to complete. Set the capacity of this pool's inbound queue based on the amount of memory you have: if your files are 10k in size, then a queue capacity of 1,000 seems reasonable (there's no good reason to allow the reader to get too far ahead of the writers, so you could even go with a capacity of twice the number of threads).
Rather than NIO, use basic BufferedInputStreams and BufferedOutputStreams. Your problem here is syscalls, not memory speed (the NIO classes are designed to prevent copies between heap and off-heap memory).

I'm going to assume that you already know not to attempt to store all the files into a single directory. Or even store more than a few hundred files in one directory.

And as another alternative, have you considered S3 for storage? I'm guessing that its bucket keys are far more efficient than actual directories, and there is a filesystem that lets you access buckets as if they were files (haven't tried it myself).

Harrod answered 16/3, 2013 at 18:54 Comment(3)

I did create 2 processes doing this and the disk speeds dropped dramatically, but the aggregate of 2 processes was 2MB/sec, a bit better, but it didn't look like more async processes would help the situation. As for S3, that was my first thought and it failed with a huge explosion. 2 weeks online with their techs trying to get 300M files to upload failed and cost me 10k in usage charges, even if it worked the first time (which it will certainly not) you're talking 3k just to upload the files. Watch those little $0.10 / 100 puts charge, it creeps up on you real quick!! – Beldam 17/3, 2013 at 3:13

I'm now trying large files (which I can create magnificently fast), and storing a pointer to the bytes in the large file. This is all going much more smoothly so far, and it's the approach facebook uses as I read. I'll post on it's successfullness when I'm finished. – Beldam 17/3, 2013 at 3:15

Final result: Don't do 300M small files. We're moving to a more complex system in which we load the binary data into large files and keep an index offset to the binary data. We're also experimenting with large mysql/myisam tables as a good option. – Beldam 17/3, 2013 at 13:17

O

2

If I understood your code correctly, you're splitting/writing the 300M files in small chunks ("fiveToTenKBytes").

Consider to use a Stream approach.

If you're writing to a disk, consider to wrap the OutputStream with a BufferedOutputStream.

E.g. something like:

try (BufferedOutputStream bos = new BufferedOutputStream(Files.newOutputStream(Paths.getPath(filePathString), StandardOpenOption.CREATE))){

 ...

}

Overhear answered 15/3, 2013 at 14:0 Comment(3)

@JoachimSauer Thanks for editing, but StackOverflow has issues with method links... – Overhear 15/3, 2013 at 14:13

I know, but the link I added worked fine (at least for me). And the one that stands now only brings you to the Files documentation, because of the space in it. – Syntax 15/3, 2013 at 14:14

See the update in the question for answers, I believe I am using a buffered approach. – Beldam 16/3, 2013 at 2:14

H

1

I think your slowness is coming from creating new files, not actual transfer. I believe that creating a file is a synchronous operation in Linux: the system call will not return until the file has been created and the directory updated. This suggests a couple of things you can do:

Use multiple writer threads with a single reader thread. The reader thread will read data from the source file into a byte[], then create a Runnable that writes the output file from this array. Use a threadpool with lots of threads -- maybe 100 or more -- because they'll be spending most of their time waiting for the creat to complete. Set the capacity of this pool's inbound queue based on the amount of memory you have: if your files are 10k in size, then a queue capacity of 1,000 seems reasonable (there's no good reason to allow the reader to get too far ahead of the writers, so you could even go with a capacity of twice the number of threads).
Rather than NIO, use basic BufferedInputStreams and BufferedOutputStreams. Your problem here is syscalls, not memory speed (the NIO classes are designed to prevent copies between heap and off-heap memory).

I'm going to assume that you already know not to attempt to store all the files into a single directory. Or even store more than a few hundred files in one directory.

And as another alternative, have you considered S3 for storage? I'm guessing that its bucket keys are far more efficient than actual directories, and there is a filesystem that lets you access buckets as if they were files (haven't tried it myself).

Harrod answered 16/3, 2013 at 18:54 Comment(3)

I did create 2 processes doing this and the disk speeds dropped dramatically, but the aggregate of 2 processes was 2MB/sec, a bit better, but it didn't look like more async processes would help the situation. As for S3, that was my first thought and it failed with a huge explosion. 2 weeks online with their techs trying to get 300M files to upload failed and cost me 10k in usage charges, even if it worked the first time (which it will certainly not) you're talking 3k just to upload the files. Watch those little $0.10 / 100 puts charge, it creeps up on you real quick!! – Beldam 17/3, 2013 at 3:13

I'm now trying large files (which I can create magnificently fast), and storing a pointer to the bytes in the large file. This is all going much more smoothly so far, and it's the approach facebook uses as I read. I'll post on it's successfullness when I'm finished. – Beldam 17/3, 2013 at 3:15

Final result: Don't do 300M small files. We're moving to a more complex system in which we load the binary data into large files and keep an index offset to the binary data. We're also experimenting with large mysql/myisam tables as a good option. – Beldam 17/3, 2013 at 13:17

Recommended topics

Hot tags