most efficient way to write data into a file
Asked Answered
G

5

6

I want to write 2TB data into one file, in the future it might be a petabyte.

The data is composed of all '1'. For example, 2TB data consisting of "1111111111111......11111" (each byte is represented by '1').

Following is my way:

File.open("data",File::RDWR||File::CREAT) do |file|
  2*1024*1024*1024*1024.times do
  file.write('1')
  end
end

That means, File.write is called 2TB times. From the point of Ruby, is there a better way to implement it?

Godly answered 8/8, 2012 at 20:46 Comment(2)
i guess it would be more faster to create the string before writing it instead of calling write each time. strings in ruby are mutuable, so you don't need to create a new string on permutations.Oxalis
Do you want binary 1 bits (0b1111) or ASCII "1" (0x31) on the disk?Nominalism
T
8

You have a few problems:

  1. File::RDWR||File::CREAT always evaluates to File::RDWR. You mean File::RDWR|File::CREAT (| rather than ||).

  2. 2*1024*1024*1024*1024.times do runs the loop 1024 times then multiplies the result of the loop by the stuff on the left. You mean (2*1024*1024*1024*1024).times do.

Regarding your question, I get significant speedup by writing 1024 bytes at a time:

File.open("data",File::RDWR|File::CREAT) do |file|
  buf = "1" * 1024
  (2*1024*1024*1024).times do
    file.write(buf)
  end
end

You might experiment and find a better buffer size than 1024.

Terresaterrestrial answered 8/8, 2012 at 21:34 Comment(2)
You may also write to shift the bits instead of multipling which would be a little faster (2<<10<10<<10) == (2<<30) for (2*1024*1024*1024)Ucayali
@Ucayali Sure, but it's a calculation that only happens once, so that wouldn't make a measurable difference, just a theoretical one measured in nanoseconds. We could really save time by just writing 2147483648 directly ;)Terresaterrestrial
S
1

Don't know which OS you are using but the fastest approach would be to us a system copy to concatenate files to one big file, you can script that. An example. If you start with a string like "1" and echo it to a file

echo "1" > file1

you can concatenate this file with itself a number of time to a new file, in windows you have to use the parameter /b for binary copy to do that.

copy /b file1+file1 file2

gives you a file2 of 12 bytes (including the CR)

copy file2+file2 file1

gives you 24 bytes etc

I will let the math (and the fun of Rubying this) to you but you will reach your size quick enough and probably faster than the accepted answer.

Submariner answered 9/8, 2012 at 0:8 Comment(0)
D
0

A related answer, if you want to write binary zeros with any size, just do this using the dd command (Linux/Mac):

dd if=/dev/zero of=output_file bs=128K count=8000

bs is the block size (number of bytes to read/write at a time. count is the number of blocks. The above line writes 1 Gegabyte of zeros in output_file in just 10 seconds on my machine:

1048576000 bytes (1.0 GB) copied, 10.275 s, 102 MB/s

Could be inspiring to someone!

Drandell answered 10/9, 2013 at 13:33 Comment(0)
G
0

You can set the file.sync to false. Then it will write data to disk in batches instead of one by one.

File.open("data",File::RDWR||File::CREAT) do |file|
  file.sync = false
  2*1024*1024*1024*1024.times do
  file.write('1')
  end
end
Govea answered 23/5 at 1:29 Comment(0)
C
-2

The data is all ones? Then there is no need to write the ones, just write the number of ones.

file.write( 2*1024*1024*1024*1024 )

Simple, yes?

Caravan answered 8/8, 2012 at 21:32 Comment(2)
I'm giving the questioner the benefit of the doubt and assuming there's actually a reason they want to do what they're asking.Terresaterrestrial
Well, questioner should have said 'random ones and zeros'... But, in nearly all cases runtime encoding will be a better solution anyway.Caravan

© 2022 - 2024 — McMap. All rights reserved.