Quickly create a large file on a Linux system
Asked Answered
I

16

598

How can I quickly create a large file on a Linux (Red Hat Linux) system?

dd will do the job, but reading from /dev/zero and writing to the drive can take a long time when you need a file several hundreds of GBs in size for testing... If you need to do that repeatedly, the time really adds up.

I don't care about the contents of the file, I just want it to be created quickly. How can this be done?

Using a sparse file won't work for this. I need the file to be allocated disk space.

Interference answered 3/11, 2008 at 3:8 Comment(7)
Ext4 has much better file allocation performance, since whole blocks of up to 100MB can be allocated at once.Galvanism
The 'truncate' command creates a sparse file, by the way. E.g. see en.wikipedia.org/wiki/Sparse_fileLubberly
People seem to be grossly ignoring the "sparse file won't work with this", with their truncate and dd seeks below.Synonymy
You should have defined what you meant by "for testing". Testing the writing speed of your hard disk? Testing what df will report? Testing an app that does something particular. The answer depends on what you want to test. Anyway I'm a bit late -- I see now that it's been years since your question :-)Typeface
random: superuser.com/questions/470949/… |Xanthine
Just in case you are looking for a way to simulate a full partition, like I was, look no further than /dev/fullGermanic
What's the fastest way to generate a 1 GB text file containing random digits?Geanticline
S
740

dd from the other answers is a good solution, but it is slow for this purpose. In Linux (and other POSIX systems), we have fallocate, which uses the desired space without having to actually write anything to it, works with most modern disk based file systems, very fast:

For example:

fallocate -l 10G gentoo_root.img
Schaumberger answered 16/4, 2011 at 18:28 Comment(17)
Is it possible that dd is internally using that already? If I do 'dd if=/dev/zero of=zerofile bs=1G count=1' on a 3.0.0 kernel, the write finishes in 2 seconds, with a write data rate of over 500 megabytes per second. That's clearly impossible on a 2.5" laptop harddrive.Scorn
I have just tried on 3.0.0-14 kernel dd if=/dev/zero of=zerofile bs=1G count=1 and it was very slow: 37,3 MB/s. Maybe it depends on filesystem…Schaumberger
fallocate is exactly what I was looking for.Dunno
This (fallocate) will also not work on a Linux ZFS filesystem - github.com/zfsonlinux/zfs/issues/326Johm
very fast. recommended.Kennithkennon
For those having a problem creating a large file on your filesystem of choice (because fallocate is not allowing it for some reason), you can create a relatively large file (say 2GB) with fallocate, use mv to move the file over to your other filesystem. Then you can use cat to put together the size of file you need. For example, if you have a 2GB file named "a" and you wanted to create a 10 GB file called "b", you can execute "cat a a a a a > b". That will take a bit, but you will have a large file where you want it when fallocate won't create it directly at the location for you.Millpond
^^ Note: The above, of course, will not work if your file system is a 32-bit file system or has file limitations smaller than the file you're trying to create on it.Millpond
fallocate is not supported by ext3 either. bugzilla.redhat.com/show_bug.cgi?id=563492Ru
May I get more information on fallocate? I'm unable to call it in adb shell in android. how to check support for commands like falloctae/truncate/mkfile? does it require any other shell like bash?Exit
In Debian GNU/Linux fallocate is part of the util-linux package. This tool was written by Karel Zak from RedHat and source code can be found here: kernel.org/pub/linux/utils/util-linuxSchaumberger
It happens fast on filesystems that support sparse-files, since you're reading from /dev/zero and writing to a filesystem that supports sparse files.Korry
Doesn't work on FAT32 either, but dd does using /dev/zero.Elsie
Doesn't work on memfs either. # fallocate -l 1G testfile.dat fallocate: testfile.dat: fallocate failed: Operation not supported However, dd worked like a champ (and very fast): # time dd if=/dev/zero of=filename.dat bs=1G count=1 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB) copied, 0.701103 s, 1.5 GB/s real 0m0.704s user 0m0.000s sys 0m0.701sPortsmouth
looks like support for fallocate was added to tmpfs in Linux kernel 3.5: kernelnewbies.org/Linux_3.5Kalgoorlie
noticed that fallocate doesn't work with exFAT as well.Bimetallic
@Scorn This happens because you're not writing directly to the laptop's hard drive, you're filling up your operating system's cache before hitting the disk. Try writing more data than you have RAM, also, try using status=progress to check how your speed varies as you run out of cache and oflag=direct or oflag=nocacheMerchandising
zfs and fallocate do cope with each other. other limitations mentioned by others should have been resolved as well. rationale: a state of 2013 is no longer valid in 2022. (just in case, please re-evaluate for your specific environment.) github.com/openzfs/zfs/pull/10408Joon
I
381

This is a common question -- especially in today's environment of virtual environments. Unfortunately, the answer is not as straight-forward as one might assume.

dd is the obvious first choice, but dd is essentially a copy and that forces you to write every block of data (thus, initializing the file contents)... And that initialization is what takes up so much I/O time. (Want to make it take even longer? Use /dev/random instead of /dev/zero! Then you'll use CPU as well as I/O time!) In the end though, dd is a poor choice (though essentially the default used by the VM "create" GUIs). E.g:

dd if=/dev/zero of=./gentoo_root.img bs=4k iflag=fullblock,count_bytes count=10G

truncate is another choice -- and is likely the fastest... But that is because it creates a "sparse file". Essentially, a sparse file is a section of disk that has a lot of the same data, and the underlying filesystem "cheats" by not really storing all of the data, but just "pretending" that it's all there. Thus, when you use truncate to create a 20 GB drive for your VM, the filesystem doesn't actually allocate 20 GB, but it cheats and says that there are 20 GB of zeros there, even though as little as one track on the disk may actually (really) be in use. E.g.:

 truncate -s 10G gentoo_root.img

fallocate is the final -- and best -- choice for use with VM disk allocation, because it essentially "reserves" (or "allocates" all of the space you're seeking, but it doesn't bother to write anything. So, when you use fallocate to create a 20 GB virtual drive space, you really do get a 20 GB file (not a "sparse file", and you won't have bothered to write anything to it -- which means virtually anything could be in there -- kind of like a brand new disk!) E.g.:

fallocate -l 10G gentoo_root.img
Incommodity answered 2/8, 2012 at 14:23 Comment(6)
+1 truncate is functional on JFS; fallocate, not so much. One point: you can't include a decimal in the number, I needed to specify 1536G, not 1.5T.Enrollment
According to my fallocate man page, this is only supported on btrfs, ext4, ocfs2, and xfs filesystemsPamper
Note swapon unfortunately doesn't work on pre-allocated extents, last I checked. There was some discussion on the XFS mailing list about having an fallocate option to expose the old freespace data instead and not have the extent marked as preallocated, so swapon would work. But I don't think anything was ever done.Miquelon
FYI, trying to read too much data from /dev/random can result in running out of random data, and "When the entropy pool is empty, reads from /dev/random will block until additional environmental noise is gathered" so it could take a very very very long timeItinerary
Thanks, i was reading this brought me here: brianschrader.com/archive/…Krusche
Oddly with WSL I couldn't persuade it to "actually use" space (like allocated it) in a self-expanding vhdx file, unless I used /dev/urandom. Weird.Fleshpots
V
199

Linux & all filesystems

xfs_mkfile 10240m 10Gigfile

Linux & and some filesystems (ext4, xfs, btrfs and ocfs2)

fallocate -l 10G 10Gigfile

OS X, Solaris, SunOS and probably other UNIXes

mkfile 10240m 10Gigfile

HP-UX

prealloc 10Gigfile 10737418240

Explanation

Try mkfile <size> myfile as an alternative of dd. With the -n option the size is noted, but disk blocks aren't allocated until data is written to them. Without the -n option, the space is zero-filled, which means writing to the disk, which means taking time.

mkfile is derived from SunOS and is not available everywhere. Most Linux systems have xfs_mkfile which works exactly the same way, and not just on XFS file systems despite the name. It's included in xfsprogs (for Debian/Ubuntu) or similar named packages.

Most Linux systems also have fallocate, which only works on certain file systems (such as btrfs, ext4, ocfs2, and xfs), but is the fastest, as it allocates all the file space (creates non-holey files) but does not initialize any of it.

Vincennes answered 3/11, 2008 at 3:14 Comment(4)
Where is this mkfile of which you speak, stranger? It's not in the default RHEL install.Supervisory
It's a solaris utility. if you search for gpl mkfile you will find some source code examples.Bandwagon
Works as a charme on OS X: mkfile 1g DELETE_IF_LOW_ON_SSD_SPACE.imgAwfully
xfs_mkfile is included in xfsprogs on Ubuntu and works like a charm on my ext3 fs. :)Migration
S
119
truncate -s 10M output.file

will create a 10 M file instantaneously (M stands for 10241024 bytes, MB stands for 10001000 - same with K, KB, G, GB...)

EDIT: as many have pointed out, this will not physically allocate the file on your device. With this you could actually create an arbitrary large file, regardless of the available space on the device, as it creates a "sparse" file.

For e.g. notice no HDD space is consumed with this command:

### BEFORE
$ df -h | grep lvm
/dev/mapper/lvm--raid0-lvm0
                      7.2T  6.6T  232G  97% /export/lvm-raid0

$ truncate -s 500M 500MB.file

### AFTER
$ df -h | grep lvm
/dev/mapper/lvm--raid0-lvm0
                      7.2T  6.6T  232G  97% /export/lvm-raid0

So, when doing this, you will be deferring physical allocation until the file is accessed. If you're mapping this file to memory, you may not have the expected performance.

But this is still a useful command to know. For e.g. when benchmarking transfers using files, the specified size of the file will still get moved.

$ rsync -aHAxvP --numeric-ids --delete --info=progress2 \
       [email protected]:/export/lvm-raid0/500MB.file \
       /export/raid1/
receiving incremental file list
500MB.file
    524,288,000 100%   41.40MB/s    0:00:12 (xfr#1, to-chk=0/1)

sent 30 bytes  received 524,352,082 bytes  38,840,897.19 bytes/sec
total size is 524,288,000  speedup is 1.00
Sweatband answered 20/8, 2010 at 12:4 Comment(7)
Tried this, but it doesn't affect available disk space. Must because it is a sparse file as described previously.Uprear
This shouldn't be the top answer as it doesn't solve the problem, the fallocate answer below does.Uprear
@GringoSuave but this is still useful for some people that may have a similar-but-slightly-different problem.Narvik
@GringoSuave: It seems to create a large file as requested, why does it not solve the problem? Also there are notes under the fallocate answer that it doesn't even work in most cases.Emaemaciate
The file is not actually allocated, it is "sparse."Uprear
Why suggest making sparse files when he said that will not work?Synonymy
This works on an NTFS volume where fallocate doesn't.Tidings
F
47

Where seek is the size of the file you want in bytes - 1.

dd if=/dev/zero of=filename bs=1 count=1 seek=1048575
Forwhy answered 3/11, 2008 at 5:14 Comment(4)
I like this approach, but the commenter doesn't want a sparse file for some reason. :(Valentinevalentino
dd if=/dev/zero of=1GBfile bs=1000 count=1000000Ossicle
dd if=/dev/zero of=01GBfile bs=1024 count=$((1024 * 1024))Offutt
For sparse files, truncate seems to be much better.Emaemaciate
G
37

Examples where seek is the size of the file you want in bytes

#kilobytes
dd if=/dev/zero of=filename bs=1 count=0 seek=200K

#megabytes
dd if=/dev/zero of=filename bs=1 count=0 seek=200M

#gigabytes
dd if=/dev/zero of=filename bs=1 count=0 seek=200G

#terabytes
dd if=/dev/zero of=filename bs=1 count=0 seek=200T


From the dd manpage:

BLOCKS and BYTES may be followed by the following multiplicative suffixes: c=1, w=2, b=512, kB=1000, K=1024, MB=1000*1000, M=1024*1024, GB =1000*1000*1000, G=1024*1024*1024, and so on for T, P, E, Z, Y.

Gerstein answered 22/2, 2012 at 10:57 Comment(1)
This looks much better than the n-1 way, so it's basically equivalent to truncate.Emaemaciate
R
27

To make a 1 GB file:

dd if=/dev/zero of=filename bs=1G count=1
Rhinoplasty answered 27/9, 2015 at 1:12 Comment(3)
I believe count must be 1. (tested on centos)Upcoming
dd if=/dev/zero of=filename bs=20G count=1 will only create 2GB file! not 20GB.Scrape
@MaulikGangani What FS was that on? Looks like you're hitting file size limit on an old FS. Also, avoid using such large block sizes with dd, I believe it might try to allocate all that memory at once. Was this on a thumb drive? Consider formatting it with UDF if you need to store big files in it.Merchandising
C
18

I don't know a whole lot about Linux, but here's the C Code I wrote to fake huge files on DC Share many years ago.

#include < stdio.h >
#include < stdlib.h >

int main() {
    int i;
    FILE *fp;

    fp=fopen("bigfakefile.txt","w");

    for(i=0;i<(1024*1024);i++) {
        fseek(fp,(1024*1024),SEEK_CUR);
        fprintf(fp,"C");
    }
}
Cowpea answered 25/4, 2012 at 13:54 Comment(1)
there must be better approaches in C. You also need to close the file. Iterating to a million writing 1 char at a time...Southeastwards
H
11

You can use "yes" command also. The syntax is fairly simple:

#yes >> myfile

Press "Ctrl + C" to stop this, else it will eat up all your space available.

To clean this file run:

#>myfile

will clean this file.

Heredes answered 12/12, 2013 at 10:32 Comment(0)
F
7

I don't think you're going to get much faster than dd. The bottleneck is the disk; writing hundreds of GB of data to it is going to take a long time no matter how you do it.

But here's a possibility that might work for your application. If you don't care about the contents of the file, how about creating a "virtual" file whose contents are the dynamic output of a program? Instead of open()ing the file, use popen() to open a pipe to an external program. The external program generates data whenever it's needed. Once the pipe is open, it acts just like a regular file in that the program that opened the pipe can fseek(), rewind(), etc. You'll need to use pclose() instead of close() when you're done with the pipe.

If your application needs the file to be a certain size, it will be up to the external program to keep track of where in the "file" it is and send an eof when the "end" has been reached.

Fortuity answered 3/11, 2008 at 4:18 Comment(0)
S
5

The GPL mkfile is just a (ba)sh script wrapper around dd; BSD's mkfile just memsets a buffer with non-zero and writes it repeatedly. I would not expect the former to out-perform dd. The latter might edge out dd if=/dev/zero slightly since it omits the reads, but anything that does significantly better is probably just creating a sparse file.

Absent a system call that actually allocates space for a file without writing data (and Linux and BSD lack this, probably Solaris as well) you might get a small improvement in performance by using ftrunc(2)/truncate(1) to extend the file to the desired size, mmap the file into memory, then write non-zero data to the first bytes of every disk block (use fgetconf to find the disk block size).

Slobber answered 27/7, 2011 at 3:22 Comment(1)
BSD and Linux have fallocate actually (edit: it's now POSIX and widely available).Alika
S
4

One approach: if you can guarantee unrelated applications won't use the files in a conflicting manner, just create a pool of files of varying sizes in a specific directory, then create links to them when needed.

For example, have a pool of files called:

  • /home/bigfiles/512M-A
  • /home/bigfiles/512M-B
  • /home/bigfiles/1024M-A
  • /home/bigfiles/1024M-B

Then, if you have an application that needs a 1G file called /home/oracle/logfile, execute a "ln /home/bigfiles/1024M-A /home/oracle/logfile".

If it's on a separate filesystem, you will have to use a symbolic link.

The A/B/etc files can be used to ensure there's no conflicting use between unrelated applications.

The link operation is about as fast as you can get.

Supervisory answered 3/11, 2008 at 3:27 Comment(1)
You can have a small pool or a large pool, it's your choice. You were going to need at least one file anyway, since that's what the questioner asked for. If your pool consists of one file, you lose nothing. If you have bucketloads of disk (and you should, given its low price), there's no issue.Supervisory
D
3

This is the fastest I could do (which is not fast) with the following constraints:

  • The goal of the large file is to fill a disk, so can't be compressible.
  • Using ext3 filesystem. (fallocate not available)

This is the gist of it...

// include stdlib.h, stdio.h, and stdint.h
int32_t buf[256]; // Block size.
for (int i = 0; i < 256; ++i)
{
    buf[i] = rand(); // random to be non-compressible.
}
FILE* file = fopen("/file/on/your/system", "wb");
int blocksToWrite = 1024 * 1024; // 1 GB
for (int i = 0; i < blocksToWrite; ++i)
{
   fwrite(buf, sizeof(int32_t), 256, file);
}

In our case this is for an embedded linux system and this works well enough, but would prefer something faster.

FYI the command dd if=/dev/urandom of=outputfile bs=1024 count = XX was so slow as to be unusable.

Declination answered 31/12, 2014 at 0:10 Comment(3)
It's perfectly compressible right down to 1028 bytes since you're just writing the same block over and over.Slugabed
Why is dd unusable? On my server with an Intel 6246 processor, using a bs of 4096 (not 1024), I was able to create a completely random 1G file in 5 seconds. In this example, on my system (sizeof int32_t is 4 bytes) we're creating a 1k block of random data, repeated 1M times. You're comparing apples to oranges; the random file won't compress; yours will because it's repeated. A dd copy of a file created by this C code takes under a second. The fault is not dd's, it's in how you use it and what you want it to do. A small input file could be dumped repeatedly to an output file in similar time.Arabian
time { dd if=/dev/urandom of=tmprandfile bs=1024 count=1; yes < tmprandfile | head -c 1073741824 > tmpfile; } takes 1.5s on my server. The C code takes 0.7s.Arabian
G
2

Shameless plug: OTFFS provides a file system providing arbitrarily large (well, almost. Exabytes is the current limit) files of generated content. It is Linux-only, plain C, and in early alpha.

See https://github.com/s5k6/otffs.

Grimaud answered 30/1, 2018 at 17:7 Comment(0)
A
0

So I wanted to create a large file with repeated ascii strings. "Why?" you may ask. Because I need to use it for some NFS troubleshooting I'm doing. I need the file to be compressible because I'm sharing a tcpdump of a file copy with the vendor of our NAS. I had originally created a 1g file filled with random data from /dev/urandom, but of course since it's random, it means it won't compress at all and I need to send the full 1g of data to the vendor, which is difficult.

So I created a file with all the printable ascii characters, repeated over and over, to a limit of 1g in size. I was worried it would take a long time. It actually went amazingly quickly, IMHO:

cd /dev/shm
date
time yes $(for ((i=32;i<127;i++)) do printf "\\$(printf %03o "$i")"; done) | head -c 1073741824 > ascii1g_file.txt
date

Wed Apr 20 12:30:13 CDT 2022

real    0m0.773s
user    0m0.060s
sys     0m1.195s
Wed Apr 20 12:30:14 CDT 2022

Copying it from an nfs partition to /dev/shm took just as long as with the random file (which one would expect, I know, but I wanted to be sure):

cp ascii1gfile.txt /home/greygnome/
uptime; free -m; sync; echo 1 > /proc/sys/vm/drop_caches; free -m; date; dd if=/home/greygnome/ascii1gfile.txt of=/dev/shm/outfile bs=16384 2>&1; date; rm -f /dev/shm/outfile 

But while doing that I ran a simultaneous tcpdump:

tcpdump -i em1 -w /dev/shm/dump.pcap

I was able to compress the pcap file down to 12M in size! Awesomesauce!

Edit: Before you ding me because the OP said, "I don't care about the contents," know that I posted this answer because it's one of the first replies to "how to create a large file linux" in a Google search. And sometimes, disregarding the contents of a file can have unforeseen side effects. Edit 2: And fallocate seems to be unavailable on a number of filesystems, and creating a 1GB compressible file in 1.2s seems pretty decent to me (aka, "quickly").

Arabian answered 20/4, 2022 at 18:22 Comment(0)
M
-2

You could use https://github.com/flew-software/trash-dump you can create file that is any size and with random data

heres a command you can run after installing trash-dump (creates a 1GB file)

$ trash-dump --filename="huge" --seed=1232 --noBytes=1000000000

BTW I created it

Montevideo answered 15/1, 2021 at 9:30 Comment(1)
The question is about creating the file "quickly". Creating a file with a generated contents will unlikely be quickly.Laboratory

© 2022 - 2024 — McMap. All rights reserved.