Is it possible to prepend data to an file without rewriting?
Asked Answered
V

5

11

I deal with very large binary files ( several GB to multiple TB per file ). These files exist in a legacy format and upgrading requires writing a header to the FRONT of the file. I can create a new file and rewrite the data but sometimes this can take a long time. I'm wondering if there is any faster way to accomplish this upgrade. The platform is limited to Linux and I'm willing to use low-level functions (ASM, C, C++) / file system tricks to make this happen. The primimary library is Java and JNI is completely acceptable.

Veridical answered 30/1, 2011 at 17:18 Comment(1)
All of the answers confirmed what I already knew. Just hoping there was some magic I wasn't aware of. Thanks for the extra brain power.Veridical
P
9

There's no general way to do this natively.

Maybe some file-systems provide some functions to do this (cannot give any hint about this), but your code will then be file-system dependent.


A solution could be that of simulating a file-system: you could store your data on a set of several files, and then provide some functions to open, read and write data as if it was a single file.

Phaedra answered 30/1, 2011 at 17:29 Comment(1)
+1 Using the proposed solution, the header could exist as a file beside the legacy file, with a different suffix or extension. If a file to be opened is detected as a legacy file, the file system abstraction would automatically access the header file.Hyonhyoscine
Y
4

Sounds crazy, but you can store the file data in reverse order, if it is possible to change function that reads data from file. In that case you can append data (in reverse order) at the end of the file. It is just a general idea, so I can't recommend anything particular. The code for reversing of current file can looks like this:

 std::string records;
 ofstream out;
std::copy( records.rbegin(), records.rend(), std::ostream_iterator<string>(out));
Yodel answered 30/1, 2011 at 18:53 Comment(1)
Nice idea but that would require reversing initial file write (and also appends). So the idea is only possible with existing files, and even then complete rewrite (in reversed order) is required.Kristlekristo
E
2

It depends on what you mean by "filesystem tricks". If you're willing to get down-and-dirty with the filesystem's on-disk format, and the size of the header you want to add is a multiple of the filesystem block size, then you could write a program to directly manipulate the filesystem's on-disk structures (with the filesystem unmounted).

This enterprise is about as hairy as it sounds though - it'd likely only be worth it if you had hundreds of these giant files to process.

Ere answered 31/1, 2011 at 0:39 Comment(0)
M
0

I would just use the standard Linux tools to do it.
Writting another application to do it seems like it would be sub-optimal.

cat headerFile oldFile > tmpFile && mv tmpFile oldFile
Mccarty answered 30/1, 2011 at 17:32 Comment(2)
I think OP is looking for something that doesn't require to rewrite the whole file (just like you wouldn't need to rewrite it in order to append some data). If headerFile is 1 byte, and oldFile is 10GB your command will take a lot of time.Phaedra
@peoro: I realize this is what the OP wants. But to make FS generally efficient they don't work like that. As a design trade off they are very efficient for common functions but as a result are inefficient for less common operations.Mccarty
M
0

I know this is an old question, but I hope this helps someone in the future. Similar to simulating a filesystem, you could simply use a named pipe:

mkfifo /path/to/file_to_be_read
{ echo "HEADER"; cat /path/to/source_file; } > /path/to/file_to_be_read

Then, you run your legacy program against /path/to/file_to_be_read, and the input would be:

HEADER
contents of /path/to/source_file
...

This will work as long as the program reads the file sequentially and doesn't do mmap() or rewind() past the buffer.

Massachusetts answered 15/8, 2017 at 20:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.