How to insert characters to a file using C#
Asked Answered
F

10

13

I have a huge file, where I have to insert certain characters at a specific location. What is the easiest way to do that in C# without rewriting the whole file again.

Feces answered 19/9, 2008 at 0:56 Comment(1)
good question. I needed to figure out how to just what you're asking, and your question was the first thing returned by Google.Managerial
P
10

Filesystems do not support "inserting" data in the middle of a file. If you really have a need for a file that can be written to in a sorted kind of way, I suggest you look into using an embedded database.

You might want to take a look at SQLite or BerkeleyDB.

Then again, you might be working with a text file or a legacy binary file. In that case your only option is to rewrite the file, at least from the insertion point up to the end.

I would look at the FileStream class to do random I/O in C#.

Pershing answered 19/9, 2008 at 1:10 Comment(2)
That is not exactly correct. You can use random access to arbitrarily read and write a file (in bytes) from any point. However, its up to you to shift the file offsets when inserting something. In other words, its simpler to just regenerate the file.Franz
I disagree. Of course, you can seek to any point in a file using random access. But if you write at that point, you'll overwrite what was previously at that location. So, if you had "abced" in your file, you seek to the 'c' and write "123", you end up with "ab123", not "ab123cde".Pershing
A
3

You will probably need to rewrite the file from the point you insert the changes to the end. You might be best always writing to the end of the file and use tools such as sort and grep to get the data out in the desired order. I am assuming you are talking about a text file here, not a binary file.

Army answered 19/9, 2008 at 0:58 Comment(1)
i was actually looking for some random access techniques using C#, even if I have to use unsafe code. Thanks for the suggestions anyway.Feces
C
2

There is no way to insert characters in to a file without rewriting them. With C# it can be done with any Stream classes. If the files are huge, I would recommend you to use GNU Core Utils inside C# code. They are the fastest. I used to handle very large text files with the core utils ( of sizes 4GB, 8GB or more etc ). Commands like head, tail, split, csplit, cat, shuf, shred, uniq really help a lot in text manipulation.

For example if you need to put some chars in a 2GB file, you can use split -b BYTECOUNT, put the ouptut in to a file, append the new text to it, and get the rest of the content and add to it. This should supposedly be faster than any other way.

Hope it works. Give it a try.

Conchiolin answered 19/9, 2008 at 2:6 Comment(0)
F
1

You can use random access to write to specific locations of a file, but you won't be able to do it in text format, you'll have to work with bytes directly.

Franz answered 19/9, 2008 at 1:1 Comment(2)
Can you pls point to some resources on the web? I was thinking it was not possible for random access file handling in C#.Feces
I don't think that he wants to overwrite the old bytes.Minorca
M
1

If you know the specific location to which you want to write the new data, use the BinaryWriter class:

using (BinaryWriter bw = new BinaryWriter (File.Open (strFile, FileMode.Open)))
{
    string strNewData = "this is some new data";
    byte[] byteNewData = new byte[strNewData.Length];

    // copy contents of string to byte array
    for (var i = 0; i < strNewData.Length; i++)
    {
        byteNewData[i] = Convert.ToByte (strNewData[i]);
    }

    // write new data to file
    bw.Seek (15, SeekOrigin.Begin);  // seek to position 15
    bw.Write (byteNewData, 0, byteNewData.Length);
}
Managerial answered 22/1, 2009 at 13:46 Comment(1)
Mind that this code will OVERWRITE(!) the data at position 15.Cyrano
I
1

You may take a look at this project: Win Data Inspector

Basically, the code is the following:

// this.Stream is the stream in which you insert data

{

long position = this.Stream.Position;

long length = this.Stream.Length;

MemoryStream ms = new MemoryStream();

this.Stream.Position = 0;

DIUtils.CopyStream(this.Stream, ms, position, progressCallback);

ms.Write(data, 0, data.Length);

this.Stream.Position = position;

DIUtils.CopyStream(this.Stream, ms, this.Stream.Length - position, progressCallback);

this.Stream = ms;

}

#region Delegates

public delegate void ProgressCallback(long position, long total);

#endregion

DIUtils.cs

public static void CopyStream(Stream input, Stream output, long length, DataInspector.ProgressCallback callback)
{
    long totalsize = input.Length;
    long byteswritten = 0;
    const int size = 32768;
    byte[] buffer = new byte[size];
    int read;
    int readlen = length < size ? (int)length : size;
    while (length > 0 && (read = input.Read(buffer, 0, readlen)) > 0)
    {
        output.Write(buffer, 0, read);
        byteswritten += read;
        length -= read;
        readlen = length < size ? (int)length : size;
        if (callback != null)
            callback(byteswritten, totalsize);
    }
}
Inceptive answered 20/4, 2016 at 15:17 Comment(2)
Please don't copy/paste the same answer to multiple questions. Also, be careful advertising your own work here - we have a rule against overt self-promotion.Sickening
In this context, perhaps you want to read How to offer personal open-source libraries?Mutter
R
0

Depending on the scope of your project, you may want to decide to insert each line of text with your file in a table datastructure. Sort of like a database table, that way you can insert to a specific location at any given moment, and not have to read-in, modify, and output the entire text file each time. This is given the fact that your data is "huge" as you put it. You would still recreate the file, but at least you create a scalable solution in this manner.

Rodent answered 19/9, 2008 at 1:5 Comment(0)
W
0

It may be "possible" depending on how the filesystem stores files to quickly insert (ie, add additional) bytes in the middle. If it is remotely possible it may only be feasible to do so a full block at a time, and only by either doing low level modification of the filesystem itself or by using a filesystem specific interface.

Filesystems are not generally designed for this operation. If you need to quickly do inserts you really need a more general database.

Depending on your application a middle ground would be to bunch your inserts together, so you only do one rewrite of the file rather than twenty.

Weekender answered 19/9, 2008 at 1:34 Comment(0)
A
0

You will always have to rewrite the remaining bytes from the insertion point. If this point is at 0, then you will rewrite the whole file. If it is 10 bytes before the last byte, then you will rewrite the last 10 bytes.

In any case there is no function to directly support "insert to file". But the following code can do it accurately.

var sw = new Stopwatch();
var ab = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ";

// create
var fs = new FileStream(@"d:\test.txt", FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.ReadWrite, 262144, FileOptions.None);
sw.Restart();
fs.Seek(0, SeekOrigin.Begin);
for (var i = 0; i < 40000000; i++) fs.Write(ASCIIEncoding.ASCII.GetBytes(ab), 0, ab.Length);
sw.Stop();
Console.WriteLine("{0} ms", sw.Elapsed.TotalMilliseconds);
fs.Dispose();

// insert
fs = new FileStream(@"d:\test.txt", FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.ReadWrite, 262144, FileOptions.None);
sw.Restart();
byte[] b = new byte[262144];
long target = 10, offset = fs.Length - b.Length;
while (offset != 0)
{
    if (offset < 0)
    {
        offset = b.Length - target;
        b = new byte[offset];
    }
    fs.Position = offset; fs.Read(b, 0, b.Length);
    fs.Position = offset + target; fs.Write(b, 0, b.Length);
    offset -= b.Length;
}
fs.Position = target; fs.Write(ASCIIEncoding.ASCII.GetBytes(ab), 0, ab.Length);
sw.Stop();
Console.WriteLine("{0} ms", sw.Elapsed.TotalMilliseconds);

To gain better performance for file IO, play with "magic two powered numbers" like in the code above. The creation of the file uses a buffer of 262144 bytes (256KB) that does not help at all. The same buffer for the insertion does the "performance job" as you can see by the StopWatch results if you run the code. A draft test on my PC gave the following results:

13628.8 ms for creation and 3597.0971 ms for insertion.

Note that the target byte for insertion is 10, meaning that almost the whole file was rewritten.

Arvonio answered 8/12, 2011 at 20:30 Comment(0)
B
0

Why don't you put a pointer to the end of the file (literally, four bytes above the current size of the file) and then, on the end of file write the length of inserted data, and finally the data you want to insert itself. For example, if you have a string in the middle of the file, and you want to insert few characters in the middle of the string, you can write a pointer to the end of file over some four characters in the string, and then write that four characters to the end together with the characters you firstly wanted to insert. It's all about ordering data. Of course, you can do this only if you are writing the whole file by yourself, I mean you are not using other codecs.

Bearwood answered 16/5, 2017 at 17:3 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.