.NET C# - Random access in text files - no easy way?
Asked Answered
T

10

21

I've got a text file that contains several 'records' inside of it. Each record contains a name and a collection of numbers as data.

I'm trying to build a class that will read through the file, present only the names of all the records, and then allow the user to select which record data he/she wants.

The first time I go through the file, I only read header names, but I can keep track of the 'position' in the file where the header is. I need random access to the text file to seek to the beginning of each record after a user asks for it.

I have to do it this way because the file is too large to be read in completely in memory (1GB+) with the other memory demands of the application.

I've tried using the .NET StreamReader class to accomplish this (which provides very easy to use 'ReadLine' functionality, but there is no way to capture the true position of the file (the position in the BaseStream property is skewed due to the buffer the class uses).

Is there no easy way to do this in .NET?

Tim answered 5/11, 2008 at 16:10 Comment(0)
B
13

There are some good answers provided, but I couldn't find some source code that would work in my very simplistic case. Here it is, with the hope that it'll save someone else the hour that I spent searching around.

The "very simplistic case" that I refer to is: the text encoding is fixed-width, and the line ending characters are the same throughout the file. This code works well in my case (where I'm parsing a log file, and I sometime have to seek ahead in the file, and then come back. I implemented just enough to do what I needed to do (ex: only one constructor, and only override ReadLine()), so most likely you'll need to add code... but I think it's a reasonable starting point.

public class PositionableStreamReader : StreamReader
{
    public PositionableStreamReader(string path)
        :base(path)
        {}

    private int myLineEndingCharacterLength = Environment.NewLine.Length;
    public int LineEndingCharacterLength
    {
        get { return myLineEndingCharacterLength; }
        set { myLineEndingCharacterLength = value; }
    }

    public override string ReadLine()
    {
        string line = base.ReadLine();
        if (null != line)
            myStreamPosition += line.Length + myLineEndingCharacterLength;
        return line;
    }

    private long myStreamPosition = 0;
    public long Position
    {
        get { return myStreamPosition; }
        set
        {
            myStreamPosition = value;
            this.BaseStream.Position = value;
            this.DiscardBufferedData();
        }
    }
}

Here's an example of how to use the PositionableStreamReader:

PositionableStreamReader sr = new PositionableStreamReader("somepath.txt");

// read some lines
while (something)
    sr.ReadLine();

// bookmark the current position
long streamPosition = sr.Position;

// read some lines
while (something)
    sr.ReadLine();

// go back to the bookmarked position
sr.Position = streamPosition;

// read some lines
while (something)
    sr.ReadLine();
Byer answered 28/5, 2009 at 15:34 Comment(0)
S
8

FileStream has the seek() method.

Soonsooner answered 5/11, 2008 at 16:15 Comment(6)
That's not useful when we don't know where to seek to.Chlorenchyma
Maybe we're using different definitions of random access. I (as well as Jason apparently) take it to mean a file of records with a specific size in bytes, thus the start of a record is (recnum - 1) * recsizeEndora
More importantly, the OP suggests that they can record the stream indices at which individual records begin, so knowing where to seek to is a solved problem in this instance.Kuth
@Jon: "The first time I go through the file, I only read header names, but I can keep track of the 'position' in the file where the header is. I need random access to the text file to seek to the beginning of each record after a user asks for it." Sounds like we know where to seek to.Soonsooner
"the position in the BaseStream property is skewed due to the buffer the class uses". Sounds like we don't know where to seek to.Cramp
This is very old but you could use file stream to manually loop through lines, or even pass the file stream to a stream reader and use the reader to go through lines but use the file stream to get the offset (stream.Position). Then later you can use the file stream to seek and then use the reader to read those lines. The trick is to use both. You may need to create a new reader after seeking, not sure of the behavior.Dalessandro
C
6

You can use a System.IO.FileStream instead of StreamReader. If you know exactly, what file contains ( the encoding for example ), you can do all operation like with StreamReader.

Cassell answered 5/11, 2008 at 16:15 Comment(0)
G
5

If you're flexible with how the data file is written and don't mind it being a little less text editor-friendly, you could write your records with a BinaryWriter:

using (BinaryWriter writer = 
    new BinaryWriter(File.Open("data.txt", FileMode.Create)))
{
    writer.Write("one,1,1,1,1");
    writer.Write("two,2,2,2,2");
    writer.Write("three,3,3,3,3");
}

Then, initially reading each record is simple because you can use the BinaryReader's ReadString method:

using (BinaryReader reader = new BinaryReader(File.OpenRead("data.txt")))
{
    string line = null;
    long position = reader.BaseStream.Position;
    while (reader.PeekChar() > -1)
    {
        line = reader.ReadString();

        //parse the name out of the line here...

        Console.WriteLine("{0},{1}", position, line);
        position = reader.BaseStream.Position;
    }
}

The BinaryReader isn't buffered so you get the proper position to store and use later. The only hassle is parsing the name out of the line, which you may have to do with a StreamReader anyway.

Geotropism answered 5/11, 2008 at 22:58 Comment(1)
If you have complete control of the file you could write an index at the start of the file as well.Dalessandro
C
2

Is the encoding a fixed-size one (e.g. ASCII or UCS-2)? If so, you could keep track of the character index (based on the number of characters you've seen) and find the binary index based on that.

Otherwise, no - you'd basically need to write your own StreamReader implementation which lets you peek at the binary index. It's a shame that StreamReader doesn't implement this, I agree.

Chlorenchyma answered 5/11, 2008 at 16:16 Comment(0)
R
2

Starting with .NET 6, the methods in the System.IO.RandomAccess class is the official and supported way to randomly read and write to a file. These APIs work with Microsoft.Win32.SafeHandles.SafeFileHandles which can be obtained with the new System.IO.File.OpenHandle function, also introduced in .NET 6.

Roemer answered 7/7, 2022 at 12:58 Comment(0)
S
1

I think that the FileHelpers library runtime records feature might help u. http://filehelpers.sourceforge.net/runtime_classes.html

Sauncho answered 5/11, 2008 at 17:4 Comment(0)
T
1

A couple of items that may be of interest.

1) If the lines are a fixed set of characters in length, that is not of necessity useful information if the character set has variable sizes (like UTF-8). So check your character set.

2) You can ascertain the exact position of the file cursor from StreamReader by using the BaseStream.Position value IF you Flush() the buffers first (which will force the current position to be where the next read will begin - one byte after the last byte read).

3) If you know in advance that the exact length of each record will be the same number of characters, and the character set uses fixed-width characters (so each line is the same number of bytes long) the you can use FileStream with a fixed buffer size to match the size of a line and the position of the cursor at the end of each read will be, perforce, the beginning of the next line.

4) Is there any particular reason why, if the lines are the same length (assuming in bytes here) that you don't simply use line numbers and calculate the byte-offset in the file based on line size x line number?

Talc answered 29/3, 2010 at 13:57 Comment(0)
M
0

Are you sure that the file is "too large"? Have you tried it that way and has it caused a problem?

If you allocate a large amount of memory, and you aren't using it right now, Windows will just swap it out to disk. Hence, by accessing it from "memory", you will have accomplished what you want -- random access to the file on disk.

Monohydric answered 5/11, 2008 at 16:15 Comment(1)
If the file's more than 1GB in size, and you're running on 32-bit, you'll probably run out of address space, even if Windows swaps its little heart out.Indictable
V
0

This exact question was asked in 2006 here: http://www.devnewsgroups.net/group/microsoft.public.dotnet.framework/topic40275.aspx

Summary:

"The problem is that the StreamReader buffers data, so the value returned in BaseStream.Position property is always ahead of the actual processed line."

However, "if the file is encoded in a text encoding which is fixed-width, you could keep track of how much text has been read and multiply that by the width"

and if not, you can just use the FileStream and read a char at a time and then the BaseStream.Position property should be correct

Vicentevicepresident answered 5/11, 2008 at 17:44 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.