Reading Large Files Line by Line
Original Comment (1/2021)
I was able to read a 4GB log file in about 50 seconds with the following. You may be able to make it faster by loading it as a C# assembly dynamically using PowerShell.
[System.IO.StreamReader]$sr = [System.IO.File]::Open($file, [System.IO.FileMode]::Open)
while (-not $sr.EndOfStream){
$line = $sr.ReadLine()
}
$sr.Close()
Addendum (3/2022)
Processing the large file using C# embedded in PowerShell is even faster and has less "gotchas".
$code = @"
using System;
using System.IO;
namespace ProcessLargeFile
{
public class Program
{
static void ProcessLine(string line)
{
return;
}
public static void ProcessLogFile(string path) {
var start_time = DateTime.Now;
StreamReader sr = new StreamReader(File.Open(path, FileMode.Open));
try {
while (!sr.EndOfStream){
string line = sr.ReadLine();
ProcessLine(line);
}
} finally {
sr.Close();
}
var end_time = DateTime.Now;
var run_time = end_time - start_time;
string msg = "Completed in " + run_time.Minutes + ":" + run_time.Seconds + "." + run_time.Milliseconds;
Console.WriteLine(msg);
}
static void Main(string[] args)
{
ProcessLogFile("c:\\users\\tasaif\\fake.log");
Console.ReadLine();
}
}
}
"@
Add-Type -TypeDefinition $code -Language CSharp
PS C:\Users\tasaif> [ProcessLargeFile.Program]::ProcessLogFile("c:\\users\\tasaif\\fake.log")
Completed in 0:17.109
Get-Content
loads the entire file into memory at once, which will fail or freeze on large files. – Particiaparticipantprocess
block, and spits out another object per line into the pipeline, then that function is the problem. Any problems with loading the full content into memory are not the fault ofGet-Content
. – Whensoeverforeach($line in Get-Content .\file.txt)
It will load the entire file into memory before it begins iterating. If you don't believe me, go get a 1GB log file and try it. – ParticiaparticipantGet-Content .\file.txt | ForEach-Object -Process {}
is pipeline aware, and will not load the entire file into memory. By default Get-Content will pass one line at a time through the pipeline. – Whensoever