I've recently had to parse several log files which were around 6 gigabytes each. The buffering was a problem since Perl would happily attempt to read those 6 gigabytes into memory when I would assign the STDIN to an array... However, I simply didn't have the available system resources to do that. I came up with the following workaround that simply reads the file line by line and, thus, avoids the massive memory blackhole buffering vortex that would otherwise commandeer all my system resources.
note: All this script does is split that 6 gigabyte file into several smaller ones(of which the size is dictated by the number of lines to be contained in each output file). The interesting bit is the while loop and the assignment of a single line from the log file to the variable. The loop will iterate through the entire file reading a single line, doing something with it, and then repeating. Result, no massive buffering... I kept the entire script intact just to show a working example...
#!/usr/bin/perl -w
BEGIN{$ENV{'POSIXLY_CORRECT'} = 1;}
use v5.14;
use Getopt::Long qw(:config no_ignore_case);
my $input = '';
my $output = '';
my $lines = 0;
GetOptions('i=s' => \$input, 'o=s' => \$output, 'l=i' => \$lines);
open FI, '<', $input;
my $count = 0;
my $count_file = 1;
while($count < $lines){
my $line = <FI>; #assign a single line of input to a variable
last unless defined($line);
open FO, '>>', "$output\_$count_file\.log";
print FO $line;
$count++;
if($count == $lines){
$count=0;
$count_file++;
}
}
print " done\n";
Script is invoked on the command line like:
(name of script) -i (input file) -o (output file) -l (size of output file(i.e. number of lines)
Even if its not exactly what you are looking for, I hope it will give you some ideas. :)
cat fff | perl -e 'system("./1.pl")'
prints the content? for me onlycat fff | perl 1.pl
does. – Tampon