Perl, disable buffering input

Asked 10/9, 2012 at 12:15 Answered 12/2, 2022 at 12:7

There is a file:

:~$ cat fff
qwerty
asdf
qwerty
zxcvb

There is a script:

:~$ cat 1.pl
#!/usr/bin/perl
print <STDIN>

The command works as expected:

:~$ cat fff | perl -e 'system("./1.pl")'
qwerty
asdf
qwerty
zxcvb

But this command will not work as expected: the first <STDIN> reads all the data, not a single line. How to disable buffering for <STDIN>?

:~$ cat fff | perl -e '$_ = <STDIN>; system("./1.pl")'
:~$

Parks answered 10/9, 2012 at 12:15 Comment(1)

are you sure cat fff | perl -e 'system("./1.pl")' prints the content? for me only cat fff | perl 1.pl does. – Tampon 10/9, 2012 at 13:11

There are two Perl processes here - the first that assigns $_ = <STDIN> and calls system, and the second that does print <STDIN>

Although only the first line of the stream is read into $_ by the first process, behind the scenes Perl has filled its buffer with data and left the stream empty

What is the purpose of this? The only way that comes to mind to do what you ask is to read all of the file into an array in the first process, and then remove the first line and send the rest in a pipe to the second script

All of this seems unnecessary, and I am sure there is a better method if you will describe the underlying problem

Update

Since you say you are aware of the buffering problem, the way to do this is to use sysread, which will read from the pipe at a lower level and avoid the buffering

Something like this will work

cat fff | perl -e 'while (sysread(STDIN, $c, 1)) {$_ .= $c; last if $c eq "\n"} system("./1.pl")'

But I don't like recommending it as what you are doing seems very wrong and I wish you would explain your real goal

Cockneyism answered 10/9, 2012 at 12:29 Comment(5)

@choroba: Yes, but not from a pipe – Cockneyism 10/9, 2012 at 12:48

$ cat fff | perl -ne'print unless $. == 1' | ./1.pl would be an inelegant solution to remove the first line. – Intrigue 10/9, 2012 at 12:55

Just to illustrate. Compare with cat fff | perl -e 'sysread(STDIN,$_,7,); system("./first.pl")' which explicitly just reads the first line bypassing Perl's internal buffering. – Donofrio 10/9, 2012 at 13:12

I know that the example does not work because of internal buffering perl. Need a way to disable buffering for operator <STDIN>. – Parks 10/9, 2012 at 13:36

@drlexa: OK I have added to my answer something that will work, but I'm not at all happy suggesting it – Cockneyism 10/9, 2012 at 14:15

I've recently had to parse several log files which were around 6 gigabytes each. The buffering was a problem since Perl would happily attempt to read those 6 gigabytes into memory when I would assign the STDIN to an array... However, I simply didn't have the available system resources to do that. I came up with the following workaround that simply reads the file line by line and, thus, avoids the massive memory blackhole buffering vortex that would otherwise commandeer all my system resources.

note: All this script does is split that 6 gigabyte file into several smaller ones(of which the size is dictated by the number of lines to be contained in each output file). The interesting bit is the while loop and the assignment of a single line from the log file to the variable. The loop will iterate through the entire file reading a single line, doing something with it, and then repeating. Result, no massive buffering... I kept the entire script intact just to show a working example...

#!/usr/bin/perl -w
BEGIN{$ENV{'POSIXLY_CORRECT'} = 1;}
use v5.14;
use Getopt::Long qw(:config no_ignore_case);

my $input = '';
my $output = '';
my $lines = 0;
GetOptions('i=s' => \$input, 'o=s' => \$output, 'l=i' => \$lines);

open FI, '<', $input;

my $count = 0;
my $count_file = 1;
while($count < $lines){
    my $line = <FI>; #assign a single line of input to a variable
    last unless defined($line);
    open FO, '>>', "$output\_$count_file\.log";
    print FO $line;
    $count++;
    if($count == $lines){
        $count=0;
        $count_file++;
    }
}
print " done\n";

Script is invoked on the command line like:

(name of script) -i (input file) -o (output file) -l (size of output file(i.e. number of lines)

Even if its not exactly what you are looking for, I hope it will give you some ideas. :)

Mantra answered 24/1, 2013 at 10:28 Comment(0)

Use perl -ne

or while(<STDIN>){do something with $_}

You probably don't mean "buffering" (which is disabled by doing $|++) - you are incorrectly slurping all of STDIN into $_ because you forgot to wrap it in a loop.

Lacy answered 12/2, 2022 at 12:7 Comment(0)

Recommended topics

Hot tags