grep: write error: Broken pipe with subprocess
Asked Answered
R

4

7

I get couple of grep:write errors when I run this code. What am I missing?

This is only part of it:

     while d <= datetime.datetime(year, month, daysInMonth[month]):
        day = d.strftime("%Y%m%d")
        print day
        results = [day]
        first=subprocess.Popen("grep -Eliw 'Algeria|Bahrain' "+ monthDir +"/"+day+"*.txt | grep -Eliw 'Protest|protesters' "+ monthDir +"/"+day+"*.txt", shell=True, stdout=subprocess.PIPE, )
        output1=first.communicate()[0]
        d += delta
        day = d.strftime("%Y%m%d")
        second=subprocess.Popen("grep -Eliw 'Algeria|Bahrain' "+ monthDir +"/"+day+"*.txt | grep -Eliw 'Protest|protesters' "+ monthDir +"/"+day+"*.txt", shell=True,  stdout=subprocess.PIPE, )
        output2=second.communicate()[0]
        articleList = (output1.split('\n'))
        articleList2 = (output2.split('\n'))
        results.append( len(articleList)+len(articleList2))
        w.writerow(tuple(results))
        d += delta
Routinize answered 31/3, 2013 at 3:21 Comment(3)
I can't figure out what you're trying to do. When you give filename arguments to grep it does't read from stdin, so why are you piping the output of one grep process to the second one?Dryer
I am filtering the files that contain the keyword Algeria OR Bahrain and protests OR protests. It's actually a lil more complicated I just simplified it for this question. I want to get all the files that contain one of the keywords in list1 and one of the keywords in list2Routinize
Any particular reason for not using Python's regular expression library, re? It would save you calling out to grep.Highly
J
12

When you do

A | B

in a shell, process A's output is piped into process B as input. If process B shuts down before reading all of process A's output (e.g. because it found what it was looking for, which is the function of the -l option), then process A may complain that its output pipe was prematurely closed.

These errors are basically harmless, and you can work around them by redirecting stderr in the subprocesses to /dev/null.

A better approach, though, may simply be to use Python's powerful regex capabilities to read the files:

def fileContains(fn, pat):
    with open(file) as f:
        for line in f:
            if re.search(pat, line):
                return True
    return False

first = []
for file in glob.glob(monthDir +"/"+day+"*.txt"):
    if fileContains(file, 'Algeria|Bahrain') and fileContains(file, 'Protest|protesters'):
        file.append(first)
Jug answered 31/3, 2013 at 9:31 Comment(0)
D
2

To find the files matching two patterns, the command structure should be:

grep -l pattern1 $(grep -l pattern2 files)

$(command) substitutes the output of the command into the command line.

So your script should be:

first=subprocess.Popen("grep -Eliw 'Algeria|Bahrain' $("+ grep -Eliw 'Protest|protesters' "+ monthDir +"/"+day+"*.txt)", shell=True, stdout=subprocess.PIPE, )

and similarly for second

Dryer answered 31/3, 2013 at 3:50 Comment(2)
It didn't work for me. Can you explain though why I get broken pipe in some cases and some cases I don't. What does the error mean?Routinize
Broken pipe means that the command tried to write to the pipe, but the read end was closed. I don't think it should happen when you use first.communicate(), since that reads until EOF.Dryer
H
1

If you are just looking for whole words, you could use the count() member function;

# assuming names is a list of filenames
for fn in names:
    with open(fn) as infile:
        text = infile.read().lower()
    # remove puntuation
    text = text.replace(',', '')
    text = text.replace('.', '')
    words = text.split()
    print "Algeria:", words.count('algeria')
    print "Bahrain:", words.count('bahrain')
    print "protesters:", words.count('protesters')
    print "protest:", words.count('protest')

If you want more powerful filtering, use re.

Highly answered 31/3, 2013 at 9:22 Comment(0)
S
0

Add stderr args in the Popen function based on the python version the stderr value will change. This will support if the python version is less than 3

first=subprocess.Popen("grep -Eliw 'Algeria|Bahrain' "+ monthDir +"/"+day+".txt | grep -Eliw 'Protest|protesters' "+ monthDir +"/"+day+".txt", shell=True, stdout=subprocess.PIPE, stderr = subprocess.STDOUT)

Sear answered 23/12, 2022 at 7:27 Comment(1)
This does not provide an answer to the question. Once you have sufficient reputation you will be able to comment on any post; instead, provide answers that don't require clarification from the asker. - From ReviewOilla

© 2022 - 2024 — McMap. All rights reserved.