two processes write to one file, prevent mixing the output
Asked Answered
A

1

6

I want to get output from two processes and merge them into one file, like:

proc1 >> output &
proc2 >> output &

The problem is that output may be mixed up in the final file. For example if first process writes:

hellow

and the second process writes:

bye

the result may be something like:

hebylloe

but I expect them to be in seperate lines like (order is not important):

bye

hello

So I used flock to synchronize writing to the file with the following script:

exec 200>>output
while read line;
  flock -w 2 200
  do echo $line>>output
  flock -u 200
done

And run the processes like:

proc1 | script &
proc2 | script &

Now the problem is that the performance is decreased significantly. without synchronization each process could write with the speed of 4MB/sec but using the synchronization script the write speed is 1MB/sec.

Can anyone help me how to merge the output from two processes and prevent mixing outputs up?

edit: I realized that there is a relation between line length and std buffer size, if size of each line is less than std buffer size, then every thing works well, nothing is mixed (at least in my tests). so I ran each script with bufsize command:

bufsize -o10KB proc1 | script &
bufsize -o10KB proc2 | script &

Now I want to make sure that this solution is bulletproof. I can not find any relation between buffer size and what happens now!!!

Aerify answered 21/8, 2016 at 6:59 Comment(6)
If you only have two processes, why not write two output files and then merge them afterwards? If you need to scale that up, look into using an appender like log4j.Gurdwara
It is better (not solving your problem) to use echo "$line" >> output (with quotes).Recondite
What are you writing? For plain logfiles the hero who will read so much data will only get confused when 2 procs write in the same file. Or are you writing something that will go to a database some day? Start now.Recondite
for some reason I have to write it with bash script. I know that I can handle the situation in C++ easily but I can not use anything but bash script...Aerify
Whar are you gioing to do with the output of 4 Mb/sec ?Recondite
The process is creating output with a very high rate, and I dont want to loose it!!!Aerify
P
2

Now I want to make sure that this solution is bulletproof. I can not find any relation between buffer size and what happens now!!!

For a fully buffered output stream, the buffer size determines the amount of data written with a single write(2) call. For a line buffered output stream, a line is written with a single write(2) call as long as it doesn't exceed the buffer size.

If the file was open(2)ed with O_APPEND, the file offset is first set to the end of the file before writing. The adjustment of the file offset and the write operation are performed as an atomic step.

See also these answers:

Prague answered 23/8, 2016 at 11:57 Comment(7)
Thank you Armali but how can I make sure if my Linux shell redirect (>>) implementation is based on write(2) not write(3), cause write(3) does not guarantee such thing.!!Aerify
@ayyoob imani: Do you mean write(3) from the POSIX Manual? On Linux of course the Linux implementation write(2) is in effect.Prague
Besides that, also POSIX says If the O_APPEND flag of the file status flags is set, the file offset shall be set to the end of the file prior to each write and no intervening file modification operation shall occur between changing the file offset and the write operation and Any successful read() from each byte position in the file that was modified by that write shall return the data specified by the write() for that position until such byte positions are again modified. These shall requirements are to be met by the operating system kernel.Prague
May be helpful -- I've encountered issues with bash scripts that use append and multiple processes. So I did a little testing: This seems unreliable: ( proc1 & proc2 & ) >> outputfile While this seems reliable: ( proc1 & proc2 & ) | cat >> outputfile Presumably stdio has optimizations for file i/o that have contention issues.Acropolis
@Acropolis - You don't have by chance a reproducible example of such issues at hand, do you?Prague
@Prague Well, with ">>" it works. With ">" it behaves differently on different fs types. Try this: echo 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | tr ' ' '\n' | while read i; do yes abcdefghijklmnopqrstuvwxyz | sed 1000q | sed -e "s/^/-$i- /" & done > tmpfile; wait; wc -l tmpfileAcropolis
@Acropolis - I wouldn't call this unreliable - The command line you used just introduces non-determinism by creating 50 processes which write to the >tmpfile potentially in parallel. Using >> changes just the open mode from O_TRUNC to O_APPEND, causing all 50 thousand lines to get into the file, but not necessarily in a specific order. By the way, the wait command is not effective, because the background pipeline contains children of a subshell.Prague

© 2022 - 2024 — McMap. All rights reserved.