Ruby readpartial and read_nonblock not throwing EOFError

Asked 20/11, 2012 at 19:48 Answered 9/8, 2022 at 8:24

ruby unix nonblocking unicorn preforking

I'm trying to understand and recreate a simple preforking server along the lines of unicorn where the server on start forks 4 processes which all wait (to accept) on the controlling socket.

The controlling socket @control_socket binds to 9799 and spawns 4 workers which wait to accept a connection. The work done on each worker is as follows

def spawn_child
  fork do
    $STDOUT.puts "Forking child #{Process.pid}"
    loop do 
      @client = @control_socket.accept                                        
      loop do                     
        request = gets              

        if request                          
            respond(@inner_app.call(request))                           
        else
            $STDOUT.puts("No Request")
            @client.close                           
        end
      end
    end
  end
end

I've used a very simple rack app which simply returns a string with the status code 200 and a Content-Type of text/html.

The problem i face is that my server works as it should when i read incoming requests (by hitting the url at "http://localhost:9799") using a gets instead of something like read or read_partial or read_nonblock. When I use non blocking reads it never seems to throw the EOFError, which according to my understanding means it does not receive the EOF state.

This causes the read loop to not complete. Here is the code snippet which does this bit of work.

# Reads a file using IO.read_nonblock
# Returns end of file when using get but doesn't seem to return 
# while using read_nonblock or readpartial
        # The fact that the method is named gets is just bad naming, please ignore
def gets
  buffer = ""         
  i =0
  loop do
    puts "loop #{i}"
    i += 1
    begin
      buffer << @client.read_nonblock(READ_CHUNK)
      puts "buffer is #{buffer}"
    rescue  Errno::EAGAIN => e
      puts "#{e.message}"
      puts "#{e.backtrace}"
      IO.select([@client])
      retry
    rescue EOFError
      $STDOUT.puts "-" * 50
      puts "request data is #{buffer}"    
      $STDOUT.puts "-" * 50
      break           
    end
  end
  puts "returning buffer"
  buffer
end

However the code works perfectly if the I use a simple gets instead of read or read_nonblock or if replace the IO.select([@client]) with a break.

Here is when the code works and returns the response. The reason I intend to use read_nonblock is unicorn uses an equivalent using the kgio library which implements a non_blocking read.

def gets
  @client.gets
end

The entire code is pasted next.

module Server   
  class Prefork
    # line break 
    CRLF  = "\r\n"
    # number of workers process to fork
    CONCURRENCY = 4
    # size of each non_blocking read
    READ_CHUNK = 1024

    $STDOUT = STDOUT
    $STDOUT.sync

    # creates a control socket which listens to port 9799
    def initialize(port = 21)
      @control_socket = TCPServer.new(9799)
      puts "Starting server..."
      trap(:INT) {
        exit
      }
    end

    # Reads a file using IO.read_nonblock
    # Returns end of file when using get but doesn't seem to return 
    # while using read_nonblock or readpartial
    def gets
      buffer = ""         
      i =0
      loop do
        puts "loop #{i}"
        i += 1
        begin
          buffer << @client.read_nonblock(READ_CHUNK)
          puts "buffer is #{buffer}"
        rescue  Errno::EAGAIN => e
          puts "#{e.message}"
          puts "#{e.backtrace}"
          IO.select([@client])
                              retry
        rescue EOFError
          $STDOUT.puts "-" * 50
          puts "request data is #{buffer}"    
          $STDOUT.puts "-" * 50
          break           
        end
      end
      puts "returning buffer"
      buffer
    end

    # responds with the data and closes the connection
    def respond(data)
      puts "request 2 Data is #{data.inspect}"
      status, headers, body = data
      puts "message is #{body}"
      buffer = "HTTP/1.1 #{status}\r\n" \
               "Date: #{Time.now.utc}\r\n" \
               "Status: #{status}\r\n" \
               "Connection: close\r\n"            
      headers.each {|key, value| buffer << "#{key}: #{value}\r\n"}          
      @client.write(buffer << CRLF)
      body.each {|chunk| @client.write(chunk)}            
    ensure 
      $STDOUT.puts "*" * 50
      $STDOUT.puts "Closing..."
      @client.respond_to?(:close) and @client.close
    end

    # The main method which triggers the creation of workers processes
    # The workers processes all wait to accept the socket on the same
    # control socket allowing the kernel to do the load balancing.
    # 
    # Working with a dummy rack app which returns a simple text message
    # hence the config.ru file read.
    def run         
      # copied from unicorn-4.2.1
      # refer unicorn.rb and lib/unicorn/http_server.rb           
      raw_data = File.read("config.ru")           
      app = "::Rack::Builder.new {\n#{raw_data}\n}.to_app"
      @inner_app = eval(app, TOPLEVEL_BINDING)
      child_pids = []
      CONCURRENCY.times do
        child_pids << spawn_child
      end

      trap(:INT) {
        child_pids.each do |cpid|
          begin 
            Process.kill(:INT, cpid)
          rescue Errno::ESRCH
          end
        end

        exit
      }

      loop do
        pid = Process.wait
        puts "Process quit unexpectedly #{pid}"
        child_pids.delete(pid)
        child_pids << spawn_child
      end
    end

    # This is where the real work is done.
    def spawn_child
      fork do
        $STDOUT.puts "Forking child #{Process.pid}"
        loop do 
          @client = @control_socket.accept                                        
          loop do                     
            request = gets              

            if request                          
              respond(@inner_app.call(request))                           
            else
              $STDOUT.puts("No Request")
              @client.close                           
            end
          end
        end
      end
    end
  end
end

p = Server::Prefork.new(9799)
p.run

Could somebody explain to me why the reads fail with read_partial or read_nonblock or read. I would really appreciate some help on this.

Sexlimited answered 20/11, 2012 at 19:48 Comment(2)

The behavior you describe is the opposite of what docs EOFError , read_nonblock etc say. get should return nil, read_nonblock should raise EOFError. – Bacchanal 9/12, 2012 at 22:2

What happens if you only start up a single worker? It's odd to me that you assign an instance variable @client in the spawn_child method. Wouldn't each worker override that variable? Or, does fork establish it's own context? – Tempi 12/12, 2012 at 13:52

First i wanna talk about some basic knowledge, EOF means end of file, it's like signal will send to caller when there is no more data can be read from data source, for example, open a File and after read the entire file will receives an EOF, or just simple close the io stream.

Then there are several differences between these 4 methods

gets reads a line from stream, in ruby it uses $/ as the default line delimiter, but you can pass a parameter as line delimiter, because if the client and server are not the same operating system, the line delimiter maybe different, it's a block method, if never meet a line delimiter or EOF it will block, and returns nil when receives an EOF, so gets will never meet an EOFError.
read(length) reads length bytes from stream, it's a block method, if length is omitted then it will block until read EOF, if there is a length then it returns only once has read certain amount of data or meet EOF, and returns empty string when receives an EOF, so read will never meet an EOFError.
readpartial(maxlen) reads at most maxlen bytes from stream, it will read available data and return immediately, it's kind like a eager version of read, if the data is too large you can use readpartial instead of read to prevent from blocking, but it's still a block method, it blocks if no data available immediately, readpartial will raises an EOFError if receives an EOF.
read_nonblock(maxlen) is kind like readpartial, but like the name said it's a nonblock method, even no data available it raise an Errno::EAGAIN immediately it means no data right now, you should care about this error, normally in Errno::EAGAIN rescue clause should call IO.select([conn]) first for less unnecessary cycle, it will block until the conn becomes available to read, then retry, read_nonblock will raises an EOFError if receives an EOF.

Now let's see your example, as i see what you are doing is try to read data by "hitting the url" first, it's just a HTTP GET request, some text like "GET / HTTP/1.1\r\n", connection are keep alive in HTTP/1.1 by default, so using readpartial or read_nonblock will never receive an EOF, unless put Connection: close header in your request, or change your gets method as below:

buffer = ""
if m = @client.gets
  buffer << m
  break if m.strip == ""
else
  break
end
buffer

You can't use read here, because you don't know the exact length of the request package, use large length or just simply omitted will cause block.

Inexplicable answered 27/12, 2012 at 15:7 Comment(0)

r, stop = "", false
io = IO.new(2)
EXIT_SYMBOL = 'q'

until stop 
  begin
    r = io.read_nonblock(256)
  rescue IO::WaitReadable
    retry unless r.scan(EXIT_SYMBOL).first
    
    r, stop  = "", true
  end
end

For exit need enter 'q' symbol then press Enter

Archaize answered 9/8, 2022 at 8:24 Comment(0)

Recommended topics

Hot tags