B

5

Update:

Read "Indicate to an ajax process that the delayed job has completed" before if you have the same problem. Thanks Gene.

I have a problem with concurrency. I have a controller scraping a few web sites, but each call to my controller needs about 4-5 seconds to respond.

So if I call 2 (or more) times in a row, the second call needs wait for the first call before starting.

So how I can fix this problem in my controller? Maybe with something like EventMachine?

Update & Example:

application_controller.rb

def func1
    i=0
    while i<=2
        puts "func1 at: #{Time.now}"
        sleep(2)
        i=i+1
    end
end

def func2
    j=0
    while j<=2
        puts "func2 at: #{Time.now}"
        sleep(1)
        j=j+1
    end
end

whatever_controller.rb

puts ">>>>>>>> Started At #{Time.now}"
  func1()
  func2()
puts "End at #{Time.now}"

So now I need request http://myawesome.app/whatever several times at the same times from the same user/browser/etc.

I tried Heroku (and local) with Unicorn but without success, this is my setup:

unicorn.rb http://pastebin.com/QL0wdGx0
Procfile http://pastebin.com/RrTtNWJZ
Heroku setup https://www.dropbox.com/s/wxwr5v4p61524tv/Screenshot%202014-02-20%2010.33.16.png

Requirements:

I need a RESTful solution. This is API so I need to responds JSON

More info: I have right now 2 cloud servers running.

Heroku with Unicorn
Engineyard Cloud with Nginx + Panssenger

Burkes answered 18/2, 2014 at 20:25 Comment(3)

When you say that Unicorn didn't work, can you explain in more detail what happened? How are you testing it? – Playboy 22/2, 2014 at 4:56

Which version of rails are you running? – Byword 25/2, 2014 at 22:24

Why are you initiating the screen scraping from a controller? Can this just be a batch job that is scheduled? If so, I'd recommend a clockwork + sidekiq setup - will provide detailed answer if this sounds applicable. – Sagittate 27/2, 2014 at 1:17

N

2

For any long response time controller function, the delayed job gem is a fine way to go. While it is often used for bulk mailing, it works as well for any long-running task.

Your controller starts the delayed job and responds immediately with a page that has a placeholder - usually a graphic with a progress indicator - and Ajax or a timed reload that updates the page with the full information when it's available. Some information on how to approach this is in this SO article.

Not mentioned in the article is that you can use redis or some other memory cache to store the results rather than the main database.

Nogood answered 26/2, 2014 at 10:13 Comment(1)

This proposal is EXACTLY what I wanted. I'll test this and confirm if so. – Burkes 28/2, 2014 at 9:40

A

4

You're probably using webrick in development mode. Webrick only handles one request at a time.

You have several solutions, many ruby web servers exist that can handle concurrency.

Here are a few of them.

Thin

Thin was originally based on mongrel and uses eventmachine for handling multiple concurrent connections.

Unicorn

Unicorn uses a master process that will dispatch requests to web workers, 4 workers equals 4 concurrent possible requests.

Puma

Puma is a relatively new ruby server, its shiny feature is that it handles concurrent requests in threads, make sure your code is threadsafe !

Passenger

Passenger is a ruby server bundled inside nginx or apache, it's great for production and development

Others

These are a few alternatives, many other exist, but I think they are the most used today.

To use all these servers, please check their instructions. They are generally available on their github README.

Apery answered 18/2, 2014 at 20:37 Comment(3)

Thank you man. You are right, I'm using webrick. Now I'm testing Unicorn in development (localhost) but I can't get it to work with multiple workers. This is my config/unicorn.rb gist.github.com/skozz/1ed6a4c2514a62427856 – Burkes 18/2, 2014 at 20:58

Btw, I'm using Heroku with dropbox.com/s/rzgrczywb0pl5k9/… in production – Burkes 18/2, 2014 at 20:59

FYI Puma is new, but its basically a modern take on the Mongrel server from Zed Shaw which is old. I like it and have used it successfully as my default for some time I find it easier to use quickly than unicorn. my 2 cents – Nussbaum 27/2, 2014 at 1:8

D

2

Answers above are part of the solution: you need a server environment that can properly dispatch concurrent requests to separate workers; unicorn or passenger can both work by creating workers in separate processes or threads. This allows many workers to sit around waiting while not blocking other incoming requests.

If you are building a typical bot whose main job is to get content from other sources, these solutions may be ok. But if what you need is a simple controller that can accept hundreds of concurrent requests, all of which are sending independent requests to other servers, you will need to manage threads or processes yourself. Your goal is to have many workers waiting to do a simple job, and one or more masters whose jobs it is to send requests, then be there to receive the responses. Ruby's Thread class is simple, and works well for cases like this with ruby 2.x or 1.9.3.

You would need to provide more detail about what you need to do for help getting to any more specific solution.

Disinfest answered 26/2, 2014 at 5:52 Comment(0)

N

2

For any long response time controller function, the delayed job gem is a fine way to go. While it is often used for bulk mailing, it works as well for any long-running task.

Your controller starts the delayed job and responds immediately with a page that has a placeholder - usually a graphic with a progress indicator - and Ajax or a timed reload that updates the page with the full information when it's available. Some information on how to approach this is in this SO article.

Not mentioned in the article is that you can use redis or some other memory cache to store the results rather than the main database.

Nogood answered 26/2, 2014 at 10:13 Comment(1)

This proposal is EXACTLY what I wanted. I'll test this and confirm if so. – Burkes 28/2, 2014 at 9:40

T

1

Try something like unicorn as it handles concurrency via workers. Something else to consider if there's a lot of work to be done per request, is to spin up a delayed_job per request.

The one issue with delayed job is that the response won't be synchronous, meaning it won't return to the user's browser.

However, you could have the delayed job save its responses to a table in the DB. Then you can query that table for all requests and their related responses.

Towne answered 25/2, 2014 at 16:51 Comment(0)

N

1

What ruby version are you utilizing?

Ruby & Webserver

Ruby

If its a simple application I would recommend the following. Try to utilize rubinius (rbx) or jruby as they are better at concurrency. Although they have drawback as they're not mainline ruby so some extensions won't work. But if its a simple app you should be fine.

Webserver

use Puma or Unicorn if you have the patience to set it up

If you're app is hitting the API service

You indicate that the Global Lock is killing you when you are scraping other sites (presumably ones that allow scraping), if this is the case something like sidekiq or delayed job should be utilized, but with caution. These will be idempotent jobs. i.e. they might be run multiple times. If you start hitting a website multiple times, you will hit a website's Rate limit pretty quickly, eg. twitter limits you to 150 requests per hour. So use background jobs with caution.

If you're the one serving the data

However reading your question it sounds like your controller is the API and the lock is caused by users hitting it.

If this is the case you should utilize dalli + memcached to serve your data. This way you won't be I/O bound by the SQL lookup as memcached is memory based. MEMORY SPEED > I/O SPEED

Nussbaum answered 27/2, 2014 at 1:23 Comment(0)