How do I manage connection pooling to PostgreSQL from sidekiq?
Asked Answered
S

2

6

The problem I have a rails application that runs a few hundred sidekiq background processes. They all connect to a PostgreSQL database which is not exactly happy about providing 250 connections - it can, but if all sidekiq processes accidentally send queries to the db, it crumbles.

Option 1 I have been thinking about adding pgBouncer in front of the db, however I cannot currently use it's transactional mode, since I'm highly dependent upon setting the search_path at the beginning of each job processing for determining which "country" (PostgreSQL schema) to work on (apartment-gem). In this case, I would have to use the session based connection pooling mode. This however would, as far as I know, require me to disconnect the connections after each job processing, to release the connections back into the pool, and that would be really costly performance wise wouldn't it? Am I missing out on something?

Option 2 use application layer based connection pooling is of cause also an option, however I'm not really sure how I would be able to do that for PostgreSQL with sidekiq?

Option 3 something I have not thought of?

Swage answered 19/3, 2016 at 0:49 Comment(0)
M
4

Option 1: You're correct, sessions would require you to drop and reconnect and that adds overhead. How costly would be dependent on access pattern ie what fraction of the connection/tcp handshake etc is of the total work done and what sort of latency you need. Definitely worth benchmarking but if the connections are short lived then the overhead will be really noticeable.

Option 2/3: You could rate limit or throttle your sidekiq jobs. There are a few projects here tackling this...

Queue limits

  • Sidekiq Limit Fetch: Restrict number of workers which are able to run specified queues simultaneously. You can pause queues and resize queue distribution dynamically. Also tracks number of active workers per queue. Supports global mode (multiple sidekiq processes). There is an additional blocking queue mode.
  • Sidekiq Throttler: Sidekiq::Throttler is a middleware for Sidekiq that adds the ability to rate limit job execution on a per-worker basis.
  • sidekiq-rate-limiter: Redis backed, per worker rate limits for job processing.
  • Sidekiq::Throttled: Concurrency and threshold throttling.

I got the above from here

https://github.com/mperham/sidekiq/wiki/Related-Projects

If your application must have a connection per process and you're unable to break it up where more threads can use a connection then it's pgBouncer or Application based connection pooling. Connection pooling is in effect either going to throttle or limit your app in some way in order to save the DB.

Mita answered 20/4, 2016 at 8:13 Comment(1)
"if the connections are short lived then the overhead will be really noticeable" -> sound like it would hurt, but're talking miliseconds here (depends on where pgBouncer instance is running -> maybe it can run on the app server, skipping the network roundtrips), take a look at this: depesz.com/2012/12/02/what-is-the-point-of-bouncingFrecklefaced
A
-1

Sidekiq should only require one connection for each worker thread. If you are setting your concurrency to a reasonable value, say 10-25, I don't think you should be using 250 simultaneous database connections. How many worker processes are you running, and what is their concurrency?

Also, you can see on that page that even if you have a high concurrency setting, you can still create a connection pool shared by the threads within that process.

Allina answered 19/3, 2016 at 3:24 Comment(1)
This is not really relevant to my question. I need the 250 connections to be available, since I'm running 20 processes of 10 threads each.Swage

© 2022 - 2024 — McMap. All rights reserved.