I have an app that was written with Flask+SQLALchemy+Celery, RabbitMQ as a broker, database is PostgreSQL (PostgreSQL 10.11 (Ubuntu 10.11-1.pgdg16.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609, 64-bit). Database is hosting in DigitalOcean (1 CPU, 2Gb RAM). All app workers (flask or celery) are starting in Supervisor.
In my project for connection to DB I'm using flask_sqlalchemy package like this:
from flask_sqlalchemy import SQLAlchemy
from flask import Flask
# Init
app = Flask(__name__)
# Create the connection to database
db = SQLAlchemy(app)
I wrote some login in Flask app, tested it and then copied it to the Celery project (in which db connection is the same). So now my example celery task looked like this:
@celery.task(name='example_task', queue='default')
def example_task(payload):
""" Some logic here """
data = ExampleModel.query.filter(ExampleModel.id == payload["id"]).first()
""" Some another app logic """
db.session.add(SecondModel(payload))
db.session.commit()
return {"success": True}
And the problem is when I'm running my app on my laptop, it's OK, everything is working fine, no errors. When I uploaded my app on VPS and there was not so much users, everything was ok too. But some time later when there are 30+ users on the same time and they calling this example_task, at some time errors began to appear periodically on very simple queries for select some data from database:
File "/venv/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
context)
File "/venv/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 509, in do_execute
cursor.execute(statement, parameters)
psycopg2.OperationalError: SSL SYSCALL error: EOF detected
The above exception was the direct cause of the following exception:
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) SSL SYSCALL error: EOF detected
[SQL: 'SELECT example_model.id AS example_model_id, example_model.key AS example_model_key
\nFROM example_model \nWHERE example_model.id = %(id_1)s \n LIMIT %(param_1)s'] [parameters: {'id_1': 2, 'param_1': 1}] (Background on this error at: http://sqlalche.me/e/e3q8)
Sometimes, but very-very rare I saw in logs this error:
psycopg2.OperationalError: SSL error: decryption failed or bad record mac
I wrote a example exception decorator that handle the errors (for example any errors, not only the SQLAlchemy errors), and after it catched the error I do a db.session.rollback()
def exception_log(func):
@wraps(func)
def wrapper(*args, **kwargs):
try:
return func(*args, **kwargs)
except Exception as err:
# Do the rollback
db.session.rollback()
# Call function again
return func(*args, **kwargs)
But it didn't helped me, because yes, it's reloads the connection for db, function worked fine then, but app starts to work slower and slower and at some point I should do a reload of workers in Supervisor. I saw a lot of idle connections in PostgreSQL, I set the idle transaction timeout for 5 mins, but it didn't helped.
SET SESSION idle_in_transaction_session_timeout = '5min';
I dont't know what to do next, because only solution that helps now is to reload app workers in supervisor any time I see that app is working slower and slower.
psycopg2
orpsycopg2-binary
? – Tarnopol