How to configure Celery Worker and Beat for Email Reporting in Apache Superset running on Docker?
Asked Answered
K

3

8

I am running Superset via Docker. I enabled the Email Report feature and tried it:

image

However, I only receive the test email report. I don't receive any emails after.

This is my CeleryConfig in superset_config.py:

class CeleryConfig(object):
    BROKER_URL = 'sqla+postgresql://superset:superset@db:5432/superset'
    CELERY_IMPORTS = (
        'superset.sql_lab',
        'superset.tasks',
    )
    CELERY_RESULT_BACKEND = 'db+postgresql://superset:superset@db:5432/superset'
    CELERYD_LOG_LEVEL = 'DEBUG'
    CELERYD_PREFETCH_MULTIPLIER = 10
    CELERY_ACKS_LATE = True
    CELERY_ANNOTATIONS = {
        'sql_lab.get_sql_results': {
            'rate_limit': '100/s',
        },
        'email_reports.send': {
            'rate_limit': '1/s',
            'time_limit': 120,
            'soft_time_limit': 150,
            'ignore_result': True,
        },
    }
    CELERYBEAT_SCHEDULE = {
        'email_reports.schedule_hourly': {
            'task': 'email_reports.schedule_hourly',
            'schedule': crontab(minute=1, hour='*'),
        },
    }

The documentation says I need to run the celery worker and beat.

celery worker --app=superset.tasks.celery_app:app --pool=prefork -O fair -c 4
celery beat --app=superset.tasks.celery_app:app

I added them to the 'docker-compose.yml':

superset-worker:
    build: *superset-build
    command: >
      sh -c "celery worker --app=superset.tasks.celery_app:app -Ofair -f /app/celery_worker.log &&
             celery beat --app=superset.tasks.celery_app:app -f /app/celery_beat.log"
    env_file: docker/.env
    restart: unless-stopped
    depends_on: *superset-depends-on
    volumes: *superset-volumes

Celery Worker is indeed working when sending the first email. The log file is also visible. However, the celery beat seems to not be functioning. There is also no 'celery_beat.log' created.

If you'd like a deeper insight, here's the commit with the full implementation of the functionality.

How do I correctly configure celery beat? How can I debug this?

Kotz answered 20/4, 2020 at 20:3 Comment(0)
K
3

I managed to solve it by altering the CeleryConfig implementation, and adding a beat service to 'docker-compose.yml'

New CeleryConfig class in 'superset_config.py':

REDIS_HOST = get_env_variable("REDIS_HOST")
REDIS_PORT = get_env_variable("REDIS_PORT")

class CeleryConfig(object):
    BROKER_URL = "redis://%s:%s/0" % (REDIS_HOST, REDIS_PORT)
    CELERY_IMPORTS = (
        'superset.sql_lab',
        'superset.tasks',
    )
    CELERY_RESULT_BACKEND = "redis://%s:%s/1" % (REDIS_HOST, REDIS_PORT)
    CELERY_ANNOTATIONS = {
        'sql_lab.get_sql_results': {
            'rate_limit': '100/s',
        },
        'email_reports.send': {
            'rate_limit': '1/s',
            'time_limit': 120,
            'soft_time_limit': 150,
            'ignore_result': True,
        },
    }
    CELERY_TASK_PROTOCOL = 1
    CELERYBEAT_SCHEDULE = {
        'email_reports.schedule_hourly': {
            'task': 'email_reports.schedule_hourly',
            'schedule': crontab(minute='1', hour='*'),
        },
    }

Changes in 'docker-compose.yml':

  superset-worker:
    build: *superset-build
    command: ["celery", "worker", "--app=superset.tasks.celery_app:app", "-Ofair"]
    env_file: docker/.env
    restart: unless-stopped
    depends_on: *superset-depends-on
    volumes: *superset-volumes

  superset-beat:
    build: *superset-build
    command: ["celery", "beat", "--app=superset.tasks.celery_app:app", "--pidfile=", "-f", "/app/celery_beat.log"]
    env_file: docker/.env
    restart: unless-stopped
    depends_on: *superset-depends-on
    volumes: *superset-volumes
Kotz answered 12/5, 2020 at 11:32 Comment(2)
what is superset-build?.. Is the config outdated?Greyso
@Greyso this was pre superset 1.0. superset-build corresponds to the versions prior 1.0.Kotz
A
0

I believe Celery needs to run inside your superset container - so you'll need to modify your dockerfile and entrypoint.
BUT you should really first daemonize celery so you don't have to monitor and restart celery [see how to detect failure and auto restart celery worker and http://docs.celeryproject.org/en/latest/userguide/daemonizing.html].
See an example here for how to run a daemonized celery process in docker: Docker - Celery as a daemon - no pidfiles found

Auntie answered 21/4, 2020 at 14:6 Comment(0)
B
0

you can also add -B flag to celery worker command to run beat

celery worker --app=superset.tasks.celery_app:app --pool=prefork -O fair -c 4 -B
Bonucci answered 12/5, 2020 at 10:41 Comment(6)
that's not recommended in production, when you have multiple workersKotz
It is better then two separate command/processes in docker. I have 1,5 year of -B in prod, all fine.Bonucci
how many workers do you have running? The docs say "You can also embed beat inside the worker by enabling the workers -B option, this is convenient if you’ll never run more than one worker node, but it’s not commonly used and for that reason isn’t recommended for production use"Kotz
If you have only one worker then fine, but you shouldn't use -B for multiple workersKotz
there were 6 containers with diffirent params for workers and one of them was with -B option and all works fineBonucci
@RyabchenkoAlexander need help in some mail setup , can you help?Weatherly

© 2022 - 2024 — McMap. All rights reserved.