How can I make Ansible fail when the systemd service fails to start?
Asked Answered
C

2

9

I have a systemd service that I deploy and want to be started by Ansible.

My systemd service unit file is this:

[Unit]
Description=Collector service
After=network.target mariadb.service
Requires=mariadb.service

[Service]
Type=simple
ExecStart=/opt/collector/app.py
WorkingDirectory=/opt/collector
Restart=on-abort
User=root

[Install]
WantedBy=multi-user.target

I am using Type=simple since this looks like the correct solution (also the preferred one in this Question).

I tried using Type=oneshot as well (as suggested by the initial user making this question as duplicate of this question) but the problem is that the /opt/collector/app.py script is a long running process:

while True:
    t = threading.Thread(...)
    t.start()
    t.join()
    time.sleep(15)

and with Type=oneshot, Ansible will block forever.

And my Ansible starting code is:

- name: start Collector service
  systemd:
    name: collector
    state: started
    enabled: yes

On the target system, systemctl will display:

[root@srv01 /]# systemctl
  UNIT                           LOAD   ACTIVE     SUB       DESCRIPTION
  dev-sda1.device                loaded activating tentative /dev/sda1
  -.mount                        loaded active     mounted   /
  dev-mqueue.mount               loaded active     mounted   POSIX Message Queue File System
  etc-hostname.mount             loaded active     mounted   /etc/hostname
  etc-hosts.mount                loaded active     mounted   /etc/hosts
  etc-resolv.conf.mount          loaded active     mounted   /etc/resolv.conf
  run-user-0.mount               loaded active     mounted   /run/user/0
  session-73.scope               loaded active     running   Session 73 of user root
  crond.service                  loaded active     running   Command Scheduler
  dbus.service                   loaded active     running   D-Bus System Message Bus
  haproxy.service                loaded active     running   HAProxy Load Balancer
<E2><97><8F> collector.service          loaded failed     failed   Collector service
....

The service fails because of the Python process exception (using un undefined variable).

But my Ansible playbook run does not fail:

TASK [inventory : start Collector service] *********************************
changed: [srv01]

I tried with both systemd and service Ansible modules and the behavior is the same.

How can I make Ansible:

  • fail when the systemd unit fails to start?
  • not block and systemd getting in active running status with a while True process?
Calcariferous answered 29/7, 2018 at 19:6 Comment(6)
This question is not a duplicate of the mentioned question. See last edits. oneshot is not a solution due to the long running process that systemd needs to start.Calcariferous
No one ever suggested oneshot was a solution for you, there are other options which you need to explore and choose. I closed the question because you asked for a reason why it did not return an error and the other answer explains that in a nice way. This is also not an Ansible problem, because systemctl start behaves exactly the same - so you can delete all that Ansible talk from your question. You are left with an open and too broad question "how to write a daemon in Python".Erle
But the problem is that systemd alone behaves correctly. The difficulty is to have this running from Ansible (i.e. block forever or no showing the failures). If you have a solution how to solve this with Ansible, you can post an answer. Like that, thinking you know everything, you might block others from giving/receiving answers. Cheers!Calcariferous
Also, thinking in a puritan way, I have no idea why you pointed out to another question that explicitly compares simple vs oneshot systemd options. For the "other options [...] to explore", I think you should have pointed me to: freedesktop.org/software/systemd/man/systemd.service.html (which by the way, I read but still not yet found a solution to make it run from Ansible)Calcariferous
@GabrielPetrovay I pointed you to that answer because it contains a description of Type=simple and why it will never work for you with that setting. ・ So you claim systemd is behaving differently than Ansible and differently than how it was designed (read the other answer), right? And you call that "correctly"? Behaving contrary to documentation? ・ Attributing "thinking you know everything" attitude to someone who wanted to help is not only rude, but also counterproductive.Erle
I am wondering if running with -vvvv (or more v's) would help show what is going on. This ansible task is doing two things: starting and then enabling. Perhaps the 1st task fails but the second one succeeds and that is the rc returned by ansible. You could try breaking those two things into separate tasks.Lustrate
E
4

I stumbled over this while I had the same problem with silently failing serives. I also found a bug report describing this issue and after some research I found a workaround:

- name: start Collector service
  systemd:
    name: collector
    state: started
    enabled: yes

- name: make sure Collector service is really running
  command: systemctl is-active collector

Note that for Type=simple services this will only fail if the service itself fails immediately after it was started.

Euphroe answered 5/8, 2019 at 18:20 Comment(0)
A
0

You can use failed_when example:

- name:  validating processes started correctly
  shell: pgrep toto| wc -l
  register: after_count
  failed_when: after_count.stdout_lines[0] == "1"

The failed_when will fail a task if the number of processes returned is not == 1

Advancement answered 6/8, 2019 at 16:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.