Stop systemd from killing user slices on reboot
Asked Answered
S

4

6

My solution (so far) was to comment pam_systemd.so from common-session. Everything runs in the system.slice with no control groups. I am not sure of the impact of that yet but at least things run, stay running, and get shutdown cleanly.

Our software is in-house developed and run on SLES. It is java, oracle, a tomcat web page for sysadmin, etc. We have a script that we have been using that starts all these processes. Has been working great until systemd.

The "env" script gather info from config files and then calls other scripts to start java, oracle, etc. These other script "su" to the user like "oracle".

I have a unit for this "env" script and start works. Stop works if I run "systemctl stop env".

My issue is that on reboot the first thing is ALL users are killed and so are all the DBs, java process, etc. Basically crashing the DBs since they really aren't stopped nicely. THEN the stop tries to run and can't because stuff is down.

I have tried to add KillUserProcesses=no, enable-linger, KillExcludeUsers=, systemd-run --scope, and they still get killed.

Is there any way to have systemd NOT insta-kill users on reboot or am stuck having to figure out units for all the sub scripts?

The stuff below is just to replicate the issue - not the actual scripts running.

I was able to replicate it with the below on SLES12SP2 (systemd 228). I built an Arch machine and it didn't do the kills.

One thing I noticed that was different is the sleep 600 was a user slice on sles12 but a system slice on arch.

systemd-cgls on SLES12:

`-user.slice
  |-user-1000.slice
  | |[email protected]
  | | `-init.scope
  | |   |-1362 /usr/lib/systemd/systemd --user
  | |   `-1371 (sd-pam)                                                          
  | `-session-c1.scope
  |   `-1383 sleep 600

and on Arch:

└─system.slice
  ├─env.service
  │ └─276 sleep 600

A user slice and session aren't even created with the su on Arch.

My service file:

[Unit]
Description=Starts and stops applications needed for an environment
Wants=network.target httpd.service
After=network.target httpd.service sshd.service

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/pro/bin/sys/services/envStart.sh start
ExecStop=/pro/bin/sys/services/envStart.sh stop
ExecReload=/pro/bin/sys/services/envStart.sh restart
TimeoutSec=3600

[Install]
WantedBy=multi-user.target

The envStart script:

#!/bin/bash

case $1 in
    start)
        /pro/bin/sys/services/sleep.sh start
    ;;
    stop)
        /pro/bin/sys/services/sleep.sh stop
    ;;
esac

and the sleep script:

#!/bin/bash

case $1 in
    start)
        echo "starting sleep"
        su sleepuser -c "sleep 600 &"
    ;;
    stop)
        echo "stopping sleep"
        sleep 300
    ;;
esac
Sluggard answered 20/3, 2017 at 20:40 Comment(1)
Can you post your service file? Sounds to me like you're doing something quite weird thereMarbling
I
1

I had the same/similar problem. It was the user switch that was the problem for me, causing all processes to start in the user.slice instead of in the system.slice. Apparently nothing "important" is supposed to be running in the user.slice and systemd just kills all(?) processes there at shutdown/reboot. I solved it by removing all user switches (su/sudo) in my start scripts and using the user directive in the unit file (User=xxx).

Incommodity answered 25/9, 2019 at 8:28 Comment(0)
S
0

My solution (so far) was to comment pam_systemd.so from common-session. Everything runs in the system.slice with no control groups. I am not sure of the impact of that yet but at least things run, stay running, and get shutdown cleanly.

Sluggard answered 24/3, 2017 at 13:57 Comment(0)
A
0

I'm still stuck with the same issue unfortunately.

My investigation reveals that as an alternative to using the User=xxx directive, the script could use "runuser" instead of "sudo" and "su", as this is an "su" implementation that bypasses PAM IIUC.

For most of the services I am managing, this does the trick.

I wish there was a way to tell systemd to ignore or delay killing user sessions somehow.

Aguila answered 31/1, 2020 at 21:15 Comment(1)
Unfortunately there is no difference for us. runuser results in the same kill as sudo.Vc
M
0

I wish there was a way to tell systemd to ignore or delay killing user sessions somehow

Yes, by using SendSIGKILL=no but here the issue is not the systemd is killing the process, during OS restart Kernel kills all the user slice process. So to avoid just update your script to move all the process from user slice to system slice before the reboot (/sys/fs/cgroup/systemd/system.slice//tasks>). Hope it helps. Thanks!

Markley answered 30/6, 2022 at 6:0 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.