How to debug a failed systemctl service (code=exited, status=217/USER)?
Asked Answered
D

2

41

I'm trying to add my first service on rhel7 (which resides in AWS/EC2), but - the service is not configured correctly - as I get:

[ec2-user@ip-172-30-1-96 ~]$ systemctl status clouddirectd.service -l
● clouddirectd.service - CloudDirect Daemon
   Loaded: loaded (/usr/lib/systemd/system/clouddirectd.service; enabled; vendor preset: disabled)
   Active: activating (auto-restart) (Result: exit-code) since Tue 2018-01-09 16:09:42 EST; 8s ago
 Main PID: 10064 (code=exited, status=217/USER)

Jan 09 16:09:42 ip-172-30-1-96.us-west-1.compute.internal systemd[1]: clouddirectd.service: main process exited, code=exited, status=217/USER
Jan 09 16:09:42 ip-172-30-1-96.us-west-1.compute.internal systemd[1]: Unit clouddirectd.service entered failed state.
Jan 09 16:09:42 ip-172-30-1-96.us-west-1.compute.internal systemd[1]: clouddirectd.service failed.

Also:

[ec2-user@ip-172-30-1-96 ~]$ systemctl is-active clouddirectd
activating
[ec2-user@ip-172-30-1-96 ~]$ sudo systemctl list-units --type service --all | grep clouddirectd
  clouddirectd.service                                  loaded    activating auto-restart CloudDirect Daemon

And my unit file is:

[ec2-user@ip-172-30-1-96 ~]$ cat /usr/lib/systemd/system/clouddirectd.service
[Unit]
Description=CloudDirect Daemon
After=network.target

[Service]
Environment=AWS_SHARED_CREDENTIALS_FILE=/etc/sonar/.aws/credentials
#ExecStart=/usr/lib/sonar/clouddirect/virtualenv/bin/python /usr/bin/sonar/clouddirectd -c /etc/sonar/clouddirect/clouddirectd.conf
ExecStart=/usr/lib/sonar/clouddirect/virtualenv/bin/python /usr/bin/clouddirect -c /etc/sonar/clouddirect.conf
# @PERM@ allow group write permission on newly created files
UMask=0007
#User=clouddirectd
User=clouddirect
Group=sonar
KillSignal=SIGINT
TimeoutStopSec=60min
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

Can you suggest how to debug this systemctl service so it won't keep dying and auto restarting?

Decrescendo answered 9/1, 2018 at 20:32 Comment(3)
I know I didn't come across and answer this question until 7 months after you posted it, but could you let me know if this answer applied to your issue?Flatus
Your answer, @JoshMc, was in the right direction (i.e., a problem with the username).Decrescendo
boardrider I noticed a number of up votes of my answer, if it does fit your question would you mind accepting it? I do mention it could be a simple typo which I think you are indicating in your comment.Flatus
F
71

The error 217 indicate the user did not exist at the time the service tried to start. In your case the user specified in your service is clouddirect.

 Main PID: 10064 (code=exited, status=217/USER)

Jan 09 16:09:42 ip-172-30-1-96.us-west-1.compute.internal systemd[1]: clouddirectd.service: main process exited, code=exited, status=217/USER

This could be caused if that is not the actual user name (for example if it has a typo), it can also be caused if the user is part of some external user store (ex: LDAP or Active Directory) and the service which needs to start that allows the Linux server to access the external user store is not up yet. For example vasd.service starts a product used to allow Linux to authenticate against Active Directory, if vasd.service is not up and you have specified a user that is only available in Active Directory you would want to add that service in your After= line. For example:

After=network.target vasd.service
Flatus answered 9/8, 2018 at 18:18 Comment(3)
Or maybe you copied the service file from another server and the user doesn't exists on this.Griffiths
would having a hyphen like "/home/ec2-user/" cause this?Legitimize
I have not seen that be a cause, can you try with a user without an embedded hyphen to see if it makes a difference.Flatus
F
3

There's two parts to the question. One is how to diagnose a 217/USER, the other is how to fix it. I'll just focus on the former.

For the 217/USER there's some good pointers here:

https://www.reddit.com/r/linuxquestions/comments/oaya49/systemd_service_not_starting_with_status217/

217 doesnt' "always" mean it's a user problem, it just means it exited with a 217. May or may not...

You could use journalctl to see the logs of which services "seem to come up after it does" initially or what not.

It's possible that "network users" aren't yet available at the time the system is started during boot, you can fix that by adding After=nss-user-lookup.target https://systemd.io/UIDS-GIDS/ though that's not the case here since it still fails after restarting, which is later. systemd expects the user specified to "be available" when the service starts. So for "system users" (which start early running processes) they need to be available on the local box. For later started processes they can be "network users".

You could also try changing your group and username (and environment) to what you "think" systemd is running and run it manually, see what happens. https://serverfault.com/questions/410577/execute-a-command-from-another-group Kind of wish systemd output more debug so you could tell what it is running more easily...

In certain bizarre cases you may need to specify both User= and Group= https://superuser.com/a/1452367/39364

In our case running "vintela status" had a message "SELinux may not be configured correctly" and sure enough, after disabling SELinux, it started working as expected, no more 217. [redhat 8]

Filippa answered 11/1, 2022 at 18:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.