Proxmox Backup issues

Backups is like insurance. If something fails or a fuckup happens, we can count on backups. when I went to do some system updates production systems. Went and did a more recent fresh backup, and then I saw it. Really old backups and no new updates. After finding out Backup not being done, I wanted to get to the bottom of it.

The issue

Looking in issue on the 1 other node (pve01) backups where being done. but not the main node (pve02).

On the main node (pve02), I saw pvesheduler not running.
When trying to start pvesheduler manually with systemctl, command just hangs when trying to start.

The fix

After reading this. I was able to fix it. https://forum.proxmox.com/threads/pvescheduler-is-dead-and-wont-start.120794/

pvesheduler is dependent of pve-guests running. If pve-guests wasn’t able to run the backups would work.

The pve-guests.service was stuck with a child zombie tread. Killing the parent with SIGKILL made the service unstuck

Now restarting after unstucking pve-guests pvesheduler.service was able to start, and backups started being taken again.

What caused it

I have no idea, Maybe after a boot.

@TODO to fix future issues

Create a zabbix trigger if it sees zombie processes will sound a trigger.
Make zabbix watch the sensitive processes.
Setup automated Emails.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *