logoalt Hacker News

gchamonlivelast Tuesday at 3:40 PM2 repliesview on HN

Just to be clear, this isn't default systemd timer behaviour, you need to opt in by setting Persistent=true. If you have hundreds of jobs like this you need a proper queue and neither cronie nor systemd is the right tool because at that scale you'd surely need better observability


Replies

gchamonlivelast Tuesday at 6:52 PM

You could implement this with a gitlab instance in a separate system, like two VMs in proxmox or two physical machines, and a shell executor running in them. Gitlab CI has a nice feature to limit concurrency by using resource groups. Say you have 500 jobs spread through the day and the system stays offline for a while, when it comes online it'll start processing the jobs, but will only run a single one at a time. You get visibility, logs, queue monitoring and an API to query data.

zbentleylast Tuesday at 8:35 PM

> If you have hundreds of jobs like this you need a proper queue and neither cronie nor systemd is the right tool

Eh sometimes, but you can get pretty far with one of two approaches:

1. Careful use of Requires= and Wants= to group your scripts into chains of jobs, which achieves fixed parallel (though at 100s of jobs, I hope you're generating those unit files with a tool like Puppet or https://github.com/karlicoss/dron or something and not doing this by hand).

2. Even better, just use a lockfile. `ExecStart="flock -F $TMPDIR/mylock <command>"` is pretty hard to beat. Use -F so as not to confuse KillMode and resource accounting and you're golden. Just don't use flock(1) timeouts; let systemd handle that. Heck, if you have that many cron jobs, you should be doing this even if you don't use systemd; otherwise job latency changes can cause reboot-style thundering herds out of the blue.

If you need semaphore behavior and still don't want a real job queue, waitlock (https://github.com/bigattichouse/waitlock) and many other CLIs have you covered.