logoalt Hacker News

Replacing cron jobs with a centralized task scheduler

174 pointsby tlf07/28/202592 commentsview on HN

Comments

gnat08/01/2025

I find the best comments here to be ones where people use their knowledge and experience to discuss the relative strengths and weaknesses of the technology in the post. I see a bunch of short single-sentence comments here that add no value.

For my part, I see this pattern repeatedly at different places. The raw tools in the platforms are too codey and the third-party frameworks like Temporal seem overkill, so you build a scheduler and need to solve the problems OP did: only run once, know if it errored, etc.

But it's amazing how "it's firing off a basic action!" becomes a script, then becomes a script composed of reusable actions that can pick up where they left off in case of errors ... Over time your "it's just enough for us!" feature creeps towards the framework's functionality.

I'd be curious to know how long the OP's solution stays simple before it submits to the feature creep demands. (Long may complexity be fought off, though! Every day you can live without the complexity of full workflows is a blessing)

show 3 replies
Felk08/01/2025

I see that the author took a 'heuristical' approach for retrying tasks (having a predetermined amount of time a task is expected to take, and consider it failed if it wasn't updated in time) and uses SQS. If the solution is homemade anyway, I can only recommend leveraging your database's transactionality for this, which is a common pattern I have often seen recommend and also successfully used myself:

- At processing start, update the schedule entry to 'executing', then open a new transansaction and lock it, while skipping already locked tasks (`SELECT FOR UPDATE ... SKIP LOCKED`).

- At the end of processing, set it to 'COMPLETED' and commit. This also releases the lock.

This has the following nice characteristics:

- You can have parallel processors polling tasks directly from the database without another queueing mechanism like SQS, and have no risk of them picking the same task.

- If you find an unlocked task in 'executing', you know the processor died for sure. No heuristic needed

show 3 replies
burnt-resistor08/01/2025

Jobs that need retries, atomicity, monitoring, rescheduling, ad hoc scheduling, and flexibility probably aren't suited to most cron servers.

Beanstalkd, cronicle, agenda, sidekiq, faktory, celery, etc. are the usual suspects.

What is often missing is HA of the controller service process.

show 2 replies
sunshine-o08/01/2025

Is there a cool lightweight alternative to cron for (at least) a single host?

To illustrate what I am looking for, I often end up using supervisord [0] (but I also like immortal [1]) for process control when not on a systemd enabled system. In my experience they are reliable, lightweight and a pleasure to work with.

I am looking for something similar for scheduled jobs.

- [0] https://supervisord.org/

- [1] https://immortal.run/

show 2 replies
jiggunjer08/01/2025

Aka workflow orchestrator, pipeline manager, process runner, automation tool.

It's not clear if they used a product or DIY solution. The nice thing many existing products offer is a web UI and a database.

chriscbr08/02/2025

On my current team we run a centralized task scheduler used by other products in our company that manages on the order of around ~30M schedules. To that end, it's a home-grown distributed system that's built on top of Postgres and Cassandra with a whole control plane and data plane. It's been pretty fun to work on.

There are two main differences between our system and the one in the post:

- In our scheduler, the actual cron (aka recurrence rule) is stored along with the task information. That is, you specify a period (like "every 5 minutes" or "every second Tuesday at 2am") and the task will run according that schedule. We try to support most of the RRule specification. [1] If you want a task to just run one time in the future, you can totally do that too, but that's not our most common use case internally.

- Our scheduler doesn't perform a wide variety of tasks. To maximize flexibility and system throughput, it does just one thing: when a schedule is "due", it puts a message onto a queue. (Internally we have two queueing systems it interops with -- an older one built on top of Redis, and a newer one built on PG + S3). Other team consume from those queues and do real work (sending emails, generating reports, etc). The queueing systems offer a number of delivery options (delayed messages, TTLs, retries, dead-letter queues) so the scheduling system doesn't have to handle it.

Ironically, because supporting a high throughput of scheduled jobs has been our biggest priority, visibility into individual task executions is a bit limited in our system today. For example, our API doesn't expose data about when a schedule last ran, but it's something on our longer term roadmap.

[1] https://icalendar.org/iCalendar-RFC-5545/3-8-5-3-recurrence-...

jiehong08/02/2025

I think BMW used to use a paid product named Control-M to handle this (from BMC, still exists).

It contained what people quickly need to reach for:

- schedule a job in UTC or local time zone for a particular place;

- schedule a job but only if another job ran beforehand;

- semaphore-like resource limits on jobs.

It did this with job generating resource tokens and other jobs stating a token as a condition for being scheduled.

It ended up being a not so nice system to debug to be honest, but worked fine.

For simple job, I’d reach for systemd timers on a single machine, a kubernetes cronjob on a given platform, or something external altogether otherwise (for geo-distributed scheduled jobs).

sontek08/01/2025

I love this solution, I've implemented a very similar task scheduler at many companies.

I do think the best solution for this is still RabbitMQ. It has the ability to push tasks in the queue and tell it to run at a very specific time called "Delayed Messages" and then it just processes them at that time.

UltraSane08/01/2025

The Windows Task Scheduler is actually very nice and powerful. One cool trick is to have a task triggered by a windows event.

dthedavid08/01/2025

Great work. Did you consider buying instead of building? I’ve worked at organizations that built similar systems, but what was often lacking was developer experience, observability, and scalability, basically everything outside of core functionality; essentially the stuff that you're trying to tack on as you improve your system.

Now that I'm building on my own, I’ve thought about building as well, but I’ve found that off-the-shelf systems handle all of this far better (and they are opensourced too), ie trigger-dot-dev and many others.

dmitry-vsl08/01/2025

> We had createScheduledPosts.ts that would run every 15 minutes, scan our table of scheduled posts and create any that needed to be published.

Why not set the publication_date when you create a post and have a function getPublishedPosts that fetches a list of posts, filtering out those with a publication_date earlier than the current date? With this approach, you don't need cron jobs at all.

show 2 replies
meatmanek08/01/2025

Why use a 1 minute cron job to run the tasks, instead of a continuously-running queue worker (or several)?

show 3 replies
shireboy08/01/2025

One gotcha with roll your own task scheduler is if you want to run it across multiple machines. If you need 5 machines running different scheduled tasks, you need a locking mechanism to ensure only one machine is processing the task. In the author’s approach this is handled by the queue, but in my read the scheduler can only happen on one machine or you get multiple of the same task in the queue. Retry can get more complicated- depending on the failure you may want an exponential backoff, retrying N times and waiting longer durations between. A nice dashboard to see the status of everything is helpful also.

In .NET world I use Hangfire for this. In Node (I assume what this is) I tinkered with Bull, but not sure what best in class is there.

show 1 reply
jusonchan8108/02/2025

Unmeshed.io is a newer startup in the space - and works like a charm. Temporal seems like more targeting durable executions, but scheduling a different game. It starts with crons but soon you got to deal with holidays, adhoc skips and holds and more especially during maintenance and upgrades.

Unmeshed has all of these, managing holiday calendars etc and makes it super easy. It even has agents for AS400 server commands if that is still a thing you need.

rashidae08/01/2025

What happens when the DB gets large? How do you handle idempotency? (What if SQS delivers twice?) The cron job is still a single point of failure...

show 1 reply
shawn_w08/01/2025

Isn't a "centralized task scheduler" pretty much what cron is?

show 4 replies
majkinetor08/01/2025

I find Rundeck is great for this. Using it with hundreeds of jobs for a decade, with a bunch of users accessing it and checking logs, having retries, notifications and all enterprise thingies for free. Providing easy way to have GUI for scripts.

pjmlp08/01/2025

If they are using AWS, why not use what AWS already has, battle tested for task scheduling functions?

show 2 replies
8345708/02/2025

I looked around years ago and found Rundeck to be a good system for scheduled tasks.

pinko08/01/2025

HTCondor is always an option. Lacks shiny tinfoil, but works like a tank.

pokstad08/01/2025

Temporal.io is made for this

show 2 replies
d00mB0t07/28/2025

You forgot D-Bus.

_wire_07/28/2025

Next thing you know you'll have systemd.

show 1 reply
emchammer08/01/2025

[flagged]

show 1 reply