Given how taxing the "thundering herd" effect can be on mirrors, websites (RSS readers!), you'd think this sort of thing should've been in cron since at least mid-90s.
Once again, OpenBSD with the simple, obvious solution, that everyone else kinda overlooked. I hope every other cron out there copies and ships this as soon as possible.
On the other hand I wouldn't be able to understand what the hell is this cron string. I actually have no idea about cron format despite the fact that I used it multiple times. I have to read man every time I use it. Also different software implements it differently.
[Timer]
OnCalendar=daily
RandomizedDelaySec=12h
might take few more seconds to type, but it's definitely readable without any additional documentation.
> The arguments to the directives are time spans configured in seconds. Example: "OnBootSec=50" means 50s after boot-up. The argument may also include time units. Example: "OnBootSec=5h 30min" means 5 hours and 30 minutes after boot-up.
Sec is a standard suffix for time values, anything ending with Sec accepts a value in seconds. 12h is shorthand for 'the number of seconds in 12 hours'
In our code at work we have constants like HOUR=3600 and RESTART_TIME_SECS = 6 * HOUR. It makes sense to me. If it doesn't for you, feel free to use something else I guess.
That's not a hack though, a hack is gluing things together to fix a certain specific bug that is not easily solved because the bug is related to the design instead of a mere mistake.
Otherwise with your standard of hack anything beyond hello world and baby's first input are hacks because everything else requires boilerplate.
It's not a hack because of the boilerplate - it's a hack because it's functionality implemented in a wrong place. I think the boilerplate made you believe it's not a hack by making it look professional(ish), ie someone spend time putting a lot of lipstick on that pig, man page and all.
freebsd cron(8) has -j for the daemon to add a random sleep of up to 60 seconds on each task. This was added in FreeBSD 5.3, committed 19 years ago. (FreeBSD 5.3 released November 6, 2004)
A random sleep of up to 60 seconds doesn't really solve the problem the OpenBSD changes do, especially when your jobs take longer than 60s.
For example, instead of "0-59/10" in the minutes field, "0~59/10" can be used to run a command every 10 minutes where the first command starts at a random offset in the range [0,9]. The high and low numbers are optional, "~/10" can be used instead.
That's surprising. You'd think spreading the workload start over 10 seconds would lower the size of spikes (integrated over a second) by at most a factor of 10.
But the above point is still true: many jobs take a few minutes to run. 60s of dispersion in start time is better than nothing, but you really want more.
(In this case, things are still quantized to a minute boundary, so you'd really want both).
> That's surprising. You'd think spreading the workload start over 10 seconds would lower the size of spikes (integrated over a second) by at most a factor of 10.
If the delay is at the reading side, away from Akami, through a cache, perhaps 10 concurrent requests for X would result in ten lots of data transfer as it isn't in cache yet, but 10 with a short delay is enough to prime the local cache on the first request before the rest start.
There are a number of reasons a sudden glut of activity could balloon bandwidth or CPU/memory costs more than you might expect.
Without a chunk more detail about the system in question, this is just random speculation of course.
I'm not disputing that it doesn't prevent a subset of the same class of problem. It's just a wholly incomplete solution to the OpenBSD implementation to the degree that it's disingenuous to say netbsd already implemented it.
That threw me for a loop when I realized the last time I used FreeBSD was in the 4.x days - on a desktop, no less. That was actually something of a glory period, at least for the hardware I had at the time... Soundblaster OSS drivers that actually did hardware mixing, the proprietary Nvidia driver that actually gave working 3D acceleration on the card I had at the time (Geforce 2 GTS maybe?) - this was at least a year or two before that driver was released for Linux. I think it even had working Java.
It was such a breath of fresh air compared to Linux at the time because it was a coherent, engineered, documented system. When you didn't always have reliable internet (and at least for me, even when I did have it was something like 128K DSL), it was a huge deal to have well written man pages, where as on the linux half the time time the man page woud just tell you to scream into the void, err, run gnu info.
This was still in the period when the GPL scared off corps.
I first ran into randomization delays in cronie (the ~ is how it's implemented also and there's a RANDOM_DELAY variable for use too), after redhat had switched to it at some point years ago. Personally, never really used it, but it's nice it's there.
Intriguing! I had no idea! Again, I wish the concept was more popular back when cron and the Internet were younger, and having it built-in and readily available goes a very long way. If someone doesn't introduce you to the concept, you have little chance of knowing better until you find yourself at the receiving end of a spike, yelling at the clouds.
I believe the philosophy was to leave it up to the service to behave sensibly, including things like have a circuit breaker, use some kind of backoff/retry, and generally be robust in the face of resource contention.
It kind of feels like this is putting the policy of "don't all go at once" into the cron mechanism, which is just starting jobs at desired times.
It's a neat idea to be sure but I fail to see how this will have any material impact.
Firstly, OpenBSD is a niche OS, meaning the absolute magnitude of OpenBSD cron jobs out "in the wild" is relatively low.
Second, my understanding is that this is a client-side feature. I.e. if I run a service, this feature only benefits me if a significant portion of my users opt into it.
Third, I have an unsubstantiated suspicion that cron usage relative to systemd usage is also on the decline.
FreeBSD and NetBSD also implement "-n". There doesn't seem to be a cross-platform port of OpenBSD cron like there are of doas and OpenBSD ksh. (Anybody want to try making one?) Cross-platform fcron has "erroronlymail".
That's gross. If the operation shouldn't run concurrently it should use an exclusive flock or similar. It's not just cron that can cause concurrent execution, and if that matters you generally want to robustly prevent it - not just if cron is the executor.
> It’s not gross. It’s a simple solution to a simple and common cron problem
No, it's gross. By providing that facility in the wrong place it discourages implementing it in the right place to people who come at the problem from the cron perspective.
Wrap the command in a flock-running script. That script goes in the crontab entry. When you're inevitably debugging your cron-scheduled command - paydirt! The command serializes itself still while you're manually testing instead of shitting itself.
Isn’t that the same? Just because you check a file lock in your script doesn’t mean that other invocations of the program without the script will check the lock.
Ok, but your original criticism stated that solving this in cron is bad because the program may be run outside of cron:
* It's not just cron that can cause concurrent execution, and if that matters you generally want to robustly prevent it - not just if cron is the executor.*
So you did not like that exclusion only worked when triggered from cron. But in your case it also only works when triggered from your script. So cron just made your script an integrated feature and you’re essentially criticizing your own solution.
If your job is a real script: sure, handle locking in it.
If it's a single command or pipeline... adding the layer of indirection to have a script that runs flock is more opaque. Might as well put it in cron and trust cron to only run it once.
But it's unnecessary when you can do the same thing in 2 characters.
I like flock(1) and have known how to use it for 15 years. But there's sharp edges.
- It's not standardized. In particular, this means the OpenBSD base system doesn't even include it. It's not like the underlying flock(2) is very well behaved or consistent.
- You need to ask for nonblocking behavior.
- If your command or script can ever result in a daemon launching, it may be holding the lock even though the part of your action that is supposed to be protected by the lock (the script/immediate subprocess) has ceased. so e.g. 'flock -n /tmp/relaunch-apache /etc/init.d/apache2 restart' could be a really bad idea. -u can fix this... in some cases.
a defacto core tenet of UNIX is composability of disparate programs that do one thing well?
cron goes completely against that principle - after all, you can schedule jobs with the 'at' command, and to make a repeating task, you just make it exec 'at' again each time it is called. cron is for the lazy, no real UNIX hacker would dream of using such an extravagant single-use program. /s
> It's the exact opposite of reinventing the wheel.
If I need to write a wrapper script each time I need to run a task on a timer - then it is the proverbial reinventing the wheel. It doesn't matter if you call it a tool, utility or a component. Especially if this was solved decades ago.
> Not understanding the operating system you're using is fine
Using obsolete utilities for the sake of 'doing the right UNIX way' instead of just ticking a checkbox/adding one line in the task configuration?
Come on, I would repeat it again - it was solved for decades. Why do you need to do the things like it's 1976? Why do you insist everyone else should do that way too and abandon the fruits of the digital age?
Why would you use something other than cron for many types of tasks? Not everyone needs a distributed system to run a script periodically, they want to actually get things done. :)
Even though a process can be run concurrently, it doesn’t mean you necessarily want to. Besides that, your cron itself may be something like “do-something | grep foo > /root/bar”. You obviously don’t want to run that concurrently. You could create a script, but that’s more cumbersome.
You are of course correct, but at the same time, you might (or probably not) be surprised at the number of "system administrators" that are thrown into the job without really having the capability to expand too far on their knowledge. Having the option in cron may help those administrators that specifically search Google for cron usage, and never come across flock.
EDIT: In addition, the Task Scheduler in Windows has this type of option, so it may help those sys admins coming from that environment, leveraging their existing knowledge
How would this work with a step that isn't divisible into the range for the field? Given minutes 0~59/25 with a random offset of 0 there will be an event at 0, 25, and 50 minutes past the hour. On the next hour does it start at 0, 15, or a new random offset? i.e. constant offset, constant step, or regenerated offset.
This is a nice feature at the minute-resolution level. I think something at the second-resolution level would be helpful too. For example, I have a cronjob on my Raspberry Pi at home that runs every minute and does a simple check-in with Heii On-Call so I get alerted if there's a FIOS outage or the pi breaks. I ended up writing a little bash script like this:
#!/bin/bash
set -e
HEIIONCALL_API_KEY="redacted_api_key_goes_here"
HEIIONCALL_TRIGGER_ID="redacted_trigger_id_goes_here"
AUTHORIZATION_HEADER="Authorization: Bearer ${HEIIONCALL_API_KEY}"
CHECKIN_URL="https://api.heiioncall.com./triggers/${HEIIONCALL_TRIGGER_ID}/checkin"
if [ "$1" != "--now" ]; then
RANDOM_SLEEP=$[ ( $RANDOM % 55 ) + 1 ]
echo "Sleeping ${RANDOM_SLEEP} seconds before checkin..."
sleep ${RANDOM_SLEEP}s
fi
echo "Checking in..."
exec curl \
-X POST \
--retry 5 --retry-connrefused --retry-max-time 15 --retry-delay 1 \
-H "${AUTHORIZATION_HEADER}" \
"${CHECKIN_URL}"
This script ~/bin/heiioncall-checkin.sh gets called by crond every minute at exactly :00 seconds, so my expected maximum timeout between check-ins is approximately 120 seconds. And I can skip the sleep with "--now" flag for testing. But I'd much rather have this random offset behavior be something optionally built-in to cron, I suppose.
Yes, I first learned this and the name "splay" from CFengine, back in the day.
I put together a small busybox-like collection of sysadmin tools, and one of the subcommands is "splay" to sleep for a random amount of time. It's one of those things that is useful surprisingly often, even outside cron.
Doesn't make me very excited, since I strongly feel standard cron implementations should've been deprecated long time ago anyway. I mean, consider dkron, for example. Forget k8s and web-UI and all that nonsense, its YAML configs are simply way more clear, readable and powerful, than the usual crontab syntax. Why cannot I have the same with plain simple non-distributed cron?!
Also, just as a sidenote I'm not willing to seriously discuss: I seriously doubt I'd personally ever use random ranges in production. I understand what problem it's supposed to solve, but generally I just really don't want anything random in my systems. If it conflicts with some other cronjobs or whatever, I'd like it to break down deterministically — preferably, all the time, so it's easier to spot, track down and fix it. If it causes any load spikes, I'd like these spikes to be regular, so that I can see that and manually tweak run times so that it'll be more even. If any problems arise, I'd prefer them to arise after somebody changed something, and not just magically one Saturday evening a couple of months later.
The only situation I can think of right away when this is acceptable, is if I have lot of nodes with the same cron config, so it's my attempt to spread out workers of the same type that I know would start at the same time otherwise. But then, why the fuck do I have such a degenerate architecture in the first place?! Maybe I should think about replacing that by something a little more sustainable, like, uh, a centralized scheduler? No, I mean, it's definitely a solution — a quick and easy one, at that — but even then it seems like a solutions to a problem that shouldn't have existed in the first place.
Once again, OpenBSD with the simple, obvious solution, that everyone else kinda overlooked. I hope every other cron out there copies and ships this as soon as possible.