This fits my use case so perfectly! I have a very small internal app taking care of organizing seminar talks, calendars, email announcements, recordings of the talks, and signups. It is a single python file of less than 1500 lines and an sqlite database. This library is so perfect for taking care of scheduled events. Everything else I have found is ridiculously over-complicated of a solution.
I am a research scientist that knows how to write numerical code, but I do not trust myself to write secure web software. Which is the reason I regrettably keep the app in question internal and the source unreleased.
To give a more useful answer though: it just uses cherrypy as a web framework and the zoom python bindings and sqlite. Nothing sophisticated, just a CRUD app that occasionally needs to download a large file and transcoded it in the background (which is where this scheduler will be used).
I too first thought this had something to do with RED Engine 4 from CDPR.
Unfortunately CDPR's latest RED Engine 4 will not be released as open source in the forseeable future, it's basically dead and locked away, and will probably never see the light of day. Maybe it would be open sourced but in something like 20 years time.
How does it handle state and restarts? What happens if a job is scheduled to run "before 10am", then the entire server restarts at 9:55am, will it try to run that same job again when it boots back up?
I think that regardless of the scheduling tool, it is very worthwhile putting in the effort to make your tasks idempotent. I've had tons of cases when I had to repeatedly rerun failed tasks "in anger", and was always grateful to know I made it safe to do so.
I think on the contrary, it's important to build software that's resilient to random reboots: you can't control things like losing power or your CPU failing, so the less toil you have to manage around that the better.
From its docs, I understand that it has a task status logging layer[1] which can persist to SQL/Mongo (using another framework called Red Bird[2]). I haven't tried them out though.
Seconding this. I would love a middle tier job scheduler / manager for Python that has persistence. I feel like there's a missing middle between cron+scripts and the enterprise grade tooling built for ETL tasks.
I don’t see any mention to serialization, so I guess this is single-server and memory only. I also couldn’t find any mention to error handling or retries.
Using Python decorators is a very strange choice for constructing computational graphs. Then again, I don't really use Python for this kind of thing, I roll my own solutions (of just functions being composed together).
One big issue I have with the proposed approach is that it's very difficult for me to see at a glance the actual compute graph. I suppose you can build some tools to visualize it from the DSL in the decorator call, but I'd much rather be able to see this directly in code, with no weird magic, so that I can very easily interpret and update it if need be.
Yes, 100% this. Too many frameworks claim that they are "easy" just to find out they left out things that other older frameworks have already solved.
While we are on this: Do you know of any task scheduler framework that is similar to celery, but has better guarantees around task execution than what acks_late= True gives you?
I always find myself building a system that stores the really important Tasks in Postgres so that I can recover from anything in the broker or celery crashing. What i use celery for is just scheduling these Tasks by creating a celery job with the Postgres Job ID as the parameter.
Then, to detect if something went horribly wrong, I have a sweeping job that checks if any job in Postgres has not run in celery for some reason. If that is the case, we just re-queue the job.
Thus users could continue using RED, and if they want to scale to multiple machines or want resilience, you could allow them to switch out the backend to Flyte.
Disclosure: I am maintainer of Flyte. This is just a suggestion. Great work!
> Red Engine provides more features than Crontab and APScheduler and it is much easier to use than Airflow.
Correct me if I'm wrong, but the framework more powerful than Crontab and easier to use than Airflow is Celery. But Celery is not even mentioned here, why?
I imagine it's to ensure it will run ASAP in case the scheduler crashes, power failure or whatever. Systemd has a similar feature "Persistence" for timers but it's configurable.
As cool as it is, scheduled jobs (and jobs in general) should really be isolated from the rest of your system and from each other to limit the blast radius. Jobs are notorious for crashing and backing up, so you really don't want that impacting other systems on the same machine.
Clouds have managed systems for scheduled jobs now (AWS Eventbridge, GCP Cloud Scheduler) which handle this for you.
First of all, English itself is not clean; it's a messy amalgamation of special cases and inconsistent spelling rules.
Second, it isn't actually English anyway. It might look like English, but it's actually a DSL that happens to correspond to English a lot of the time. English text is meant to be interpreted by humans, who understand context & connotation, and can resolve ambiguities by making educated guesses or discussing the text with other humans. But your English-like DSL can only be interpreted by a computer program, which cannot (and arguably should not) do such things. Ergo, the benefits of using natural language are lost, and you are left with the same strict interpretation rules as any other programming language, but without any of the syntactic rigor that would normally help you construct programs/expressions that are both syntactically correct and also do what you intended them to do. Finally, the passing similarity to another language is a newbie trap and it makes teaching more difficult. See also: SQL, Python.
Worse still, it's represented in code as a string literal.
It cannot be reasonably syntax-highlighted or otherwise statistically analyzed, nor can it be easily constructed dynamically if needed, nor can scheduling primitives be combined or composed. It is the worst of all worlds, and you have no way to check if your program is valid other than to run it and see if it crashes. And you have to re-learn operator precedence / associativity rules, because they probably won't be identical to the rules in Python itself.
I'm sorry if there is a really high quality scheduling engine underneath this DSL, but I absolutely would never want to use something like this in production code.
(I'm sure you can guess how I feel about BDD frameworks and "expect.foo.to.be.equal.to" style test APIs).
Yeah, I did a hard turn towards "nope" right there.
Similar but not quite Python combined with similar but not quite English does not make a tasty dish. It makes yet another pointless one-off thing to learn and struggle with.
For some reason they chose to put the concept which is hardest to understand on the front page. 'is foo' is showing how to define and use a custom condition, the conditions in the rest of the documentation are easy to understand and make sense to me.
The only way I could see me struggle with this would be to piece together a large call graph in my head. I can understand why the author says that Airflow is better suited for this case because you get a visualization.
It's also just annoyingly redundant, and breaks searching, completion, etc.:
@app.cond('is foo')
def is_foo():
IMO
@app.cond
def is_foo():
And 'losing' that you can omit the underscore to make it look a bit like English is much better. Plus then I can jump from its use back to this definition without any DSL-aware tooling (that probably doesn't exist).
The mistake of trying to make programming languages look like English looks like it will be repeated forever. English is not a good language for expressing things specifically to computers or other humans for that matter.
It’s plain English like SQL is plain English. It uses English words that’s kind-of make sense to a novice, but that doesn’t stop it being a new programming language.