Hacker News new | past | comments | ask | show | jobs | submit login
Red Engine: modern scheduling framework for Python applications (red-engine.readthedocs.io)
213 points by jonbaer on July 3, 2022 | hide | past | favorite | 53 comments



This fits my use case so perfectly! I have a very small internal app taking care of organizing seminar talks, calendars, email announcements, recordings of the talks, and signups. It is a single python file of less than 1500 lines and an sqlite database. This library is so perfect for taking care of scheduled events. Everything else I have found is ridiculously over-complicated of a solution.


Yeah, I'll definitely try this for little server-based tasks


Can you share repo to your application?


I am a research scientist that knows how to write numerical code, but I do not trust myself to write secure web software. Which is the reason I regrettably keep the app in question internal and the source unreleased.

To give a more useful answer though: it just uses cherrypy as a web framework and the zoom python bindings and sqlite. Nothing sophisticated, just a CRUD app that occasionally needs to download a large file and transcoded it in the background (which is where this scheduler will be used).


I assume that “internal” means it’s just that and not accessible.


Not to be confused with REDengine by CD Projekt Red, used to make the Witcher and Cyberpunk 2077 games [0]. They're moving to Unreal Engine 5 though.

[0] https://witcher.fandom.com/wiki/REDengine


Yeah, when I read the title I was like: "wtf, CDPR released Python version of their engine?"


I too first thought this had something to do with RED Engine 4 from CDPR.

Unfortunately CDPR's latest RED Engine 4 will not be released as open source in the forseeable future, it's basically dead and locked away, and will probably never see the light of day. Maybe it would be open sourced but in something like 20 years time.


How does it handle state and restarts? What happens if a job is scheduled to run "before 10am", then the entire server restarts at 9:55am, will it try to run that same job again when it boots back up?


It will run it if the scheduler is called before 10am according to the docs - at runtime the conditions must be met.


Right but what if it already ran? Should the jobs be written such that they are tolerant to re-runs?


I think that regardless of the scheduling tool, it is very worthwhile putting in the effort to make your tasks idempotent. I've had tons of cases when I had to repeatedly rerun failed tasks "in anger", and was always grateful to know I made it safe to do so.


If it took less than 5 minutes to boot then I can't see why it wouldn't work.

How would that work for other schedulers? Also, if a server reboots that's quite bad all round anyway. Hopefully you'd be notified directly.


I think on the contrary, it's important to build software that's resilient to random reboots: you can't control things like losing power or your CPU failing, so the less toil you have to manage around that the better.


Other schedulers have a durable database of attempted runs. This doesn't seem to have anything like that.


From its docs, I understand that it has a task status logging layer[1] which can persist to SQL/Mongo (using another framework called Red Bird[2]). I haven't tried them out though.

[1]: https://red-engine.readthedocs.io/en/stable/tutorial/basic.h...

[2]: https://red-bird.readthedocs.io/en/latest/


Would you have a recommendation for an easy to use Python scheduler with such a feature for a personal project?


I think you can create ephemeral timers with Systemd if you're on Linux.


APScheduler


It looks pretty good. Thanks!


Seconding this. I would love a middle tier job scheduler / manager for Python that has persistence. I feel like there's a missing middle between cron+scripts and the enterprise grade tooling built for ETL tasks.


https://www.prefect.io/ is also pretty easy.


Celery Beat is pretty good


Looks nice but very limited.

I don’t see any mention to serialization, so I guess this is single-server and memory only. I also couldn’t find any mention to error handling or retries.


What makes this true?

“Red Engine is not meant to be the scheduler for enterprise pipelines, unlike Airflow, but it is fantastic to power your Python applications.”


It sounds like the author doesn't think enterprise apps can be Python applications


Using Python decorators is a very strange choice for constructing computational graphs. Then again, I don't really use Python for this kind of thing, I roll my own solutions (of just functions being composed together).

One big issue I have with the proposed approach is that it's very difficult for me to see at a glance the actual compute graph. I suppose you can build some tools to visualize it from the DSL in the decorator call, but I'd much rather be able to see this directly in code, with no weird magic, so that I can very easily interpret and update it if need be.


  from redengine import RedEngine
  
  app = RedEngine()
  
  @app.task('daily')
  def do_things():
      ...
  
  if __name__ == "__main__":
      app.run()
> We initialized the RedEngine application, created one task which runs every 10 seconds and then we started the app. https://red-engine.readthedocs.io/en/stable/tutorial/quick_s...

Might wanna fix that.


If you're going to build a framework like this, please please put your consistency behavior and retry behavior at the top of your docs.


Yes, 100% this. Too many frameworks claim that they are "easy" just to find out they left out things that other older frameworks have already solved.

While we are on this: Do you know of any task scheduler framework that is similar to celery, but has better guarantees around task execution than what acks_late= True gives you?

I always find myself building a system that stores the really important Tasks in Postgres so that I can recover from anything in the broker or celery crashing. What i use celery for is just scheduling these Tasks by creating a celery job with the Postgres Job ID as the parameter.

Then, to detect if something went horribly wrong, I have a sweeping job that checks if any job in Postgres has not run in celery for some reason. If that is the case, we just re-queue the job.


If your tasks are idempotent, Dramatiq is intended for your case.

https://dramatiq.io/


After reading the linked article in more detail, I might have misinterpreted things a bit, but my concerns are still related :)


This is cool, you could provide resilience to Red Engine, by providing it a backend using Flyte.org. Checkout example of making a new API on top of Flyte which has similarish feel - https://unionml.readthedocs.io/en/latest/index.html#quicksta....

Thus users could continue using RED, and if they want to scale to multiple machines or want resilience, you could allow them to switch out the backend to Flyte.

Disclosure: I am maintainer of Flyte. This is just a suggestion. Great work!


> Red Engine provides more features than Crontab and APScheduler and it is much easier to use than Airflow.

Correct me if I'm wrong, but the framework more powerful than Crontab and easier to use than Airflow is Celery. But Celery is not even mentioned here, why?


I've never considered using Celery on it's own - it's part of my typical web app stack. Do you mind sharing some examples of how you're using celery?


Off topic but,

Many scheduling system I’ve worked with have this weird tendency to run when deployed and then every time it’s time for it to run.

I’ve had this happen with kubernetes, Scheduled Quries in GCP big query and a few other systems.

That seems like absolute madness. Why would anything do that?


I imagine it's to ensure it will run ASAP in case the scheduler crashes, power failure or whatever. Systemd has a similar feature "Persistence" for timers but it's configurable.


Airflow does this. Madness!


As cool as it is, scheduled jobs (and jobs in general) should really be isolated from the rest of your system and from each other to limit the blast radius. Jobs are notorious for crashing and backing up, so you really don't want that impacting other systems on the same machine.

Clouds have managed systems for scheduled jobs now (AWS Eventbridge, GCP Cloud Scheduler) which handle this for you.


I'm using APScheduler for most of my scheduling tasks.

Does Red Engine integrate with asyncio? When searching the docs for this keyword no hits showed up.


Going to shamelessly plug Temporal’s Python SDK which was designed for asyncio.

https://github.com/temporalio/sdk-python

Disclaimer: I work for Temporal


> Clean: Scheduling is just plain English

Ugh, no thanks.

First of all, English itself is not clean; it's a messy amalgamation of special cases and inconsistent spelling rules.

Second, it isn't actually English anyway. It might look like English, but it's actually a DSL that happens to correspond to English a lot of the time. English text is meant to be interpreted by humans, who understand context & connotation, and can resolve ambiguities by making educated guesses or discussing the text with other humans. But your English-like DSL can only be interpreted by a computer program, which cannot (and arguably should not) do such things. Ergo, the benefits of using natural language are lost, and you are left with the same strict interpretation rules as any other programming language, but without any of the syntactic rigor that would normally help you construct programs/expressions that are both syntactically correct and also do what you intended them to do. Finally, the passing similarity to another language is a newbie trap and it makes teaching more difficult. See also: SQL, Python.

Worse still, it's represented in code as a string literal. It cannot be reasonably syntax-highlighted or otherwise statistically analyzed, nor can it be easily constructed dynamically if needed, nor can scheduling primitives be combined or composed. It is the worst of all worlds, and you have no way to check if your program is valid other than to run it and see if it crashes. And you have to re-learn operator precedence / associativity rules, because they probably won't be identical to the rules in Python itself.

I'm sorry if there is a really high quality scheduling engine underneath this DSL, but I absolutely would never want to use something like this in production code.

(I'm sure you can guess how I feel about BDD frameworks and "expect.foo.to.be.equal.to" style test APIs).


    @app.task('daily & is foo', execution="process")
followed by

    @app.task("after task 'do_daily'")
Yeah, I did a hard turn towards "nope" right there.

Similar but not quite Python combined with similar but not quite English does not make a tasty dish. It makes yet another pointless one-off thing to learn and struggle with.


These in particular remind me of "Baba Is You", which is probably not a good thing for an API.


For some reason they chose to put the concept which is hardest to understand on the front page. 'is foo' is showing how to define and use a custom condition, the conditions in the rest of the documentation are easy to understand and make sense to me.

The only way I could see me struggle with this would be to piece together a large call graph in my head. I can understand why the author says that Airflow is better suited for this case because you get a visualization.


Just got hard AppleScript PTSD.


It's also just annoyingly redundant, and breaks searching, completion, etc.:

    @app.cond('is foo')
    def is_foo():
IMO

    @app.cond
    def is_foo():
And 'losing' that you can omit the underscore to make it look a bit like English is much better. Plus then I can jump from its use back to this definition without any DSL-aware tooling (that probably doesn't exist).


The mistake of trying to make programming languages look like English looks like it will be repeated forever. English is not a good language for expressing things specifically to computers or other humans for that matter.


100%. also, why re-invent python's already pretty decent datetime/timedeltas?

and why not a proper python DSL?

  from redengine import minute, hour
  
  @app.run_every(hour + 20*minute)
  def do_first(): [...]
  
  @app.run_after(do_first)
  def do_second1(): [...]
  
  @app.run_after(do_first)
  def do_second2(): [...]
  
  @app.run_after(do_second1 and do_second2)
  def do_last(): [...]


It’s plain English like SQL is plain English. It uses English words that’s kind-of make sense to a novice, but that doesn’t stop it being a new programming language.


> See also: SQL, Python

I do not see what is wrong with SQL and I definitely do not see what is wrong with Python.


You nailed it.


Is this meant for use instead of something like Luigi?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: