Hi -- author here. I just presented this work at JupyterCon 2023 so I figured it was time to advertise more broadly. There are a few rough edges but my hope is that, by making all the reactive behavior opt-in and only enabled for in-order execution (i.e., cells above the one I execute will never reactively execute by default), it can be predictable enough to be useful in practice.
There's still a long way to go to get e.g. full dataflow understanding of all the common libraries, understanding file paths, autoreload integration, etc., but after nearly 3 years of on-and-off development I think it's finally useable-ish.
I'll give this a try. Managing "hidden state" in notebooks is a known flaw of Jupyter. If nothing else, an indicator that says, "this code is dirty" would be useful.
I have a long standing habit of doing "restart kernel and run all cells" before walking away from a session, to help avoid this. I'd rather see it break in front of me than have it break 6 months later or in someone else's use.
It looks like you are using a static approach for dependency inference. There are a lot of benefits to static approaches, but they can only get you so far. My JupyterCon presentation includes a bunch of examples where dynamic approaches are a must: https://t.ly/78rS
Besides that, there are a bunch of interesting design decisions about when to add edges between cells, when to break them, what metadata to annotate edges with, etc.
I'm hoping to abstract a way a bunch of the complexity by developing something like a runtime version of a language server protocol (working name "language kernel protocol") so that any editor that implements the protocol would get reactivity for free when running a kernel that likewise implements the protocol. I have an early version of this which is how IPyflow works for both Jupyter and JupyterLab; VSCode would be a great editor to add support for next.
Kudos! When at Mozilla c.2016, I tried to work with the core Jupyter team on solving the stale-cell problem. I couldn't find a path forward that they would consider. Glad to see someone making progress.
I would be very surprised if something like this gets support in core Jupyter -- there's a lot of added complexity. Fortunately it is doable as extensions for Jupyter / JupyterLab.
Will this approach ever be usable with other Jupyter langauges? Like, do you have an API for another language to tell you what the code dependency graph is? Or is Python a fundamental assumption here?
For this particular project, Python is a requirement.
For the general approach, the answer is more complicated. It depends on what hooks the language implementation exposes -- and even if it exposes enough to make this work in theory, tracking dataflow at the same level of accuracy and granularity as IPyflow does may not be possible without taking an unacceptable performance hit, or without sacrificing portability across language versions.
My hope is that the approach can scale to languages like Julia or R, but I'm not as familiar with those languages as I am with Python, and I kind of suspect each language may require its own bespoke tricks.
Regardless, for Python it was a journey roughly 3 years in the making (and still ongoing) -- other languages would be easier now that I've learned a fair amount, but the work to add this kind of support is by far the most complex I've ever done.
Looks like it solves a common problem but the page is a bit confusing. It could make it clearer upfront what it does (I didn't know what reactive meant in this context) and how it relates to Jupyter (I thought it was official/core stuff at first, but I take it its a third party tool that integrates into jupyter).
Stuff like "Trust me? Good." in the introduction doesn't really help me answer "wtf does this do" more quickly and the first intro sentence is pretty long and convoluted.
On the other hand, the link description alone was enough to convince me that this is something I want.
Having a very specific target makes it easier to reach that target in writing, I guess, and harder for people outside the target to understand what it's about.
Well, other than Observable, reactive notebooks are not that common and well known (precisely because Jupyter, which is the most famous, didn't support that model before).
So maybe today is the first day that you are exposed to that model and you learn about it? There's always a first time.
Reactive notebooks are what change the workflow from command line-like to spreadsheet-like.
It may not matter much of you use the notebook as a glorified terminal, but it is a godsend if your workflow involves data analysis with heavy dependencies between filtered subsets.
I believe this is what Pluto sets out to do for Julia.
I used it as part of the “Computational Thinking” with Julia course a year or two back. Even then the beta software was very good and some of the demos the Pluto dev showed were nothing short of amazing
Looks like by default you have to manually trigger reactivity in ipyflow, but there is a `%flow mode reactive` ipython magic mode that enables Pluto-style reactivity!
Yep. I think there is also a way to enable it by default in your ipython profile which I'll document at some point, so that you don't have to run `%flow mode reactive`. I'm curious though -- personally I much prefer to use the opt-in reactive execution mode with ctrl/cmd+shift+enter; curious to understand your preferences better :)
An always-reactive notebook is essentially a "literate spreadsheet", where you have data cells in between multimedia descriptions. In this model, all computed data is always up to date with whatever changes you make to the input parameters, including things like graphics connected to interactive sliders and text boxes. You can prototype the logic of an application very fast with real data and interactions.
Your ipython profile suggestion is good, I use that for `%autoreload` so I don’t see why it wouldn’t work for ipyflow
interesting question, I’m going to have to try the opt in reactivity in ipyflow because it’s not an option in Pluto. Actually that’s kind of a strength, one point of frustration in Pluto is accidentally triggering reactive execution of an expensive cell before everything is ready
I think the thing I like most about always-on reactivity is that the state of the REPL and outputs can never become stale. I used to run into that in jupyter a lot as a (physical sciences) student writing hacky prototype code with implicit control flow… nice for debugging but in the long run it’s quite painful.
The nearest thing I had found in python is streamlit, but it is not as smooth as Pluto IMO. Looking forward to trying ipyflow, honestly I have been hoping for something like this for a while because using Pluto+PyCall as a jupyter replacement is a bit too cumbersome for python-forward projects
After getting used to it with Julia I found it really jarring to go back to plain Jupyter (when I need python) where I have to keep re-executing the cells.
Nice! From what I've gathered this has been in the works for a while(?)
Thoughts ...
1. Yea, the Readme could do with a bit of polish. Your hero feature, AFAIU, is the automatic reactivity. This is in your second GIF, put this front and center and make it really clear what is happening. You (and I) know what reactivity looks like so we know what to look for, but someone new to the idea in notebooks could easily blink and miss this. I'd work on a nicer GIF and even a little youtube video just to make it really clear what's going on here. Bostock and ObservableHQ advertised their reactivity a while ago, you might be able to get inspiration for how they demonstrated it?
2. The syntax extensions are cool! Integration with ipywidgets is Ace!!
3. Do you have any comments on how ObservableHQ (Javascript runtime by Bostock) and Pluto (inspired by previous) informed or inspired your choices and implementation here? Is this basically the same for python/jupyter as those are for JS/Julia?
4. Annoying Questions or feature requests ... Are there any overheads? Any timeout facilities for long running code? Can the full variable and/or cell dependency graph be surfaced and visualised (ObservableHQ put this into the UI a while back and it was kinda cool).
Otherwise ... awesome to see this land! Congratulations!!!
Thank you for all the feedback, positive and constructive! Yep docs definitely need a lot of polish :)
3. I actually started from scratch -- ipyflow's reactivity model is a bit different from these, since for Python, my experience is that static dependency inference is too unreliable to be useful. (Though after talking with the Pluto maintainer earlier today it sounds like Pluto may be reaching some of the same conclusions and also be moving toward a dynamic dependency inference strategy)
In the future, my hope is that as a community we will develop a live-coding analogue to lsps which one might call a "language kernel protocol" so that we can standardize some of these features across different languages / editors
4. For top-level / module-level statements, yes there is lots of overhead (> 100x), but it's largely limited to those statements (i.e. external library calls, recursive function calls, etc have close to 0 overhead thanks to intelligent instrumentation disabling for these) and turns out to be OK in practice (more details in nbslicer paper https://smacke.net/papers/nbslicer.pdf). At some point I'll run it through a profiler and try to grab the low-hanging optimizations but it hasn't been noticeable so far.
Surfacing the DAG is definitely something I want to do at some point; we have all the information in the backend so we should try to surface it in the frontend.
I love Jupyter notebooks I just wish they looked as good as observable notebooks[1], not just in the overall layout but the charts/graphs you could make in general (plotly, matplotlib, etc don't even come close to d3.js, Observable Plot, etc)... I don't know why there seems to be a hole in the Python ecosystem for good designers or something
This seems to be a step in the right direction with reactivity though. But it's not instant like Observable notebooks. But still good
IPyflow takes a round trip from client to kernel for each execution (including reactive executions) -- this approach is necessary to get the best possible accuracy when determining dataflow in a highly dynamic language like Python, but it is an architectural limitation that prevents the reactivity from feeling as instant as in Observable or Pluto.
Now I wonder why this isn't an option in plain Jupyter. Inconsistent cell states and having to re-execute all cells after a single line change slows me down a lot.
Like I get why this doesn't need to be default, but this seems crucial enough to warrant being included in the base package.
It's very new, and the current frontend implementations for Jupyter and JupyterLab include some workarounds for fundamental protocol-level limitations that probably make adding this kind of feature as part of the core package a no-go (without first addressing the core protocol limitations).
What's the correlation between people who don't use debuggers and people who use notebooks? I can't imagine writing code without a visual debugger, one at a level of pycharm. I think people who use notebooks must either be very active how they write code (don't misremeber variables, complexity of arrays, dicts etc) or have other means of debugging (print? Jupyterlab debugger? pbd?).
I love notebooks for their ability to preload chunks of code/data and have the ability to explore without delay. But having to put mental strain in keeping track of objects is too much for me. Vscode and pycharm have made strides in unifying the experience but it's still very much sub par, at least in my experience. Matlab-like style of executing code with possibility of reusing same debugger solution was perfect.
Personally, I use notebooks to do exploratory data analysis and to get model training configured. Any large-scale model training event is converted to a script, and nothing production-facing is in a notebook.
Not sure if correlated. In Ruby I do a mix of REPL/console, tests and step-through debugging. When using Python, I always use a notebook as a scratchpad - to me it's a REPL but easier to keep tidy. The notebook can be good docs of how things work too, a complement to tests-as-docs as it's easy to show in different (real) contexts.
I sorely miss being able to do this when working on frontend, have tried setting up node console to import files but React just makes it very easy to couple everything. This leaves me with tests as the easiest way to code outside of a view (which has too much friction for playing around). Hot reloading is great but iterating logic in isolation is way harder without a REPL.
I'd use Ruby more if it would work better in a notebook environment. It appears that iruby is in maintenance mode and falling behind Julia in usability.
Most people write notebooks that are ephemeral and meant for ad-hoc analysis. If a value needs to be inspected it can just be printed in a cell, or even better a fancy widget or graph can display it. You don't need breakpoints as much since you can just choose what cells to execute, or create a throw away cell to grab some values.
Once you need to turn an analysis into a business process or repeatable task it makes sense to move it into a proper python module and use any IDE, debugger, etc.
Now that VSCode has a notebook mode, you can execute a cell in debug mode and it will trigger break points you created in the referenced package, giving you the full debugger experience. I really like it, but do not know how it compares to pycharm. I just recently settled on doing this workflow (start notebook to test some new code) and find it to be super productive.
I spend a good deal of time teaching my students inside of Jupyter.
For Pandas, many problems can be solved by chaining (debugging as you go), converting the chain to a function, and placing the function at the top of the notebook after you load the raw data.
I get the problem this is solving, but adding some congrats and practical software engineering makes for much better notebook experiences.
I think it is a great idea, but doesn't apply to 99% of my notebooks because they need to run up to 2 hours to completely execute once, due to data intensive tasks. I usually run all cells once per day, up to cell x (where I left the day before), then continue working by adding cells and updating the state manually until I make progress (=no errors or output is as expected) and move on to the next cell - without re-executing other cells because this would be a productivity nightmare.
Many reactivity frameworks (e.g. observablehq, shiny) recompute intelligently - they're aware of what parts of the calculation has changed and needs recomputing. Haven't checked with ipyflow, but this idea would help mitigate some of your concerns
I kind of think quarto is a much better solution to the problems that notebooks try to solve plus you get the added bonus of having plain text as the file source.
Can you explain a little more about how it works? Does it handle cases like loops correctly (or self-referencing cells)?
Is this running a CPython fork, or how does the lineage tracking work? Are the values “x” and “y” in the quickstart example still simple Python int types, or are they a wrapped type?
The papers seem very interesting but even as an early adopter of tools like this I’d like to know what the limits and expectations are, and some docs would really help.
It's running on top of vanilla CPython, but with heavy instrumentation via sys.settrace as well as ast transformations. x and y are just normal Python ints. The downside is the overhead, but it's paid mainly for top-level / module-level statements, so I've found it to be acceptable in practice. The benefits are on portability -- as it scales for all the major Python versions supported today (3.6 to 3.11), and even for some different Python runtimes such as Cinder.
Congratulations! I starred it a long time ago but never used it (sorry). But I do think this IS the way to go for Jupyter. I don't know how I could contribute to this - lack of time, but mostly knowledge, but I would love to find other ways to help.
Thank you! If you had tried to use it before, it probably would have broken pretty quickly. Now it will still break, but not so quickly as to not be useful, hopefully.
External contributions are mostly blocked on me right now to improve both user and developer docs (improve = write the first draft in this case).
Reactivity is great. Is there any framework for using it without a REPL?
I.e. to define a DAG of tasks and have them executed as needed?
I know existing workflow engines, and they are typically not reactive but rather work on batches.
From a nuts and bolts perspective, I've been thinking of building some reactivity on top of https://github.com/dagworks-inc/hamilton (author here) that could get at this. (If you have a use case that could be documented, I'd appreciate it.)
How closely tied is this to Python?
The need for reactivity is what drove the development for Pluto.jl, but it would be nice to have something like this for IJulia.jl as well.
For this project, Python is a hard requirement, though it's possible the approach may be applied (after significant effort -- Python took me ~3 years and counting) to other languages / runtimes as well.
How closely tied is this to Python?
The need for reactivity is what drove the development for Pluto.jl, but it would be nice to have something like this for IJulia.jl as well.
That’s a pretty nice idea. The problem of knowing what state has been invalidated often drives me away from using a notebook. So it is nice to see this solved.
I often though we would benefit from having some kind of shell, only a mix between ipython qtconsole et jupyter.
Not an editor like jupyter, rather a shell with a REPL flow. But each prompt is like a jupyter cell, and the whole history is saved in a file.
But if you don't create a file, it should work as well. One of the annoying things about jupyter is that you can use it without file on disk unlike ipython shell.
If you have four of five sklearn tiles that take 20 minute to an hour to execute, you might not want to do that too often.
I did not use jupiter notebook/jupyterLab much, but each time, it was in the context of datascience. The first was on an OCR during my internship, the second for data exploration (mix of quantitative/qualitative, but the project was scrapped after a week or two). In both case, having to re-run all each time the kernel was shut down was actually a pain point.
I encountered this a lot in my previous work and found workarounds that I write about here: https://rachitsingh.com/collaborating-jupyter/. At a high level I think being scared of rerunning your kernel is indicative of a code smell, and there's relatively easy ways to combat that.
Personally I don't like the "write to disk" approach; I think it kind of just punts the state problem somewhere else (i.e. from memory to disk). Writing to a database and adding versioning is better, but that's a lot of machinery to expect a notebook user to adopt (though maybe better tooling could help). Also a lot of Python objects are not out-of-the-box pickleable (e.g. generators). Also pickle is a mess.
I definitely agree (and I think given what you work on you would be horrified by how I define cached functions that capture locals), but I think in practice getting to a state where you can restart your kernel often makes it easier to reason about state. But you’re definitely right, it would be better to reason correctly here.
One thing I’ve toyed with is writing a Jupyter kernel extension that notes what new locals you’ve defined in a cell, figures out what locals are read, and creates a (cached) function from the cell. E.g. a cell that has `y = a @ x + b` becomes
@cache_to_disk
def compute_y(a, x, b):
return a @ x + b
y = compute_y(a, x, b)
I don’t worry much about serialization - 90% of the time what I need to cache is dataframes (write to parquet), and the rest is trained models (custom serializer). People rarely need to cache generators, in my opinion.
Yes indeed - not having to re-run expensive calculations is one of the major appeals of the notebooks.
But still, I believe it should be done reasonably often - at least once, before committing for example (if you're using tools like jupytext to convert notebooks to .py)
There's still a long way to go to get e.g. full dataflow understanding of all the common libraries, understanding file paths, autoreload integration, etc., but after nearly 3 years of on-and-off development I think it's finally useable-ish.