Oh man. Lack of proper debugging and diffing is a _huge_ problem with jupyterlab notebooks. In several cases I've seen, data scientists don't even realise that these tools exist.
I know VS Code has been making progress recently but a good jupyter competitor with some halfway decent software development features would be a game changer.
Notebooks are the out-of-order log of a fancy terminal. That can be great! But trying to use them for anything more than interaction or the most basic scripting is a fool’s errand. Build software in a real environment, then use it a notebook.
The advantage of notebooks is extremely fast iteration. They are amazing for very fast "trial-and-error". Building software in "real" env has a slightly different use case than notebooks. If you are a data scientist or working with a lot of data, you definitely want to use a notebook over a normal environment.
Imagine you want to read a lot of data from csv file and plot a graph of it. But then you realize that the alpha value used for graph should be 0.3 rather than 0.1. In real env, you would need to read the huge amount of data again. In notebooks, you avoid that!
Right, that’s interactive computing. Notebooks are a strict improvement over a standard REPL-in-terminal for those use cases.
The problem come when you want to make that process reusable for other CSVs, on other machines, in other environments, to repeat it in the future, to test components of it, to make and track changes over time, to share with a coworker, etc.
To put in perspective, notebooks are how using the REPL on Lisp Machines and Xerox workstations (Interlisp-D, Mesa XDE, Mesa/Cedar, Smalltalk) feels like.
Those machines came and went a bit before my time, but I do recognize the sense of excitement and power I feel in notebooks in the recollections I’ve read.
Similarly, comments about ‘how do I distribute a Smalltalk program if it’s just the image of the entire VM’ rhyme just a bit to my ear with ‘how do I share my notebook with my coworker if it’s really the agglomerated state of my interpreter and language environment’
In terms of Smalltalk the answer is quite easy, you would trim down the image as kind of "release build" and ship it alongside a tiny executable that would load it.
No big deal other that talking rumors without reading manuals.
I didn't mean to imply that the answer to either question was 'can't be done', but that the analogy might extend to the the points of friction as well.
Sure, and my point was that mostly the friction in Smalltalk's case, was not reading the documentation and having an opinion without Smalltalk experience.
RMarkdown has neither of these issues, and it supports Python. It is baffling to me that most data scientists use Jupyter, since its diffs are meaningless. Its export options are very underwhelming compared to Rmd as well. Notebooks [1] are simply a special case of R Markdown formats. Besides, Rmd are literally text files that work with any text editor, including vim.
You can use Jupytext and basically get the best of both worlds (it hooks into jupyterlab to save/restore a markdown version of the notebook). A possible downside is that it doesn't store the outputs of the cells, though that is intended as a feature.
And since rMarkdown just uses pandoc under the hood, it's a bit unfair to say it has better export options than ipynb which is also supported by pandoc.
I’ll take a moment to plug jupytext. It just takes an easy pre-commit hook to have auto-synchronized .ipynb and .md (and more!) versions of the notebook, which makes diffing and code review feasible.
some things are less-great in the FOSS world. I used MetroWerks tooling that really, really was nice. I even wrote code for them once .. it has been a long while and the real win with Jupyter is the massive popularity I think.
I know VS Code has been making progress recently but a good jupyter competitor with some halfway decent software development features would be a game changer.