Hacker News new | past | comments | ask | show | jobs | submit login
What’s wrong with computational notebooks? (utk.edu)
377 points by ashort11 on Jan 27, 2020 | hide | past | favorite | 221 comments



Co-author of the study here. Let me know if you have any questions or how you overcome some of the problems we identified!


I just wanted to say thank you. Many of the points in your study strikes a nerve. Part of my responsibility at my last job was to introduce good software engineering practices. What happens? The data scientists go rogue and start running notebooks left and right. How do they productionize their work? Well, they don't. They were academics. All they know is that the models ran fine in their notebooks on their laptops. Meanwhile, we didn't have anyone that was devoted full time on model productionization.

Sharing data? They had enough problems sharing their notebooks.


I just happened to be reading Peter Naur's "Programming as theory building" recently. It strikes me that taking its theme even a little seriously helps understand why notebooks are so popular. Notebooks happen to be convenient tools for exploring a new domain (interactively). Irrespective of how much software purists might complain, conventional software engineering provides very few tools/solutions/practices for that process. The wretched state of interactive debugging (in most languages) is a simple example.

As someone who spends a substantial amount of time working with both modes (writing research code in Jupyter notebooks, and writing production code as python modules), notebooks scratch certain itches that IDEs typically don't even come close to. (Some recent progress on add-ons in Javascript-based editors is potentially interesting, because that might help marry the strengths of the two)

In my experience, in the evolution of code from Jupyter notebooks to repositories of production code as part of any project, there comes a "right time" to switch from the former to the latter. And this can typically only be learned with experience.


I just refactor into a module that I import into my notebook as I go along. This lets me use the notebook for quick prototyping, but also productionize faster if need be.


That only works after the code in the module is largely "frozen". It doesn't work well if you're experimenting with ideas inside the module. OTOH, if the algorithm is largely frozen, and you're trying to experiment with its performance on a bunch of examples, the workflow of putting the algorithm in a module and using a notebook to interface with data and visualize results is quite useful.

That is basically what I meant by knowing when to transition from one mode to the other.

Here's a concrete example (maybe somebody considers this an inspiring challenge?), to illustrate how notebooks are infuriating in their primitiveness, but still better compared to using an editor on source files: Imagine a beginner trying to write/learn a sorting algorithm, and who would like to keep experimenting with their code and observing what happens on examples, possibly profiling space/time complexity along the way.

To expand on my point above, there are actually three distinct computational use cases, not just two: Interactive learning -> Sharing insights with others -> Productionizing code.


Why doesn't work with experimenting with the module? In jupyter, if you're using auto reload then the module will refresh every time you use it.


I guess the objection is that if you what you are experimenting is inside a module, you've moved the "active" code out of the notebook, and then given up the interactiveness.


>to introduce good software engineering practices. What happens? The data scientists go rogue and start running notebooks left and right. How do they productionize their work? Well, they don't. They were academics.

My background is programming (instead of data analysis & modeling) so I'm sympathetic to your idealistic "software engineering" view... but I'm also sympathetic to the academics' side as explained by Yihui Xie's blog post:

https://yihui.org/en/2018/09/notebook-war/

He's convinced me that criticizing non-programmers for using (or over-using) computational notebooks when it should be a "proper" programming language and deployment is like criticizing financial analysts over-using Excel to learn how to program VB or Python and re-write their spreadsheets into a "proper database" like Oracle or MySQL. That's just not reality. This divide between "end user tools" and "proper programmer tools" will always exist because there is no perfect tool in existence that serves the needs of both skill sets. Therefore, the programmers will always be able to say the data scientists or financial analysts are "doing it wrong".


> He's convinced me that criticizing non-programmers for using (or over-using) computational notebooks when it should be a "proper" programming language and deployment is like criticizing financial analysts over-using Excel to learn how to program VB or Python and re-write their spreadsheets into a "proper database" like Oracle or MySQL.

I think this is very much off the mark. For sure plenty of scientists are poor programmers, but that isn't the reason they use notebooks. It is because:

They are not attempting to write something that will run everywhere, and often. They are either analyzing some data or doing rapid prototyping. For the latter, it's like criticizing someone who uses a REPL. It's just that the Notebook is much more powerful than a simple REPL that one can safely stick to it. Imagine you will do 40-50 prototypes and only one of those may end up worthy enough to make a product out of, and you don't know which one that will be. If you used a non-notebook environment, you'd give up in frustration by the time you hit the 15th one.

As you said: At the moment, there simply isn't an alternative that allows for rapid prototyping and is production ready. It's a hard problem to solve - there's a reason no one had solved it for decades (well before notebooks were a thing).

Had notebooks not been invented, you would have the same people handing you MATLAB code asking you to productize it.

Claiming they are beginners/novice programmers is off the mark. Peter Norvig started using notebooks for a reason, and no one would call him a novice. I do SW for a living, but when I need to analyze data and visualize it, I'll pick a notebook over "proper" SW tools any day.


We shouldn't assume it will always exist. It exists because programming languages and tools are not as usable as they can be. That is something we can and should expect to change.


Notebooks are like training wheels. They serve multiple purposes, one of the most important being signaling ineptitude to others. Code smells are useful and a notebook does too.


The tone of what you are saying strikes a nerve with me - we had exactly the same issues with Excel in the front office in investment banking.

Unknowable ad-hoc, unversioned spreadsheets running much of the capital of the company.


That’s a really good comparison. Excel is often used for storing data and doing analysis because it just plain works. And anyone can use it.

Notebooks tend to be the same way. It’s a simple GUI-ish was to do many complex analyses in a quick and dirty way.

And many of the arguments for not using Excel are the same as not using notebooks. Each is good at the initial data exploration stage, but are often abused and used in production when everyone knows it is a bad idea. But it still “works” so it is unlikely to be replaced.

(Especially when those that are working with the data don’t always have the skill set to build out a full production workflow.)


Think of all the damage caused by excel. We replaced one set of avoidable catastrophes for another. But this time there’s no shame.


I'm a computational biologist and Excel has been the bane of my existence for 20 years. We've "known better" for all of that time, but I still deal with people passing around Excel files of data or having common spreadsheets on shared drives (or now Dropbox shared). We all "know better", but Excel is often the first thing that people try to keep track of data, and once a system works, there is just too much inertia to change.

(For what it's worth, I feel the same way about people who try to send me RDS files with dataframes stored as R objects).

However, I think that whoever decided to name genes "OCT4" and "SEPT7" have to share some of the blame here too...


Are you hiring for summer positions?


My last job I spent 80+% of my time productionizing models and notebooks. It was an absolute nightmare. Everyone had slightly different preprocessing hacks for different stages and things were always working fine locally, but I couldn't replicate the results in docker containers.

I am very happy to be out of that business.


are you me?


>Sharing data? They had enough problems sharing their notebooks.

We just store the data tables in the project's database on a Postgres server. Then it's just a matter of pd.read_sql_query()


Question for the author:

Have you looked into the domain of "research data management"? Concerns such as "archival", "security" or "share & collaborate" are core to this research domain:

In academics, there's a trend to prepare a "data management plan" up front that creates awareness about these concerns. They are even a requirement in order to get funding:

i.e. https://dmponline.dcc.ac.uk/

So, it's a bit odd to see a study that's focussed on a single technical tool yield the same concerns... but not making that jump to a larger, existing framework on information management.

Looking at the authors, it seems you are located at Oregon State University. A quick DuckDuckGo search yields this service from your colleagues at the University Library:

https://guides.library.oregonstate.edu/dmp

With the context of notebooks themselves, I think the study reflects on "if you have hammer, every problem looks like a nail." Notebooks aren't the only powerful tool to work with data. I think many of the same concerns could be raised with Google Sheets or Excel with heavy VBA scripting. Like others said, this is not a new problem.

Notebooks do have a place in the bigger process of doing iterative research based on data mining techniques. They can help to formulate more accurate questions and perform quick tests without the friction of having to set up complex environments. Moving on from initial data exploration, it's up to the researcher to use a formal method and tools that do mitigate those concerns. RDM is all about providing tools and mitigating (legal) liabilities as far as "what do you do with your data?" is concerned.


In my experience, the best approach is to treat the notebook as the frontend. So widgets, graphs, annotations are generally ok. Anything compute intensive should be relegated to the backend.


I think adding feedback for marking cells as dependant on each other might be a good idea.

I'd also love code completion in notebooks.

I think the cleaning and code reuse problems can easily be mitigated by putting functions into libraries and using auto reload.

My normal workflow is hack something in a notebook until it runs, then refactor and put in a library I import with auto reload. I work on production ML and I use this for both software development and research.


And when initially prototyping new code I'd also normally have both code editor and notebook up at the same time.


> Co-author of the study here. Let me know if you have any questions or how you overcome some of the problems we identified!

It's not clear who the audience is. It sounds like most people who complain about them are software people and not researchers/scientists.

For someone like me, who once did computational research using MATLAB, and later analyzed data for my job, Jupyter is not worse, and is in most ways superior. Let's take your points one by one:

> Participants stated they often downloaded data outside of the notebook from various data sources since interfacing with them programmatically was too much hassle.

This was the norm with MATLAB, Excel and JMP as well, unless someone wrote code to autodownload (extremely rare - less than 1% of people did that). And if you are going to write code to get the data from somewhere, it's much nicer in Jupyter than in these other tools.

> Not only that, but notebooks often crash with large data sets (possibly due to the notebooks running in a web browser).

I honestly have not seen this, and the reason makes no sense. Your browser is not handling the data. The kernel is. I mean yes, if you try to load several GB of data in pandas, it's possible you will have problems if you run out of RAM, but this has nothing to do with notebooks.

> Once the data is loaded, it then has to be cleaned, which participants complained is a repetitive and time consuming task

This was as much a problem prior to notebooks as it is now. Notebooks did not make this any worse.

> Explore and analyze. Modeling and visualizing data are common tasks but can become frustrating. For example, we observed one participant tweak the parameters of a plot more than 20 times in less than 5 minutes.

It was even worse with MATLAB. Ditto for Excel. JMP is a bit nicer for visualization, though.

> Notebooks do not have all of the features of an IDE, like integrated documentation or sophisticated autocomplete, so participants often switch back and forth between an IDE (e.g., VS Code) and their notebook.

It may be better now, but this was a problem in MATLAB as well.

> While it is easy to share the notebook file, it is often not easy to share the data.

This is as true with MATLAB, JMP, etc. A lot of the complaints about it being hard to reuse notebooks is because notebooks at least attempt to be reproducible, and thus many more people attempt it. Prior to notebooks, I know almost no one who tried to share MATLAB analyses, because it was such a pain to do so.

> Notebooks as products. If a large data set is used, as one might expect in production, then the notebook will lose the interactivity while it is executing. Also, notebooks encourage "quick and dirty" code that may require rewriting before it is production quality.

I suppose some people are trying to make products out of notebooks, and this is where all the recent grief I see is coming from. I do not think it was the primary goal of notebooks, though. They were meant for data analyses and prototyping, not for production use.


Much of your comment could be summarised as “it’s no worse than prior tools”. That doesn’t invalidate the authors points though: just because it’s better than previous tools, which have the same or worse problems, doesn’t mean notebooks don’t have problems that should perhaps be tackled or talked about. That it’s an improvement over what existed before doesn’t mean you can’t be critical about the flaws it still has and a study like this (looking at real people) and discussion like we’re having here is a necessary start to finding out how to improve on this.


> It's not clear who the audience is.

People who design and implement features in notebooks. The conclusions in the blog post and research paper are clear that improving these identified problems could improve user experience.


> I honestly have not seen this, and the reason makes no sense. Your browser is not handling the data. The kernel is. I mean yes, if you try to load several GB of data in pandas, it's possible you will have problems if you run out of RAM, but this has nothing to do with notebooks.

This is true nowadays with Jupyter, because it is smart about truncating output. But it used to be possible to OOM the browser by e.g. printing in an long-running loop or displaying too long of a list/table.


Mostly I used pandas Series/Dataframes, and they've always had built-in protection for not printing too much.

But I guess if you're manually printing in a for-loop, then I suppose you could make the browser crash - whereas tools like MATLAB wouldn't.


Why so defensive?


Maybe because of the negative sentiment/bad taste statements like "What's wrong with..." leave. It could say "What can be improved with ... in 2020" instead. Probably not intention of the author but it comes as non-constructive crtiticism/"not recommended to use" a bit too much. Some observations seem to not be directly related to notebooks per se. Other feel like could be made as just entries in a FAQ/best practices section of documentation.

Kudos for looking at real people's work and surveying it.

I wonder how much workflow could be improved if researchers would be temporarily paired with developers - who are generally better at modularising and removing friction in their work.

Personally I believe that a bit of clean-code discipline and following known best practices could solve couple of those pain points.

It's also true some could be improved by rethinking how notebooks work; ie. being able to specify input/output of notebook so it can be used as a library; detaching runtime data from the code so it plays better with version control/publishing; maybe even more radical ideas like adding visual/flow view that helps with linking elements; adding built-in excel-like sheets that can be queried/manipulated could also be interesting; built-in, first class support for relational database (sqlite) could also be a big win.

There are many interesting developments happening in this space and there seem to be some unexplored ideas waiting to be tested out.


"What's wrong with ..." for me literally means "What should be done about ..." or "How to make ... better".

I found your defense, which basically just says "It wasn't better before - so, no critique allowed?", considerably less valuable than the article.


IMHO "What's wrong with ..." in colloquial use usually means "Here's a bunch of reasons why you shouldn't use ...".


Why did you not include Org Mode among the Computational Notebooks that you surveyed?

We use it exclusively for our Data Science and find that it ameliorates all of the pain points you highlight in your article.

https://orgmode.org/


Is there opportunity for cross-pollination of your research with the spreadsheet literature?


Yes! Not only that, but with all of the end-user programming research. I did some studies on LabVIEW programmers before and I noticed a lot of the same phenomenon with data scientists. They have a lot of domain knowledge, some programming experience, but usually do not use software engineering best practices or tools (e.g., unit testing, code reviews, automated refactoring). All of this is very understandable but reveals a lot of potential for tools to better support them.

See Yestercode [1] and CodeDeviant [2], two tools that I specifically designed for LabVIEW programmers to refactor and test their code without expecting them to behave like traditional software engineers.

[1] http://web.eecs.utk.edu/~azh/pubs/Henley2016VLHCC_Yestercode...

[2] http://web.eecs.utk.edu/~azh/pubs/Henley2018VLHCC_CodeDevian...


This is very good, I work with a number of data scientist notebook users and your findings closely echo the common complaints I hear.

What are your thoughts on the best way to address these things?


As a data scientist, I used all of the notebooks and didn't find any of the problems listed with databricks.

I don't get to use it in my current role, miss it a lot.


Interesting study! I'm curious what shadowing 15 R data scientists would look like, since it seems to resolve some of the pain points around caching results, debugging, and scaling.

This is a very minor question (and I am not concerned about risk to participants)--when you say they signed consent "in accordance with our institutional ethics board", are you talking about Microsoft, one of the two universities, or all?


Thanks! What are alternatives to computational notebooks (for the same tasks) and how do they compare?


I want a notebook where causality can only flow forward through the cells. I hate notebook time-loops where a variable from a deleted cell can still be in scope.

1. Checkpoint the interpreter state after every cell execution.

2. If I edit a cell, roll back to the previous checkpoint and let execution follow from there.

I can't tell you how many times I've seen accidental persistence of dead state waste hours of people's time.


My problem with notebooks is that I feel like the natural mental model for them is a spreadsheet mental model, not a REPL mental model. Under that assumption, changing a calculation in the middle means that all of the cells that depend on that calculation would be updated, but instead you need to go and manually re-run the cells after it that depend on that calculation (or re-run the entire notebook) to see the effect on later things. Keeping track of the internal state of the REPL environment is tricky, and my notebooks have usually just ended up being convenient REPL blocks rather than a useful notebook since that's the workflow it emphasizes.


That's something that I think Observable [1], in my modest usage, seems to do well.

[1] https://observablehq.com/


Yep, the real complaint is “dead state”, not out of order execution. Worrying about linear flow per se turns out to be misguided based on lack of imagination for/experience with a better model: reactive re-rendering of dependent cells. Observable entirely solves the dead state problem, in a much more effective way than just guaranteeing linear flow would do.

* * *

More generally, Observable solves or at least ameliorates every item in the linked article’s list of complaints. (In 2020, any survey about modern notebook environments really should be discussing it.)

I found the article quite superficial. More like “water cooler gripes from notebook users we polled” than fundamental problems with or opportunities for notebooks as a cognitive tool. I think you could have learned more or less the same thing from going to whatever online forum Jupyter users write their complaints at and skimming the discussion for a couple weeks.

I guess this might be the best we can hope for from the results of a questionnaire like this. But it seems crazy having an article about notebook UI which makes no mention of spreadsheets, literate programming, Mathematica, REPLs, Bret Victor’s work, etc.

From the title I was hoping for something more thoughtful and insightful.


You can get a jupyter extension[1] that allows you to add tags and dependencies and this way construct the dependency graph as you go along. Of course, you have to do it manually and the interface is a bit clunky, but it does what it says.

In practice I think taking care not to accidentally shadow variables is much more important: this dependency business only makes sense once you have a clear idea of what you need and by that point you are mostly done anyway.

[1] https://jupyter-contrib-nbextensions.readthedocs.io/en/lates...


I don’t understand what you are trying to say in your second paragraph, but I highly recommend you spend a few weeks playing with http://observablehq.com instead of speculating about the differences.

In practice, I find it to be dramatically better than previous notebook environments for data analysis, exploratory programming / computational research, prototyping, data visualization, and writing/reading interactive documents (blog posts, software library documentation, expository papers ...). It has a lower barrier to starting new projects, a lower-friction flow throughout

I find it better at every stage of my thinking process from blank page up through final code/document, and would recommend it vs. Jupyter or Matlab or Mathematica in every case unless some specific software library is needed which is unavailable in Javascript. The only other tool I really need is pen and paper, though I also use http://desmos.com/calculator and Photoshop a fair bit.


This falls apart when computation is a factor, though. You can't recompute the whole notebook on every commit when there are 30 cells that each take 2-8 seconds to complete.


[flagged]


In Jupyter I approach this by structuring my exploratory analysis in sections, with the minimum of variables reused between sections.

Typically the time-intensive data prep stage is section 1.

The remaining sections are designed essentially like function blocks: data inputs listed in the first cell and data outputs/visualizations towards the end.

Once I decide the exploratory analysis in a section is more-or-less right, I bundle up the code cells into a standalone function, ready for reuse later in my analysis.

Jupyter notebooks can easily get disorganised with out-of-order state. However that is their strength too: exploratory analysis and trying different code approaches is inherently a creative rather than a linear activity.


Maybe I'm missing a joke here, but if that's your workflow then there's absolutely no advantage to notebooks over something like Spyder or even VS Code.


No, that's not the workflow. You work in the notebook as normal but from time to time (say every two hours) rerun the whole thing.

One advantage of this is that it forces you to name your variables such that they don't overwrite each other. Further down the line this enables sophisticated comparisons of states (e.g. dataframes) before and after (something data scientists need)


If you have a few long data loading and preprocessing steps it's a pain to wait for them to run again, people try to avoid it.

When something odd begins to happen, they don't immediately consider the possibility that it's not their bug and waste time trying to 'debug' the problem instead of just rerunning the notebook.


Would it be a solution to store intermediate computations to an in-memory or disk database like Redis, SQLite? It is a matter of few minutes to run a docker instance and write simple read / write + serialize Python util functions?


Surely, it would be a solution, but I don't think for an average data scientist it's a matter of few minutes.


You don't reload every time you write a line of code. Nobody's insane like that. You reload every two hours or so. This is good enough for most except most extreme data sets.


Well if block 1 takes ages but everything after that is dependant 2->3->4 etc. Obviously it would be nice to just re-run block two and have those changes cascade


I break long running data imports out into seperate Notebooks or .py files, and persist the results.

Always restart&re-run for usable results.


That’s what I’d always do. On more complex notebooks, though, is it possible that isn’t a solution? I wouldn’t think so but I am happy to be surprised. Then again I use notebooks only at the end of a project to present work in “executable presentation” style. Restart and Rerun all has been always been sufficient for me. More generally, I took a look at notebooks, thought, “Why develop with all the extra baggage” and left it at that until ready to experiment with presentation methods for (tight) core ideas.


Why are you even using notebooks at all then?


> One advantage of this is that it forces you to name your variables such that they don't overwrite each other. Further down the line this enables sophisticated comparisons of states (e.g. dataframes) before and after (something data scientists need)

Also, not sure about you, but I like seeing all of my outputs on a single browser page without having to write any glue code whatsoever.


A couple years out of college we finally took a hard look at the credit cards and realized we had fucked up.

We were gonna buckle down, pay the cards down hard for a while, 'color' our money so we both had discretionary spending separate from, say, the power bill. She had much more Excel experience than I did so she worked up a spreadsheet.

It was bad. We had worked up some 'fair' notion of proportionality and she basically had no spending money and mine was pretty bleak. So I redid the numbers from scratch with split that was better for her. In the new spreadsheet she has much more spending money and... hold on, I've got a bit more too? I looked at her spreadsheet repeatedly and I never did figure out where a couple hundred bucks got lost. I went back to sanity checking mine instead to make sure I wasn't wrong. It checked out.

I wonder sometimes how often small companies discover they've been running in the red instead of the black, because some cell got zeroed out, a sum didn't cover an entire column, or embezzlement is encoded straight into the spreadsheet without anyone noticing.

There's gotta be a better way.


The entire accounting department at any company exists to make sure their numbers are spot on. If your wife had an entire accounting department scrutinizing her numbers, they'd find the discrepancy. These are people who were willing to sacrifice their entire professional career and their lives during busy season at least to do nothing but tinker with excel for 40+ years; always trust a masochist verging on the insane.


> I wonder sometimes how often small companies discover they've been running in the red instead of the black, because some cell got zeroed out, a sum didn't cover an entire column

This is a really interesting insight (actually obvious when you think about it). I'm currently working on a spreadsheet app and these kinds of observations are very interesting to me. I guess things like named cells/variables will help (instead of using $A$4 etc.). Range selection could also be more intelligent (it could actively warn you if a range selection seems to be missing a few cells of the same data type). Do you have any other insights here?


One company I sued to work for had this happen there was a magic spreadsheet in the accounting system - one factor in the massive restricting of the company - ICAN was the other


Going back to even some of the earliest literate programming exercises by Knuth, there's a lot of demonstrable usefulness in being able to write the code "out of order", or to at least demonstrate it in such form. It's not entirely out of the question that setup requirements aren't interesting to the main narrative flow, and maybe even distract from it, such that the "natural" place to put stuff in say a textbook is in the back as an Appendix.

A good notebook (again, similar to early literate programming tools) should help you piece the final execution flow back into the procedural flow needed for the given compiler/interpreter, but it probably should still let you rearrange it to best fit your narrative/thesis/arc.


This is how runkit does it for nodejs and I think it’s working quite well for them.

We (at Nextjournal) tried doing the same for other languages (Python, Julia, R) and felt that it didn’t work nearly as well. You often want to change a cell at the top e.g. to add a new import and it can be quite annoying when long-running dependent cells re-execute automatically. I now think that automatic execution of dependent cells works great when your use case is fast executing cells (see observablehq) but we need to figure out something else for longer running cells. One idea that I haven’t tried yet is only run cells automatically that have executed within a given threshold.

I hear a lot of complaints about hidden state but I think it’s less of a problem in reality. It’s just a lot faster than always rerunning things from a clean slate. Clojure's live programming model [1] works incredibly well by giving the user full control over what should be evaluated. But Clojure's focus on immutability also makes this work really well. I rarely run into issues where I'm still depending on a var that's been removed and then there's still the reloaded workflow [2].

Overall I think notebooks are currently a great improvement for people that would otherwise create plain scripts – working on it is a lot quicker when you have an easy way to just execute parts of it. Plus there's the obvious benefit of interleaving prose and results. That doesn't mean we should not be thinking about addressing the hidden state problem but I think notebooks do add a lot of value nevertheless.

[1] https://clojure.org/guides/repl/introduction

[2] http://thinkrelevance.com/blog/2013/06/04/clojure-workflow-r...


If people are wondering about cases that can cause this - a common one (for me) is a mis-spelled variable name. If you go back and change it, the old one is still there and if you make the same mistake twice you will have code that runs but doesn't work. It's then really not obvious why it doesn't work.


It's best to think of the notebook as a REPL. So you'd want to run `del foo` on the old name.

In fact, this is a good counterexample. Why should the notebook delete the old variable name? What if its value is a thread currently executing?

Notebooks are REPLs, and it's better to get used to that than to try to enforce some confusing time traveling.


But, strangely, Jupyter doesn't also give you a REPL (like, say, R Studio does). I'm always making new cells in the middle to output the column names of my spreadsheet, and then I have to delete them. I used to just always have an ipython REPL running and test things out in there as I write. You can start a ipython instance on the same kernel but I found that messed up my plots when I did that IIRC.


You can get a REPL attached to a notebook in jupyter. When you open a console in jupyter-lab you have the option of attaching it to an already running kernel. Using the notebook interface you can connect a console using `jupyter console --existing`. By default this connects to the most recent session, but you can also specify a session by passing a token.


Yeah, but problems with that are what my last sentence referred to. I didn't try debugging it for long, though.


Just get the extension: https://jupyter-contrib-nbextensions.readthedocs.io/en/lates...

there are quite a few very useful ones, my favourite being collapsible headings: https://jupyter-contrib-nbextensions.readthedocs.io/en/lates...


Yes! I've wanted this too.

Colab has a nice feature that's close to this: Insert -> Scratch code cell


> It's best to think of the notebook as a REPL.

With sections of the history easily replaced.

> So you'd want to run `del foo` on the old name.

And then delete this line, because if you leave it in it'll break when you try and run the file all the way through.

> Notebooks are REPLs, and it's better to get used to that than to try to enforce some confusing time traveling.

Do you mean by treating them as append only and never rerunning any cells?

> In fact, this is a good counterexample. Why should the notebook delete the old variable name? What if its value is a thread currently executing?

The notebook has no idea if it can or can't, but that doesn't mean that leaving it in is good it's simply the only realistic option.


Agree. Get used to the habit of deleting old objects/names when you are replacing them, if you work in notebooks


It's an easy thing to miss though, because you also then need to delete the line of code you used to delete the old object/name so you have no record of cleaning up after yourself.


Be careful what you wish for.

Hot reloads can become very expensive. Especially when it comes to computationally heavy tasks that notebooks are built for.

If you decide you want hot reloads by default, it'd mean each time you click on a cell and then click on another you'd be restarting the whole notebook.

If you had massive datasets you were loading or other args that you were parsing manually or at prompt, you'd have to go back and do all that. Don't even get me started on the operations you'd have done with those dataframes prior.

I think it is a good thing that notebooks separate instructions and re-execute manually by default. The cost of the alternative is just too high


> I think it is a good thing that notebooks separate instructions and re-execute manually by default. The cost of the alternative is just too high.

Maybe add a "lock" toggle so a user can block a cell from being automatically executed? The heavy numeric setup tasks could then be gathered in a few cells and locked, leaving the lighter plotting & summary stats cells free to update reactively.


Toggle???

Toggle a whole environment and intrepeter's behavior? Do you know how much architecture that would involve? That's like trying to tell IDLE to be able to both delete or keep your variables on exit, or the JVM to have a toggle switch for memory and garbage management.

Why doesn't the developer make themselves useful and simply write a save function that freezes their buffer variable values to a text, json or SQLite file that they can read from or stream rather than trying to set back a whole community years of progress in an effort to accommodate perhaps entitled or lazy devs.

Can you even imagine the architectural costs of trying to accommodate streaming data and timestamped data as opposed to you just writing your own stuff to file?


I think adding a toggle to run or not run a cell would be a trivial change, something like adding a property to the cell (https://raw.githubusercontent.com/jupyter/notebook/master/no...) and checking whether that is set or not before running the contents.


I don't think re executing by default would be beneficial. I just don't want state present in the interpreter that isn't in any live cell, and I only want time to flow one direction. Other than that, I think the other ergonomics of notebooks are fine.


RunKit does this: https://runkit.com/home

> RunKit allows you to rewind your work to a previous point, even filesystem changes are rewound!


If the interpreter state contains large variables checkpointing might not be viable (eg I have dataframes that are 100s of GB/large fractions of total available memory, reading/writing from hard drive all the time would be relatively slow. If you can save deltas I guess it wouldn't be too space inefficient but I imagine still slow).

At the same time, I do like the idea of an append only notebook where you can:

1. Only run cells in sequential order

2. Only edit cells that are below the most recently ran cell.

Thankfully you can enforce it through code practice and the notebook is relatively guaranteed to be "run all"-able. You will need to refactor it after the initial dirty run, but at least it's easy to reason about.


I want a notebook situation where the platform understand sampling, so that, while I'm doing my EDA and initial development and generally doing the kinds of work that are appropriate to do in notebook, I'm never working with 100GB data frames.

I suspect that a big part of my annoyance about the current state of the data space is that parts of the ecosystem were designed with the needs of data scientists in mind, and other parts of the ecosystem were designed with the needs of data engineers in mind, and it's all been jammed together in a way that makes sure nobody can ever be happy.


You can sample data if you want already (or sequentially load partial data, which is what I usually do if I just want to test basic transformations), but if you need to worry about rare occurrences (and don't know the rate) then sampling can be dangerous. For example, when validating data there are edge cases that are very rare (ie sometimes I catch issues that are less than one record per billion), it can be hard to catch them without looking at all of the data.


Assuming the data isn't changed, thanks to CoW forking wouldn't cause any extra memory usage. If only a subset of data is changed, same thing - only the changed cells will take extra space. The problem only occurs when the whole variable changes - in which case yeah, you're SOL. I wonder what the usage patterns are for such datasets?


Personal experience: when first looking at the data I often do lots of map /reduce style operations which might transform large portions of the dataframe.

Question, if you use CoW then presumably your variable blocks are no longer contiguous, wouldn't this really slow down vector operations?


> Question, if you use CoW then presumably your variable blocks are no longer contiguous, wouldn't this really slow down vector operations?

I don't think so. Vector operations require the data to be aligned to whatever the vector size is, no? E.G. 16-byte vector ops require the data to be aligned to 16-byte, etc... At least that's my understanding.


I've played with prototypes of this by calling fork on IPython to take snapshots of interpreter state https://github.com/thomasballinger/rlundo/blob/master/readme... but if you can't serialize state fully, rerunning from the top (bpython's approach) can work, or rerunning as a dependency dag shows is necessary (my current employer Observable's approach) works nicely.


Check out our reproducibility focused notebook Vizier (https://vizierdb.info). In Vizier, inter-cell communication happens through spark dataframes (and we're working on other datatypes too. This makes it possible for Vizier to track inter-cell dependencies and automatically reschedules execution of dependent cells. (It also makes Vizier a polyglot notebook :) )


Makefiles have this issue too, sometimes things have been incorrectly made, and the dependencies in the makefile are wrong.

Unless it takes more than a few seconds to run a notebook, rerun every cell up to the point you're editing, always.

And then if it does take minutes, and you find yourself in an unexplainable rut, then run the entire notebook, and get a cup of coffee.


That's part of what drove me to write TopShell, which is a notebook-like interface:

https://github.com/topshell-language/topshell

  * Information only flows downwards.
  * Computations are cached.
  * Things are automatically recomputed when the values they depend on change.
  * Things with effects is instead cleared and awaits confirmation before running.


I haven't dug into it myself, but Netflix makes something called Polynote that is supposed to add some awareness of the sequence of the cells to combat this


https://datalore.io does exactly this


Might be worthy trying out nodebook [1] which at least enforces the forward directionality you mentioned.

Also polynote by Netflix, as a user below mentioned.

[1] https://github.com/stitchfix/nodebook


s/Netflix/Stitch Fix/

(don't reply, and I'll delete this)


I used Mathematica’s notebook interface quite heavily 15-20 years ago; Jupyter’s interface is a clone of that in many ways.

At the time, my workflow was to use two different notebooks for everything: foo.nb and foo-scratch.nb. I’d get things working a piece at a time in foo-scratch.nb, not caring at all how it looked, not having to worry about leaving extra output or dead ends of explorations lying around; then the refined cells would be copied over to foo.nb, which would get pristine presentation, and which I could run top-to-bottom.

This workflow worked pretty well for me: very clean reproducible output, with the ability to easily refer back to all the steps of how I’d derived something, along with copious detailed private notes.

I never had to use it but I’m pretty sure each cell even had its modification time stored in the metadata in case I wanted to view a chronological history.


I make a "scratch pad" section of my notebook and work on ideas there. Then once I've pieced together a function line by line and tested it a bit I move it up to where it should be in the chronological order of the notebook. Kind of like your two notebook system but makes copying easier in Jupyter.


I do the same, though it feels dangerous because both the good-copy and scratch sections share the same kernel. JupyterLab works on .ipynb files, and makes it way easier to copy (or drag and drop) cells between different notebooks. One of these days, I plan to switch to JupyterLab to get a sense of what else it offers above Jupyter Notebook.



Superb talk! It's worth noting that a lot of the issues he brings up, ultimately stem from the format in which Jupyter notebooks are stored. R notebooks, with their plain-text stored format as well as code-chunk parameters, solve some, but not all, of these problems.


https://github.com/mwouts/jupytext

doesn’t solve the state management testing or tooling issues though, but commits are slightly less awful.


Yea, when I first found that extension, I was pretty excited about it. But it ultimately is not a first class citizen in the way it is for R notebooks, so I simply don't feel as comfortable using this as I might otherwise have been.


I'm someone who has been programming for a very long time and has been using notebooks for a reasonably long time (and almost always starts projects with them), my feeling is that they are a bit like C in that they make it easy to accidentally shoot yourself in the foot if you aren't careful. I always strive to end up with a notebook that can be "Run All" from a fresh clone, and I'd say that I'm successful with that maybe 60-70% of the time, and am close enough that I can fix it in the remainder.

As the article (and the many others like it that have frequently cropped up as soon as IPython Notebooks first started ramping up in popularity) points out though, a lot of newer users don't have the discipline to ensure that they're not jumping around too much. It's not a problem for them in the immediate term since they know how the state ought to work, but then it becomes a mess when they try to share it with someone else (or to run it themselves again 3 months later).

The challenge though is that the data analysis workflows that it allows are unbeatable by any other tools I've tried. In the end, it may just be that it's the worst form of data programming except for all of the others that have been tried.


I interned @ Google AI last summer; used notebooks nearly everyday. Estimated productivity gain is 3-5x.

Biggest tip I have is to turn auto reload on, then write the bulk of your code as modular functions and call functions within your notebooks. Keeps the notebook tidy and it’s easier to push your code this way.

It’s also easier for sharing since most people viewing your notebooks (mentors, people outside your team) are interested in results/artifacts such as metrics, generated text, images, audio, which notebooks display well (not your code).


Beware perceived gains that (1) benefit you at the detriment of others or (2) have hidden costs exposed at a later stage.


Yes, notebooks are good for studying.


(A frequent Jupyter Notebook user here. For data exploration, and teaching deep learning - then Colab is indispensable.)

The main question is: what are the alternatives, for data exploration (and sharing its results). Similarly, for data science tool demos, Notebooks shine.

IMHO the problem is not in the notebooks, but in how they are being used (i.e. the workflow). By writing scripts in py files, and using notebooks only to show their results (processed data, charts, etc) we get the best of both worlds.

The only build-in problem with Jupyter Notebooks is JSON, mixing input and output (and making it pain to work with version control). But here RMarkdown (and a few other alternatives) work well.


Yes, the article mentions users copy pasting snippets from their personal "library". Well, that could just be made into an actual library of functions to call.

I'm currently at uni enrolled in an AI/ML degree, and there are a lot of people with no previous exposure to programming. It's just that most people don't know that these things are possible, don't want to learn another tool (IDE) and are not interested in longevity of the code, just in the results. This shouldn't sound like me complaining, I totally understand. I think a lot of the stuff could be solved with just better tooling, but a familiarity with software development is definitely helpful.

Also a while back streamlit (https://www.streamlit.io/) was here on HN and since then I've been meaning to try it. I think this could be a good approach to bring together the best of both worlds.


most people don't know that these things are possible, don't want to learn another tool (IDE) and are not interested in longevity of the code, just in the results.

This approach is arguably more effective than wasting time trying to refactor everything into a library of functions.

Programming-as-crafting needs to be more of a thing. Not everything is written to be long-lasting. Even HN ushered the ugly code into hook functions that weren't shipped with the main codebase.


> The only build-in problem with Jupyter Notebooks is JSON, mixing input and output (and making it pain to work with version control)

Absolutely. JSON notebook format makes it very hard to do code reviews, merge in remote changes etc. After being frustrated with lack of solutions, I built ReviewNB[1] specifically to do notebook code reviews on GitHub. Alternatively, Nbdime[2] is also a nice open source library to see diffs locally & merge in changes.

[1] https://www.reviewnb.com/

[2] https://github.com/jupyter/nbdime


I recommend nbstripout https://github.com/kynan/nbstripout

It eases most of the pain regarding version control. You can use it as a 'git filter', so only inputs would be shown in diffs and committed (and also works with interactive adding!), while keeping outputs in your working tree.


Came here just to make sure this got mentioned!


I don't get why anyone one who knows how to use an IDE would ever use a notebook, the coding experience is garbage in comparison. I understand they started as a way to get STEM kids coding quick, but now they are like a standard in data analysis and data science, with those people needing experienced devs to translate the notebook into production code. This just drives the silo walls up higher.


Doing data science in an IDE would be terrible. With a notebook, you get the chance to load the data, view it, clean it where needed, view it again, analyze it, model it and do anything else you need to it. An IDE means that you can't use the previous output to guide your next operation in a direct fashion like you can with a notebook.


> With a notebook, you get the chance to load the data, view it, clean it where needed, view it again, analyze it, model it and do anything else you need to it.

In a good data-oriented IDE like RStudio you get to do all of those things and write code which can be saved as plain text and can be version controlled well under git which you can't do well with Jupyter.

R folks have to be the best indicator in this case because they have access to a good IDE and they have good support for Jupyter. Their use is overwhelmingly in plain text files in RStudio, a small portion of rmarkdown notebooks and pretty much no one user R in Jupyter.


Yes! Rstudio is the one thing I miss most when doing datascience in python.

Notebooks give me some of the interactivity but the experience degrades significantly.

The spyder IDE seem like an okayish replacement but some of the library I use expect you to have html display (within a notebook) to give you full functionalities which is not yet available in spyder.


Have you tried Orange. It has scripting capabilities.


No but, looking at some screenshot + descriptions, it seems to get me further from the code which does not seem like what I am looking for

Rstudio gives you the experience of a classical IDE + easy data exploration which I found to be productive from the exploratory stages (where I need to see my data and the effect of my code) to the clean-up phase (where I refactor my file).


as a counterpoint, plenty of R folks are pretty happy doing all of that in Rstudio


It is interesting to see this discussion about notebooks while I'm thinking about all the RStudio users who do all their work inside the IDE and are pretty happy. Notebooks seem like such an inferior tool to me. I'm also extremely bias.


Also lots of emacs users of org-mode as an awesome notebook.


I'm an R folk, and I'm even happy doing all of that in Emacs!


That kind of depends on your process. In many cases pdb (or the debugging interface in your IDE of choice) works just fine for that. It's certainly not "terrible".

After the exploration and preprocessing stage I personally don't see much benefit of the notebook model, training/evaluation and any meaningful visualization takes forever anyway, that means I need to cache and persist intermittent results. With that it doesn't really matter all too much if I work on it in vim&pdb, an IDE, or Jupyter.


Maybe thinking about the data and what your trying to do before coding might be an idea as well.


'Thinking about the data' most often requires looking at the data from hundreds of different angles, quickly investigating its properties and statistics, maybe plotting or fitting a few things, checking some hypotheses etc (all of the above code you will most likely throw out after the initial stage).

Same with the results - once you've coded something (perhaps outside of a notebook environment) and obtained results, verifying that they are what you expect is much more efficient to do in a notebook.


Maybe you use a notebook I'm completely unfamiliar with, but my experience is that they allow you to write code, run it, and save the results in cells. My IDE does all of that except the saving of partial results part, but this can be done easily by just dumping your precomputed data to disk if you can't recompute it easily. In either case, an IDE gives you get an actual debugger, plus with IntelliJ it has a great data visualization plugins, database viewer, great autocompletion, and integrates with your VCS, etc. What do you do when you need an actual debugger, or need to profile your code? What about documentation for the function you are calling? In my IDE this is a popup, in every notebook I've used, this is a google search.


I use both PyCharm and JupyterLab on daily basis, typically dealing with multi-gb datasets.

If I'm writing a library or adding new features to one, or writing tests I'll use PyCharm sure thing, otherwise the notebook is a quicker way to sketch prototypes and always have a kernel with preloaded datasets and pre-imported stuff ready at hand. I don't want to wait 10 minutes to just load the data every time I want to check if my new function works well on it at big scale. That's one of the most important bits.

PyCharm is a clear winner at actually writing code that you won't throw in the bin 10 min later, and once you know what to write.

Debugging? Don't remember ever using PyCharm despite the fact it exists... either pudb or python-devtools or something else. I'd just write tests and things start working in the process. And btw you have pdb debugger (some weak version of it) in jupyter if you really need it. Docstrings? Press tab twice in the notebook. Or keep PyCharm open on the side so you can cmd-b. Profiling? Never a pycharm builtin, maybe something like flamegraph but an external tool anyway.


> I don't get why anyone one who knows how to use an IDE would ever use a notebook,

The Python IDEs for data science are mostly garbage - if you have any recommendations, I'm all ears because I really don't like notebooks but still keep switching between jupyter and vscode depending on what I'm working on.


I use IntelliJ for all my work, data or normal dev stuff, and it works great (all is python). Maybe there is just a workflow issue here where people are used to saving their data as they go in cells. I just write my algorithms all the way through, get a subset of data to debug against, then use the debugger to help me see what mistakes I made. I always run my code all the way through and only stop at the step I'm debugging. I like this better than saving the data from previous computations because I tend to refactor a lot and would need to rerun most of the notebook anyway. Also, rerunning it all the way through a lot makes me notice slow spots more than if I only ran that area a few times and saved the results. For me, this has the effect that those areas get more attention and my code is closer to production grade than if I had used a notebook workflow. My two cents, but give IntelliJ a try if you want a good python IDE.


> I use IntelliJ for all my work, data or normal dev stuff, and it works great (all is python).

Can it show me inline plots and allow me to embed rendered formulae written in LaTex, images and video in between lines?

This is the reason people like notebooks.


I have found PyCharm to offer a good trade-off between data exploration and productionizing your code. It has the best Python debugger that I've used. You can also run Jupyter notebooks in PyCharm when that makes sense for you.


I don't know what this document is meant to do but you will have to take my Jupyter-lab instance from my dead cold hands.

I love notebooks, I work fast, line by line I execute commands and I immediately see the output (dataframes or graphs). For complex code I have an editor open (in jupyter-lab or vscode) for some functions and classes. But the main developing is done in the notebook, anything that ends in a module start in my notebooks.

As a biologist that learned to program after 30 I just don't understand how you can develop data processing code without such a close handle on dataframes and without checking in graphs/visualizations if your code does what you expect. I don't see how I would do that in pure vscode of other IDEs.

I also don't understand this sentence: "Once the data is loaded, it then has to be cleaned, which participants complained is a repetitive and time consuming task that involves copying and pasting code from their personal "library" of commonly used functions." What is the alternative? Not cleaning the code? And why copy and paste when you can perfectly fine have your own shareable module on the side? I guess most notebook users do some kind of hybrid development.


Good point on the last one; I think we have to 'educate' researchers on the fact that they can also write their own libraries and frameworks, and they should. Even basic data manipulation utilities can be made into Python modules and distributed at ease. If something is tedious, there definitely is a way to make it less so.


The FastAI people have been working on a lot of these issues with their NBDev too: https://www.fast.ai/2019/12/02/nbdev/


Came here to post the same thing. Nbdev helps fill in the strengths that IDEs are traditionally good at. Even if you don't use the full nbdev library and templates, the work flow makes sense. Write code in Jupyter, export to a python library, and you can use it everywhere else after that.


nbdev deals with the version control and library of code issues pretty effectively.


I love jupyter notebooks. Without them, I wouldn’t have been half as productive as I was during my PhD.

Here’s a post I wrote just a few weeks ago describing some of the conventions that I established for myself over the course of 5 years:

https://jessimekirk.com/blog/notebook_rules/

I suspect that a lot of the conventions I describe help mitigate problems described here, some of which should be strictly or optionally enforced by the notebook instead of the user.

(The site’s very much a work in progress, so expect to see odd and broken things if you go poking around.)


Thanks for this! Didn't know about the watermark extension, that looks useful.

I just started working with the Guix kernel for more easily reproducible and reusable notebooks. I suppose that's an alternative to using a conda environment.

See here for an example: https://gist.github.com/jboynyc/5d0319f33e71427aa42a98c1a3a9...


No mention of https://observablehq.com notebooks? They’re the best I’ve found in the “Share and collaborate” and “As products” category. JupyterLab is still pretty great for exploratory stuff, but visualization possibilities in observable are incredible.


No one we interviewed or surveyed mentioned it.


Problem is that Javascript doesn't have the scientific computing ecosystem that Python, R and Julia have. Jupyter supports those languages and any others that people write kernels for. And you can also execute bash, JS, CSS and HTML directly in python notebooks with magic commands.


Agreed, Observable fixes a lot of the problems I've had with other notebooks. It can still be fiddly for code over a certain size/ complexity but the ability to import from npm modules goes a long way to fixing the problem. The user base seems to be predominantly drawn from the visualisation side of things + the fact that it's javascript may limit its uptake in science/maths areas. Aside: I've felt for a while that JS is really missing decent maths/stats libraries, any suggestions?


They're a walled garden and to be honest I kinda hate them.

Bostock had lots of accessible D3 examples before, now it's all on that "platform".. sure it's slick, but overall loss for the ecosystem IMO.


The problem of notebooks has been solved by the Python extension in Visual Studio Code (and some other editors too, although VS Code is the one I'm most familiar with).

Editing an ordinary Python file, if you insert the comment "# %%", you turn everything between that comment and the next "# %%" (or the end of the file) into a code cell that can be submitted to the ipython kernel, just as in a Jupyter notebook. The editor splits into two halves, the left half your Python file and the right half the Jupyter notebook window with submitted code and formatted output (e.g., DataFrames look pretty, plots display normally, etc.). When you're done running everything, you can export the result as a Jupyter notebook. Because you're editing an ordinary Python file, standard features like version control and importing the file you're editing into other files (you cannot normally import .ipynb files IIRC) work normally.

And of course since VS Code is a real editor/IDE, you can double click a file and have it open right up (no resorting to a Terminal to start your Jupyter session) and you get syntax themes, a built in Terminal, a git UI, code snippets, documentation on hover, vim mode if that's your thing, etc.

The only downside I've found is that the Python extension doesn't incorporate ipython's autocomplete in its own autocompletion, but that's a small price to pay for getting to treat .py files as notebooks.


So literally all of these complaints are about their particular implementations of notebooks, not the concept of computational notebooks in general, or are all computational notebooks destined to have unstable kernels?

In my mind, notebooks should be married to a functional style of programming, where you use the notebook's markup to thoroughly explain and document your functions. Below your "function definition" section, you keep a "trying things out" section where you actually plug the data into your functions for debugging/visualizations. You can't shoot yourself in the foot with variables because all the work is done in your function's lexical scope. You can shoot yourself in the foot with stale function definitions, but a good notebook interface gives you the ability to clear function definitions and run groups of cells, so you can make sure you always run your functions in a group that starts with a "clear function definitions" cell.

When you are done, you just cut the "trying things out" section into a second notebook which references the functions in the first and viola, you've got a very well documented library of functions, and a new work notebook where you can freely polish your visualizations/whatever.


This is a solid list. It will be even better if juxtaposed with current efforts to solve each of these problems - every DS I know is addressing at least 2-3 of these with some pet tools in their own environment. For example, we use Panel and Holoviews to make data exploration much easier. I have a feeling the ecosystem would improve faster if we had an index of (partial) solutions aligned with this problem set.

One category left out of the list: testing of data pipelines (c.f. great expectations).


I use Jupyter Lab with Python every day. It's where I do my initial data exploration and cleaning. Jupyter Lab is not perfect, but most of these findings seem like they are more issues of inexperience with technology and programming, not computational notebooks.


I have been heads down in jupyter for the past couple of weeks and I finally realized I just DO NOT LIKE IT AT ALL! Cracks started appearing and then suddenly there was an avalanche of disappointment.

The first crack -- it's almost impossible to build a nice presentation in Jupyter, because you always have to show your code and its stderr. I imported all the TeX goodness, and it looked pretty nice, but I couldn't show the output without showing the TeX code. Importing the TeX interpreter is quite non-standard and means that my notebook doesn't play well with the public servers. I also got burned by some kind of permissions issue, so that all my charts ended up being invisible to read-only users.

The second crack -- I can only look at the code from within my own jupyter server. The source is buried in a very noisy json format.

The third crack -- Who wants to write code in the impoverished browser based editor provided? How many times have I deleted a closing brace that was automatically inserted incorrectly? How can I do a global search and replace?

The fourth crack -- I can't test my code unless I include all the tests in the notebook!

I'm complaining. I realize that I don't have anything constructive to offer, and I'm really a beginner. However, I think some of my disappointment is justified, as I think it was reasonable to assume that I could build my notebooks to be next level presentations.


So once your code gets large enough that it doesn't fit neatly within a jupyter notebook, it's time to split the code out into another package, and then import it into your notebook.

The benefit here is that now your code and be used inside the jupyter notebook, and also inside a webserver say.


I've started to use jupyter with kdb on the back end for analysis. For me I have some hope it will hit the sweet spot because:

1. kdb is "too obtuse" for many and python glue makes it more amenable

2. I can still have kdb functions in source code and call them from jupyter with pyq

3. I can do most of my "editing" in emacs to the kdb back-end, write python "libs" for parsing results, and just use jupyter as a fairly thin presentation layer

4. I can share notebooks with analysts who run the same jupyter server virtual env, so finally we can share notes

Will this add more value than just using kdb? Time will tell, hard to know right now.

I agree that the "ide" experience absolutely sucks.


This all sounds very familiar to me. I'm at a robotics company; we had some experimental infrastructure built up around processing ROS bag files via notebooks, and it just eventually became like pulling teeth. Stuff would get cut and pasted between notebooks, or moved out to helper modules which then had versioning and permissions chaos. Each bag needed its own notebook/interpreter instance because there's no way to rerun a notebook on new data, but then the server would explode because of these massive Python processes hanging around with half-processed data state still in them.

In the end we dumped it all and turned the good parts into a sane CLI tool which ingests data and dumps out Bokeh plots. At some point we'll throw a Jenkins front end on it, but the current approach seems to be working fine.


To offer some help for 2 and 4, you can get a script out of a notebook with jupyter nbconvert <notebook.ipynb> --to python, which you can even include as a cell that starts with ! since that will run shell commands.

For part of 3, there is global search and replace accessible either via Edit -> Find and Replace or Esc-F, and it includes case sensitivity toggles, regex search, and the option to change the current cell or globally.

For 1, I think there are some plugins that can make that tidier, but I've generally just accepted that if the people I'm presenting to don't want to see code then I'll just need to make a set of slides out of the whole thing.


>it's almost impossible to build a nice presentation in Jupyter, because you always have to show your code and its stderr.

You should be able to see a blue vertical bar to the left of every code cell. Click that bar to collapse the cell. You can do this for both the code and the output. I know this works for Jupyter Lab, but I don't know about legacy notebooks.


I think Atom’s hydrogen and VSCode’s python are best-in-class Jupyter clients that achieve everything Jupyter Lab set out to do with more and better features. I develop scripts that function top to bottom with a notebook side-by-side that on a keyboard stroke executes code blocks from my script in the notebook.


For those that do not know it, vscode is great for Jupyter Notebooks.

https://code.visualstudio.com/docs/python/jupyter-support


I think Computational Notebooks are a great idea, and yet I have the feeling that we are in the process of seeing them overapplied. They are wonderful for certain situations, and teaching or demonstrating code to others is right in its sweet spot.

I get the impression that people are creeping in the direction of trying to do everything with one tool, which sounds like it would end up in the same swamp that Eclipse went into. Sometimes, you need to use different tools for different tasks, and not everything should integrate. Just my opinion.


I think that all the pain points of the article are a result of not using notebooks for their purpose. In my opinion, notebooks are good for:

1. POC/MVP: Showing that what you want to do will work before making a full structure. 2. Creating PDF/HTML documents with code and output. 3. Exploratory data analysis and visualization.

I think many of the data scientists in the article go well beyond what a notebook is. A notebook is where you start, but should never be a production tool.


Jupyter notebooks are great for many purposes. They have, however, two really tragic shortcomings:

1. They are stored by default in stupid json files instead of plain source code with comments.

2. The text editing interface inside the browser is horrific and very difficult to normalize (e.g., disable "smart" closing of parentheses, disable the capture of classic unix copy-pasting, etc).


This is a great list, and totally matches my experience. I also agree this is solvable with tooling.

A) VS Code / IDE needs to be the primary editor B) Results are not stored with source C) Export (build) allows packaging for whatever platform.

Python notebooks especially also use some crazy mutable APIs. In general notebooks align with other code written by people who aren’t usually software engineers building production systems. They’re much more about getting things done, APIs and tools are less questioned, a lot of pain is swallowed because PhDs have plenty of time to write a few lines of code. I don’t want to sound disparaging towards these people, it’s just a different set of tradeoffs from writing production grade software.


Logging, monitoring, security, versioning, etc. These are things that most often get ignored due to ignorance or inexperience, but are required for production grade software.


As a computer scientist/software engineer, please allow me the question: Why would I prefer a notebook over e.g. equivalent python script(s) in a git?

I first saw jupyter notebooks when my sister (physicist, non-programmer) used it for analyzing economical data with pandas. Run-time for the full data set was half a day (and IMHO for that analysis SQL would have been better suited). I understand that as a non-programmer it looks alluring, but once the language proficiency is build up, why not use an IDE and run the code on a shell?


The key factor is iteration speed.

If step A takes 5 minutes (and 5 minutes is a very short time) and I want to experiment on step B, then I don't want to rerun step A each time while I'm writing and running code that helps me understand what step B is going to be; I'd want that to be interactive and immediate, not have each rerun take 5 minutes.

Storing/loading to disk is not a good option because all the data that needs to be stored is not yet determined until the exploration is finished; If I write code to save/load A, then I need to change (and test) it after I'm done with B and now want to experiment with C, and it all becomes even more complicated when I need to add an extra step and data field to step A and rerun everything. Deciding what data should be stored in what format is something that you can do in 'productionizing' the code after you've done the exploratory analysis.

REPL is not a good option because it's not convenient to save and replicate the code that got you to the current REPL state.

The other aspect is that visually 'debugging' intermediary data through various plots is not conveniently possible in IDEs. I could generate some picture files in a folder or possibly an HTML 'dashboard' to see the results of my most recent run but that takes extra code and effort, and the results aren't immediately in my face like in a notebook.


Ah, I think I have a hugely different approach to data processing: For my work I often have a very good idea what the output should look like, and what transformations are required on the input to get there. E.g. when processing log files to generate an overview page, or (as I'm doing right now) adding a target to binutils (assembler, linker,...). (Obviously I'm not a data scientist ;-)

With what you describe, intuitively I would use a library that allows me to store&load data per step (with verifying the structure matches), or pass it in-memory. Think JSON (yeah, slow) or something like protobuffers. That way I could do both

> store(A(read(input)) -> file); store(B(load(file)) -> file2)

during development (or in case B is in another language as A), and in production just

> B(A(read(some_other_input)))

But yeah, that's just my intuition of course. Maybe I'd be a bad data scientist.

However, can't you just experiment with smaller data sets? That's what I usually do if processing is slow (e.g. instead of parsing 10GB of log files, I'll just do 50MB to verify the processing pipeline works, and once that's it, run it on the full 10GB and grab a coffee while it runs). Not an option for data science?


Sure, if you know what the output should look like and if it's possible to e.g. write tests to verify that it's correct, then jupyter notebooks would not be the proper tool to use.

The intended usecase of these notebooks is in scenarios where the main output is not the code and not a particular set transformed data, but knowledge gained during a 'computational exploration' of that data. With that knowledge in hand, you can then build 'productionized' code with different methodologies (possibly but not necessarily using or adapting large parts of the code in your notebook), if that's needed - and in such data analysis scenarios it often happens that it's not ever needed.

Sampling a subset of the data sometimes works. Sometimes it would alter the results substantionally and drive the exploration in a wrong direction; questioning and verifying assumptions is important, and it can be a big difference if all A's are also B or only 99% of them are.


https://joblib.readthedocs.io/en/latest/ (though it has a large overhead)


The main reason is that one has to fiddle with the code a lot and re-running the whole thing is much too slow.

A common example is that a huge text file containing experimental data gets parsed in the beginning. Then you have to explore the data step-by-step using all kinds of visualization and analysis such as Fourier transforms, curve fitting, etc.

If you simply put everything into one giant python script, for every step you have to re-run the entire thing which takes forever. Of course you can speed things up by writing intermediate results to disk, but this adds tons of boilerplate code and is quite error prone.

One alternative would be to write individual scripts for each step and read them into an interactive REPL shell. However, then you still have to somehow record the proper execution order if you ever want to repeat the analysis.


https://datalore.io has (1) a reactive Datalore kernel that solves the reproducibility problem. It recalculates the code automatically when something is changed, and recalculates only the changed and the dependent code; (2) good completion; (3) online collaboration; (4) read-only sharing; (5) publishing; (6) sensitive data can be saved in .private directory that is not exposed when the notebook is shared with read-only access


it seems it's cloud-based. Fun for playing around but not suitable for real work (at least for me).

I can't just upload random data to some cloud service to work with it, also I can't upload data if it's too big. Often the data that's valuable is very sensitive.


There are these and other problems with CNs:

0. They try to be "be-all, end-all" proprietary container documents, so they lack generality, compatibility and embeddability. It would be better if live code try-out snippets were self-contained and embeddable in other documents: HTML, other software, maybe PDF, LaTex or literate programming formats. Maybe there should be standard, versioned interpreters for each kind of programming language in WebAssembly and cached for offline usage by the browser for inclusion in documentation, papers, etc.?

1. For prototyping, it is better to have try-out live code (and/or REPLs with undo) for prototyping like what is Xcode/iOS Playgrounds for Swift or ReInteract was for Python.

2. Computational notebook software, that I've seen, are terrible, complex, fragile and messy to install. The ones I've seen make TeXLive look effortless by comparison.

3. Beyond replicability what goal(s) are CN really trying to solve?

3.0. For replicability itself, why not have a GitLab/BitBucket/GitHub repo for code and a Docker/Vagrant container one-liner that grabs the latest source when built? Without a clear, consistent and simple build process, there is no replicability, only wasted time, headaches and fragile/messy results.

3.1. Are CNs "hammers" for "nails" that don't exist?


> Maybe there should be standard, versioned interpreters for each kind of programming language in WebAssembly and cached for offline usage by the browser for inclusion in documentation, papers, etc.

This would be incredible. Even better, the output from the code (like graphs) should be able to be embedded in the paper. You have no idea how many papers have errors in the code that generated the graphs/statistics/etc. and nobody can tell because the authors rarely release the data, let alone the source


For WASM, there ought to be a package-management/registry mechanism for installation (unless there is already? It might get complicated, but would seem a good idea to reuse code/plugins.)... or as below, there ought to be some caching priority mechanism.

Then for HTML assets (and CSS ones too), perhaps a hint on asset-linking tags (a, script, link, img, audio, video, etc.) there ought to an offline-priority attribute to help the browser decide what to throw away when clearing cache the regular way or evicting items from the cache, while being able to leave some things deemed vital when not nuking the entire cache. Yes, websites could be goofy and game caching mechanisms, marking everything "vital" like for 0-pixel image cookies but I'm sure someone would make an "RBL" (real-time blackhole list) system of which priorities on which websites to ignore.

Related aside: There's a lot of common frameworks, libraries and bits that could be cached user-side, with the trick either to a) herding web devs to de-fragment their CDNs, which could create SPoF's or b) changing the standard allowing multiple SRCs or HREFs for high-availability/less bitrot to preserve both choice and encourage de-duplication of common assets. [0]

0. https://html.spec.whatwg.org/multipage/links.html#attr-hyper...


Here's an idea - what if you could put a `hash` attribute on an script tag. After downloading whatever it links to, the browser checks the hash matches the one you provided. Then it could also cache the result and reuse it whenever the hash matches, even if the link pointed elsewhere.


Good list.

Their observations bring to mind the benefits of watching people program on YouTube or video where you learn a style of working you may not even have considered.

However there is one other issue that is not on the list: because a notebook is meant to be read or shared, I always feel like my work is public and feel less inclined to play around and just take a look at things. When I do “transfer” my work to a notebook, it’s only surprising or interesting things that suppress the discovery process.


One thing that I find to be incredibly useful is the keyboard shortcut `00` (press zero twice while focus is outside of a cell), which will restart the kernel, clear all output and re-run the whole notebook.

This way I'm sure that "library code" that I'm editing in parallel in a real text editor is up to date in the notebook and also solves the limits the confusion due to run-out-of-oder problems.

The overall workflow is something like this:

  1. explore using thing.<TAB>, thing?, and %psource thing
  2. edit draft code chunk or function
  3. when chunk 80% done; move it to a module
     and replace it with an import statement
  4. press 00 to re-run everything, then GOTO step 1
The key to preserving sanity is step 3—as soon as the exploration phase is done, move to a real text editor (and start adding tests). Don't try to do big chunks of software development in the notebook. You wouldn't write an entire program in the REPL, would you?

Sometimes I keep around the notebook as a record for failed explorations or as a "test harness" for the code, but most of the time it's throwoutable since all the useful bits have moved into a normal python module/script under version control.


Another useful tip which doesn't require always doing '00': when editing the library code, import things like this:

    import mylib; importlib.reload(mylib); from mylib import foo
Then in most cases except some very entangled ones, you can simply rerun this cell without having to restart the kernel (especially if it requires reloading all the data).


The reality is — notebooks are and need to be developed as an app platform ...

In order to do notebooks properly — you need:

1. discovery (Ideally static discovery) of all the state the notebook needs, and the bulk of state the notebook will/could manipulate during its execution. Your container needs to intercept the filesystem and the networking apis that will be invoked so that a determination of the state that results from these operations can be observed by the runtime and shimmed appropriately for reproducibility and for performance optimization

2. The notebook (and the runtime inferred model of all the required inputs) needs to be repo stable — I Should be able to write a notebook app that reads from the file system on my development host, deploy it somewhere, and the runtime should take care that wherever that however that post deployment file system read is implemented matches my local development semantics

3. Pplatform level dependency graph needs to exist to model re-execution requirements automatically — incorporating code changes and external state

Apple could build this And “notebook-os” would be the correct conceptual framework for it ... anything less is always going to leave us severely wanting


For those asking “what’s the alternative”, RStudio and Matlab already solved the design problem (though they could be better executed).


My main use for notebooks is a simple way to constantly hold a whole large dataset in memory. That way if I want to try some feature reduction or remove some bad result, I can just do that and not wait 10 minutes for my slow PC to rerun my import code. I feel like an easy way to do that in base python would draw me away from notebooks.


Obviously it's pretty hard to make general criticism of the Notebook GUI. This is especially without comparing to a specific other user interface for data scientists, such as a traditional REPL terminal, or some other command line tools?

The Python world gives a good example about the sheer complexity of a notebook infrastructure. The is IPython, there is Jupyter Notebook, JupyterLab. There is even stuff like the SageMathCloud (nowadays called CoCalc) which is basically a web GUI to a VPS combining command lines and various notebooks. And hell, most of these web based interfaces try to make sharing easy.

Mabye we should start comparing these (mostly OSS) tools to the traditional notebook GUIs of Matlab and Mathematica, something we used in the 90s and 2000s. From my feeling, they were more robust, could handle large data better, but they lack all the tooling we get for free in the web.


Jupyter Notebook/Jupyter Lab has replaced IPython as the notebook front end.

I suspect 90%+ of Python Notebook work is done in Jupyter/Jupyter Lab (or things built on it like Google Collab/Kaggle Kernels).

traditional notebook GUIs of Matlab and Mathematica, something we used in the 90s and 2000s. From my feeling, they were more robust, could handle large data better

I've done 10s of terabyte analysis on Jupyter (Spark backend) and I personally know people doing petabyte work on it so this seems doubtful.


You probably were careful enough to understand the limits of the Jupyter server and client (frontend).

It's easy to screw up a terminal application in data science when dumping a large array. Many REPLs cannot handle this properly (and CTRL+C won't work). It's easy to test this: What does your favourite notebook do when you call some command such as (pseudocode/python here)

    print(range(int(1e7))) # or 1e8
In this particular example, the python CLI seems to handle keyboard interrupts fine when the terminal (or RAM) is flooded.


I would love to see some gifted people using this info to further improve tools like nteract[0].

It already eases a lot of pain you may have in comparison when setting up jupyter notebook without the knowledge of a software developer.

[0]: https://nteract.io/


I'm not sure I understand the issue about the user repeatedly tweaking parameters for their data visualization. If anything, that is a reason notebooks are so nice. The repeated tweaks are due to the notebook format, its because that's an inherent part of the data visualization process, where the end result of a particular parameter choice is hard to predict how it will look with a given data set. So the same process would occur whether one was using a notebook or a script, but with a script it becomes much more cumbersome to actually see the result. In a notebook, the parameter tweaking for a data visualization is immediately followed by the result.

I definitely agree with most of the other points though.


I think it's easy to do notebooks wrong, but possible to do them right. I try and do quick prototyping in notebook cells before moving it off to a separate .py file, and avoid keeping any code that does anything other than visualisation or parameter setting inside a cell long-term. That way, if you need to run something "in production" (whatever that means in your context), you don't end up having to pick apart and re-write your code -- you just import the .py file you wrote along the way.

For me, notebooks are a super handy way of visualising and sharing results during meetings, and it's difficult to imagine a more convenient alternative.


I tried to encourage our team to use notebooks, however everyone prefers using PyCharm and git for sharing code. We dont have much visualization, which might be the reason, but I was surprised just how many people just hated it.


Notebooks are not so much for writing programs or collections of functions.. they are better for a style of "code plus explanation" .. add flexible inline charting for data itself


Are you using oo? Still not sure how to “explain” an oo system once sophisticated enough. Just better than go-to everywhere but not much. Of course a trigger based system (gui, system) also have the same issue.

This code + explanation would not work I guess.


> Still not sure how to “explain” an oo system once sophisticated enough.

With a couple of UML diagrams, still the best option.


Direct link to the preprint: http://web.eecs.utk.edu/~azh/pubs/Chattopadhyay2020CHI_Noteb...

Interesting study, I like the mixed-method approach. A quick glance at the industry of the participants suggests that there might be a bias towards structural data (which I think is actually acceptable as that makes up a huge chunk of the non-academic ML-Notebook work) Edit: The authors acknowledge this in the "Limitations".


This is akin to reviewing how well a screwdriver drives nails. Yes, it has problems. That doesn't mean it's a bad tool - you're just not using it right. Does it require discipline? Yes, but so does the screwdriver. That being said, I think jupyter specifically has some legacy issues around format, and I prefer R markdown. As much as I love pycharm, it's never going to do more than replicate the notebook experience. IMHO, the main author publishes on code UI/UX, the title seems more like click bait. Not sure why it's so upvoted.


I quite like the Spyder approach: Pure python code that is segmented into cells by inserting a special comment line.

The cells can then be individually executed in an ipython shell, or the entire script can be run with the regular python interpreter. This makes it easy to tweak the individual parts without having to re-run everything. In contrast to jupyter notebooks you still end up with a valid python script that can be easily version controlled.

I just wish that I could use vim instead of the Spyder IDE.


I wrote a plugin for ipython that some people might find useful: https://github.com/uiuc-sine/ipython-cells

It lets you do linear execution of blocks like in Jupyter, but in a normal .py file. Obviously more lightweight than Jupyter and you get to use your regular editor.


I work at https://www.deepnote.com/, we are trying to tackle some of the pains mentioned in the article (setup, collaboration, IDE features like auto-complete or linting).

We are still early access, but if you are interested in an invite just let me know. My email is filip at deepnote dot com.


Deepnote seems quite interesting, but as a cheapskate grad student, I'm compelled to ask.

If this information isn't private, what sort of business model do you use? I take it you'll have a SaaS subscription model? I see it's free to use now, but how does your company plan to make money (especially taking into account the cost of the cloud hosting Deepnote requires)?


Hey, thanks for the question.

Our goal right now is to build the most amazing data science notebook. We need a lot of feedback to get there, that's why we are keeping it free. But since the servers also cost us something, we haven't opened up Deepnote to the public just yet.

Once in GA, we know we can support students on a free tier almost indefinitely (it doesn't really cost that much) while offering more advanced features on a subscription model for teams and enterprises.


I'm curious. How do people protocol their experiments? When I started, I used to just keep the cells but that lead to very long and impossible to parse Jupyter notebooks. I have since opted for keeping a journal.txt file in Atom where I write down hpyerparameter configurations, epochs run and results (for ML). But that feels a bit awkward as well.


At Gigantum, we're trying to solve some of these issues too. A Gigantum Project lets you run Jupyter or RStudio in a container that is managed for you. Everything is automatically versioned so you can sort out exactly what was run, by who, and when.

https://gigantum.com


Small note: why post an image with the pain points if I need to check the list below to understand what's written?



People love jupyter notebooks for the same reason people love Excel.

This is intended to be a Zen-like Koan, so take it as you will.


One idea for a pain point not mentioned: better variable persistence. If I declare a variable, then delete the cell I declared it in, the variable persists. I've had this cause issues because if I use the deleted variable by accident, it will work fine right up until a kernel restart.


I don’t get the popularity of these things - I think you have to have started out with them to like them.


I learned to like them in Mathematica, in many ways Jupyter is just a pale imitation of the excellent system built at Wolfram.


For emacs org-mode users this https://github.com/dzop/emacs-jupyter/blob/master/README.org is worth looking into.


I’m surprised no one has mentioned what I see as the biggest failings of notebooks: poor handling of connection loss / re-connection. The kernel will continue to run, but a connection hiccup will often make the notebook UI stop updating (and lose any kernel output).


Notebooks are bad and unreliable. You are repeating your code all the time, you are limited to work with smaller datasets. If you are into visual data analysis use Orange or other similar data mining tools. We allow usage of notebooks only for presentation purposes.


It is interesting to me how this talks about "computational notebooks" but it seems to be about Jupyter and derivatives thereof -- RMarkdown notebooks run inside of the RStudio IDE, and they don't use the term 'kernels' like Jupyter does.


What are the currently available CI options for notebooks? You'd think this would be one of the first tools people would need to make sure notebooks are reproducible, but there seems to be little sign of CI usage.


Checkout treon[1], open source testing framework for Jupyter notebooks. Since it runs via CLI you can hook it up to any CI platform of your choice.

Disclaimer: I wrote large part of treon.

[1] https://github.com/reviewNB/treon


Streamlit is imo the best alternative. I was a beta tester and I found that it encouraged good coding practice without sacrificing too much functionality. I highly recommend that other data scientists check it out.

Streamlit.io


Trying my best to solve some of these: https://github.com/wrnrlr/foxtrot


Performance is another pain point, at least for jupyter


Why no Mathematica?


It's not even free software.


Developers need to pay bills somehow.


"Free software" is about freedom, not price. Putting the research that is your life's work inside of proprietary software that can be taken away from you, forever, at any time — that seems foolish.


Copyleft derived "free software" can also be taken away from me.


Only if you violate the license.

By the way, I'd be interested in your thoughts on https://news.ycombinator.com/item?id=22083468.


Mathematica is great for symbolic mathematics and terrible for anything else.

The awful control flow syntax makes reading longer scripts pretty much impossible. Plotting is very clunky and by default produces output files that are essentially unreadable.

Of course one can somehow work around these issues, but it is much easier (and free) to just use python.


Notebooks are sort of like democracy, its the worst form of government except all the others.

You need to pick the best tool for the job, and often times in machine learning that tool is a notebook.


Why not teach data scientists how to write software effectively? Those are smart people, it’s not like using version control, writing unit tests and extracting common code into libraries is rocket science.


What is wrong with life? Many but let us appreciate how to use it more instead of seemingly criticise it.

The world is so much better with you alive. So is the founded tool of computational notebook. Not sure I read it covered R notebook which is really good to share info and analyst. Just wonder how to use it better.

Of course they can always improve on it. But I would promote more expansion - How about a lisp notebook, a clojure notebook, a js notebook and a forth notebook.

The real problem is can you have oo notebook ... it is more “serial” and graphic and data. But not for the “messy” class or trigger Based system. Hence if I may, the real problem is the scoping. It is so hard to visualise a live oo system. Unlike a live functional or even a stack based system.

It is not life that is the problem. Even useless life has its use, as long as it is alive. But if it is not reaching there an alternative may have to think about. Just like we cannot be there we send in our voyagers outside solar system.

Be long and prosper.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: