Hacker News new | past | comments | ask | show | jobs | submit login

This is really exciting for the team! However I don't think that I am particularly sold on the notebook style of coding.. its possible that I simply haven't found a good use case for it, can anyone suggest an example where the notebook style outperforms a simple script based style?

For reference, I use Matlab and Mathematica pretty heavily, and python in a text editor like sublime along with a terminal running ipython shell.




I find notebooks to be great for prototyping longer pipelines or processes. Instead of having to constantly get fresh data, particularly if it's from an external API, the notebook can persist the data in memory and you can iterate on the next piece of the process right there.

I then take that and make it a more formal script/process w/ version control and all that fun stuff. They're also really great for learning. I just wouldn't put them in production :-)


I'm using Python for lab automation, so my data comes fresh from an experiment. I've found that keeping stuff in memory is extremely convenient until the kernel shuts down for some reason (e.g., I inadvertently kill it while forgetting that I've left a notebook open).

Still, I love having my data collection scripts documented right there with the subsequent analysis. So, I've disciplined myself to handle experimental data in one of two ways:

* For "small" data, format it as a Python thing (list, dict, whatever is appropriate), and paste it into the next cell as an input. I haven't found a way to do this automatically, and I'm careful not to make things too automatic lest I run a cell and over-write old data.

* For "big" data, dump it to a file. I just turn the system time into a filename, to avoid over-writing an old file.

I don't think I've come up with the last word, on using Jupyter as a self data collecting lab notebook, nor am I yet 100% certain that it's even a good idea. This is a work in progress, but much better than anything else I've ever tried. For complicated experiments, I still create stand alone Python programs to control things.


This is the no. 1 reason I use notebooks. I recently worked on a Python library for an undocumented API that returned broken, non-semantic HTML. Counting spans, parsing inline styles - that kind of hell. I honestly don't think I could've done it without Jupyter.


I do the same thing with Emacs + Elpy, I love interactive programming!


I'm still not sold personally—it seems like the in-memory persistence is only useful for the intermediate case where my data is slow enough to generate/obtain that I don't want to run the code every time to do so, but fast enough that I don't mind running it every time I launch the editor. Most of the data I have that's worth caching due to speed is worth caching to disk. Combined with unpredictable side effects of variables persisting whilst I'm actively hacking on the code and implicit in-memory persistence is pretty off-putting.

A recent workflow I've had for a data analysis project is to have each stage of data processing in a separate function, with all the functions called in order from an " if __name__ == '__main__'" block, with all but the function I'm presently working on commented out. Each function returns nothing, but saves its data to an HDF5 file. Other functions read the inputs they need from the HDF5 file and write their outputs to the same file, and if I want a fresh run I just delete the file, uncomment everything in the '__main__' block and run again.

The functions also save output plots to subfolders.

This is compatible with version control, and caching on disk rather than just in memory.

The biggest downside compared to Jupyter notebooks is lack of interactivity in the saved plots (I can make interactive plots pop up of course but they're all in separate windows all at once so it's less clear which part of the code each plot came from), and lack of LaTeX in code comments - I still will have external LaTeX documents explaining what algorithm I'm using somewhere.

So for now, the downsides of notebooks with respect to version control, data caching and extra state that I have to remember in order to not hit subtle bugs in my code as I hack on it, seem to outweigh the upsides.

Maybe what I would like is an editor that renders LaTeX in comments, and which embeds arbitrary plot windows at given points in the code, but without any data persistence, and without the embedded plots actually being saved anywhere - your file is still a normal Python file and it's just the editor rendering things that way based on magic comments or something.

Or maybe I should just write a decorator that renders a function's docstring as LaTeX and embeds any matplotlib windows produced into one scrolling document with the sections named after the decorated functions. Decorator could take an argument telling it whether to include the full source of the function, the comments of which it could also render as LaTeX. Then you have input code compatible with your favourite text editor and version control, and an output document which optionally includes the code.


Nice! That seems like a great way to go about using it.. I'll have to give it a shot for my next project :)


Using python in an editor next to an ipython console is exactly the sort of workflow that JupyterLab supports. See https://jupyterlab.readthedocs.io/en/stable/user/documents_k... for a walk-through of how this workflow can be used in JupyterLab.


I'm using RStudio notebooks heavily in my latest bioinformatics analysis pipeline. They're a great way to produce an HTML report containing code, exposition, results, and plots all in one place.

https://github.com/DarwinAwardWinner/CD4-csaw (look at scripts/*.Rmd)


I wrote an article a few months ago on the differences between R Notebooks and Jupyter Notebooks (and why, IMO, R Notebooks are better): http://minimaxir.com/2017/06/r-notebooks/


R notebooks are the bees knees and frankly I'm surprised Jupyter hasn't borrowed more from them. It's so much easier being in plain-text until render time, and the output is easier to manage because you can trivially decide what chunks you want to echo, evaluate, plot at 2x size...etc without any change to the interactive usage. Not to mention you get to retain your nice IDE features like good code-completion, doc lookup, version control...

I am excited for Jupyter Lab and it's a step in the right direction. But it feels a little bit like they're reinventing the wheel with some of this stuff. I would gladly pay money for a python copy of the R ecosystem with RStudio, R markdown, R notebooks, where everything just works great by default.


Yup, revision control is the elephant in the room with Jupyter, and why I struggle to recommend it for reproducible research.

R Notebooks followed the org-mode model of keeping a simple, revisionable document with code interspersed.


Thanks for sharing... I'm always excited to see examples of RStudio notebooks in the wild.


Besides the other replies, notebooks are also great for an extremely easy literate programming style, i.e. when you want to explain in text or images as much as you want to do. Not only is that easy to create as a notebook, it's easy to share.


This makes sense, it seems like one of its intended purposes is as a great pedagogical tool which I think would work well.


It's hugely helpful in the consulting world. We use it all the time for proof-of-concept type work -- it's much easier to present a notebook to a CTO than a bunch of scripts.


Better than PowerPoint to management in some cases!


It makes sense for one off / exploratory idea style scripts - exactly the sort of thing that scientist will do, which is probably why its so popular with them.


If I'm writing code for data science purposes and I'm not planning on putting that code directly into production (i e. exploratory analysis, general offline analysis, etc.)


rapid experimentation, and embedding graphs/images. I use it whenever I want to try new things with CV/ML/etc.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: