Hacker News new | past | comments | ask | show | jobs | submit login
Jupyter Notebook 5.0 (jupyter.org)
588 points by sornars on April 4, 2017 | hide | past | favorite | 110 comments



For those who use R, I strongly recommend looking into R Notebooks (http://rmarkdown.rstudio.com/r_notebooks.html), as there is a lot more versatility involved, especially over the Jupyter/IRKernel approach. Although it's R only, unfortunately (you can run Python code in it but not the way you expect)

I'd like to see some things ported into Jupyter from R Notebooks, like JavaScript data tables and the separation of code and output, making it easy to version control only the code. (Atleast this 5.0 release makes tables nonugly)


It would be great to sit with the Rstudio-notebooks developers and user and discuss how we could get the two platform to converge and be more interoperable. I guess both team have a lot on their plate, and we would need more manpower, but understanding and collaborating with other project for the good of user is always something the Jupyter team is happy to do.


I'm happy to connect you anytime! Just come by the west wing of the office. :)


The jupyter team has been working on jupyter lab which is a more r studio style environment. It incorperates notebooks, terminals, file editing, traditional ipython console, plus some other cool stuff in a web ide. You should be able to use an R jupyter kernel with it.

It's in alpha right now but they're making crazy fast progress on it. I've been using it and it's awesome


Yes I know I'm one of the Jupyter dev :-) there is still a difference in notebook format and protocols. In that sens JupyterLab is not more compatible with Rstudio than the normal notebook.


Lol, sorry - well hopefully someone else reading this thread can find out about jupyter lab. it really is amazing.

Thanks so much for all your hard work!


Didn't Yhat already try this with Rodeo?


Something like Beaker? (www.beakernotebook.com)


What does "a lot more versatility" mean? Just wondering what the advantages are.

Or do you just mean that R Notebooks are specifically better than the R support in Jupyter?


In addition to what I've already said, R Notebooks have native support for widgets, autogenerated TOC, and themes (although all of those can be added to Jupyter I believe). There's also the meta-support from being embedded in an IDE. (package management/variable explorer/etc)

Here's an example of one of my notebooks with all 3 things: http://minimaxir.com/notebooks/breach-network/

It's definitely a better fit for R.


Thanks for the example! That definitely includes a bunch of things I think Jupyter should have, but "purists" might consider out of scope (theming, ToC, something as simple as being able to hide cells interactively).

Currently, Jupyter works well for development, but the result is often hard to read because code gets in the way. (unless you use nbconvert, but that sometimes defeats the purpose)


Jupyter notebooks have widgets too, created with combination of Python and JavaScript.


Some widgets are pretty advanced, I saw a presentation by a developer from Bloomberg, in NYC, about their bqplot widget library.

https://github.com/bloomberg/bqplot

They even have gamepad/controller support, which allows dynamic interactions with widgets.

Edit: Live demo of the project from PyData London 2016: https://youtu.be/eVET9IYgbao?t=27m45s


And this depends on ipywidgets (or at least similar - Syvian Corlayhas developed both).

These widgets support embedding in other contexts like static HTML pages or Sphinx docs [1]. An example can be seen on the docs [2]

[1]:http://ipywidgets.readthedocs.io/en/latest/embedding.html

[2]:http://ipywidgets.readthedocs.io/en/latest/examples/Widget%2...


I believe you can also embed one inside an IDE. I am pretty sure you can do this with PyCharm.


Well, for me the big one is that R Notebooks work cleanly with revision control systems like git.

That said, both Jupyter and R Markdown Notebooks are but a pale shadow of the support offered by Org-mode (seriously!).


> That said, both Jupyter and R Markdown Notebooks are but a pale shadow of the support offered by Org-mode (seriously!).

Yes, but the large number of excellent notebooks available for Jupyter (and R as well) all over the web as well as support for CUDA and all kinds of extremely powerful libraries such as tensorflow) give those a serious edge over Org-mode, even though Org-mode is super powerful by itself.


I run my Clojure + OpenCL + CUDA code from Org-mode without problems.


On a remote machine or on your local machine?


Either. :)

If you setup a source block, you can use TRAMP to actually execute the command in a source block on a remote machine. So:

    #+BEGIN_SRC sh :dir /user@remotemachine.com:~/remotedir 
      ls
    #+END_SRC
The above will run ls on "remotemachine.com" and put the results in an output block below it.

Edit: Meant to add that you can do the same with docker. Just use "/docker:dockerId:" as the dir, and it will execute in a docker instance locally. Using multihop addresses, this can get extreme.


Oh, that's very cool. Almost QNX like!

Org mode is one of the few real literate programming tools that I'm aware of, the other one is 'Leo'.


that's it. I'm learning org-mode.


Please do a write-up.


If there are examples you'd like to see written up, I'd be interested in helping out.

I used to try complicated examples, which led to pages such as http://taeric.github.io/Sudoku.html. I'm now much more into writeups such as http://www.howardism.org/Technical/Emacs/literate-devops.htm.... (Note, I did not write the second one.)


I always feel guilty pointing out that org-mode can do this really well. :) Thanks for taking that hit!

Know of any good resources that show this for folks that don't know what we are talking about?


Are there any public examples of really good Jupyter-like org-mode projects with their source?

I'm really intrigued by it, but Jupyter is much more clearly documented (org docs are downright sprawling), so I've always gone that path.


I would look at atom's Hydrogen https://atom.io/packages/hydrogen


Looks like a good demo of Jupyter, but I was looking for demos like this for org-mode.



It would be great if Microsoft would open up a bit about what their plans are for this service, to the extent that it could be something I can rely on in a commercial context.


Hi, fair question. Lots of people are using the service in edu as well as commercial contexts. We haven't put any restrictions in that regard on it.

Other than that the plan is to keep this (or something similar) running as long as there is sufficient interest and so far there seems to be quite a bit!

Note that it does run on docker which means ultimately it's not fully "secure", but we hope to switch to hyperv-linux when it's available.

[smortaz at msft]


I hope you all can style the notebook to look as good as the way kaggle does their version. I understand if you leave it alone too.


Just the other day I discovered that there exist a kernel (SOS kernel) that allows to use Python and R simultaneously in the same notebook (on a per-cell basis) and even has primitives for data exchange between the two.

I'm definitely using this for my next project.


You can also use R inside 'normal' Python notebooks via rpy2:

from rpy2 import robjects

my_dictionary_results = some_method()

names_dict = robjects.ListVector(my_dictionary_results)

%load_ext rpy2.ipython

%R library(some_lib)

%R -i my_dictionary_results doStuff(my_dictionary_results)

Image display etc. works fine


In a similar vein: Beaker notebooks (http://beakernotebook.com/)


Does it work in the same client-server fashion? (With Jupyter you can run the notebook on a server or the head machine of a cluster and then access the notebook remotely from your laptop or desktop.)


No. All local, although you could set up an Rstudio Server in theory and run the Notebook there, although I have less experience with that.

There are R packages which allow communicating with a server, mostly big data packages. (sparklyr allows you to connect to a remote Spark cluster)


It's very easy to set up an RStudio server. I used to run one on Azure. Took minutes to set up and you get the whole RStudio suite, including R notebooks.


Myself and all the Data Scientists I know use Python, while the Data Analysts in my company use R. Does R have anything like the Natural Language Tool Kit that Python has? Or SpaCy for fast NLP? Also, between PYKE, ProbLOG, and python-constraint, a lot of my logic programming needs are satisfied in Python. Does R have anything like that for logic programming?


Not as effectively. Which is why I use R for tabular data and Python for text/image/nontabular data.

And there's nothing wrong with knowing and using both languages.


I think that makes sense and is why our Data Analysts use R.


The only thing I want from Jupyter notebook is easy version control workflow - not sure what's the best option is right now but last time I checked it was 'delete all outputs before commit' which is great in many cases.


There's some good advice here: http://stackoverflow.com/questions/18734739/using-ipython-no.... Obviously imperfect, but it helps until it's implemented properly


Check out nbdime[0], it might be what you are looking for.

[0]https://nbdime.readthedocs.io/en/latest/vcs.html


This probably needs more explanation:

From the docs: "nbdime provides tools for diffing and merging Jupyter notebooks."

It includes both graphical diff and merge tools, command line diff and merge tools, VCS integration for git (so git uses the nbdime diff and merge for notebooks), etc.

Might be worth looking into if that is what you are looking for.


This is something I've been giving some thought to recently. I came to the conclusion that the best way to go currently is git-clean filters built using jq. It's still a bit hacky though. Gentle write-up here: http://timstaley.co.uk/posts/making-git-and-jupyter-notebook...


Use nbstripout[0] as a git filter. Then you have seamless git control of notebooks. There's talk of automatically saving code only separate versions in future releases.

[0] https://github.com/kynan/nbstripout


This looks pretty good for handling diffing/merging of notebooks. I'm just getting started with jupyter, but it seems like somewhat of a pain to have to ssh into a server to manage versioning with git while I work on the code in the browser. I'd prefer if I could do all of my jupyter work directly in a notebook, and commit from the notebook webpage.

I'm not sure how others work with this stack, so there is likely tooling I don't know about. I'd love to hear suggestions.


prepending commands with a "!" runs it in the shell, assuming you are using python.

  ! git commit add ./my_file.ipnb
http://ipython.readthedocs.io/en/stable/interactive/python-i...


Well you can commit from the UI: https://gab41.lab41.org/commit-and-push-to-github-from-jupyt... you just need to install the extension.


This very much. I'd like for Jupyter to have an option to keep the cell code in individual python files in th backend, so you can just check in those to VCS. It would furthermore allow outside python code to import code from notebooks if jupyter would just provide a simple __init__.py file together with the cell files.


So you are asking for https://github.com/takluyver/nbexplode and https://github.com/ipython/ipynb respectively. There just not maintained enough to be in the core :-)


You can create a post save hook to output .py files - http://jupyter-notebook.readthedocs.io/en/latest/extending/s...

(You used to be able to run ipython notebook --script)


I would even settle for an undo/redo (of more than one action)


Jupyter will become like the Google of software development then ;)

To be fair, wasn't Jupyter designed to be an IDE - experiment with code, make tweaks and get the final version out in a text editor - rather than a repository of production-ready code?


Not really, you rarely have IDE designed to be around live-code. IDE are more meant to develop full-featured software. We are moving in this direction (as many users request it), but we still want to stay focus on Data Exploration.


Smalltalk people have thought otherwise since the 70s. The live code part is a big reason to use Jupyter. It makes me think the ST (and Lisp) people were on to something.


IDE here meaning Interactive Development Environment, rather than Integrated Development Environment?


Did anybody succeed in deploying Jupyter into a multi-user environment? I know there is jupyterhub and some people deployed it for selected trustworthy users (e.g. in a university course context or a company's intranet), but AFAIK it hasn't been deployed in production to potential untrustworthy users. Current approaches seem to rely on the idea to sandbox you in a Docker environment, but mostly these are work-in-progress solutions and I am not sure how much more secure they are, beside the fact that also authentication and persistent storage issues have to be solved.

The obvious problem is that accessing Jupyter is technically similar to allowing full shell-access and you have to deal with local privilege escalation, but I wonder if there has been any progress. I evaluated to use it as a UI for domain-specific applications that give users some kind of graphical shell, but in the end I decided against it, because of security concerns.


https://cloud.sagemath.com deploys Jupyter into a multi-user environment. It's used by hundreds of courses for teaching across the world. Our Jupyter deployment also supports realtime synchronization (multiple people editing at once like Google docs) and recording of the complete history of the document. I've also spent the last few weeks on a complete rewrite of Jupyter from scratch using React to provide more robust realtime sync support, faster startup, and better integration with the rest of our platform; this rewrite is of course not live yet.


This is incredible. I have been wondering what a react port of jupyter would look like. Is this opensource ?

Any learning from multi-user deployment ? We are trying to do this internally inside our company and jupyter hub is a little hard to grok.

I know of a lot of people who would pay for a faster jupyter that can also be run as a standalone dashboard/script - without the heavy duty interactive kernels. Basically reduce the prototype-deploy loop.

Check some of the comments here - https://news.ycombinator.com/item?id=14033129


Yes, that jupyter+react implementation will be open source. It's not clear though how the feature parity will be. At least the document is exactly the same.


I know sage math - in fact it was the first platform I found many years ago that provided an open alternative to Maple and Mathematica back in the days when Jupyter was still in its infancy.

It convinced me that there really exist viable open alternatives to the proprietary closed systems and I am extremely thankful for your efforts - I will definitely check out sagemathcloud.


http://mybinder.org/ does that, as well as https://notebooks.azure.com/ – well this one is authenticated, but it's easy to get a microsoft live account, and it does not even needs to be validated so that count (for me) as non trustworthy users. Technically giving access to Jupyter does not give full shell access. Only if the installed kernel give full shell access (which the default Python one does). The terminal feature can be deactivated as well.


Did you ever tried bootstraping an R kernel on mybinder?



The IPython Sandstorm app is a step in this direction: https://apps.sandstorm.io/app/rprqf3t2h3vd3swfkhwk076qrennh9...


You can use systemd or docker with JupyterHub, and make it as protected (or not) as you want. We could also write a spawner that spawns full fledged VMs, which would give you 'real' untrusted isolation - would that be something interesting to you?


Here's one I made earlier: https://rnotebook.io

Unauthenticated, sandboxed notebooks in Docker containers. There are various limits.

If you manage to break it, please let me know!

(Regret incoming in 3... 2...)


You should get in contact with the mybinder.org crowd, they are low on manpower and are still looking for improvements.


https://paws.wmflabs.org/paws/hub/login is available to anyone with a wikimedia account


> accessing Jupyter is technically similar to allowing full shell-access and you have to deal with local privilege escalation

Some form of isolation - be it containers, or jails, or VMs - is going to be part of any solution precisely because of this.

I would actually dare say that FreeBSD jails are probably the best (most stable and secure) candidate of those available currently.


try.jupyter.org is something like this, and I think it uses this: https://github.com/jupyter/tmpnb


You're right, it does use that. The security against untrusted users is whatever you get from docker - each user's code runs in an individual container. In the case of try.jupyter.org a breakout would not be catastrophic, though, because those servers shouldn't be holding any sensitive data.


Hands down the best and most used tool in my portfolio. Well done, guys!

Does anyone know if there's a plan to introduce multi-kernel support in single notebooks like what Zeppelin does? Not that I have a strong preference for it but it appears to hold a lot of appeal in Spark-like environments where not all packages are available in Pyspark and you need to move between native Scala/spark and Pyspark.


See project like metakernel [1], in the end a kernel can be "just" controlling process that manage multiple language. So why you won't get "multiple kernel" you can get "one kernel with multiple language". Its just asking for "multiple kernel" which not the best way to describe what you want to do. In the end it the same excpet you do the language dispatch in the backend instead of the frontend. It's also more efficient for data sharing.

[2] show you haw Hydrogen (based on Jupyter as well), does it.

And [3] (mine), show you how in the the same notebook to use Python, R, C, Rust, Fortran, Cython and Julia with data sharing and sending functions back and forth between languages. I was definitively lazy and did not include things like SQL, javascript and a few others. I haven't used spark in a while, and definitively never from scala directly, but I doubt it would be much harder to do as the C/Rust/Fortran/Cython took me an afternoon to write.

[1]: https://github.com/Calysto/metakernel [2]: https://www.google.com/url?hl=en&q=https://medium.com/nterac... [3]: http://carreau.github.io/posts/23-Cross-Language-Integration...


We are actually working on improving the polyglot support in Jupyter by porting/merging work from Beaker Notebook (http://beakernotebook.com) into Jupyter and especially into Jupyter Lab. This new effort is called BeakerX: https://github.com/twosigma/beakerx. We have autotranslation working in one direction now, should have multi-way in a couple of weeks.


Spark-like environments where not all packages are available in Pyspark and you need to move between native Scala/spark and Pyspark.

What exactly do you mean here? Are you referring to the parts of Spark which don't currently have a Python API? Because those are becoming smaller and smaller.

There is also Jupyter magics to let you change languages within a notebook. See %Rpush and %RPull from [1]. Not sure if there is a way to have a Scala kernel running and sharing the same Spark context though.

I think IBM is working on something in this area.

[1] https://blog.dominodatalab.com/lesser-known-ways-of-using-no...


Yes, that's what I exactly meant. And you're right - that's a pretty small subset and that's what I tell others too.

But, the ease with which you can load interpreters on Zeppelin (apart from the pre-loaded ones) is impressive. I imagine it comes at a cost of some instability because it hangs more often than Jupyter.


I'm curious, what do you use Notebook for on a frequent basis? Data analysis?


Primarily, yes. And also for building and trying out ML models.


Like Beaker Notebooks?


Jupyter is super useful, together with anaconda it's a winning team. I love anaconda because it takes the sting out of all the dependencies and cruft that stops python packages from installing cleanly (besides the v3 / v2 mess and installing as a regular user rather than root).


Also Anaconda tends to come with newer versions than those available in most Linux distributions.


I absolutely love Jupyter. Thank you guys and gals for your great efforts. I've had this idea about how it could become a very simple and powerful rapid application development tool based upon the widgets you already have: What if you could preview and bundle it in a mode with hidden code to standalone executable files on all platforms? There's already many tools in python to do this, all it would take is some export feature to tie it all together.


We're going to be exploring building dashboarding solutions more with JupyterLab. See https://github.com/jupyterlab/jupyterlab/issues/1640 for some of our current discussion. That said, the deployment issue you've talked about has been experimented with in the IBM dashboarding features (i.e., one-click to deploy an application-like interface).


Does anyone know how to setup a multi user notebook? Because of the nature of the notebook and how it maintains state ... that a single instance is usable by a single person.

We are building dashboards in Jupiter and really would love it to be multi user... Without getting into the hub and stuff (way too complex to set it up)


Disclaimer: I work on the hub.

Sorry to hear you found the hub too complex. We're working on making easier-to-use hub setups that fit different use cases. Can you tell us a little more about what your use case was and (optionally) which parts of the hub setup you found too complex?

Thanks!


Thanks for replying.

So it's a bunch of different technologies - nodejs,etc. I'm kinda wondering if it can be built in Python itself. Make it part of a normal jupyter install, so just a "jupyter hub start " will work ?

EDIT: adding to that, you have built a nodejs based http proxy - can you not build it within Python (for uniformity) or nginx (for performance as well as mind share) ? Do you even need to mandate a http proxy ?

Second question is that can it run in a multiprocess - I don't want to run it in interactive mode, but just straight top to bottom. Perhaps there's huge memory savings there.


I think yuvi wrote the CHP on nginx (https://github.com/yuvipanda/jupyterhub-nginx-chp) when we first wrote CHP, node was the only viable solution to have a dynamic websocket proxy. Nowdays Go, or Python 3 with AsyncIO may be potential contenders. It may be possible to rewrite in Python but time is limitted. I'm unsure about your second question.. run notebook top to bottom ? `nbconvert --to notebook --execute --inplace yournotebook.ipynb` ?


Well the second question was also related to multi-user deployment. From what I understand, jupyterhub will spawn multiple kernels every time someone logs in. But a lot of the time (most of the time?), you don't intend people logging into your jupyter notebook to be doing interactive stuff - maybe they just want to run the whole thing as a dashboard.

So it becomes a traditional webapp use case. Do you need all the proxy/websocket, stuff to do this ? Your nbconvert command still needs every user to spawn his own kernel right ?

About the first part - it would be great to have a simpler jupyterhub. One of the steps is to have everything in Python.


JupyterHub isn't really setup to do a 'dashboard' style web application - is purely intended for interactive use. The design choices made reflect this.

There's ongoing work on formalizing the proxy better (https://github.com/jupyterhub/jupyterhub/issues/848) - someone will probably write a pure python proxy when that gets merged :)


I just wanted to make sure you guys were aware that it is a large component of the use case. The very typical prototype in jupyter .. to ..Rewrite in production code is shortened significantly by doing this.

All the tools already exist in jupyter - except one: lightweight multiuser. I would argue that building this is going to be a fairly trivial thing for you guys (as compared to other features you build), but the end user benefit is immense.

Jupyter becomes much more than an interactive scratchpad - it becomes a full blown prototyping environment for data science and reporting. I would say, you would even go against tableau in a lot of use cases.

Please do think about it. My company will be happy to contribute to a gofundme on this.


Reply to this plus 2 comments up. Hub spawns _servers_, not kernels. There are a lot of indirection layers, and indeed, being able to _view_ a notebook without starting a kernel is on the todo list. The multi user collaboration is in progress, it's more complicated than it looks. One of the issue is that if this is a "solved" [with many quotes] problem for static documents, as soon as you have code execution it becomes really tricky. The kernel need to run as someone, but who ? The owner of the document ? What are the permission you give to who and how ? There are some case where there are possible answers, but which are really hard to tackle in a generic way across programming languages and various kind of deployments. Ian has an already well advance JupyterLab Prototype that you can connect to Google Drive for live editing. If your company is interested in funding something like that, feel free to write to any of us privately (git log, and grep to find emails), and we can likely setup a contract with numfocus (non profit that handle our funds), the advantage will be that it will be tax deductible for your company (unlike most of gofundme campaigns).


I wish we had the kind of money to fund the full development - I mentioned gofundme because a small startup in India will not be able to do that, even though we want to. But I have a feeling that a lot of us will want to us as well.

I'm talking specifically about the usecase of code-execution (especially dashboards).

Here's a small point from me - perhaps you are overcomplicating the usecase for 90% of us. Give a proper ssl/tls+bcrypted password setup and roles: Editor and User.

I dont think you should be worried here in the context of people wanting to run a full on sagemath cloud kind of a thing.

If you can give me a low resource way of letting 100 "Users" on a dashboard form and one "Editor" (who can actually edit the underlying notebook), I'm golden. And I'm willing to bet that so will 90% of your audience.


check out http://gryd.us ; it looks somewhat like what your describing


Still no collapsable hierarchy of cells. This is the one feature I miss most from Mathematica notebooks.



Also Table of Contents 2 [1] is useful, since it can numerate headings and subheadings in a natural way.

Jupyter extensions are incredibly powerful, and I don't know how I used to do data analysis before finding them.

[1] http://jupyter-contrib-nbextensions.readthedocs.io/en/latest...


I didn't understand the purpose of cell-tags. Can anyone give an example of how they might be used?


An example among others: I tag certain cells as "initialization cells" and they run automatically when a kernel is restarted (it requires a plugin).


If you want to install this as part of a more integrated sciences environment check out vnode https://github.com/thomaswilley/vnode


Maybe it's just me, but the former table style seems much more readable and, well, compact, even if you claim otherwise. I hope, this behavior is customizable?


I agree. It could be because our brains are used to the old style but there are some objective drawbacks.

The new style right aligns everything which is good for numbers but bad for text. Also, it might just be the screenshot, but the contrast is poorer and the column sizes aren't as fitted.

It's a pity, I would rather have a decent default than to customize every notebook.


You can inject custom css through a ~/.jupyter/custom/custom.css file


Looking forward to next OpenSource RhodeCode release which will now support rendering jupyter notebooks.

It'll be second to Github source code management with Jupyter support.


One major issue for me is copying with mouse select + middle click, which is unfortunately not solved in this release. Ah well. :-/


Does this update break all the addons again? (Vim keybindings in edit mode, code folding etc.)


It seemed fine for me. All of my addons were working.


The computational dom




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: