Hacker News new | past | comments | ask | show | jobs | submit login
A Visual Debugger for Jupyter (jupyter.org)
197 points by sandGorgon on March 28, 2020 | hide | past | favorite | 66 comments



This animated GIF shows the new debugger in action, allowing point-and-click step-by-step execution at breakpoints and easy inspection of variables, call stack, and source code: https://miro.medium.com/max/1400/1*NP0bYBdrhwgpJpKDhPLWrQ.gi...

It looks to be a very welcome improvement for those of us who routinely use Jupyter notebooks for "REVL" (read-eval-visualize-loop) experimentation.

Has anyone here had a chance to test how well this works with objects from Python's scientific stack such as numpy arrays, pandas dataframes, matplotlib objects, PyTorch and TensorFlow CPU/GPU tensors, and so on?


Been wondering about this for a while... How does one "graduate" a notebook to an actual program on a server? Is there an actual standard procedure for that?

I'm pretty sure that if your notebook is big enough to debug what's in it you'd be doing everyone a disservice keeping it contained like that.


The fast.ai folks wrote nbdev to help with that. https://github.com/fastai/nbdev


I generally use the `jupyter nbconvert` command when I have to do something like this.

This can convert a Jupyter notebook into a Python script but unless the notebook has been written with some care (with the intention of converting it into a script later), a lot of manual twiddling will be required.

What I generally do is use Jupyter notebooks when I'm exploring a problem or dataset. When I'm done with the exploration phase, I immediately mark the notebook as deprecated (trying to keep it in sync with code is a nightmare) and link the the relevant source code elsewhere in the project that it was deprecated for.

Databricks does a decent job productionizing their notebooks, and have seen people be quite productive on that platform.


Jupytext [1] has been a good option to keep scripts alongside notebooks. The advantage is that it also keeps both versions synced so you can edit whichever you want (just not at the same time, as the browser will not live update cell contents for you).

[1] https://github.com/mwouts/jupytext


I gradually graduate a notebook by extracting what I can into libraries, and importing those utilities into the notebook. Eventually the logic in the notebook becomes trivial enough to easily convert into a production program.


Someone at IBM turned a notebook into an RPC callable micro-service. I've adapted that idea too, and put an http (gcloud/away compatible) wrapper around all my functions. It was here but that is apparently gone. http://blog.ibmjstart.net/2016/01/28/jupyter-notebooks-as-re...


For a visual output, voila can give you a dashboard look: https://github.com/voila-dashboards/voila/ see e.g. https://github.com/maartenbreddels/voila-demo for a mobile-ready responsive rendering of a notebook.


https://papermill.readthedocs.io/en/latest/

Papermill and some light container tooling can take you a long way. It gets bonus points because data science folks are quicker to jump into a notebook and debug their own equations than they are a Python module.


Along those lines, what's wrong with using free and open source editors with great debug support such as VSCode? I don't understand the appeal of using Jupyter notebooks unless you're plotting or documenting.


Run https://jupyter.org/hub if you mean sharing the notebook with central administration. Deploying is a whole nother deal.


I don't see how this debugger is visual, but anyway it's cool to see that jupyter finally has a debugger.

This is what I would call a visual debugging experience: https://github.com/hediet/vscode-debug-visualizer/blob/maste...

(Disclaimer: I'm the author of that extension)


The debugging experience in notebooks before this was through a fully text based PDB input field that would allow you to control code flow with 'c' or other characters.

This new extension gives a graphical user interface that's similar to how debugging works in i.e. VS Code or PyCharm.

I guess that's visual about it.


Anyone here interested in helping me improving the Python experience of this VS Code extension?

I would be very happy if you could reach out on Twitter (@hediet_dev) ;)


as someone who saw the demo on the github page, and wanted to use it for Python, this comment throws up some red flags.

What needs to be improved for Python use?


As of now, before you can visualize your data structure, you must convert it to a compatible json string (like demonstrated in [1]). This json must match the schema of the visualizer you want to use.

For TypeScript/JavaScript however, my extension injects some helpers that convert some common data structures (like number arrays) automatically to appropriate JSON. It requires a deep understanding of the target programming language to see how such helpers could be injected.

I don't have this unterstanding for Python and would be very happy if someone who really knows Python could help me out.

[1] https://github.com/hediet/vscode-debug-visualizer/blob/maste...


This is very cool and seems useful. I've been meaning to do something like this for years.

Thanks!


Looks great. I'm going to give it a try.


cool extension but literally every other use of the term visual debugger refers to a graphical interface for something like gdb (where you have to manually type in commands to set break points, step, continue, etc). so I don't think it makes sense to claim that this is not a visual debugger simply because you've decided to redefine the term to suit your needs (publicizing your extension).


I see two camps here. Microsoft has been calling anything with a GUI "visual" for a few decades (Basic, Studio, C++, FoxPro, ...), so people from industry often use that definition. For people from academia, "visual" has a somewhat different meaning.

Thus, people in one camp see this as the standard name for the feature, and people in the other camp view it as misleading, and a continued diluting of the word in a way which trivializes their research.

After a decade of hearing "oh, visual programming, you mean like Visual C++?", I've learned to avoid the word entirely. It's a loaded term.

Every common word that is used as a popular brand name has this problem. I'm getting flashbacks to the 80's/90's and trying to explain my home computer to IBM PC people. "Do you have Windows?" "Well, it's an Apple. There are windows on the screen but it's not Microsoft Windows." "If you've got windows on the screen you can drag around with a mouse, that's Microsoft Windows."


I wouldn't blame the person raising the bar for doing so.


I guess, growing up with Delphi IDE, Visual Studio, Browser Dev Tools and finally Visual Studio Code, all calling their "visual debugger" just a "debugger", I got used to debuggers being "visual" by default.

As Jupyter has very powerful visualizations, I initially thought they somehow integrated their visualizations into the debugger, just to see that it's an ordinary debugger every modern IDE has. I know a debugger and it's UI is a crazy complicated beast, but I would expect any modern programming language to have such a debugger.

Just because there are text-based browser like lynx, modern browsers shouldn't start calling their products "visual browser", but I get your point.

Sorry for publicizing my extension here. It's free and open source, works with Python and might help a lot of people. According to github insights, traffic decreases whenever I don't publicize it somewhere.


You're not wrong, but Jupyter is terrible for everything about programming except in-document HTML/SVG visualizations of expressions (no support for unit testing, laggy Ui response to user input, broken Undo/Redo that throws away data, buggy browser based text editing), yet that one thing is worth all the suffering, so we'll take any improvement we can get.

I don't understand why VisualStudio doesn't have LightTable like expression playground yet, but until it does, Jupyter is what we have -- a fancy REPL and document publishing format being abused as an IDE.


I always thought a debugger was more useful for when you got a big buried in a long execution sequence and produced by unknown combinations of variables. Where as usually my notebook cells are only a few lines long, and with a very clear state going in. Can any of you speak to the practical need for a debugger in Jupter?


> Where as usually my notebook cells are only a few lines long, and with a very clear state going in.

That is only true in one very specific case: When you are executing all code cells in order, from top to bottom, exactly once.

If you start executing cells multiple times, or out of order, or even worse, execute only parts of cells (which is possibke in many Jupyter UIs), all bets are off. Anything can happen.


I haven't used Jupter since I switched to web dev, but it's greate to see Jupyter's debugging experience gets better.

BTW, it looks pretty much like Spyder. AFAIR Spyder is a funded project by Jupyter. What happened between two projects?


Spyder was funded by Anaconda for a while, and is now funded by NumFocus and Quansight https://github.com/spyder-ide/spyder/wiki/Current-Funding-an...


Source: jupyterlab/debugger https://github.com/jupyterlab/debugger


When will jupyter have "highlight and execute" functionality? The cell concept is fine, but I'm constantly copy pasting snippets of code into new cells to get that "incremental" coding approach...


So, I went looking for the answer to this because in the past I've installed the scratchpad extension by installing jupyter_contrib_nbextensions, but those don't work with JupyterLab because there's a new extension model for JupyterLab that requires node and npm.

Turns out that with JupyterLab, all you have to to is right-click and select "New Console for Notebook" and it opens a console pane below the notebook already attached to the notebook kernel. You can also instead do File > New > Console and select a kernel listed under "Use Kernel From Other Session".

The "New action runInConsole to allow line by line execution of cell content" "PR adds a notebook command `notebook:run-in-console`" but you have to add the associated keyboard shortcut to your config yourself; e.g. `Ctrl Shift Enter` or `Ctrl-G` that calls `notebook:run-in-console`. https://github.com/jupyterlab/jupyterlab/pull/4330

"In Jupyter Lab, execute editor code in Python console" describes how to add the associated keyboard shortcut to your config: https://stackoverflow.com/questions/38648286/in-jupyter-lab-...


It has already been in JupyterLab for more than a year. You just need to attach a kernel console to your text file.


I use Hydrogen for that: https://github.com/nteract/hydrogen


I'm a programmer, mostly not a data scientist nowadays, mostly working with Python. I have tried Jupyter Lab/Notebook on and off over the last 10 years, and I believe I have now firmly settled on my conclusion:

Everyone should aim to minimize the amount of work they do in Jupyter Lab / Notebook.

It shocks me a bit to find myself saying that, as it is such a beautiful piece of work. Furthermore the people who wrote it are better software engineers than I'll ever be: the frontend, the zeromq-mediated communication with the kernel, the fact that the architecture has generalized so successfully to other language kernels, its huge popularity and reach. Nevertheless, I believe I'm serious. It really comes down to just two related issues, but they're extremely important: debugging and version control.

If you're a software engineer, and not a data scientist, here's how you probably debug already, or if not then how you should debug:

- You identify (a) commit(s) on which the behavior is correct, and (a) commit(s) where it is not correct.

- You experiment with fixes. Perhaps you stash them, perhaps you create experimental commits.

The critical point is that you use your version control system (probably Git) to navigate between alternative versions of the code. With a single command, you can switch the version of your code base, and the subsequent process you invoke to test your code is a fresh process, unpolluted by any state from the version of your code that you were on 30 seconds ago.

In contrast, Jupyter notebook does not encourage this style of work at all. In practice, what you will do when trying to debug some code in Jupyter is comment out lines, temporarily delete code, add experimental lines, add experimental new cells, etc. All creating a working tree, and a collection of in-memory python objects, that is a baffling mixture of changes related to the original feature development, and changes related to experimental debugging. Debugging will wear you out, as the state of your notebook gradually approaches complete incomprehensibility.

If you're a software engineer, you'll already know the benefits of being able to make precise adjustments to the state of your code with git commands. You want to learn statistics and data analysis skills from data scientists, but in doing so you should not regress to a worse style of development by starting to write much of your code in Jupyter notebooks.

And if you're a data scientist, you will want to acquire the debugging skills of software engineers. If you are not using git, you want to start learning it now.

Crudely, we can imagine a 2-dimensional diagram with one axis for engineering skills and another for data science skills. Everyone wants to be in the top-right quadrant. In that quadrant, version control is used, and the version control system is used for debugging. Debugging is rather important in developing all software, whether scientific/numerical or not.

So both groups should be minimizing the amount of code written in the Jupyter notebook UI: instead, write code in a standard Python package, in a virtualenv, installed in editable mode with `pip install -e`. If you need to use a notebook for graphical display, or HTML display of Pandas dataframes, or display of an audio playing widget, or any of the other amazing things it does so well then fine: use importlib.reload in your notebook to load and reload the bulk of your code from your Python package. The notebook should just feature calls to plotting routines etc that you have implemented in standard code files using your text editor/IDE. You could even aim for your notebook to contain so few lines of code that in some projects you might not even bother committing it.


I don't work in data science but am a physicist working in an engineering field. I very much agree with you.

Many of our PhD students learned programming using matlab and it's a mess, they never use version control, everything is in a single script, nobody properly debugs, etc.. I believe this is because matlab encourages that type of programming.

I decided very early on to use python instead of matlab, significantly before ipython notebooks became a thing. Because most of the python resources came from computer scientists, using software engineering methods was really "forced" onto me, and I really enjoy it now.

If I look at the people who have been converted to python using jupyter I see a very similar phenomenon that you talk about. People are creating a huge mess, in ways it's even worse than the monolithic matlab scripts, because the notebooks can't even be run as a whole unit, because working and not working cells are mixed, there are several cells that define the same function, but only one is the correct one (good luck remembering which one it is after a couple of months) ...

I thing jupyter is a great tool for teaching and exchanging and presenting analyses. But it is a terrible tool for programming and especially learning to program, because it really encourages bad practices.


In my experience, there's a balance to be stricken. I really like notebooks for documenting the algorithm development process. I used to do a ton of repl driven development and Jupyter is a repl that allows you to persist commands across sessions. It saves a ton of time that I used to spend scrolling through the ipython history after closing and restarting the session. Jupyter also allows you to manage different kernels in the same environment, so it makes tasks like testing code between py2 and py3 trivial. My final point here is that there's also excellent cython integration, so you can do a lot of prototyping of cython code without having to mess with configuration or multiple files. I will agree that there are tasks better suited to IDEs, but jupyter is not just a plotting frontend, it can be used very effectively in algorithm development and the communication/documentation of the development thought process.

Typically, I have a git repo with the final code products, some of the more complex code gets written in notebooks, then transferred to git and thoroughly tested. I've been dreaming of this debugging experience in jupyter because that's still not a task that's suitable for notebooks, but I am hoping that it will come for vanilla python kernels before I can hope to adopt it.


Speaking as a data scientist and not a software engineer, I think Jupyter is incredibly valuable and I've been excited to see it develop pretty quickly in the few years I've been using it.

I agree with you that it's not a tool for writing software. It's probably best thought of as a really good REPL. And there are tons of uses cases for just that (at least in my discipline): analyzing an experiment, pulling data from a DB and plotting it, sharing boilerplate code, sharing analyses.

Sometimes I use it to test something out in isolation—something that I want to see functioning outside of the larger context of a production system—or to run a local version of an application, but that's not my primary use case.

You correctly observe that it's not a good fit for the tasks you have at hand, but I hope the above illustrates that there are lots of tasks that it's a great tool for.


I understand what you're saying. I agree that there are situations where it's very useful, and I defer to you as having more recent experience regarding what those are. However, it isn't just the "tasks that I have at hand". Half of my point is that, even in data science contexts, one ought often to minimize the amount of work done in a notebook. To take your scenarios, I agree with "pulling data from a DB and plotting it". But I don't agree with "analyzing an experiment". I may not do data science now, but I did before. "Analyzing an experiment" involves debugging, and it is important that the analysis is correct, and repeatable for publication/distribution. So I do maintain that for any non-throwaway code, everyone is well-served by embracing traditional engineering discipline for the debugging and verification challenges that will inevitably crop up.


I'm a software engineer and I disagree:

I very often use jupyter to "pop open a shell intro a production service and start interactively debugging stuff live" (over an ssh tunnel, no public open ports and other security considerations, mind you).

It's amazing the feel you get the first time when you open a notebook that acts like a live REPL to smth. like a Django app and you start investigating and trying out stuff by just stitching snippets of code together in a notebook that imports your app and uses its db!

Now I code all API services regardless of tech they use so I can easily "pop open a jupyter REPL into a running system importing app code and running it agains its db".

Interactivity and REPL-driven-development-and-debugging is awesome if you have good discipline to contain the chaos and keep your notebooks aggressively short-lived (any useful code will be refactored and copied into its place in the regular codebase, most notebooks get deleted before merging a branch into dev/master).


Yes, absolutely. What you describe is Django's `manage.py shell_plus`, which does not use a notebook.

So not to be argumentative, but to be clear about this discussion, I'm going to say that your comment is 70% irrelevant, since using an interactive REPL is routine in python development.

However, it is 30% relevant, because retrieving and archiving the code you ran is going to be much more convenient in a notebook than by using %history or whatever in a shell-based ipython.


On reflection, I'm not even sure that the notebook is that much of an advantage over shell-based ipython for archiving the commands. One problem is with a notebook, you have less idea what you've actually executed: you just have a bunch of code sitting in cells in a web app. Whereas the shell UX is extremely simple/linear: if a command is in ipython history, then you executed it.


yeah, but sometimes you want to execute snippet 12 4 times then using a variable derived from that go back to snippet 8 etc.

notebooks are about non-linear execution

sure, there's enough rope to hang the whole neighborhood in that, you can totally f things up with no chance of recovery by doing that, or end up with data you have no idea how you got at and no way to re-trace 100% deterministically your steps

but if you have some discipline, stick to read-only-wrt-db, and you just delete whole notebook when things stop making any sense, it's... magical to have all that power and freedom at you finger tips, without having to keep much stuff in your working memory since you can dump it in a var or cell anytime, and mix your text notes through the code too!

It's not for everyone, but I love this beautiful chaos :)


OK, fair enough!

All of which does seem to paint a picture of a programming environment which is handy for ad-hoc interventions and graphical/audio/video/HTML output but highly unsuitable for organized development of a code base (even a small one), highly unsuitable for systematic debugging, and highly unsuitable for beginners learning to program beyond their first baby steps (as @cycomanic points out elsewhere in this discussion).


As someone who flips back and forward b/w data science and software engineering quite a lot, I think they are highly complementary. The benefit of things like Jupyter is the ability to thoroughly explore a problem space without requiring the overhead of things like version control. The way I see it, every piece of code written actually has such an exploratory phase where you try out a few different approaches, test your assumptions, often realise a mistake half way through and rework your algorithm etc.

The danger when that is done in an IDE setting is that because you build up commitment in a solution as you go, a reluctance sets in to rework it. So the final output is nothing like what you would write if you did it from scratch - it's littered with historical quirks of how you arrived at that implementation.

So I actually think that breaking out of the IDE and doing an exploratory phase in something like Jupyter is a really useful way to get your ideas into reasonable shape before you write your "real" code.


With the greatest possible respect, I suspect that you are still on the path towards getting really comfortable with git. I hope this doesn't come across as arrogant or presumptuous. What I suspect, is that you are at a stage which everyone passes through, where the act of "making a commit" feels permanent. You know that there are ways to change history, but it feels like they are going to be a huge distraction from getting work done. It really doesn't help that git uses the word "commit"! I sort of wish they'd used the word "snapshot" or something. In any case, if I am right, what you'll come to see soon is that it really isn't painful to rework commits and there is nothing constraining about git. Just create a copy of your current branch first if you're at all worried about messing something up (git checkout -b mybranch-snapshot-1), and then `git reset $commit_before_your_experimental_throwaway_commits`. There's no need to do anything more complicated than that when it's just your private work on your laptop, that you haven't pushed to a shared remote.

There were a couple of things you wrote that make me think that. Firstly: "without requiring the overhead of things like version control". There is a miniscule overhead to using git in the way I describe: `git init`, `git add`, `git commit`, `git reset` are the only commands you need and they take a second to invoke. Secondly "you build up commitment in a solution as you go": as i said above, I believe that as one gets more comfortable with git, it no longer feels a constraint -- quite the opposite, you feel liberated to experiment because you always know you can get back to any state you wish.


Thanks ... I take your point .... but it's not really about being comfortable with git. It's more about what kind of activity you are doing. When I'm an exploratory process with Jupyter it's iterative with a feedback cycle that is almost subsecond. I often have 3 - 4 versions of the algorithm I am exploring visible at the same time in different cells. I'm using autocomplete and interactive evaluation continuously to understand what state the algorithm is in and what attributes and methods are available to me and how they behave.

No amount of git or anything else an IDE can do achieves those things.


Right, fair enough. Especially helpful to have that iterative feedback with plots. I guess I'm just saying, have a workflow for moving that code into a version-controlled python package as you get happy with it. (I've given some instructions on how to work with your own python packages in a notebook in anothetr comment in this subthread). Now that I've fully embraced my notebookless workflow I've been using the following for plots in a traditional ipython shell:

  import matplotlib
  matplotlib.use("Qt5Agg")
But yes, even I might start up a notebook to iteratively refine a plot! And the HTML table output for dataframes is perfect also.


Thanks! I'm learning OOP and data analysis. For OOP I use Pycharm and for data analysis I use Jupyter Notebooks (in Lab or VS Code). Sometimes I write some reusable code in .py modules and call them from my notebooks. Anyway, I didn't fully undertand your proposal to use importlib.reload but will try to research about it. For exploratory data analysis, I guess notebooks are better that IDEs.


Hi, I'm happy to try to help. At its most basic, don't copy paste your code into the notebook! And don't import your code like this `from mymodule import myfunction`. Instead import it like this:

  import mymodule

  mymodule.myfunction()
That allows you to do this to reload your module and pick up the updates to myfunction that you've made:

  from importlib import reload
  reload(mymodule)
However, how do you ensure that python can find your code so that `import mymodule` even works? Don't mess about with PYTHONPATH and sys.path. What you really want to do is house your work in its own python package. So, the milestones you want to get to are (not implying you don't already do these things!):

- Always use a virtualenv when working with python

- Create proper package structure for your python project. This means your directory structure will look like this

  myproject/myproject/__init__.py
  myproject/myproject/mymodule.py
  myproject/setup.py
- Google for how to create a minimal setup.py. Just put what you need in there, it's not much.

- Now, with your virtualenv activated, so that `which pip` resolves to `myvirtualenv/bin/pip`, do this:

  cd myproject
  pip install -e .
- That pip command will execute your setup.py and "install" your library into the virtualenv. But it will install it in such a way that you can edit the code and the edits will be picked up by the installed version (it uses symlinks).

- Now install jupyter in that same virtualenv and start your notebook. You should now be able to do `from myproject import mymodule` and `reload(mymodule)`. And your project is now a real python library so you can create subdirectories, etc e.g. `from myproject/plots import create_boxplot`.


Are there projects to integrate notebooks with version control?


This is the most promising one I'm aware of. I tried it in the course of trying not to accept the conclusion I outlined above.

https://github.com/mwouts/jupytext


There are a few efforts on this front. Here are two that I know of for JupyterLab:

https://github.com/jupyterlab/jupyterlab-git

https://github.com/elyra-ai/elyra#notebook-versioning-based-...


Totally agree, as someone that's jumped back and forth between software engineering & data science. Data science programming already brings out my worst habits - mixing business logic everywhere, using libraries with odd APIs (pandas, matplotlib) => trawling Stackoverflow for code snippets without really understanding them, ugly performance optimizations.

Add poor version control practices and the dawning realization that you don't know whether your code ever worked or just seemed to work because some variable was in scope that shouldn't have been, and it quickly becomes chaos.


As a data scientist who spends at least half my time in Jupyter, I wholeheartedly agree with this.


NICE!


I'll take this opportunity to get jupyter advice: one thing I don't understand about jupyter vs jupyterlab is why notebooks in jupyter classic have functioning vim bindings (ie in cells) through the codemirror extension but jupyterlab notebooks do not. in jupyterlab you can have vim bindings in the text editing view but not in the notebook view. for the life of me I cannot understand this design decision. how hard can it be to just leave whatever thing that makes it work in jupyter classic alone so that it continues to work in jupyterlab.


there is an extension to provide vim bindings within cells- jupyterlab-vim


yes there is but somehow it is not as good as the codementor extension (iirc you it doesn't play well with cell navigation bindings)


Can you elaborate a little here? I made the switch from classic notebooks to jupyterlab recently and find the bin experience very similar (there's a few subtle differences that I can't remember offhand, but I don't recall having any problems with cell navigation)


They may welcome your contribution. Hop on gitter and run it by the dev crew directly!


[flagged]


Why?


[flagged]


You've been posting a shocking number of nasty swipes at people. If you keep doing this we are going to have to ban you again. Would you please just follow the site guidelines instead?

https://news.ycombinator.com/newsguidelines.html


>You've been posting a shocking number of nasty swipes at people.

i don't take swipes at people who don't post vile things to begin with. i will never understand the whole "you have to tolerate the intolerant" ethos. you can peruse my comments (as you have) - i never incite but censure is a public service.


This looks nice and all, but on the other hand it's kind of depressing how people are expending all this effort on Jupyter and surrounding stuff, just to make something that almost, but not quite as good as Smalltalk.


Would you expand on this? I don't know smalltalk, and I'm interested in what you see as better?


It uses the fusion of the IDE and the program you develop. You have vast possibilities of objects visualizations (inside a debugger or in separate windows), you can select some part of the code inside the debugger and let run it in separate debugger etc. See https://pharo.org/features


A technology needs more than technical superiority in order to thrive.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: