It's amazing how we are watching use cases for notebooks and spreadsheets converging. I wonder what the killer feature will be to bring a bigger chunk of the Excel world into a programmatic mindset... Or alternatively, whether we will see notebook UIs embedded in Excel in the future in place of e.g. VBA.
From my experience, I think that this is unlikely. A notebook abstracts a REPL and allows for literate programming. It's mostly used by people who used to write computer programs to carry out analyses (e.g., applied statisticians or numerical mathematicians). A notebook gives them the opportunity to explain the code block, include equations, etc.
A spreadsheet allows for semi-automation of data processing. Each cell can have a (rather simple) function defined and its evaluation result is printed in that cell. You can actually build up pretty complex workflows by just concatenating cell evaluations.
To give a more concrete example, think about a loop. It is arguably the building block of any computer programming language and a necessary cornerstone in learning how to program. Both notebooks and spreadsheets implement loops differently. You can code a loop in a notebook but the cell output will be difficult to interpret (think of it having to fit a linear model for 5 different outcomes). You would be better off just splitting up the cell and run the models separately. That will allow for commenting the code and explaining the results, just like you would in writing a paper. In a spreadsheet, you would define a function, then copy/paste it for the cells you want it evaluated for. No programming required, just knowledge of how to link to cells from within a function and how to copy/paste in the spreadsheet. That's why spreadsheets are wildly used by non-technical people with little knowledge of computer programming.
I've used a lot of both. Excel is much better for most tasks where you have < 1mil rows of data. Its easier to look at the data, easier for novices and fast enough. Just being able to scroll through the data is very valuable just to get a feel for it. The biggest drawback is VBA, if you could write excel macros in Python it would be a hit.
If you have more data, Notebooks can handle that better. However I've noticed lots of colleagues skipping notebooks and using IDEs instead. Much easier to work with and better for scm. I'm not a huge fan of notebooks any more.
Now that's quite a generalization... most tasks? If all you're doing is a=b+c then perhaps. I work in HFT and even for trivial data exploration I would never even consider touching Excel; why would I? Even if it's just 100 rows. No thanks. Once you're comfortable with Python / its scientific stack, the exploratory part of data analysis becomes fast and trivial.
What I would like to see is notebooks becoming more IDE-like. This is already happening gradually, e.g. with JupyterLab replacing Jupyter notebooks ([1])
As seen in the post, Finance has caught on to this idea. The Bloomberg Terminal now provides BQuant. It's an almost fully functional IPython notebook with built in access to their financial datasets.
Analysts that used to work in excel are moving their models into environments like these. Libraries for most common functionality are provided, and allow someone with only a bit of VBA knowledge to feel comfortable enough to start working with python.
And when you browse places like r/financialcareers, it's filled with finance students wondering which programming languages they should learn. And the answer is always to learn python using jupyter notebooks.
That’s not a bad idea. Spreadsheets are pure functional languages built that use literal spaces instead of namespaces.
Notebooks are cells of logic. You could conceivably change the idea of notebook cells to be an instance of a function that points to raw data and returns raw data.
I'm picturing the ability to write a Python function with the parameters being just like the parameters in an Excel function. You can drag the cell and have it duplicated throughout a row, updating the parameters to correspond to the rows next to it.
It would exponentially expand the power of excel. I wouldn't be limited to horribly unmaintainable little Excel functions.
VBA can't be used to do that, can it? As far as I understand (and I haven't investigated VBA too much) VBA works on entire spreadsheets.
Essentially, replace the excel formula `=B3-B4` with a Python function `subtract(b3, b4)` where Subtract is defined somewhere more conveniently (in a worksheet wide function definition list?).
You can build user defined functions in Excel with VBA as well as with Python through something like xlwings. One of the issues that I ran into with xlwings (or any third party integration into the Office suite) is portability between users.
The ubiquity of Excel is both a blessing and a curse in that everyone has it, so everyone uses it, regardless of whether or not it is the best tool for the job.
Google Colaboratory is now Ubiquitous in the sense you use the term, as is Microsoft Azure Notebooks, so the Ubiquity argument is no longer unique to Excel. The big argument in favor of notebooks is transparency and the breadth of tools that they can make use of. Economists will increasingly move away from Excel as the QuantEcon website demonstrates. Perhaps accountants will still uses spreadsheets, after all they invented them, but it's unclear why anyone else really needs them when there are better tools available.
This would require a reactive recomputing of cells to be anything like a spreadsheet.
> Essentially, replace the excel formula `=B3-B4` with a Python function `subtract(b3, b4)`
as of now jupyter/ipython would not recompute `subtract(b3, b4)` if you change b3 or b4, this has positive and negative (reliance on hidden state and order of execution) effects.
I too would really like something like this, but I think it is pretty far away from where jupiter is now.
You can build something like this with Jupyter today.
> Traitlets is a framework that lets Python classes have attributes with type checking, dynamically calculated default values, and ‘on change’ callbacks.https://traitlets.readthedocs.io/en/stable/
You can definitely build interactive notebooks with Jupyter Notebook and JupyterLab (and ipywidgets or Altair or HoloViews and Bokeh or Plotly for interactive data visualization).
> Qgrid is a Jupyter notebook widget which uses SlickGrid to render pandas DataFrames within a Jupyter notebook. This allows you to explore your DataFrames with intuitive scrolling, sorting, and filtering controls, as well as edit your DataFrames by double clicking cells.https://github.com/quantopian/qgrid
Procedural scripts written in a general purpose language with named variables (with no UI input except for chart design and persisted parameter changes) are reproducible.
What's a good way to review all of the formulas and VBA and/or Python and data ETL in a spreadsheet?
Is there a way to record a reproducible data transformation script from a sequence of GUI interactions in e.g. OpenRefine or similar?
"Within the Python context, a Python OpenRefine client allows a user to script interactions within a Jupyter notebook against an OpenRefine application instance, essentially as a headless service (although workflows are possible where both notebook-scripted and live interactions take place.https://github.com/OpenRefine/OpenRefine/wiki/Jupyter
Are there data wrangling workflows that are supported by OpenRefine but not Pandas, Dask, or Vaex?
This interesting need to have a closer look, possibly refine can be more efficient? But haven't used it enough to know, just payed around with it a bit. Didn't realise you could combine it with jupyter.
Two types of cells. One type defines functions. Another are instances of those functions that require inputs. Arrange them in click n drag floaty space. Cells of type one exist in a name space. Cells of type two exist in spacey space.
This kind of thing exists at a larger scale for pipeline visualization. I could see it working for notebooks.
The ease of the calculation tree in Excel versus having to keep track of what cells in a notebook you have updated was a large part of why we built and open-sourced Loman [1]. It's a computation graph that keeps track of state as you update data or computation functions for nodes. It also ends up being useful for real-time interfaces, where you can just drop what you need at the top of a computation graph and recalculate what needs updating, and also for batch processes where you can serialize the entire graph for easy debugging of failures (there are always eventually failures). We also put together some examples relevant to finance [2]
Hi, present day (grad) student here that has been seeing this change happen gradually over my academic career. honestly there have been times I wish that technology stayed out of education because I feel like that the clear explanations have disappeared in exchange for cool graphics or videos (maybe a budget reallocation to the design/graphics team on the publisher's side?). or a related thing is I've noticed in class discussion happens less as slides have replaced the whiteboard/chalkboard as the class speeds through pre written formulas or texts. overall, perhaps there's a case for more quantity of info being relayed thanks to tech but I feel like quality has suffered as a result.
Undergraduate here, and I tend to agree.
I used to learn a lot of material online using all the sites, blog posts, videos, visualizations, etc I could find... But as time has gone on Ive realized that most flashy materials, are less helpful that just spending some time with a single good book on a topic.
The problem is not the tech but the learning style. With interactive notebooks and ton of other materials available, students can participate in pro-active reading before the class so the actual class time can be used more for discussion, not just lecturing. Sadly, most students and professors still do the old way of passive listening.
I can't possibly see how any student in certain subjects like engineering can do much reading ahead. I listened to the lecture, tried to study notes, did insane amounts of homework which always took hours, worked on projects, and then had to study for quizes and tests. I would've loved to check out a chapter before class, but what little time I had was for sleeping, eating, and a little socializing. Granted there are some people that are way smarter than me or especially if you had an easy program you might have been able to read ahead.
I like the concept of Notebooks a lot, but you have to be careful that students aren't getting slightly flashier presentations that come with confusing installation woes.
I teach econ, although on a break at the moment from doing this. I use Jupyter for this, the way to structure a class I think is to split into a lecture part with work on a board and then a practical part where people work through exercises with Jupyter. Both sessions can incorporate discussion where necessary.Used appropriately something like Jupyter can enhance learning and understanding as well as build skills.
Does anyone else find it strange that there is no real-world data in these notebooks? It's all simulations or abstract problems.
This gives me the sense, personally, that economists aren't interested in making accurate predictions about the world. Other fields would, I think, test their theories against observations.
It's an educational course in quantitative economical methods. Fitting real world data is messy and would probably distract. There obviously is overlap with metrics but as an undergrad course I'd separate this too. They do have ample links to scientific papers that do use real world data. There's a pages long list of references [1]. Do check them out if you're into economical science.
It is literally impossible to run these models on data without first understanding these methods deeply, because causal inference for these observational data IS extremely difficult.
If you do not understand your model deeply, or if you do not understand your data deeply, you are likely producing garbage.
This course relates to the first point.
There's a lot of structural econometric papers that do exactly what you ask, but you need graduate level statistics and a deep understanding of discrete choice, identification and simulation methods.
Structural econometrics is a field where PhD students, in their 5 year of study, usually produce only one complete study, if that.
I let my kids practice with hammers, nails and wood (tools / supplies) before I introduce building a piece of furniture (the educational end-goal). These models are the tools of the trade.
I agree with the sentiment that if work is messy, teaching should have messy as well. But not when you're starting out with new tools.
+1 for the kids tools. To extend, I let my kids practice hammering first, then sawing, then some other skill. Learning to work with messy data can wait until you are used to the new tools.
There are serious critiques of economic theory out there, which tend to say that kind of thing.
But if you compared these notes to the notes for a college level physics course, you would find a similar level of abstraction, idealized models, and absence of real world data. Those things are not in themselves indicators that physicists (or economists) don't care about the real world. In any mature field, there is a body of knowledge and techniques to be learnt. There's a certain formalism to be picked up, rather than just staring at data.
There might be legitimate reasons for dismissing the general approach taken by mainstream economic theory, but what you seem to be saying ("hmmm, my intuition is that this stuff doesn't focus enough on accurately predicting the real world") is not a reasoned critique.
> This gives me the sense, personally, that economists aren't interested in making accurate predictions about the world. Other fields would, I think, test their theories against observations.
You say this as though using mock-up data to teach techniques isn't a universal practice in literally every other discipline.
>Does anyone else find it strange that there is no real-world data in these notebooks? It's all simulations or abstract problems.
Pretty much every course I took in undergrad physics had no real world data. The intro level courses were especially fun, when we'd go into the lab and get such horrible data that we'd never conclude what they're teaching in the theory classes. We wondered what the point of the lab even was.
The biggest offender is the friction model. Heck no - it's not proportional to the normal force. No one could successfully show that in the lab. And a quick Google search shows you a trivial experiment where just changing the orientation and keeping the normal force the same leads to wildly different frictions.
It's an academic course. You learn using models and basic concepts, then eventually apply it to real data.
Ever taken statistics courses? You're not doing multiple regression analysis on real world data on day 1. On day 1 you're learning odds using playing cards and coin flips.
>Ever taken statistics courses? You're not doing multiple regression analysis on real world data on day 1. On day 1 you're learning odds using playing cards and coin flips.
Curiously enough, my undergrad statistics textbook was loaded with problems where the data was taken straight from a journal paper. The book has poor reviews on Amazon, but I think it's the best I've seen.
You could test your own theory against observations that calculations with real world data are very much a part of economics, but are just not part of this particular course.
Of course, it depends on who they work for. Effectively, the American field of economics is an exercise in decoupling private reality from public theory.
I'm econ undergrad -> DS -> Machine learning. Econ is very useful for data science if you focus on the right subjects: statistics, math, and experimental design. You get all the hard skills you need to interact with data that a statistician or computer scientist gets, with the (significant, unique) benefit of learning how to ask the right question or design the right experiment given what is likely a messy, weird, social scientific question.
On the other hand, if you don't do any quantitative, empirical, or experimental economics -- i.e. you only do theory or political econ -- then you won't pick up these skills (as much).
Probability theory, optimization, statistics and so forth do not differ between economics and computer science, so it makes sense they are the same.
You would see a difference in that these sort of models are used for causal inference and counterfactual analysis, whereas Machine Learning is mostly predictive.
That being said, Machine Learning is starting to apply methods developed in econometrics and/or stats, like GMM and Time Series methods.
For example, Long-Term Memory models are quite recent additions to Machine Learning. The short-memory process restriction of autoregressive models has been worked on since the early 80's.