After talking with Excel users for the past year to understand what makes it such a ubiquitous tool, I don't think that its the ability to write complex formulas, but instead the reactive, visual environment (read: IDE). Of course, for some users the ability to nest 100 VLookups or create UDF's is the important part, but for the vast majority of people, their workbooks aren't __that__ complex.
When it comes to data analytics IDEs there is a fundamental tradeoff between staring at the data or staring at the transformations. Excel makes sure you are brutally aware of each edit you make to your data at the expense of reproducibility and auditability of your transformations. Python takes the opposite end of the trade -- obscuring the underlying data, but bringing the transformations to the forefront. For non-programmers trying to learn Python (especially for data analytics), the biggest hurdle is losing touch with their data.
I've been building a Python package, Mito (https://trymito.io/hnc), to try to address this tradeoff for those who want to analyze data with the intuitiveness and data-first-ness of a spreadsheet, but with the power and traceability of Python. Mito is a Jupyter Lab extension which gives users an interactive spreadsheet that automatically converts your spreadsheet analysis to the equivalent pandas code. You can write Excel formulas, merge datasets, create pivot tables, etc.
I've seen Mito around HN and reddit before and I think it's an absolutely brilliant, but its applications may be more limited than the vast use cases covered by Excel. Maybe that's fine, as your goal may *be to carve out a niche rather than compete with Excel more broadly.
The reason I say that is I think the major pain point in its hypothetical widespread adoption would be getting users accustomed to the ergonomics of accessing attributes in a programming language when they only understand noob-friendly Excel formulas, which is more functional in nature.
Looking at the demo, I think some pseudo-code such as the following could be easier for e.g. an office worker than pure python:
ADDCOLUMN(Sheet='Train Stations', Name='Accepts Bags') # defaults to appending at the end
SETFORMULA(Column='Train Stations'!'Accepts Bags', As=IF('Train Stations'!'Checked Baggage' = "Y", 1, 0))
PIVOT(From='Train Stations', To='Pivot', Keys=('State'), Values=('Accepts Bags'), Formula=SUM)
SORT(Target='Pivot', By='Accepts Bags', Direction='Asc', NA='Hide') # defaults to ascending, NA first
Clearly, it's not like I've thought this through carefully and am not claiming this particular example is really ergonomic, but hopefully this illustrates the point I'm trying to make.
It would not need to look like Excel, but I think the jump from spreadsheet -> Python may be a step too far for the average user than, say, spreadsheet -> some functional approach.
Glad to hear you're a fan! You're right, Mito is not designed to cover all of Excel's use cases. Mito focuses on rectangular datasets that fit nicely into a pandas dataframe ... at least for now. So the types of use cases that we're looking for is data cleaning, pivoting, etc. If you're building an LBO, Mito is not the tool for you.
It seems like what your proposing is almost a wrapper around pandas functionality to make the language easier to read for Excel users. I think that's a super interesting approach which we honestly haven't thought that much about. As a rule of thumb for Mito right now, any spreadsheet formula gets generated as a Mito formula (ie: using an IF statement in the Mito spreadsheet generates the code IF(A > B, 1, 0) instead of the Pandas code) and anything else is raw pandas code (ie: pivot tables, merges, add column).
In general, we've been thinking about trying to move more of the code to the raw python approach since we've heard things like "not seeing the raw script makes the code unproductionizable" etc. But I also see your point that beginning Python users might prefer readable code over Python code. If we took that approach, users would still get the reproducibility, auditability, and ability to use a spreadsheet interface on large datasets, they'd just sacrifice any semblance of learning Python. That's great food for thought!
>When it comes to data analytics IDEs there is a fundamental tradeoff between staring at the data or staring at the transformations.
That is an interesting point regarding Excel vs Python. But no-code, flow-based data transformation tools such as Easy Data Transform (my own product), Alteryx and Tableau Prep offer a different approach, by having a canvas of transformations and allowing you to see the data after each transformation with a click. This loses some of the massive flexibility of Excel, but also has a lot of advantages, including: no syntax to remember and the transformations are much more visible and easy to reverse.
So I have been using Excel every day for 13 years as an actuary. I'm also involved in the management of software project in c++.
You are correct. It's a combination of REPL like behaviour and being able to see the entire state at once that does it. Excel is the most agile programming environment known to man.
Completely agree from my actuarial experience too. Do you find in practice though that often Excel is only part of the workflow and that a lot of the agility is lost because it's using output from a (often black box) modelling tool that is doing a lot of the computational heavy lifting?
The most important point about Excel in the work environment is that it's often the only tool people have so they make it do things way beyond what is sensible.
I'm a consultant and at every client (mostly public sector) I discover new ways of solving problems with Excel which would otherwise not get solved because the blockers to buying "the right tool" are simply too great.
At my latest client I've plotted their data on a map by colouring in cells as pixels using custom formatting and zooming right out so it isn't too blocky. It's too hard to get access to GIS software even though other departments in the same organisation have it.
My favorite example of this was a team that needed to process more data than would fit into a normal Excel worksheet, so they wrote their application in pure VBA, reading from and writing to a custom file format, thus bypassing the Worksheet object entirely and using Excel just as a VBA runtime.
Like all such bespoke creations, it worked fine, and then the only guy who understood it left.
Were you using excel to translate their data to the map, or some external script to create the file? If it's the latter, why not use a plotting library? If it's the former, how did you acquire this unholy power?
> Not sure what you mean by plotting library but I cannot install any software so all I could use was Excel
I believe you, but isn't it baffling that your superiors couldn't recognize that giving you access to some Python or whatever would vastly improve your productivity, what you produce, and your happiness, all for a grant total of €0 and some de-rigidifying of arbitrary rules?
Generally these rules are institutional policy which are there for legal or security reasons. In a 1000-person organization, preventing 990 of them from installing malware is more important than allowing 10 of them to be more productive. A middle-manager does not have the ability to change that policy.
> The most important point about Excel in the work environment is that it's often the only tool people have so they make it do things way beyond what is sensible.
Some see a problem: "Don't use Excel"
I see an opportunity: "Evolve Excel so it's fit for this purpose".
> The most important point about Excel in the work environment is that it's often the only tool people have so they make it do things way beyond what is sensible.
> I'm a consultant and at every client (mostly public sector) I discover new ways of solving problems with Excel which would otherwise not get solved because the blockers to buying "the right tool" are simply too great.
I just don't get this. Most of the tasks where I've seen incredibly creative/perverse Excel gymnastics used in real life could have been solved far better with a bit of Python (or whatever). What "tools need to be bought" in the vast majority of cases?
It took me a whole 5 seconds to google [1] up. As far as I can remember Anaconda can be installed without admin permissions on any PC and pulling the needed packages is a matter of minutes. Or a Tableau Online account with Tableau Creator desktop app having similar GIS functionality costs a whole 70 bucks a month, OMG. The public sector seems to excel in making up "blockers"
to justify why shit doesn't get done.
The “hardness” I’m talking about is not technical difficulty it’s the political difficulty. Sure I could have found other solutions but the acceptable solution for a tool that I can hand over to “the business” must cost $0 and only use authorised applications installed by IT. So that leaves Excel. That’s just the way it is unfortunately.
And yes they make up lots of shit. I don’t. I just make the best of a bad situation for people who need to get shit done despite the stupid policies.
I used to work in the public sector myself, so who am I to judge your way of making a living? But still, I don't like if public resources (work-time and tax money) are wasted just because of political reasons. But that is entirely my problem.
Depends on the org. Some firms block direct executable file downloads and scan for unrecognised code on the network, proxying them all through something like Nexus This is only going to become more common (and necessary) in the wake of things like the SolarWinds hack.
My Excel skills are a bit rusty but I remember the times when it was advised to not open .doc or .xls files from an unknown source because harmful VBA macros could be executed on opening the file. Did this changed lately? If not, how is opening an .xls file different from installing a non-tested python package from security perspective?
A while ago I was bored and asked myself if I can actually solve leetcode problems with excel as a joke, and hence leetcode-excel was born. XLSX and everything.
I got bored after a while and didn't try to solve more problems.
Here is a fun thought: Excel has potentials in teaching non-programmers to think about algorithms. Translating a problem or a function from Excel to a programming language has the potential to break the ice for people who just "can't write a program."
Excel is probably the single most used piece of business software in the world. People have been able to make quake. People have done 3d rendering without vba. People have done all kinds of stuff with Excel for a while now.
People have also made megadollar blunders. As programming environments go, Excel carefully maximises the fuckup surface, enabling all users from novice to expert to stumble unawares into hidden calculation bugs. Cell-reference slip-ups (via copypaste or otherwise) are de rigeur, of course, and they're aided and abetted by Excel's cockeyed "type" system (if we can even call it that), half-baked documentation, inconsistent function library, and mediocre/absent testing, integrity, versioning, or debugging tools.
Probably the worst example I've seen first-hand was an entire retail banking loan-approval process running off of a single, shared, gigantic spreadsheet, that hundreds had tinkered with, but no-one understood or took responsibility for, and where the accompanying Word document of "things not to do" was bigger than the workbook.
Even yesterday, a friend of mine discovered they'd underclaimed expenses for a total >$1,000 due to a dodgy spreadsheet. Something as simple as pasting a list of dollar amounts from a webpage into Excel can produce an incorrect SUM() if/when trailing spaces creep in, since the resulting values may be treated as strings and evaluate to zero - and so it had transpired. Not even "text to columns" could fix it; you have to a) know about this lurking monster, b) use formatting to make it casually evident, and c) use Replace to strip the whitespace. What a crock.
In some ways the argument against Excel is like the argument against Electron: if you're comparing it to an elegant, purpose-built and well-tested application, yes... it falls woefully short.
But is that really the right comparison? Lots of people just don't have the skills or time to build a specific applications. Without Excel, what would they do? There are some places where a 90% solution is worse than no solution at all (at least no solution is a forcing function for a "real" application), and in many ways Excel isn't the best expression of the _idea_ of an Excel-like, low-barrier-to-entry declarative programming environment with a built-in UI, but it's truly a wonderful tool. It makes a lot of automation possible for a lot of people.
While true, the net is a big gain when you consider the things that do work and the ease with which they do / or even the complex things that were done correctly. And, you consider if people had to hire a team of software engineers to accomplish those things then they simply would not have existed.
Indeed the team of software engineers, no matter how big and competent, is: 1) Difficult to manage because you have to convey what you want them to do, often across a gap of domain knowledge; 2) Hopelessly overbooked.
And often the easiest way to try a few things and communicate the idea is by prototyping it with a spreadsheet.
Exactly. Ad hoc work is excels true value in my opinion. And the network effect of having so many people at least casually exposed to it. Because nothing is ever done, it’s never as simple as do “this”, it’s do “this” first then let’s discuss changes to “that” then get so and so’s thoughts and revise “it” which probably has a file name ending with “Final V4” by the time the intended actually received it.
Haha, I discovered "always TRIM both leading and trailing
spaces on both lookup criteria cells and source tables" the hard way too. At 3 in the morning...All of the 20-hour days/nights I've done as a data analyst were due to Excel malfeasance.
Also - 15 digit NUMBER precision leading to truncated values that you only discover when it's too late; no Macro undo; no way I'm aware of to transform columns/rows from formulas to their values without intermediate cut/paste steps; lots of essential features (like "Copy only visible range containing hidden cells") buried in Find & Select > Go To Special menu without keyboard shortcuts; Excel's magic ability to change cell data types by merely touching a CSV (open/closing without saving changes)...
> Probably the worst example I've seen first-hand was an entire retail banking loan-approval process
There's some real doozies out there. The Reinhart-Rogoff error, which was used to justify imposing austerity on Greece [0]. The UK's COVID tracing fiasco [1]. The list goes on. People using Excel have no business making decisions that affect other people's lives.
There are. But it’s also not an exaggeration to say that entire industries run on Excel, especially the financial sector where it originated.
That’s not changing soon but it’s also worth bearing in mind that spreadsheets were the first killer apps starting way back with VisiCalc, and for good reason. Having worked with old school paper spreadsheets it’s not hard to understand why and with every update I do see these issues being addressed in really useful ways.
But why? OK, I do understand why people make Quake in Excel – it's the same reason people make clock-perfect emulators for the Apollo guidance computer and use it as a pocket calculator or whatever – it's pure human curiosity. But why is Excel shoe-horned into solving business-relevant necessary problems also when it's not a good fit all?
IIRC, there are an order of magnitude more Excel "programmers" than programmers in all other languages combined. Tbh, it is a much better source of business than most professional programmers give it credit for.
To add to you list, back in the 80s, I wrote a simple x86 assembler, and a simple neural net, in Lotus 123.
I'm not quite sure. Within my work team, Word is only used occasionally, specifically for anything document related. Excel, on the other hand, is used all the time from typical number crunching to keeping track of tasks (despite how much I despise using it for the latter). I also imagine that small businesses, like restaurants, don't require Word very often, but a lot of them use Excel for finances, inventory, etc.
In regards to PowerPoint, I imagine it lags pretty far behind in usage compared to both Excel and Word. Not many people are making presentations in the grand scheme of things. My intuition says that it's mostly upper management and maybe a single person in a group using PowerPoint "often".
My experiences in the business world is Word has been on a sharp decline for a long time. Excel is more likely to be used as Word. It’s very common to see a spreadsheet with no math and no tabular data but it’s print formatted.
For me at least, I try to avoid Word because it is so slow and cumbersome compared to notepad or notepad++ for the purpose of writing notes. It seems like most of my word processing is for my eyes only.
I’ll have to give pages a try. I think perhaps I favor the earlier versions of office products because I learned on them, they’re faster, and they weren’t as a service.
I’ll go for latex if something has to be published and certainly use power point a bit but am always on the lookout for new approaches. It’s great that power point can pull/receive graph data.
A word processor like Word is really aimed at larger, more complex documents with formatting.
Notes are usually smaller with simpler structure and formatting. With notes, it helps of the app gives you a way to organize the notes. This is really a different use case.
Powerpoint is powerful once you get deep into it. I know people who make some crazy visuals using powerpoint and you would thing it was done with some adobe software.
Not generally. Big firms it would be email 1st with browser for most line of business apps , excel close second, third PowerPoint, with Word behind by some distance and mainly for reading, rather then writing so Acrobat is high on the list as well.
Most of the excel workbooks would be lists and trackers of various kinds, many just (big) tables without any formulae at all.
Is there a real difference between the classic MS office applications, though? You can do spreadsheets in Word and Outlook, have styled text in your Excel document, add animations to your titles in Word, etc.
At this point they can just as wel get rid of the different GUIs facade and instead implement some contextual interaction model (and hopefully they'll include an API so we can generate these documents programatically and don't have to deal with this nth clippy generation :) )
> Is there a real difference between the classic MS office applications, though?
Yes, there absolutely is.
You can create a document with styled text in Excel instead of Word, just as you can edit a photo in Paint instead of Photoshop. You can do it, but the two tools have dramatically different capabilities.
It's important to think about developing for use-cases otherwise we will end up with overly complex software that aims to be all-things to all-people.
> overly complex software that aims to be all-things to all-people
This is exactly what I believe the current state of the MS Office applications to be.
The interface and/or implementation isn't always ideal for the purpose you would generally use a specific app for, but the functionality is there.
> dramatically different capabilities
Embedding a fully functional Excel spreadsheet is only a few clicks through the ribbon and some frustration away in Outlook, Word, and even PowerPoint.
> Embedding a fully functional Excel spreadsheet is only a few clicks through the ribbon and some frustration away in Outlook, Word, and even PowerPoint.
This is effectively opening a reduced version of excel in a very limited way, primarily for embedding one document in another and allowing limited editing. You don’t get the full functionality of the other application.
I can see why I would want to change the axis on a graph even after I have pasted it into my email, but why would I want one app to be both my spreadsheet and my email inbox?
Are you a user of office suite? I spend about 70% of my work life between excel, PowerPoint and Word and have never once wanted them to be one app, but quite often have wanted better integration.
Perversely you /can/ use simple formulas (sum,average, etc) natively in Word tables. Thankfully I’ve never come across a document where someone was mad enough to do anything much with it.
Excel is extremely accessible, which is both a strength and a weakness. People who are complete noobs are quite happy to open up Excel and pull together some logic and data with very little regard for the constraints of the tool. Then they can save it and email it to you and you can work with it. It is the epitomy of 'move fast and break things'. Try getting them to pull together a quick JavaScript file to show you what they are thinking... Whichever way Excel develops, I hope that it doesn't put off the noobs from their penchant to have a go because as much as there is the possibility for the tool to go horribly wrong, elevating it out of reach of the layman would be worse.
Working on a financial institution as a forecaster, I once inherited an excel spreadsheet that took 3 hours to open.
It was full of bugs. At one point it mixed the Swiss franc and the Argentinian peso and that was sold to clients.
That would be blatantly obvious if done in Python or R. It took me 2 years after inheriting to figure it out.
I use spreadsheets but oftentimes they are not the right tool for the job.
Coming at this from another angle, I worked on a (.NET) project for a large construction company where the client wanted a subset of Excel-like functionality in the UI we were creating, but was adamant that we not actually use Excel.
Ok then, we had a go, and it was actually quite fun coming up with a scheme for getting calculations to propagate successfully and without getting locked into infinite loops; inevitably though the result was a very poor man's copy of what Excel can actually do.
So we went back and said look, far and away the best way to do this is to either use Excel itself, or embed it, whereupon the actual reason for their aversion to Excel came out: they had had cases, they explained, where project managers had used white ink in white cells to hide data and thereby forge financial calculations to their benefit, and they just wanted to prevent that from happening! I wanted to politely suggest that they solve their "technical" problem by getting more honest employees, but in any case the project got canned shortly thereafter.
Scripting with Excel can be done in two ways: one is Excel as a host calling the scripts inside Excel, and the other is controlling Excel in another hosting language. The first can be done with VBA or Excel Javascript. The second can be done via the COM Automation interface supported by Excel. Any language that can talk to a COM Automation server can start Excel as an embedded COM object and call Excel's COM Automation API to control it. Visual Basic, JScript (Windows' Javascript), C++, C#, Java, Mathlab, Python, and others can talk to the COM Automation servers, and can script against Excel.
Yes. I believe the way Excel hosting other languages inside it as scripting addon is via the COM Automation Server interface again. The other language needs to implement as a COM Automaton Server. Excel then instantiates the other language's Automation Server as an embedded COM object and calls its interface, which can call the functions implemented with the language.
There is already VisualBasic for Applications which is just similar to Python, only with begin/end-blocks instead of indentation. And with a little fiddling you can also use JScript/WSH to remote-control some parts of Excel, as well as all the .Net languages.
The libraries in VisualBasic (before .net) do suck. But syntactically, Python and Basic are very close. I hate both very much, because all the nice features that make programming fun are lacking. Both feel very much like using a childs' toy instead of proper tools.
I code in VBA constantly as a Finance Manager. I also spend a lot of time learning Python and used it throughout a CS Degree. VBA and Python are not alike at all IMO.
You are right that you can use something like C# to manipulate Excel files, but there isn't a ton of tutorials on it, which makes it difficult and prohibitive to learn.
It is already; VBA for the desktop versions isn’t going away anytime soon but it also not being developed apart from updates to the object model to reflect added features.
You can generate Excel files in Python with XlsxWriter. It has an API for formatting, charts, merging cells and what not. If you want to automate an open Excel file, you can use xlwings. You can write data to Excel from pandas inside a Jupyter notebook.
I think part of the problem is that any scripting language rolled directly into Excel will be expected to keep backwards compatibility. Microsoft doesn't have full control over Python/JS, and they would like to avoid issues such as the changeover from Python 2 to 3.
Microsoft could implement its own fork of those languages, but is that what customers actually want?
It's probably difficult. I'm sure Microsoft has no great love for VBA, after all, they effectively orphaned the language when they came out with VB dot net. Adding the scripting language per se would be relatively easy. But a big part of VBA is being able to navigate the Excel / Office object hierarchy. I believe making that easy in Python would be hard.
Just yesterday I stumbled across https://pyspread.gitlab.io/ after finding that coding Excel is a complete desaster from a software engineering point of view - no testing, no debugging, no version control, no structure. It must have costed humanity many billions in vain.
Whaddya mean no version control? What about CTRL + SHIFT + S every few minutes with incremental cryptic filename additions to USE_THIS_ONE_CopyofCopyof_VERY_IMPORTANT_myLatest_Edits_to_Copy_of(3)_DONT_DELETE.xlsx? ;-)
But if you use Office 365 for file storage, there is actual version history for Office documents.
In some way sql programming like functions, cursor, store procedures is the Advance Excel programming. Why reinvent the wheel?
In regards to having JS/Python scripting it, if you using Postgres, just add v8 or python plugin, and there you have your advance excel programming in the language of your choice.
If Excel is as popular as everyone says it is, why don't we see more of the javascript addins? Excel as a platform makes sense to me. If I'm in Excel where all my data is, I want to import more external data or publish data externally etc.
When it comes to data analytics IDEs there is a fundamental tradeoff between staring at the data or staring at the transformations. Excel makes sure you are brutally aware of each edit you make to your data at the expense of reproducibility and auditability of your transformations. Python takes the opposite end of the trade -- obscuring the underlying data, but bringing the transformations to the forefront. For non-programmers trying to learn Python (especially for data analytics), the biggest hurdle is losing touch with their data.
I've been building a Python package, Mito (https://trymito.io/hnc), to try to address this tradeoff for those who want to analyze data with the intuitiveness and data-first-ness of a spreadsheet, but with the power and traceability of Python. Mito is a Jupyter Lab extension which gives users an interactive spreadsheet that automatically converts your spreadsheet analysis to the equivalent pandas code. You can write Excel formulas, merge datasets, create pivot tables, etc.