Hacker News new | past | comments | ask | show | jobs | submit login
Excel and SQL (datanitro.com)
93 points by karamazov on May 23, 2013 | hide | past | favorite | 65 comments



I might be laughed off of HN for even suggesting this, but why not use Access and VBA for something like this? Most computers that have Excel installed will already come with a copy of Access. Plus Microsoft built the three to work together pretty seamlessly. It might not provide all the same advantages of Python and other database software, but it is easier to pickup and is a huge improvement over the old Excel as a database solution.


Access is for people who should be programmers but aren't yet. If you find yourself thinking about Access it's probably time to hire some proper programmers or get some proper programming training rather than a SAMS teach yourself VBA in 21 days book (OT, do they still have them?).

But most businesses won't realize that/be too cheap to spend the money to save much more money. Although it's much better these days than it was 5 years ago.

Though I know a lot of MI people where most of their job could probably be automated.


Speaking as a Real Database(tm) bigot, Access has two key advantages over Real Databases.

1. It's there.

Most large organisations stump up for Office Pro, and Office Pro includes Access. The corporate policies prevent you from installing the Real Database of your choice -- you can only use what's already installed. Happily, that includes Access.

2. It's upgradeable to a Real Database.

Microsoft make transforming Access into a true multi-user SQL database fairly straightforward: install SQL Server and run the upgrade Wizard.

If SQL Server is not your personal favourite Real Database, then with a bit more work you can get Access to talk to something else via ODBC. Not as seamless, but still a clear upgrade pathway.

One of the projects that made me realise I wanted to be a developer and not a lawyer (long story) was an Access database I wrote for my part-time job. An errors-tracking system. I calculated that it saved the company 35 hours of manager time per month.

What did they have before that? A physical book, typed into an Excel spreadsheet once per month.

Access is a tool with unique bureaucracy-dodging properties. It's important not to discount those.


Seconded- Access was my gateway drug into development, too. Well, not quite- I knew I wanted to do it and had played around with PHP in my spare time, but after graduating college I found myself in a crappy administration job. They had a crappy Excel-based tracking system, I replaced it with Access. Then that grew into a SQL Server-backed VB.NET application. Then I left, went into another .NET job developing C#, then journeyed into the world of Node, Python and Ruby.

TL;DR: Access is a great place to get started.


> Access is for people who should be programmers but aren't yet. If you find yourself thinking about Access it's probably time to hire some proper programmers

I hear this all the time, I wonder if you could expand on your reasons for this belief. Is Access always the wrong tool for any job?

I know of a few smallish (5 to 25 or so employees) that have been running their companies for well over a decade on quite large and complex custom written Access applications that were developed for far, far less than it would have cost to have it done "properly".


There are hard limits to Access's internal engine (JET, IIRC) which you're likely to run into as time goes on. The unfortunate thing is that there isn't anything you can do to avoid the limits or recover once you've hit them.

On the other hand, you can install Postgres under Windows and then build an Access frontend over ODBC. This gets you the best of both worlds, IMO, at the cost of running a kind of odd stack. Cheaper than SQL Server though, and I like the maintenance story better.

TBH the last time I considered a solution like this I was consulting, so it was a few years ago, and all my clients either didn't sign up at all or went for a custom web solution instead, so I don't have a lot of war stories about this platform. I do think it would work though.


JET's days are long gone. The SQL Server Desktop engine is the underpinning for Access now. It essentially _is_ SQL Server. The biggest limitation? The concurrent transaction count is limited to 5. There are others, but it is a far, far cry from that prone to corrupt, slow beast JET.


Actually, in the latest Office 2012, Access apps are now three tier apps with SQL Server 2012 or SQL Azure backends.


Point of clarification, concurrent transaction count is limited to 255. http://office.microsoft.com/en-us/access-help/access-2010-sp...


JET isn't all that bad. It can be used to run SQL queries CSV files. As an ETL tool it's pretty flexible.


It's amazing how much things change when you're not paying attention. Thanks for explaining this.


One of my clients has a poorly designed access system (using, now, mssql as the backend). It's not access's fault that the system is shit, it's the developer's fault.

Bad devs write bad code. Beginner bad devs tend to write their bad code in Access (or Excel). It gets the job done --- until it doesn't.


I agree with most of what you are saying, but I just don't see how this tool solves that problem. Both these options seem to be middle steps between the layman and full fledged software development. The Access/VBA route is slightly closer to the layman side while the DataNitro route is closer to the programming end (if you are the type of person who thinks Excel is a great front end for your database interactions, you probably aren't completely fluent with SQL). I don't see what advantages DataNitro provides that are worth jumping through the extra hoops.


Once you are going down the road of actually writing code you may as well just take the plunge and use a nicer language than VBA.

VBA isn't really going to be any easier unless you already have visual basic experience in some capacity.

If you use a "proper" database you're also going to get advantages in robustness and ability to easily deploy over a network. My last memories of using access for anything (admittedly about 10 years ago) was that it was prone to performance issues and data loss.


It also doesn't do some extremely basic things properly, e.g. foreign keys. Basically Excel -> Access is easy. Excel -> SQL is easy for a proper programmer. But Excel -> Access -> SQL is a complete nightmare.

In the end Excel -> Access is a short-term cheap, long-term very expensive decision. It means you invest a lot of money and effort in a tool that can't do databases properly and can't do forms properly and when you want to move to something that can do either properly you have to do both completely over again without any reusable code as no-one uses anything vaguely resembling VBA any more and Access as a DB that positively encourages the inexperienced to do a lot of things wrong.

I am sometimes perplexed why MS hasn't released VBA# yet. It's like there's some petty war that has been going on for the last 7 or 8 years between the office team and the .Net language team.


Bringing proper development tools to Excel is the single biggest thing Microsoft could do to undo decades of damage Excel has caused.


Access can be a convenient way to distribute structured data that is relatively static to other people, such as colleagues in another office, when it is not worth the effort to get approval for and implement a longer term solution.

I'm surprised that Microsoft has stopped including it in the more basic versions of Office, since it encourages people to seek better solutions who otherwise might stay completely in the Microsoft ecosystem and eventually move on to MS SQL Server.


> Access can be a convenient way to distribute structured data that is relatively static to other people, such as colleagues in another office, when it is not worth the effort to get approval for and implement a longer term solution.

Right, but that's an artifact of broken bureaucracy -- its not that Access is the right tool for the job, its that its the best tool left when bureaucratic controls are misapplied to prevent the use of the best tool for the job.


I'm not convinced that Access is automatically the wrong tool for every job. I worked with a team that had remote salespeople that were all over the place, with laptops and a custom VB custom configuration tool that would allow them to assemble a quote amongst thousands of parts. I can't imagine any proper grown up database would have been the appropriate solution for this kind of work, especially since they're not connected. It's the same reason why someone might use sqlite3 in a desktop app today.

I also worked in an office full of scientists that were savvy enough with Excel but absolutely not programmers. One of them threw together a Access project to track a bunch of internal data about test results with a functional, but non-fancy forms for data entry. It kept the data clean and portable, and dumped the data out into Excel as needed. If we really needed it, we could have gotten a database installed at some expense and gotten our outsourced IT manager to back it up. But why go through the hassle?


> I'm not convinced that Access is automatically the wrong tool for every job

Neither am I (though I tend to view the idea that it is the right tool for any job with skepticism); my point above was that the particular scenario pointed out in the post I was responding to indicated that it was the "right" tool, insofar as it was, because of bureaucratic barriers to selecting certain other tools rather than purely technical suitability.


If you have a single user problem, Access is fine.


Looks like this is aimed at development teams that have traders/Gov't Excel monkeys as their customer. This is really the less evil of the two options (other being VBA), and when dealing with those types of customers in organizations where developers typically are the least valued cost center... well, it's not a bad compromise.

If you have had traders ask you for 'some Excel program that has live streaming price updates along with live pricing model params from our internal database' or some crap like that, believe me, this would definitely be worth the cost over using/maintaining spaghetti code VBA.


Is Python/DataNitro really "One of the easiest ways to connect Excel to a database"?

I can go to the Data tab, click on the ribbon item "From other sources:From SQL server" or "From other sources:From Microsoft Query" in far less time than it takes me to read this blog post.

Am I missing something? (other the possibility for SQL injection attacks by users of that spreadsheet)


Also, one do it the other way around: use SQL queries to read (and update!) Excel files. For examples, see http://support.microsoft.com/kb/306397 and http://sqlserverpedia.com/wiki/Access_Excel_Sheets_as_a_Tabl...


Can you use it with arbitrary sql (non MS) servers on other hosts?


Yes, the latter is an ODBC option, which gives you arbitrary connectivity.


I think an even easier way is to use .NET/Mono with the open source library called EPPlus which lets you read and write Excel 2007 (Office Open XML/xlsx) files.

http://epplus.codeplex.com/


Excel supports querying databases out of the box - Data > Get External Data, no coding besides writing the query required.


Yes, and with the new 2013, it has some good OData stuff too.

Whilst this is a blog post by the party who have created the library, I also disagree with some of their assertions.

Hosting it in a sharepoint type thing, with track changes on, multiple users work quite well indeed. Not to mention that if someone is doing modifies or deletes, I'd much, much rather have that kind of history (hell even git/svn) than having a database without an audit setup. Given the amount of work involved in setting up an audit system, merging it into the excel UI they've just created, I really can't see the point he is making, or where he is coming from.

In fact I wouldn't really suggest people moved away from Excel for the volume of data he speaks of ether, it is very easy to backup (host on sharepoint or similar) incredibly easy to share with the people work on it.

What I would say for it being time to move is when you have a relationship then its damn well time to move.


Pre-SQLite, I used the Excel OLEDB provider for a simple data store. Basically, each sheet is a table, and once you have a connection, you can run SQL against the workbook. "SELECT * FROM [People]" works mostly as expected, and ranges can be specified like "SELECT [People$A] FROM [People$A2:Y314]"


Its no substitute for a real db but, if you are stuck in Excel, I find that Microsoft's Power Pivot is a good transitioning step. It comes free with most licenses and is pretty well supported.

It adds SQL Server functionality to Excel, speeding up large queries, adding SQL-like query functionality and greatly extending the limits on data, such as the approx one million row max.

http://www.microsoft.com/en-us/bi/PowerPivot.aspx


> ticker text, position integer, purchase_price real

Oops, bad example. Never store money using a floating-point data type! SQLite has a NUMERIC type that preserves the exact value of your decimal amount.


Excel can easily export CSV files, which are easy to parse. However, standard Linux tools aren't great for working with CSVs, especially ones with multi-line content (eg exports from a CMS).

As an experiment, I wrote a quick ruby script called csv2sqlite which parses one more CSV files (and their headers), and automatically populates an SQLite database based on the CSV.

If you have a CSV and want to easily know how many records it has, or to filter or join these records, it can be just a matter of running something like following:

ruby ~/csv2sqlite/csv2sqlite.rb baby-names-10.csv --output babynames.db

sqlite3 babynames.db "SELECT * FROM baby_names_10 WHERE percent > .05;"

Hope it helps you!


"Excel can easily export CSV files, which are easy to parse"

'CSV' and 'easy to parse' do not go together that well http://en.wikipedia.org/wiki/Comma-separated_values#Toward_s... also is instructive:

Nevertheless, RFC 4180 is an effort to formalize CSV. It defines the MIME type "text/csv", and CSV files that follow its rules should be very widely portable.

[...]

Each record "should" contain the same number of comma-separated fields.

[...]

Fields containing a line-break, double-quote, and/or commas should be quoted.

[...]

The format is simple and can be processed by most programs that claim to read CSV files.


  $ irb
  >> require 'csv'
  # => true
  >> CSV.parse(file)
^ works every time I've tried it. Systems which export mangled data to CSV probably do so elsewhere, so CSV isn't special there, and CSV is really really simple to escape well enough that any decent parser won't have any problems at all.


Excel's CSV export has a lot to be desired. From what I've tried, it doesn't export data in Unicode and exports non-standard CSV that many parsers choke on.


Excel exports CSV in UTF-16 with byte order mark (BOM). Very annoying if you're sending the data to another application that is expecting UTF-8.


I've had pretty straightforward success with Python's built in CSV parsing. Not sure what you mean by "Linux tools aren't great for working with CSVs" exactly. I guess maybe Python's not a "Linux tool," but it's sure easy to use for quick stuff from the shell.


I imagine he means things like awk, sed, sort, etc., that are standard unix tools. The newer versions of gawk have extensions that are designed to help handle CSV files, but, unfortunately, Arnold Robbins opted to make it "some assembly required." Given the fiddlyness of parsing CSV files and their ubiquity, hard-coding a CSV mode probably would have been a better solution.


Not to be too obvious, but what about plain old Microsoft Query? I pull data into Excel from time to time with that.


Not sure of the audience for this. If you understand SQL you'd already know most of it... The rest use Excel because that's what they understand.

In this case you have someone that understands SQL and Python, but still wants to use a spreadsheet. Applying the predicates at the start, this person should dump Excel entirely - they can get much better integrity using SQL and Python alone.


There are many, many organisations which use Excel sheets as de facto standard interfaces, or "applications" you have to use, because it's the only common tool the people with the domain knowledge have access to and are trained in, and sharing them is trivial - just email a .xlsx. I've seen government regulators supply template Excel sheets as the mandated standard by which private companies must, legally, submit data.

Being able to interact with those spreadsheets in a sensible language with a front-end that doesn't suck would be huge.


The person using Excel doesn't have to be the person writing the sql :)

This could we useful to integrate Excel sheets & data with other systems without too much manual intervention.


I don't understand the point about backups. I would have thought a database was more complicated to back up than an excel file.


I'm not saying it's the smartest thing to do, but Excel can most certainly handle data on the order of tens of thousands of rows. It's not blazing fast, but Pivoting that much data is not a problem at all.


As a heavy user of Excel+VBA I would love to be able to switch to coding in Python. I detest VBA with a passion. Plus the DB stuff is nice. It's much better able to return results as a formula array vs the native MS facilities which populate ranges with static data (and potentially clobber existing content). Plus it also means your data is refreshed with a simple Shift-F9.

I actually wrote a wrapper for the ADO/Jet DB engine in VBA which does exactly this [1]. However, doing it all in python would be a heck of a lot easier.

1) {=DB_QUERY("/path/to/.csv|.xls|.db","SELECT * FROM....")}


Out of curiosity, what do you hate so much about VBA?


I have had the unfortunate pleasure of having to integrate SQL and Excel for my start up. A provider's tool that we need only works in Excel, so I have to bring data from SQL into Excel, process it and then put it back into the DB. Then all of our real data processing is built with python. So far I have just used VBA/ODBC to handle everything, but for obvious reasons, I hate it.

I'll definitely give this plugin a try.


"DataNitro is embedded in Excel." So users need to install something? Yuck. That doesn't work for us.

We use handsontable (like Google Docs spreadsheet) with a MySQL back end. Users just need a web browser to edit data. I wouldn't call it a replacement for Excel, just a way for non-technical people to edit info in a database.


Just curious, why does installing software on end-user devices not work for you?


Oof, DataNitro being Windows-only is a major bummer. Something like this could be hugely useful for slightly less technical analysts on my team.

Anyone know of a Mac OS X friendly alternative?


It's not an exact alternative but I'm using Google Spreadsheets with Python and SQLite. It's webapp, it's in the cloud and always synced among the team. There are some API performance issues since you're working with a remote server but it's OK for me.

Some API wrappers I've used: a slightly outdated official wrapper gdata-python-client [1] and a bit more convenient gspread [2].

[1]: https://code.google.com/p/gdata-python-client/

[2]: https://github.com/burnash/gspread/


That's really interesting. Is any of your GDocs/Python/SQLite setup publicly available?


As an alternative RStudio [1] offers a decent GUI on top of R, which can load data via SQL. It doesn't give you WYSIWYG editing of data, but you can manipulate data frames as if they were SQL with the sqldf package [2]. The learning curve is very steep though so it's not a perfect alternative.

[1] : http://www.rstudio.com/

[2] : https://code.google.com/p/sqldf/


Thanks. I'm more comfortable with R, but I'm trying to wean Excel junkies off of Excel, and something like this (Python and SQL within Excel) seems like a good gateway drug. Sadly, RStudio is a bit too big of a leap for that.


What is the goal in trying to wean people off Excel?

Is it just to get them on an open source solution or is it to replicate the functionality of Excel in something you've written/control?

Business users already know how use Excel. Spreadsheets were the original killer app. They have transformed business and I'm not convinced we've moved beyond their usefulness. It's the same argument as trying to reinvent SQL syntax for the NoSQL flavor of the month. Why try to change what your users are already proficient at? Why not instead try to feed data to that software in a more seamless way?

I may have gone ot from what you meant, but I'm interested in what you meant by the "wean Excel junkies off Excel" comment.


Really, the goal is to use the right tool for the data analysis they're doing.

90% of the time, that's Excel; they're looking at reports that can easily be pivot tabled or VLOOKUP'd to get what they need, or use the strengths of the UI and auto-updating to get the formulas they want working in an easily tweakable fashion. For those cases, things like R, SQL or Python can be overkill, especially if you spend more time preparing the data format than it would take to use Excel, let alone analyze the data.

However, 10% of the time, they're pushing Excel beyond the limits of what it can handle and wasting time as a result, spending an hour doing massive VLOOKUPs between two huge spreadsheets looking for an answer that SQL can answer in a heartbeat. Those are the use cases I want to solve for; where the advantages of statistical software, scripting languages and relational databases overtake the powerful and convenient simplicity of Excel.


Agreed. Excel bashing often comes from web developers who lack real business experience in a gritty production environment.

Belittling Excel is an effective way to burnish one's programming credentials.

I know many languages (Flex, Html, PHP, JS, C# etc). Excel and VBA have their place, especially for very rapid app development.

Web apps are perfect for trapping data. However, output is best handled in Excel. The first thing people ask when getting a report is "How can I get this into Excel?". People like to play with their numbers.


Yes, DataNitro only works with Windows versions of MS Excel. Maybe a VM/Parallels with Windows + Excel would work for you?


Which database is best for stuff like this?

And works with Python.


Python has bindings for pretty much everything, so I'd imagine that the database you're most familiar with, or that there is the most information on (most likely MySQL, Postgres, or SQLite)


I'd suggest SQLite unless you had a ton of users interacting all at once (if it's something involving editing excel spreadsheets, I doubt you'd have a problem with this).


Slightly off topic, but has anyone built an Excel-esque program on top of SQL? I feel like something of the sort could really be an improvement, as now it'd be easy for end users (who normally use excel) and developers to collaborate on the same data.


Like the article says, SQLite is a good choice when you're starting out. It's included in Python and is great for the basics.

Once you have a lot of data, it really depends on your specific situation. If you don't have any preferences, you can try something new - MongoDB is interesting.


MongoDB/NoSQL is not for the type of analysis mentioned in the article and is NOT a replacement for a spreadsheet.

Out of all the nosql options, why pick the one with the most broken default configuration? People complain that MongoDB is has a bad rep when used for things its not meant to, and then its constantly offered as solution.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: