Show HN: Query Your Sheets with SheetSQL

mritchie712 · 2024-03-13T14:42:20.000000Z

Very cool! I built a company around SQL and Google Sheets (https://www.seekwell.io/, acquired by ThoughtSpot in 2022). Plenty of opportunity there.

You get a lot of this for free in duckdb. e.g. if you run the query below from the duckdb CLI, it'll query the sheet[0]:

    SELECT *
    FROM read_csv_auto('https://docs.google.com/spreadsheets/export?format=csv&id=1GuEPkwjdICgJ31Ji3iUoarirZNDbPxQj_kf7fd4h4Ro', normalize_names=True);

We use duckdb under the hood at my new company (https://www.definite.app/), so we get this same sort of functionality for free.

0 - https://twitter.com/thisritchie/status/1767922982046015840

tarasyarema · 2024-03-13T14:58:17.000000Z

Yeah, that is actually super nice, the potential DuckDB offers to operate with data in such an easy way. With a bit of tuning on the auth side it's that easy indeed.

I think it is a pain point for some people, and it's great to see other teams working on similar products and being based on the same tech.

plaidfuji · 2024-03-14T00:48:00.000000Z

Don’t you also get this functionality from a combination of BigQuery, Connected Sheets and ImportRange nowadays?

tarasyarema · 2024-03-14T07:33:37.000000Z

Probably you could do something similar, the thing is that duckdb is much more lightweight. But at the same time if you end up having billions of data points BQ is there to scale for sure jaja

tobilg · 2024-03-13T16:15:26.000000Z

Nice! I built https://sql-workbench.com, which is based on DuckDB WASM as well, and can also query Google Sheet directly from the browser.

Here's an example for querying a Google Sheet:

https://sql-workbench.com/#queries=v0,SELECT-*-FROM-read_csv...

tarasyarema · 2024-03-13T17:06:28.000000Z

Indeed the tool is great, and the performance is amazing in the browser. I believe DuckDB has a ton of potential!

The same query in SheetSQL

https://ibb.co/dDdZ0vC

nathanwallace · 2024-03-13T14:38:36.000000Z

Readers may also enjoy Steampipe [1], an open source CLI to live query Google Sheets [2] and 140+ other services with SQL (e.g. AWS, GitHub, etc). It uses Postgres Foreign Data Wrappers under the hood and supports joins etc across the services. (Disclaimer - I'm a lead on the project.)

1 - https://github.com/turbot/steampipe 2 - https://github.com/turbot/steampipe-plugin-googlesheets

tarasyarema · 2024-03-13T14:50:10.000000Z

Oh wow, I did not know about this, thanks for adding. Curious, do you handle live updates of the data upstream, it would refetch the data each time?

nathanwallace · 2024-03-13T15:11:25.000000Z

Steampipe does live queries against the API endpoint and stores results in an in-memory cache. You can modify the cache expiration to adjust that behavior across queries [1]. If you want to do change data capture to act on results then check out the query trigger [2] in our Flowpipe open source project [3].

1 - https://steampipe.io/docs/guides/caching 2 - https://flowpipe.io/docs/flowpipe-hcl/trigger/query 3 - https://github.com/turbot/flowpipe

hashhar · 2024-03-13T11:07:46.000000Z

Very interesting. Seems conceptually (and implementation wise too) quite similar to https://trino.io/docs/current/connector/googlesheets.html.

Where are you storing the structure of the sheet though? Is it re-evaluated on each query? I am thinking about cases where a column header is renamed or the data types in a column are changed?

tarasyarema · 2024-03-13T11:32:54.000000Z

Oh, that's actually cool and thanks for the reference.

It's using a temporary cache that re-uses the latest data from the sheet (checking if it was updated) so that's why the subsequent queries should be much faster than the initial one.

Now about the headers, it's indeed an issue that I need to figure out, as there are some sheets that have weird structures and it would be nice to find an easy UXsy way to preview the sheet data and say "select from this range" or "these are the headers".

cldellow · 2024-03-13T14:25:55.000000Z

Google Sheets has an extensibility system: you write a React app that renders as a sidebar inside the sheets UI. This exposes a few additional API endpoints vs the vanilla sheets API. For example, you can determine the selected range, when a user has edited a value, changed sheets, etc. We use this in our add-on to help people select a range. Sounds like it might help for you, too!

The extensibility system also lets you show a modal dialog, which we use for previewing API calls to third party HTTP servers -- sounds like a similar thing could work well for you, too.

You could also look at supporting named ranges if you don't already, so people can refer to a range as `FinanceData` instead of `Sheet 1!A3:F90`

hashhar · 2024-03-13T11:52:53.000000Z

> it would be nice to find an easy UXsy way to preview the sheet data and say "select from this range" or "these are the headers".

Indeed. Defining the UX when the only UI you have is a SQL query is a bit challenging.

Very impressive project indeed and solves a real problem a lot of people and companies have. I wish you luck.

tarasyarema · 2024-03-13T12:39:30.000000Z

Will keep the site posted on this, but definetly is one of the things that worries me the most as of now.

Thanks for the comment!!

wanderingstan · 2024-03-13T11:04:19.000000Z

I’ve made it used an interface to Google sheets at every company I’ve worked at, and in every case it’s made me a hero to all the non-tech employees. Finally a UI they understand!

Often previous tech people had to built internal CRUD apps with Byzantine interfaces that trafficked in JSON or YAML files.

But being able to make changes to our app, or import/export data, and all via making changes to a google sheet—this feels like magic.

hobs · 2024-03-13T11:59:47.000000Z

How do you deal with the validation problems that crop up, the data entry problems that crop up, and the impedance mismatch between a traditional datastore and the sheet?

wanderingstan · 2024-03-13T12:05:44.000000Z

Google sheets actually has a pretty robust data validation setup (up to regular expressesions), and most business users are fairly good at handling detailed instructions.

But yes, for the inevitable times when they get it wrong, you just need some robust alerting system. Having it email/text me when anything goes wrong turned out to still be easier than writing our own CRUD app or fancy interfaces.

hobs · 2024-03-13T19:14:22.000000Z

I am really glad that worked! I am pretty paranoid about the downstream breakage of things but I probably just need to let go a little more.

tarasyarema · 2024-03-13T11:36:18.000000Z

Nice, actually I think the best target for this would be people that poweruse sheets and have some SQL knowledge. I see teams that eventually would develop tools or use data lakes etc instead of sheets, but as a first tool to quickly operate over the initial startup sheet drama I think it could be powerful :D

nunez · 2024-03-13T12:47:32.000000Z

it's almost always required. the API for Sheets and its documentation isn't great. when I dug into sheets to make my receipts tracker, I found myself having to transpile their HTTP REST API docs into Golang, which was difficult since Golang has special objects for many kinds of requests you'd make in Sheets.

tarasyarema · 2024-03-13T11:45:39.000000Z

If someone want to try the PRO version here's a promo code to use it for free the first month "PGRAHAM" it's limited to the first 50 uses :D

x2bool · 2024-03-13T15:45:55.000000Z

There is similar open-source project for Google Sheets: https://github.com/0x6b/libgsqlite (SQLite extension), which is a fork of https://github.com/x2bool/xlite (same idea but for Microsoft Excel spreadsheets)

Natfan · 2024-03-14T22:31:32.000000Z

very cool, here's a pattern i've used in the past for integrating with google sheets using rclone[0] and PowerShell[1]:

    rclone config # setup a new config with the "drive" configuration
    rclone copy file.xlsx . # copy the file to local storage
    $Data = Import-Excel file.xlsx # using the ImportExcel[2] module, we can read this file as if it were a csv
    $MyData = $Data | Where-Object { $_.ColumnName -eq "MyColumn" -and $_.MyOtherColumn -match "\d+" }

[0]: https://rclone.org

[1]: https://learn.microsoft.com/powershell

[2]: https://github.com/dfinke/ImportExcel

plaidfuji · 2024-03-14T00:56:18.000000Z

I’ve solved a similar set of problems in my current role by piping all of our important sheets into BigQuery, saving/executing the SQL logic there, then making the result available as views that can be sent back to Sheets through Connected Sheets (or practically anywhere else eg Tableau via native connectors to BQ). For small datasets that fit in spreadsheets, the storage and query usage in BQ is practically free.

The trick (and I assume you’ve found ways around this) is dealing with the flaky Sheets API that throws a 500 error what feels like 1% of the time, and also deciding when/where/how to enforce the SQL data types and column headers. I made a config layer for the latter.. the interface to which is also a Sheet, lol.

qixxiq · 2024-03-13T15:37:07.000000Z

I've been using Google Sheets as a data store for some of my latest projects, and it's really (mostly) delightful.

https://www.wheremoneygo.com barely has a database, and bedsides protecting the Google access keys has minimal privacy/security risk.

The main risk, and something I've only seen once, is a corrupted database on the Google Sheets side. Since I don't want to keep backups of peoples sheets, if something goes wrong on the Google-side the entire database could be lost.

nagstler · 2024-03-14T00:24:18.000000Z

Very interesting, good to see such use-cases around google sheets, much needed. I'm building an open-source Reverse ETL tool to run SQL query from data warehouse and send results into Google Sheets and many more. (https://github.com/Multiwoven/multiwoven)

All the best :)

numlocked · 2024-03-13T11:17:42.000000Z

Cool! How do you deal with irregularly shaped data? For instance if I have two levels of headers, the first of which spans multiple columns via merged cells?

tarasyarema · 2024-03-13T11:33:37.000000Z

As I replied to @hashhar, the header thing it's indeed an issue that I need to figure out, as there are some sheets that have weird structures and it would be nice to find an easy UXsy way to preview the sheet data and say "select from this range" or "these are the headers".

tarasyarema · 2024-03-13T11:34:09.000000Z

Note tho, that vertically merged cells I think should not be an issue probably, but would need to QA jaja

brap · 2024-03-13T11:49:22.000000Z

This would be interesting to use not as an external UI but rather completely inside of Sheets. Maybe as a function or something like that.

ramraj07 · 2024-03-13T12:17:12.000000Z

Google sheets already has an inbuilt QUERY function that'll do basic sql..

gpderetta · 2024-03-13T12:38:29.000000Z

it is better than messing around with vlookup (what isn't), but only a tiny subset of sql is supported (with dedicated syntax). Also it claims to support actual human readable column names but as far as I know it doesn't work and you have to use column labels. Also no joins.

Still, if it were to be extended to a full query language with good ergonomic it would be amazing.

tarasyarema · 2024-03-13T13:03:09.000000Z

Indeed, you can run queries in a sheet, but

1. It's writing a query in a single row cell which is hard from a UX perspective 2. SheetSQL offers you a way to cross join sheets as if those where tables, even between worksheets, which is something that is not supported natively 3. Full DuckDB SQL support, so you can use advanced functions like PIVOT or window functions

antender · 2024-03-15T11:37:57.000000Z

This is not exactly true, QUERY pseudoSQL is available, but as a separate API not listed in Sheets Docs: https://developers.google.com/chart/interactive/docs/queryla... I used it as a DB for Cloudflare Workers, the only hard part is Google Auth.

tarasyarema · 2024-03-13T11:50:59.000000Z

Yeah actually I'm thinking on making an extension that you could use from the sheet page. Do you think that would make sense?

eezing · 2024-03-13T20:54:13.000000Z

Can do this for free via GCP BigQuery console.

tarasyarema · 2024-03-13T22:37:26.000000Z

Is this something that can be done without being in Google? And it would not be free in the end right? As BQ is not free in the end either...

andybak · 2024-03-13T14:44:10.000000Z

Needs more docs and more examples. Either I'm dim or I'm getting the syntax on a simple SELECT..WHERE wrong.

tarasyarema · 2024-03-13T22:36:35.000000Z

Could you email me maybe some of the issues you encountered so I could help and find a good way to add the docs in the page?

andybak · 2024-03-13T22:51:47.000000Z

I'd rather do it here so everyone can join in.

I'm not getting a quota error despite not using it all day:

> Could not run query

> Error: Monthly free usage of 100 exceeded,

It's saying I've used 188 out of 100 which is odd. I haven't used 188 - more like 15.

Earlier I was trying a query like:

> SELECT * from <table> where Status="Complete";

And it was complaining that Complete wasn't a valid column... (rather than Status...) I also had weird issues trying to replace * with an actual column name. The autocomplete went crazy.

Not sure if it makes a difference but I'm a stubborn Firefox user.

tarasyarema · 2024-03-14T00:03:03.000000Z

Ah, I see, could you try now? The quota should be fixed, my bad there, had a bug.

Now about the query, DuckDB works in the following way, the query you sent should be written like

> SELECT * from <table> where "Status" = 'Complete';

Now this is because the `"` is used for the table and column names, and the `'` is used for raw string. See more here https://duckdb.org/docs/sql/introduction#querying-a-table which might be a bit different from other SQL database engines.

And finally, regarding the autocomplete, it indeeds tries to autocomplete in a too verbose way, which is something I'll try to improve.

Thanks for reaching out, and hopefully this can help you try it out!

andybak · 2024-03-14T08:08:48.000000Z

Yes. Obvious in hindsight but - this underlines my point that a few examples would make life a lot easier for people casually trying it out.

tarasyarema · 2024-03-14T08:14:41.000000Z

Totally, In fact I think it would be nice to have a sort of interactive example based on one of your sheets, definitely! Thanks for the feedback.

mdavid626 · 2024-03-13T11:23:46.000000Z

I get a certicicate error - invalid authority. When I proceed: FortiGuard Intrusion Prevention - Access Blocked

rad_gruchalski · 2024-03-13T11:52:48.000000Z

Talk to your corporate IT. They are not mitm-ing you transparently enough.

mdavid626 · 2024-03-17T08:25:00.000000Z

It's strange. Never had this before on any other site. It's not even corporate IT, it's just the building's internet connection.

tarasyarema · 2024-03-13T11:31:05.000000Z

Could you reach out to taras@sheetsql.io with a screenshot or something, so I could have a look at what could be going on?

mdavid626 · 2024-03-18T12:18:16.000000Z

Yes, I reached out.

mdavid626 · 2024-03-21T16:29:34.000000Z

Fixed

mikeyinternews · 2024-03-13T14:06:15.000000Z

That's pretty cool. So do they load your sheet into their database from which you write SQL?

tarasyarema · 2024-03-13T22:38:06.000000Z

No, it's done dynamically over the data, no loading nor anything. In fact it's streaming the queries with the power of DuckDB :D

throwaway295729 · 2024-03-13T11:56:39.000000Z

Looks great!

Is there a rate limit?

Will changed to the sheet instantly go live? Or is that a manual process to clear cache?

tarasyarema · 2024-03-13T12:07:14.000000Z

I have the GCP sheet basic read quota limit, which is something I need to investigate if I can make it higher.

Also, in theory right now once you use a sheet once it will cache the version so that subsequent queries are faster, and in theory if you change the sheet it will check for that and use the live data if it got updated in the meantime.

Nevertheless, it might take a minute or so sometimes.

replwoacause · 2024-03-13T13:22:31.000000Z

This looks awesome, nice work!

untech · 2024-03-13T11:20:05.000000Z

Very cool! I am often frustrated by the lack of SQL features in Google Sheets.

jimnotgym · 2024-03-13T20:28:43.000000Z

Can you tell me why this is better than using Power Query in Excel?

tarasyarema · 2024-03-13T22:35:58.000000Z

I guess this is more focused on Google sheets and the limitations of it, i.e. people that do not operate with Excel files basically

blondin · 2024-03-13T12:58:33.000000Z

great work on the first iteration!

tarasyarema · 2024-03-13T13:01:27.000000Z

Thanks! First time deploying a project like this on my own, so nervous to see how the users use it, or if they do it at all jeje

diordiderot · 2024-03-13T13:43:06.000000Z

Is the site MKDocs?

edit: looks like sphinx

tarasyarema · 2024-03-13T13:45:50.000000Z

:D Actually completely accidental, as I'm using the chakra ui for the frontend part

rvba · 2024-03-13T10:58:54.000000Z

I would make the mid tier limited to 10k (100k?) queries and add one more plan.

tarasyarema · 2024-03-13T11:30:06.000000Z

That is an interesting thing you are raising here. I think right now the point of the paid plan is the scheduled queries, which are the more compute expensive, that's why the pricing like this.

Now, I do believe that I may just increase the free allowance, and maybe if I see people that want to power use normal queries and have maybe a few scheduled ones have an mid tier plan (or even on demand) that allows more queries and a few schedules.

And eventually move the PRO to be a really PRO plan.

blowski · 2024-03-13T11:05:09.000000Z

Why's that? I'm terrible at making decisions on pricing, so I'm curious about people's thought processes.

itsoktocry · 2024-03-13T11:23:00.000000Z

There's some weird nudging psychology that says when when people see an expensive and free option, the mid-tier seems even more appealing, which makes it easier to get them paying.

Honestly, I wouldn't worry about it. If your tool is useful to people, I think your query limit on the free-tier will be enough.

rrr_oh_man · 2024-03-13T11:23:30.000000Z

Probably some kind of psychological pricing strategy.

However, charging for queries would be like ChatGPT charging in tokens — yes, it's understandable, but non-intuitive. And very product-, not user-driven.

I'd rather keep it simple in terms that a user can understand. Really like how you set it up right now — makes it easy to say yes.

The _enterprise_ game is the one where you will make the real money: Teams, Management, Integrations, Access Rights, Billing cycles, Onboarding, etc.

rvba · 2024-03-13T19:38:41.000000Z

There will be that one user who makes a gazillion queries just because they are unlimited.

Or the second tier should still be capped. But that looks wrong feo marketing point of view.

niutech · 2024-03-13T12:01:25.000000Z

You can just import your CSV file to SQLite without any limits, e.g.:

    sqlite3 :memory: -cmd '.import -csv file.csv table'\
    'SELECT col, COUNT(*), AVG(total) FROM table GROUP BY col'

More: https://www.sqlitetutorial.net/sqlite-import-csv/

shubhamjain · 2024-03-13T12:30:04.000000Z

The problem is you need to create a table with the correct data types first, which can be time-consuming. Additionally, using CLI can be a bit cumbersome to use, especially if it involves quering multiple tables, or using joins. I don't want to point to the infamous "dropbox" comment [1], but it's important to understand that many folks just want a friendly interface where they can put in a URL/file, and start querying.

This is the exact reason I built TextQuery [2], which solves a similar problem as OP's. I just wanted a simple app where I could drop a CSV and start SQL querying it, without creating schemas or writing code/commands.

[1]: https://news.ycombinator.com/item?id=8863

[2]: https://textquery.app/

niutech · 2024-03-13T12:53:58.000000Z

In the link I posted there is a GUI way of importing CSV files by using the open source SQLiteStudio: https://sqlitestudio.pl. Your TextQuery also looks nice, although it's a pity it isn't open source.

tarasyarema · 2024-03-13T12:28:46.000000Z

It is indeed an option, I guess the problem with this is that its static and and you need to re-create the tables each time.

For someone more techie probably could work, as you could dump the data to sqlite and then query it. But if you want to have it dynamically and from sheets in real-time probably it would be harder to handle this way.