Show HN: Google Drive to SQLite

nickcw · on Feb 21, 2022

We discovered (in rclone) that Google are deprecating the auth method you are using - the OOB auth here: https://github.com/simonw/google-drive-to-sqlite/blob/121509...

Here is the google page: https://developers.googleblog.com/2022/02/making-oauth-flows...

Here is the discussion on the rclone issue - there are links to more stuff there: https://github.com/rclone/rclone/issues/6000

Luckily rclone already implements the oauth webserver for other backends, but for tools like this it has suddenly made authenticating with Google a whole lot harder.

simonw · on Feb 21, 2022

Hah, wow thanks I hadn't seen that! Just my luck to implement that the same week they announce they are deprecating it.

simonw · on Feb 21, 2022

It's pretty hard to understand what I'm meant to do for a CLI app instead here. Are they removing all versions of the "copy and paste this code" auth flow?

Do I need to spin up a localhost web server on a port instead? How am I supposed to do that sensibly on a remote server that I'm SSHing into?

Also that announcement seemed to indicate that localhost: redirects were going to be disabled too, wouldn't that make the local web server option not work either?

I swear, obtaining an OAuth token to interact with my Google account is the new hardest problem in computer science. Oh for a GitHub-style personal access token!

simonw · on Feb 21, 2022

I'm collecting notes on how to address this here: https://github.com/simonw/google-drive-to-sqlite/issues/39

It looks like the localhost redirects are being disabled for "iOS, Android and Chrome app OAuth client types" so hopefully I can still use those by spinning up a localhost web server after all.

nickcw · on Feb 21, 2022

My reading is that a localhost web server on a port is the way to go.

This aligns with what nearly all the other cloud providers do with oauth.

> How am I supposed to do that sensibly on a remote server that I'm SSHing into?

If you want to see rclone's solution then check out

https://rclone.org/remote_setup/

It's a bit of a pain!

> Also that announcement seemed to indicate that localhost: redirects were going to be disabled too, wouldn't that make the local web server option not work either?

As you noted below that is only for Android, IOS and ChromeOS. I hadn't noticed that before though...

> I swear, obtaining an OAuth token to interact with my Google account is the new hardest problem in computer science.

Ha ha!

I note that you can get "rclone authorize" to do it if you want as you can set the scopes and the client id / client secret.

mey · on Feb 21, 2022

Considering the GCloud SDK CLI Auth uses the copy/paste this code auth flow for certain scenarios, this will be interesting.

aaaaaaaaata · on Feb 21, 2022

They literally couldn't care less.

Either you're a sticky customer and you'll figure it out,

or you're a high value customer that they'll sick an engineer or two on for a week to re-tool.

They do. not. care.

sjelkjd · on Feb 22, 2022

The replacement flow requires gcloud on the machine using the browser to login. The command is 'gcloud auth login --no-browser'

ljm · on Feb 22, 2022

There used to be one blog that explained how you could roll oAuth. Then v2 came out and they gave up.

It wasn't T.J Holowaychuk but it was that time. Google doesn't give me shit but back in 2010 there was one post that everyone was working against.

smcnally · on Feb 22, 2022

gTakeout[0] also includes this file meta and allows incremental downloads. Ingesting from a local Takeout archive as an additional source for your tool mitigates some of this auth wonkiness and depreciations.

[0]https://takeout.google.com/

rbosinger · on Feb 21, 2022

I dived into Datasette a bit last year. Simon has a whole set of tools around SQLite and data analysis with it. I haven't used any of it for anything major but I love to see what he's got going on. My Google Drive is out of control, I hope I can find the time to try this one out.

gwbas1c · on Feb 21, 2022

FWIW: If you find that you need something like this, consider using a different file storage product. Most likely, it's using SQLite and the schema is useful.

I used to be the lead on Syncplicity, a desktop file synchronization product. All of the local state was in SQLite. Early versions of the desktop client kept the complete path to every file in the associated row in SQLite. Later versions kept the filename and foreign key to the directory row.

Bittorrent Sync (I forget its new name) also used to use a SQLite database for local metadata.

And, finally: It's not that hard to write a tool to recursively scan a folder and grab all the metadata of files. You can scan 1000s of files in a matter of seconds.

20after4 · on Feb 22, 2022

So why use something else? Just because it's not that hard to write it?

20after4 · on Feb 22, 2022

Datasette and the various companion tools have all been a pleasure to work with. I've recently been using it to build my personal info archive as well as building a data analysis and visualization dashboard with Datasette and Vega lite. I've been impressed with the architecture and philosophy, as well as the code quality and ease of use in simonw's projects.

Just want to say thanks for sharing these valuable projects and keep up the great work!

I hope to be able to contribute something back to Datasette in the future. (code is at https://gitlab.wikimedia.org/releng/ddd/ but probably not anything that could be merged back into the core, it's just custom extension code on top of datasette)

simonw · on Feb 23, 2022

https://gitlab.wikimedia.org/repos/releng/ddd is amazing! This is exactly the kind of thing I've been hoping to see teams using Datasette for.

I love that this is a public repo - I'm definitely going to be pointing people to it as an example of the kind of thing you can build.

Thanks for sharing that!

20after4 · on Feb 23, 2022

You can see it in action here: https://data.releng.team/dev/-/ddd/dashboard/project-metrics...

A lot of inspiration was taken from https://github.com/rclement/datasette-dashboards. I started out by trying to extend that project. I ended up abandoning that idea and started fresh once I realized how much my requirements diverged from the capabilities and direction of rclement's plugin.

Thanks for making such a remarkably extensible architecture and especially for the great documentation. It's been a real pleasure working with datasette and I look forward to doing some more work on datasette plugins in the future.

NicoJuicy · on Feb 21, 2022

> Pulling the metadata—file names, sizes, file types, file owners, creation dates—into a SQLite database felt like a great way to start understanding the size and scope of what had been collected so far.

I would:

- sync the GDrive locally

- setup "search anything service" from voidtools

Voila. A super fast programmable index of the files.

If you want to dump it to SQLite, go ahead.

Ps. Applicable to Dotnet

Note: file owners wouldn't be known.

nijave · on Feb 21, 2022

If you have a lots of files in Google Drive, this can take a significant amount of time or you may not be able to sync it locally. I push incremental backups & disk images without a retention period to Google Drive (it's unlimited storage) so there's quite a bit of data shoved in there

It's pretty easy to get rate limited so it could take days or weeks to build an index (so having a tool that talks to the API is generally more reliable)

simonw · on Feb 21, 2022

In my case the Google Drive folder that was shared with me has about half a TB in it, so going via the API seemed like a better option!

NicoJuicy · on Feb 21, 2022

Agreed :p

samhw · on Feb 21, 2022

> The README is using a trick I'm increasingly leaning on: parts of that document - the --help output and the example database schema - are automatically generated using Cog

How does this interact with IDEs? Has anyone written a VSCode extension yet? I'd probably be reluctant to use something like this without IDE support. (Though I may well be in the minority, and I appreciate that for open source tools it's not necessarily reasonable to expect people to dedicate so much of their time to DX.)

simonw · on Feb 21, 2022

To regenerate the markdown file I run this command in the console:

    cog -r README.md

Presumably most IDEs have q mechanism that can run that command automatically, either when a file changes or just intermittently over time.

I use VS Code's markdown preview panel, which updates automatically when I manually run the "cog -r" command.

ash · on Feb 21, 2022

Nice! I didn't know about cog.

I've developed a simple "codeblocks" tool that to keep README.md up-to-date. In your example, here is how to insert the latest `--help` output:

    codeblocks usage-auth README.md -- google-drive-to-sqlite auth --help

Where "usage-auth" is the block language in README.md:

    Full `--help`:
    ```usage-auth
    Usage: ...
    ```

Checking is also possible:

    codeblocks --check usage-auth README.md -- google-drive-to-sqlite auth --help

It's not as generic as cog, but does not require writing codegen scripts inside Markdown file.

https://github.com/shamrin/codeblocks

_hl_ · on Feb 21, 2022

Notable alternative for emacs users: Github supports rendering .org files, so you can have a README.org using babel (just remember to execute all code blocks before pushing).

whateveracct · on Feb 21, 2022

hm I'm guessing..run the cog command in the VSCode terminal

tonymet · on Feb 21, 2022

An Alfred workflow that queries this index would be fantastic. Alternatively feed it into spotlight

bjtitus · on Feb 21, 2022

There's a Google Drive workflow for Alfred which I've used previously.

https://www.alfredforum.com/topic/17318-google-drive-%E2%80%...

RachelF · on Feb 21, 2022

The desktop Google Drive client actually stores the the file and sync info into a SQLite file.

soheil · on Feb 22, 2022

How does one post both a link and description in a HN submission? Was this added by a mod?

simonw · on Feb 22, 2022

I posted the link and a first comment with that description - I guess a moderator must have merged them together, unless that's a new undocumented feature of HN?

on Feb 21, 2022

[deleted]

anonymouse008 · on Feb 21, 2022

TELL HN: More things like this please -- though I have no real use for this tool directly, it's paradigms are super helpful in thinking through other projects I'm working on.

Thanks simonw - neat stuff

mixcocam · on Feb 21, 2022

Agreed. More uses of SQLite too!

I have been working with it to gather ecom sales data across platforms for a couple of months. Such a powerful tool! I want to do more with it!