Neural Databases

wokwokwok · on Oct 17, 2020

I came hoping to find something like neural radiance fields [1] but for generic datasets (ie. latent representations of data that allow you to store the 'data' without storing it explicitly row by row).

...but, this is:

> What if, instead, data and queries can be represented as short natural language sentences, and queries can be answered from these sentences?

Useful for some things, but not useful for most things.

[1] - eg. https://www.matthewtancik.com/nerf

suifbwish · on Oct 17, 2020

Tensorflow 1.15 for python3.. good luck finding that anymore for a Linux repo.

Tensorflow needs to stop breaking their entire codebase every 10 or so minor versions and also make their archived versions for all OS readily available. Do a pip3 install tensorflow==1.15 just for an illustration of what I am talking about.

There are hundreds if not thousands of extensive code TF projects on GitHub that are now almost completely useless because the libraries they require are not readily available. Sure you can use them as a template to translate into TF 2 but as a keeper of an important open source library, tensorflow team has been far less than diligent at making sure projects containing their code do not break as tensorflow pushes out updates.

unishark · on Oct 17, 2020

Have you tried installing it through anaconda? I was using tensorflow 1 (1.14 maybe?) just a few months ago (once I realized tensorflow 2 was running twice as slow for some reason). I think there are some really big projects that still can't use tensorflow 2.

But I certainly agree with the frustration with google breaking everything. If I wanted to use Chainer (or for that matter Keras) I would just use that. I don't need tensorflow to imitate it for me and break the alternative approach I was using.

MichaelBurge · on Oct 17, 2020

Try the "tensorflow/tensorflow:1.15.0-gpu-py3" Docker container. Most Linux distributions support Docker, and Nvidia's been working on supporting GPU access in them.

suifbwish · on Oct 21, 2020

Oh very nice! Thank you for this. Docker has never been my go to as I use virtualenv for most of my projects but it’s things like this that make me begin using new things.

ssivark · on Oct 17, 2020

Since there are sibling comments wondering what this means, here’s a perspective:

A database (simplest case is an array or a dict) isn’t fundamentally different from a function in its interface. To both of them you submit a query/argument and they return some value. A “function” might “compute” the value after the query is submitted, while the database might “store” a pre-computed value. Especially in physical contexts where there is a natural continuity in the thing being represented (think object density/radiance, as opposed to student names), it makes sense to also have the option to interpolate among records in a database.

Neural networks (being generic learnable function approximators) are a nice framework to straddle this boundary for generic cases.

In the specific example of the Nerf, we use a neural network rather than discrediting space and assigning a value per pixel/voxel. A consequent advantage is that we can have an a data description that is adaptive non-uniform in its information density, unlike a voxel grid.

There’s more to say, but I hope this gives a feel :-)

sixdimensional · on Oct 17, 2020

A shortcut to storing (and possibly also querying) data is to store (or "query" within) the function of that data? Seems to be in-line with what you wrote, and in essence I've always looked at neural networks as a kind of compression into complex functions of encoded data.

my999 · on Oct 17, 2020

> "I came hoping to find something like neural radiance fields [1] but for generic datasets (ie. latent representations of data that allow you to store the 'data' without storing it explicitly row by row)."

In that case the system doesn't have transparency and control. Imagine you are given a fact that "Alice works at Google" and 1 month later you want to remove that and add "Alice works at Facebook". In fully latent representation model of the data, you can not guarantee this.

For similar reasons, you can not guarantee preventing implicit biases in the data. Pre-trained generative models tend to hallucinate facts, for example you have two facts: "Alice works at hospital", "Bob works at hospital", if someone asks what does Alice/Bob do? You don't want your model hallucinates Nurse/Doctor.

jarym · on Oct 17, 2020

Thanks for the nerf link - nice to start a day with something this impressive!

pama · on Oct 17, 2020

Here is the previous discussion of nerf: https://news.ycombinator.com/item?id=22637721

alpineidyll3 · on Oct 17, 2020

Exactly my thinking. I think this is the year of compressing tabular data with differentiable programming tho, instead of the fun flashy incremental woo projects. Certainly it's what I'm up to...

foota · on Oct 17, 2020

Wow, that seems way beyond what I would have expected the state of the art to be. Iiuc they're generating arbitrary perspectives of geometry from one or two photos, right?

wokwokwok · on Oct 17, 2020

Have a look at the google drive files if you click on the data link; it's a lot more than 2 photos.

eg. That dinosaur skeleton is derived from 60 photos. The drumkit comes from ~100.

...so it's not magic, it's very close to what you get from standard photogrammetry. The big part of this is that it isn't representing the scene as block of voxels like some other approaches.

> The biggest practical tradeoffs between these methods are time versus space.

> LLFF produces a large 3D voxel grid for every input image, resulting in enormous storage requirements (over 15GB for one “Realistic Synthetic” scene).

> Our method requires only 5 MB for the network weights (a relative compression of 3000× compared to LLFF), which is even less memory than the input images alone for a single scene from any of our datasets.

Anyway, so... if you could do the same sort of thing with a similar accuracy to non-images for a 'neural representation database', that'd be pretty neat.

l33tman · on Oct 17, 2020

Also I do think the 3d voxel reconstruction approach and the nerf approach solves different goals. I didn't read the original nerf paper thoroughly but AFAIK the network learns to interpolate between the photos in a beautiful, smooth way, but the voxel representation would allow a lot of other reconstructions.

handol · on Oct 17, 2020

If the photos span the camera's full position-orientation vector space, I don't see why you can't put the camera anywhere in the scene.

BeatLeJuce · on Oct 17, 2020

Not one or two, more like one or two hundred. But otherwise: yeah.

dougabug · on Oct 17, 2020

One of the authors said in a talk that it worked with as few as 15 images.

BeatLeJuce · on Oct 17, 2020

So that's an order of magnitude more than what OP suggested and one less than what I suggested. But I still think that 100 is a more realistic estimate to get stellar results.

goodmachine · on Oct 17, 2020

Your generic dataset idea is either crazy-brilliant, or I'm stupid: conceivably both.

Either way, I'm struggling to understand how it might work. Care to say more?

woadwarrior01 · on Oct 17, 2020

The OP probably has something more in mind, but I'd imagine overfitting a small NN on a large dataset to a point of convergence and then shipping the small NN instead of the large dataset.

throwaway9980 · on Oct 17, 2020

Sounds like lossy (acceptable) compression.

sixdimensional · on Oct 17, 2020

I've often wondered if one could store representations of data as regular expressions or something similar. Sounds like what you are looking for is along similar lines - data compression by effectively "compiling" the data into an expression or compact representation?

msamwald · on Oct 17, 2020

Meta-comment:

How is it possible that the original submission has been on the front page for 8+ hours, and all discussion is focused on this completely unrelated link?

Have people stopped reading original submission links in favor of comments so much that the discussion is no longer related to the original submission at all?

sixdimensional · on Oct 17, 2020

I have thought that cost-based optimizers in database query planning could definitely overlap with neural network approaches. I know there are some papers in the area, but I haven't seen any implementations.

chaoz_ · on Oct 17, 2020

I also wonder if neural approaches will be embeddable into non-experimental engines as it's hard to inspect their behaviour / worst-case scenarios.

daemonk · on Oct 17, 2020

As much as unstructured data makes me uncomfortable, I think something like this is probably the future of data keeping.

jw0335066 · on Oct 17, 2020

Wow, this is pioneering work!!

g0xA52A2A · on Oct 17, 2020

Surely a link to arXiv rather than a tweet would be better?

https://arxiv.org/abs/2010.06973

nikivi · on Oct 17, 2020

Tweets provide some nice context by the author as commentary.

mpfundstein · on Oct 17, 2020

but twitter should not be supported