VideoPrism: A foundational visual encoder for video understanding

adzm · 2024-05-09T14:18:39 1715264319

YouTube is going to be a gold mine for training data.

hrdwdmrbl · 2024-05-09T14:24:13 1715264653

is a gold mine*

present tense :)

mtillman · 2024-05-10T00:26:04 1715300764

Sold a company in 2014 that categorized 65M videos a month using a number of techniques such as object detection (soccer ball), scene classification (soccer pitch), and face recognition (Ronaldo). Used YouTube as a training ground. Marketed as ad words for video. Way ahead of its time. Did $60M in revenue the year after we sold it. Face. Palm.

bingleboy · 2024-05-10T18:07:02 1715364422

I built something that used surveillance cams in warehouses in 2018 and almost got 25m for a subset of what our company did. I took those same ideas to some long-term connects and we turned it into something worth much more that to us.

Be glad about where you are knowing that you won't have to do as much as others to pay down your sins in this industry.

hwbunny · 2024-05-09T19:54:40 1715284480

And since everyone turns their own profession to pennies by uploading the know how of their craft, it will properly wreck entire industries once the AI+Robotics marriage takes off.

devwastaken · 2024-05-09T20:07:49 1715285269

Video describing practices does not translate to AI performing those practices. Creating a community of knowledge progresses that knowledge. Sitting on it only benefits few and removes the market. Without people sharing their work on YouTube their own market would be far less.

Richyrichy3234 · 2024-05-10T08:49:13 1715330953

And?

This is only a problem in a dystopian shit future were we are too stupid to get around this.

Work doesn't make people free.

and yes i do absolutly believe that the pressure will be good for us as a society as we will have to rethink.

hwbunny · 2024-05-10T09:40:23 1715334023

Just watch. People without purpose are dead.

Richyrichy3234 · 2024-05-10T12:02:09 1715342529

People are forced to make work their purpose because we still need to do it as our primary activity.

Plenty of people spend their time different.

Plenty of people will learn how to do something else if they actually don't have to work.

vessenes · 2024-05-09T14:30:47 1715265047

So frustrating.

"We hope VideoPrism paves the way for future breakthroughs at the intersection of AI and video analysis, helping to realize the potential of ViFMs across domains such as scientific discovery, education, and healthcare."

... for those of us internal to Google Research, and possibly large corporate clients when we choose.

Custom internal dataset, no model weights, no API access, just "FYI, this works pretty well when you have our data."

I do appreciate publishing of successful architectures, genuinely, and VP looks like it has some nice ideas, and it is very useful to know that training such a thing is probably not a dead end, so for all that, thanks.

At the end of the day, with large competitors committed to open models/weights, I think momentum is on the side of needing to push out open weights at least, but my guess is GOOG will be the last to transition to this of the big tech cos. I could understand it as a business decision if they were quickly opening this stuff up on their cloud, and competing that way, but they still seem to be in this world where the DeepMind crew continues to push out novel and interesting research projects, and the cloud crew struggles to deliver a competitive LLM at the same time.

I wonder where they'll be in a year. I'd like to hope up to competitive standards so that we've got an additional player, but progress seems slow to me right now.

bingbingbing777 · 2024-05-09T14:41:11 1715265671

Why are you so frustrated that a company decided to not completely release something they have built internally?

yterdy · 2024-05-09T20:12:09 1715285529

Because, half a century ago, it would have been built by a government research agency or a designated monopoly that was obliged to share it with the public, instead of the quasi-monopoly - that can keep secret whatever it needs to wreck your sh*t - like today.

We know a better way to do this ("this" being "the development of foundational technology for the next century of human civilization"), so of course it's frustrating to see the way it's actually being done.

392 · 2024-05-10T03:44:47 1715312687

What's the better way?

GaggiX · 2024-05-09T15:07:21 1715267241

Because it's presented as research, but the results are not reproducible without at least information about the dataset.

fngjdflmdflg · 2024-05-09T16:30:51 1715272251

It is reproducible with the dataset though. (Or more accurately may be reproducible). It is important to distinguish between "reproducibility" meaning "the extent to which consistent results are obtained when an experiment is repeated" and "the ability to be reproduced or copied." Only the former is necessary for an experiment to be considered reproducible. It's just that it may be difficult to actually run the experiment although certainly not impossible, at least if we consider using a similar alternative dataset as its really the coed being tested here. But in any event I think it qualifies as publishing research all the same. Also note that research is still research even if it not published.

TeMPOraL · 2024-05-09T17:14:40 1715274880

FWIW, reproducibility is like backups: if you haven't tested restoring from a backup, you don't actually have one. Similarly, research on proprietary and non-disclosed data, using proprietary and non-disclosed techniques, that can only be reproduced if you manage to cut a deal with the company behind it (if it's even possible), should be called what it is: marketing.

fngjdflmdflg · 2024-05-09T21:26:57 1715290017

An experiment being reproducible just means that if you repeat the experiment you will get the same outcome. We don't know if this experiment is reproducible or not because nobody has tried to reproduce it. I think if this paper gets citations, that will show that other people have read the paper[0] and gained something from it. So we can just wait and see if this was a useful for other people.

[0] https://arxiv.org/abs/2402.13217 this, not the blog

mrbungie · 2024-05-09T23:15:29 1715296529

You are defining "reproducible" according to your own terms and I guess that's OK as it appears there is not a global definition right now (check [1][2][3]).

But consider that others really expect data and full transparency to call research "reproducible", so is not as simple as you make it sound. Plus, grandparent is not asking about the dataset perse, but rather the dataset summary statistics or at least metadata/information about the dataset, a bare minimum in my view.

Being this opaque only messes up research efforts from other institutions.

[1] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3383002/?ref=da...

[2] https://researchguides.uic.edu/reproducibility

[3] https://open-science-training-handbook.github.io/Open-Scienc...

fngjdflmdflg · 2024-05-10T02:32:34 1715308354

>others really expect data and full transparency to call research "reproducible"

I truly don't understand why that is. Creating the dataset is part of the experiment. It is reproducible because if you follow all the steps in the experiment you will get the same outcome (or, pedantically, a similar outcome, which is true of any experiment unless literally every aspect of the experiment is the same). If you create a dataset of 36M video clips with high-quality manually labelled captions, 170M (video, ASR transcript) pairs from 44.6M YouTube videos and 71.5M (clip, machine-generated caption) pairs from 36.7M YouTube videos, you will get a similar outcome. If you don't, the experiment is not reproducible.

In fact, fully replicating the dataset is more conducive to proving the effectiveness of a given method to produce a specific outcome, as noted in your first link:

>The standard of reproducibility calls for the data and the computer code used to analyze the data be made available to others. This standard falls short of full replication because the same data are analyzed again, rather than analyzing independently collected data.

Ultimately, the question trying to be answered by the authors is not "does x code, when applied to y dataset, create z outcome?" but rather "how is z outcome created?" To prove that following the sequence of steps outlined in the article is sufficient to produce the claimed outcome it is required to recreate the dataset in any event. Releasing the dataset would help in that regard only inasmuch as it would help other people create their own datasets by way of example. If you think about non-coded experiments, it is clear to see that this is the case. Nobody would ask for someone else's reactants and products that they got at the end of an experiment. They would just ask for the procedure to generate the product. And in fact you will find that often times these procedures are not as exact as you might expect. For example it may say something like "this reactant was heated to x degrees" without mention of how fast to heat it up or how exactly that was done. And it isn't necessary because the goal is not "reproducible" as in a reproducible build artifact where every bit is the same but "reproducible" as in "the experiment once replicated will have the same outcome."

I will try to come back to this paper in a year and see how many citations it has and to what extent it was useful to others. Ie. will it be written something like "we follow the architecture outlined in [1]" or will it only be mentioned as previous work? If it is the former it is hard to say that it isn't considered publishing research if others are using it for their own research.

I don't really care what the definition of "reproducible" is. I have always understood it to mean in the context of experiments as “the extent to which consistent results are obtained when produced repeatedly.” When people talk about an experiment not being reproducible or of the "reproducibility crisis" they are using my definition.

TeMPOraL · 2024-05-10T11:28:16 1715340496

> Ultimately, the question trying to be answered by the authors is not "does x code, when applied to y dataset, create z outcome?" but rather "how is z outcome created?"

Maybe, but the question asked by the people in the field, and even general public, is different: we're asking, "is this even real, or are they exaggerating or plain bullshitting us?". It's a reasonable question to ask when the results seem groundbreaking, and there's lots of money or reputation at stake.

Consider if they came out and said, we've invented machine converting electricity to mechanical motion with near-zero energy loss at room temperature. The design is complex, and uses exotic materials no one outside few megacorps could reasonably afford to get, and we just plan to use it to sell mechanical work as a service, so you're never gonna see how the thing is built (unless you're really important and sign an NDA) - but trust us, it works. Wouldn't some questions about reproducibility be warranted in such a case?

nolok · 2024-05-09T15:49:05 1715269745

Well Google did that for years and where did that get them? To companies using their research to build something but not share to the same extent their research so they could keep a leg up.

So yeah, they learned their lesson. You want to complain, complain to those who did not play the game fair and square.

GaggiX · 2024-05-09T16:05:58 1715270758

I'm not complaining, Google can do whatever they want, it's just not proper research, even though they want it to feel like it is.

vessenes · 2024-05-09T16:15:36 1715271336

GaggiX has it mostly right below. But, I'm frustrated because I'd like to try this out. And, by try it out, any of these would be fine:

1. Download the datasets and train a small version to get a feel for it.

2. Download the models and deploy them to use it and get a feel for it.

3. Talk to it via a paid API

Why do I want to do that? I'd like to know how capable this architecture is, and get a feel for how capable it could be with different data / scale up / fine tuning, etc.