Amazon's "Just Walk Out" checkout turns out to be 1000 workers watching you shop

sgerenser · 2024-04-04T20:07:43.000000Z

Previously discussed: https://news.ycombinator.com/item?id=39908579

passwordoops · 2024-04-04T20:10:47.000000Z

With the hype this got at release, the story should be pinned to #1 until further notice

crtasm · 2024-04-04T20:54:45.000000Z

1,000 HN users watching people repost.

gwern · 2024-04-04T20:11:34.000000Z

This is a bad submission because Boing Boing heavily editorializes, and is at least 2 levels removed from the actual reporting at this point - the editorialized Boing Boing excerpts, the editorialized Gizmodo article, The Information article, whatever the sources leaked to TI as a friendly outlet, and then the truth. And there are people on Twitter disputing the insinuations that the human workers were watching you as you shop, which would bespeak a true level of failure, or just reviewing stuff afterwards, which would surprise no one.

yowzadave · 2024-04-05T00:08:12.000000Z

Presumably Amazon ended the program because it was not profitable and had no prospect of becoming profitable soon. If it was already fully automated and only employed humans for QC and/or AI training purposes, then it's hard to see why they would have killed it.

yung_steezy · 2024-04-05T17:48:55.000000Z

I suppose there is meant to be a tipping point where the humans should be able to step back after the AI has learnt enough from watching them. Perhaps they were struggling to reach that point and the failure rate was remaining stubbornly high.

Outsourcing to India/Mechanical Turk is always going to be cheaper than creating a sophisticated AI model. I could see them dropping it if the "learning" had started to stall.

This whole thing seemed like a branding exercise anyway. Like how Apple stores are in every major city despite relatively few people buying their products in-store.

gwern · 2024-04-05T16:44:03.000000Z

There are many reasons it might not be profitable even if the ML part were working well.

sgerenser · 2024-04-04T20:46:30.000000Z

Also reminds me of the revelation that Expensify was mostly using manual transcription vs. "AI" to categorize and interpret receipts/invoices: Expensify sent images with personal data to Mechanical Turkers https://news.ycombinator.com/item?id=15796189

adrr · 2024-04-04T20:15:41.000000Z

All these AI companies have a bunch of people who just do labeling to help train the model. Like saying google has millions of people powering waymo when its just recaptcha having users help with labelling image captures like select all the images with street signs or bicycles .

preinheimer · 2024-04-04T20:08:22.000000Z

Securities Fraud?

(With apologies to Matt Levine)

shmatt · 2024-04-04T20:17:11.000000Z

These types of systems take a week to train the obvious clean use case, and a decade to be able to perform with people intentionally or unintentionally messing with the system

Just don’t act surprised when we “find out” the dash carts are also powered by humans

sbdaman · 2024-04-04T20:10:38.000000Z

The title here suggests they're watching you shop and entering items into a list as you go (which would be ridiculous). It seems to actually just be low-paid workers spot checking to ensure the "algorithm" got it right?

morkalork · 2024-04-04T20:10:33.000000Z

I can't believe they're throwing in the towel right now. Sure, whatever pipeline they were using before was maybe a dead end but with technology like GPT/Gemini vision they couldn't make it work? Did they even try?

ToucanLoucan · 2024-04-04T20:12:04.000000Z

This is how a ton of AI is operating right now. Both to filter training data and also to filter the results before they go back to the users, there is very often human oversight. And by human oversight I mean the same sort of extremely cheap and exploitable overseas labor that Amazon was using for this.

Ultimately it might be as simple as the fact that even cheap ass labor plus the infrastructure plus the AI itself still costs something and ultimately that something is more than just paying some friggin cashiers at the bloody store to do the job.

mewpmewp2 · 2024-04-04T20:23:49.000000Z

I don't get it though - what is wrong with trying different things and seeing if they work or not? If they don't, just pick what you had previously. What's the huge deal?

passwordoops · 2024-04-04T20:12:33.000000Z

You forgot the /s. And if you are serious, I have a bridge you might be interested in buying

morkalork · 2024-04-04T20:28:50.000000Z

Half serious, obviously too expensive to do today but there's huge advancements being made right now. I can see a startup building thing to make "Just Walk Out" stores work and getting bought out in next 5 years.

passwordoops · 2024-04-04T20:35:01.000000Z

So the bridge isn't the biggest one you'll find, but it gets the job done. Should suit your needs just fine

morkalork · 2024-04-04T21:01:03.000000Z

Got any real arguments for why it's such an intractable problem or do you just like to be an ass for fun?

passwordoops · 2024-04-04T21:40:14.000000Z

Partially like being an ass for fun.

My first question is what does a GPT have to do with solving this problem? Honest question, because I don't see how a language model helps solve this, so if your do have a clear explanation it will be appreciated. I get they parse image data but have they demonstrated better accuracy for this use case (grocery items/food) than what was available when Amazon started their campaign?

There's also the fact that, yes more data gives better results... To a point. At this stage, I'm inclined to think we've hit that point when it comes to deep learning models' ability for object recognition and we're in a plateau until a rational new approach (i.e. not deep learning) allows for another leap.

My last is more socialogically rooted. Namely, it's pretty clear that since sama said he'd need $7 Trillion, AI peddlers have jumped the shark and we're seeing a slow motion deflation of the major claims, even concerning LLMs (don't get me wrong, very impressive tech, but no where near the heights and capabilities of the hucksters). For that reason I'm very inclined to lean towards skepticism when people claim they've found a solution by incremental improvements to a problem that's proven to be very, very difficult for a computer to handle.

I think we're nearing a general plateau in deep learning and need the next foundational approach or what we could find ourselves in another deep AI winter.

Any reason you think my skepticism is unfounded and why 5 years?

CamperBob2 · 2024-04-04T23:21:02.000000Z

My first question is what does a GPT have to do with solving this problem?

Have you seen how effective transformer-based networks are at image recognition? They are literally better at reading bad cursive handwriting than I am, at this point. So yes, they can be pressed into service recognizing products being taken off of shelves and placed into carts or bags.

"Just walk out" is a difficult problem but very much worth solving. We now have tools to attack it that nobody was even dreaming of when Amazon designed their current approach. Tools with near science-fiction levels of capability.

passwordoops · 2024-04-04T23:46:16.000000Z

CV tools have been better than most people at reading bad cursive since at least 2016.

Is there a specific benchmark comparison between a GPT and any OCR app/algo? Any cases where you've seen a free OCR app fail spectacularly where a GPT succeeded without question?

Language models are just that - language models. They are good at predicting what word should come next. Because it's language we impart capabilities way above and beyond their reality. Sure, they can incorporate CV data (among other inputs) that feed them the resulting text. Maybe when we develop something with a functional world model, we might be able to tackle the real physical world. Until then, problems like this, self driving, folding clothes, etc will keep failing at the edge cases, rendering them more trouble than they are worth.

And "just walk out" is a fun, interesting problem only in that it might give rise to interesting solutions and approaches for more pressing or relevant problems

CamperBob2 · 2024-04-05T02:28:00.000000Z

Tell me that the 'language model' behind this won't be capable of solving the just-walk-out problem before long: https://shot.3e.org/ss-20240310_145736.png

Go ahead, tell me. Life is short of opportunities for good sensible chuckles.

passwordoops · 2024-04-08T13:12:45.000000Z

Go ahead, tell me this image and scenario wasn't already available in its training data (it was [1]). Tell me that if it got it wrong in the first pass (which it most likely did), the developers didn't explicitly tell it what the scenario was.

I'm not saying the feat of LLMs is not impressive, they certainly are. Just don't tell me they have developed a "world model" and display understanding because they have not. They will always suffer the same issues that have plagued self-driving cars and autonomous robotics: they can only process what's in their training data, therefore they need to be trained on all scenarios that will ever exist to function outside of well-curated, well-defined closed systems.

I would love a good chuckle too, unfortunately the total lack of critical thinking and understanding when it comes to these stochastic correlative black boxes leaves me greatly disappointed

[1] https://www.reddit.com/r/Wellthatsucks/comments/j67atm/1_sec...

xboxnolifes · 2024-04-05T18:46:36.000000Z

A single good result does not prove accuracy.

CamperBob2 · 2024-04-05T19:41:41.000000Z

You could say the same for the army of 1000 mechanical Turks they were using before.

passwordoops · 2024-04-08T13:22:35.000000Z

Absolutely. And what does that tell you?

It tells me this is a solution without a problem.

htrp · 2024-04-04T20:21:24.000000Z

I remember seeing a couple of blogs that said this last year.

ChrisArchitect · 2024-04-04T20:34:49.000000Z

[dupe]

More discussion: https://news.ycombinator.com/item?id=39908579

greenavocado · 2024-04-04T20:19:26.000000Z

In a tech-crazed era where the boundaries of reality blur with the audacity of innovation, there unfolds a tale so bewildering, it vaults past the edges of conventional disruption into the realm of the utterly surreal. This isn't just another yarn spun from the looms of Silicon Valley's grand dream factories. No, this saga pirouettes on the razor's edge of imagination and insanity, where the most cutting-edge "AI" in retail isn't AI at all—it's the keen eyes and quick wits of Pakistanis, stationed a world away, orchestrating the illusion of machine precision in real-time.

Let's zoom in on Lahore, pulsating with life and a fervor for innovation that makes Silicon Valley look like a retirement home for tired tech. Here, in a place that hums with the energy of endless possibility (and perhaps too much caffeine), sits the command center of the world's most "advanced" computer vision system, SeeAll. Only, SeeAll's vision is purely human, powered by a legion of sharp-eyed annotators who scrutinize live feeds from grocery store checkouts, identifying every item with the accuracy and flair only a human can muster.

The masterminds behind this grand ruse? VisioTech, a company shrouded in the mystique of technological advancement, promising a checkout experience free from human error, barcodes, and the tedium of waiting. Their secret sauce wasn't algorithmic; it was organic, brainpower fueled by chai and an undying spirit of camaraderie.

This narrative, though dripping with the trappings of high-tech, is really an ode to the human element in the digital age. Customers, awash in the glow of seamless transactions, never questioned the "how" of their flawless shopping experience, unaware that the real magic lay in the hands of the PaaS team. This crew, capable of identifying the most obscure products with a glance, operated in a symphony of clicks and keystrokes, a ballet of productivity that no machine could replicate.

The facade crumbled when a technical hiccup at an Ohio grocery store exposed the gears behind the magic. What followed was a tumultuous unraveling that laid bare VisioTech's elaborate scheme. Yet, the fallout was anything but predictable. The world, instead of recoiling in outrage, leaned in, captivated by the audacity and sheer inventiveness of it all.

mewpmewp2 · 2024-04-04T20:21:29.000000Z

Is this LLM generated?

rblatz · 2024-04-04T20:08:43.000000Z

Honestly I think it's cool that Amszaon took that approach. Do things that don't scale to better understand the value/market fit. Then you can work on the hard problem of building the complicated tech to enable it at scale. So often companies jump straight to the hard tech issues without even trying to learn from the real world.

Supermancho · 2024-04-04T20:27:38.000000Z

I wonder if they used (Amazon) MTurk for this mechanical turk.

passwordoops · 2024-04-04T20:16:18.000000Z

It is good strategy. Just don't get caught lying about it

mewpmewp2 · 2024-04-04T20:26:01.000000Z

What was the lie? Is there a source for the statement that was a lie?

passwordoops · 2024-04-04T20:37:20.000000Z

"Automated checkout systems powered by computer vision can significantly reduce checkout times. For instance, Amazon Go stores use computer vision to automatically track items added to a shopping cart and charge customers accordingly as they leave the store" (1)

The only way this can be interpreted as "people manually look at the camera and tabulate the price" is disingenuously. And it's one example that I suspect will be scrubbed from the web, but from the moment this was launched, unless you were living under a rock, the message was "AI powers the store." So please stop with the hairsplitting and revisionist history.

(1) https://aws.amazon.com/blogs/industries/enhancing-the-retail...

mewpmewp2 · 2024-04-04T20:45:59.000000Z

As I understand, they do use computer vision, but the 1000 people are just either reviewing it or labelling the data for computer vision. Which seems like a normal way to me to start with a system like that. You definitely need to review it for a while to be certain of its quality and understand the edge cases which I'm sure there would've been many of.

E.g. the way I imagine it to be is that this CV system will attempt to identify the item and confidence that it is that item. If the confidence is too low, it will trigger a signal for human feedback, then human will either confirm or reject.

passwordoops · 2024-04-08T13:20:48.000000Z

Then advertise it as Human-in-the-Loop. At no point did Amazon do this. In fact they did the opposite and led people to believe the tech was much further along than reality. Why the lie?