Web AI Model Testing: WebGPU, WebGL, and Headless Chrome

refulgentis · 2024-01-16T19:58:03 1705435083

Real, but naive, question: does TensorFlow have meaningful share outside Google? I've been in the HuggingFace ecosystem and it's overwhelmingly PyTorch, IIRC 93%, (I can't find the blog post that said it, but only gave it 2 minutes)

hatthew · 2024-01-16T21:00:41 1705438841

TF used to be the most popular framework by a large margin, so a lot of things that were started 5+ years ago are still on it. PyTorch is most popular in places that only started more recently or have the ability to switch easily, e.g. new startups, research, LLMs, education, and companies that have the resources to do a migration project.

summerlight · 2024-01-16T22:47:44 1705445264

A fun thing is that even in Google JAX is now preferred across researchers and slowly taking over the share.

ZeroCool2u · 2024-01-17T00:37:55 1705451875

Tbf JAX is super nice. Even easier than PyTorch in many ways and astonishingly fast. XLA is super powerful.

nwoli · 2024-01-16T20:25:36 1705436736

Best alternative for web imo (perf generally beats onnx for web)

antimora · 2024-01-16T21:00:52 1705438852

Great!

For Burn project, we have WebGPU example and I was looking into how we could add automated tests in the browser. Now it seems possible.

Here is the image classification example if you'd like to check out:

https://github.com/tracel-ai/burn/tree/main/examples/image-c...

dblitt · 2024-01-17T02:59:41 1705460381

Reminds me of “The Birth & Death of JavaScript”

https://www.destroyallsoftware.com/talks/the-birth-and-death...

tcper · 2024-01-17T02:14:19 1705457659

Everything will run on a browser, eventually

LoveMortuus · 2024-01-17T18:02:54 1705514574

Since the browsers work as a sort of unified experience (to some extent) that would be quite good. But sadly, I haven't seen the wide adoption of PWA or similar technology. Most usually just create their own app, which in many cases really isn't even needed, since the app is just a wrapped version of their website.

rubatuga · 2024-01-17T02:53:54 1705460034

Hopefully this will solve some of the incompatibility with training models on AMD vs NVIDIA. Just use Google Chrome.

bhakunikaran · 2024-01-17T08:00:21 1705478421

A question that comes to mind is: How significant is the performance difference between using CPUs and GPUs for these machine learning models in web applications, and are there specific types of applications where one would significantly outperform the other?

FL33TW00D · 2024-01-17T09:22:07 1705483327

Very significant in the current paradigm.

FL33TW00D · 2024-01-16T19:30:50 1705433450

This can also be done in Rust using the excellent `wasm_bindgen_test`!

not_a_dane · 2024-01-16T19:50:47 1705434647

AFAIK, there is still a memory barrier in chrome which is set to 4gb per tab.

jmayes · 2024-01-16T21:46:52 1705441612

Hello there, I am one of the authors of the piece. Fun fact just for the lols we have tried running a 1.3B parameter unoptimized TensorFlow.js model in this system just to see if it would work (could be much more memory efficient with tweaks), and it does. It uses about 6GB RAM and 14GB VRAM when using V100 GPU on Colab (15GB VRAM limit) but runs pretty fast otherwise once the initial load is complete! Obviously plenty of room to make this use much less memory in the future - we just wanted to check we could run such things as a test for now.

jsheard · 2024-01-16T20:27:20 1705436840

At least on desktop you generally know where the line is, on mobile there's a mystery limit you're not allowed to cross, and you're also not allowed to know where the line is until you reach it, which might gracefully throw an error or might result in your tab being force-killed, and you're not allowed to know which of those will happen either.

echelon · 2024-01-16T20:33:08 1705437188

I'm building a list of "second class citizen" mobile web issues for Android and Apple. I wasn't aware of this one! Do you know of anything else like this?

jsheard · 2024-01-16T20:34:33 1705437273

https://github.com/WebAssembly/design/issues/1397

> Currently allocating more than ~300MB of memory is not reliable on Chrome on Android without resorting to Chrome-specific workarounds, nor in Safari on iOS.

That's about allocating CPU memory but the GPU memory situation is similar. The specs don't want to reveal information about how much memory you're allowed to use because it could be used for fingerprinting, but that means that it's practically impossible to build reliable applications which use (or can optionally use) a lot of memory. Every allocation you make past a few hundred MB risks blowing up the app immediately, or putting it into the danger zone where it's the first in line to get killed when running in the background, either way without any warning or last-chance opportunity to release memory to avert getting killed.

a_wild_dandan · 2024-01-16T21:58:20 1705442300

Could the solution be a user permission dialog? Similar to how browsers implement webcam/etc permissions: “Enable <website> full GPU access? (Default: Off)”

FractalHQ · 2024-01-16T23:16:42 1705447002

I’m not sure your average user would know what that means, let alone the implications.

jmayes · 2024-01-16T23:37:38 1705448258

Not sure if this is useful for you as not mobile specific but these were issues I bought up with the W3C back in 2020 some of which may be of interest to you too:

https://www.w3.org/2020/06/machine-learning-workshop/talks/o...

basil-rash · 2024-01-17T04:02:32 1705464152

Could you share the list? I may have things to add.

fragmede · 2024-01-16T20:34:01 1705437241

Mobile Safari just has a fixed limit of 500 tabs

paulgb · 2024-01-16T20:53:19 1705438399

This is a user-facing limit; jsheard is talking about how as an app developer you don't know whether your app is below the limit or whether the next allocation will kill the browser tab.

Me1000 · 2024-01-16T20:43:46 1705437826

If you switch to private browsing mode you can get an extra 500 tabs. :)

wlesieutre · 2024-01-16T21:38:14 1705441094

What about other tab groups?

abxytg · 2024-01-16T20:23:50 1705436630

I hate it so much. So arbitrary and capricious. I would say this is currently the number one blocker for the web as a serious platform. And they're doing it on purpose.

whatshisface · 2024-01-16T20:28:45 1705436925

I guess the policy is that tabs can use 100% of the available resources on low end devices, but only 10% of the available resources on high end devices.

skybrian · 2024-01-16T22:01:28 1705442488

I think the desktop policy might be better. In the tablets I've used, tabs sometimes get killed when I switch tabs and visit another website with a lot of ads. It's an annoying way to lose work in an unsubmitted form. It doesn't seem to happen for desktop.

NavinF · 2024-01-17T00:30:01 1705451401

That's because most (all?) phones don't swap to a pagefile whereas every desktop OS has swap enabled by default. The only practical solution is to buy a phone with more memory. IMO 6GB is the bare minimum in 2024

mkesper · 2024-01-17T10:35:52 1705487752

No, just use an ad blocker and save the environment from more waste.

NavinF · 2024-01-17T17:18:29 1705511909

Ad blockers certainly shave off a few MBs, but in my experience the vast majority of RAM usage is not caused by ads. Unlike first party content, ads are automatically benchmarked by ad exchanges and penalized for using too much resources. I also don't think a 200gram phone is the kind of waste that we should be concerned about. Think bigger

PoignardAzur · 2024-01-18T11:05:45 1705575945

> I also don't think a 200gram phone is the kind of waste that we should be concerned about. Think bigger

A lot of raw ore is processed to get those 200 grams in your hand.

That said, a quick search tells me the carbon footprint of producing a phone is around 55kg, which is about 320km of car travel; it's not trivial, but it's not as much of a bottleneck I thought it might be.

NavinF · 2024-01-18T17:14:00 1705598040

It's not uncommon for me to drive 320km in a day and take flights that Google Flights claims emits 579kg per leg. So I'd place 55kg from phones in the "fuck all" category even if I upgraded every year.

PoignardAzur · 2024-01-18T17:36:22 1705599382

... Or maybe you should avoid both?

Like, I don't know your life, maybe you go to your family or important business meetings or something with these flights, but in any case none of this is going to be "fuck all" if we want to have any chance to stay under the +1.5°C (or even +2°C) bar.

vicktorium · 2024-01-16T20:26:16 1705436776

what apps can't run on 4GB?

games?

3D?

Editing?

have you tried forking chrome and increasing this limit?

paulgb · 2024-01-16T20:58:03 1705438683

Video editors are a big one. I've heard of people crashing a browser tab with Figma as well.

For data exploration tools it's very easy to want to use 4GB+ of memory. I found the limit cumbersome while working on financial tools. It usually comes up in internal tools where you reliably have a fast internet connection; it's harder to reach the limit for public-facing tools because there the slowness of sending 4GB+ to the browser is the more limiting factor.

The annoying part isn't just that the limit is there, but that you can't really handle it gracefully as the developer -- when the browser decides you've hit the limit, it may just replace the page with an error message.

10000truths · 2024-01-16T22:58:21 1705445901

For a video editor, only a small portion of the video needs to be in memory at any given time. The rest can be squirreled away in an IndexedDB store, which has no hard size limits on most browsers.

jjcm · 2024-01-17T02:10:17 1705457417

It's one of our big barriers over at Figma. Creative tools in general hit this limit pretty quickly. For context, I was a very heavy user of Photoshop back in the day. Even a decade ago I remember hitting 20GB of active memory use for Photoshop.

Things get really big really quick, especially when you're storing uncompressed versions of raster elements in memory. To frame things in a different way, 4GB is 22 seconds of 1080p video if you're loading the raw frames into memory.

KeplerBoy · 2024-01-16T20:39:44 1705437584

Some AI apps. You can't really load a capable LLM in 4 GB. Or does this limit not apply when dealing with WASM and WebGPU?

amelius · 2024-01-16T23:48:37 1705448917

4GB ought to be enough for anybody.

FL33TW00D · 2024-01-16T20:51:39 1705438299

This is a 7B parameter model at int4, lots to play with!

sylware · 2024-01-16T19:55:32 1705434932

Isn't that exactly the modern, AI based, mouse and keyboard BOT? (trained with click farms)

lxe · 2024-01-16T22:02:38 1705442558

I think better SIMD support for webassembly is more inclusive than relying on / expecting WebGPU

jmayes · 2024-01-16T22:10:23 1705443023

For this blog post we are using Chrome for the testing environment which has WebGPU turned on by default now and other common browsers should hopefully follow suit, but given we are using Chrome here we know WebGPU will be available if the WebAI is using that (which many people are turning to for diffusion models and LLMs as its so much faster to run those types of models).

But yes, I am all for better support on all the things too, we have many WASM users too, and when anything new comes out there, this set of instructions can still be used to leverage testing that too as its just Chrome running on Linux essentially with the right flags set.

NavinF · 2024-01-17T00:24:49 1705451089

CPU inference is 10x slower. Not good enough for most use cases