Hacker News new | past | comments | ask | show | jobs | submit login
Web AI Model Testing: WebGPU, WebGL, and Headless Chrome (chrome.com)
199 points by kaycebasques 11 months ago | hide | past | favorite | 46 comments



Real, but naive, question: does TensorFlow have meaningful share outside Google? I've been in the HuggingFace ecosystem and it's overwhelmingly PyTorch, IIRC 93%, (I can't find the blog post that said it, but only gave it 2 minutes)


TF used to be the most popular framework by a large margin, so a lot of things that were started 5+ years ago are still on it. PyTorch is most popular in places that only started more recently or have the ability to switch easily, e.g. new startups, research, LLMs, education, and companies that have the resources to do a migration project.


A fun thing is that even in Google JAX is now preferred across researchers and slowly taking over the share.


Tbf JAX is super nice. Even easier than PyTorch in many ways and astonishingly fast. XLA is super powerful.


Best alternative for web imo (perf generally beats onnx for web)


Great!

For Burn project, we have WebGPU example and I was looking into how we could add automated tests in the browser. Now it seems possible.

Here is the image classification example if you'd like to check out:

https://github.com/tracel-ai/burn/tree/main/examples/image-c...


Reminds me of “The Birth & Death of JavaScript”

https://www.destroyallsoftware.com/talks/the-birth-and-death...


Everything will run on a browser, eventually


Since the browsers work as a sort of unified experience (to some extent) that would be quite good. But sadly, I haven't seen the wide adoption of PWA or similar technology. Most usually just create their own app, which in many cases really isn't even needed, since the app is just a wrapped version of their website.


Hopefully this will solve some of the incompatibility with training models on AMD vs NVIDIA. Just use Google Chrome.


A question that comes to mind is: How significant is the performance difference between using CPUs and GPUs for these machine learning models in web applications, and are there specific types of applications where one would significantly outperform the other?


Very significant in the current paradigm.


This can also be done in Rust using the excellent `wasm_bindgen_test`!


AFAIK, there is still a memory barrier in chrome which is set to 4gb per tab.


Hello there, I am one of the authors of the piece. Fun fact just for the lols we have tried running a 1.3B parameter unoptimized TensorFlow.js model in this system just to see if it would work (could be much more memory efficient with tweaks), and it does. It uses about 6GB RAM and 14GB VRAM when using V100 GPU on Colab (15GB VRAM limit) but runs pretty fast otherwise once the initial load is complete! Obviously plenty of room to make this use much less memory in the future - we just wanted to check we could run such things as a test for now.


At least on desktop you generally know where the line is, on mobile there's a mystery limit you're not allowed to cross, and you're also not allowed to know where the line is until you reach it, which might gracefully throw an error or might result in your tab being force-killed, and you're not allowed to know which of those will happen either.


I'm building a list of "second class citizen" mobile web issues for Android and Apple. I wasn't aware of this one! Do you know of anything else like this?


https://github.com/WebAssembly/design/issues/1397

> Currently allocating more than ~300MB of memory is not reliable on Chrome on Android without resorting to Chrome-specific workarounds, nor in Safari on iOS.

That's about allocating CPU memory but the GPU memory situation is similar. The specs don't want to reveal information about how much memory you're allowed to use because it could be used for fingerprinting, but that means that it's practically impossible to build reliable applications which use (or can optionally use) a lot of memory. Every allocation you make past a few hundred MB risks blowing up the app immediately, or putting it into the danger zone where it's the first in line to get killed when running in the background, either way without any warning or last-chance opportunity to release memory to avert getting killed.


Could the solution be a user permission dialog? Similar to how browsers implement webcam/etc permissions: “Enable <website> full GPU access? (Default: Off)”


I’m not sure your average user would know what that means, let alone the implications.


Not sure if this is useful for you as not mobile specific but these were issues I bought up with the W3C back in 2020 some of which may be of interest to you too:

https://www.w3.org/2020/06/machine-learning-workshop/talks/o...


Could you share the list? I may have things to add.


Mobile Safari just has a fixed limit of 500 tabs


This is a user-facing limit; jsheard is talking about how as an app developer you don't know whether your app is below the limit or whether the next allocation will kill the browser tab.


If you switch to private browsing mode you can get an extra 500 tabs. :)


What about other tab groups?


I hate it so much. So arbitrary and capricious. I would say this is currently the number one blocker for the web as a serious platform. And they're doing it on purpose.


I guess the policy is that tabs can use 100% of the available resources on low end devices, but only 10% of the available resources on high end devices.


I think the desktop policy might be better. In the tablets I've used, tabs sometimes get killed when I switch tabs and visit another website with a lot of ads. It's an annoying way to lose work in an unsubmitted form. It doesn't seem to happen for desktop.


That's because most (all?) phones don't swap to a pagefile whereas every desktop OS has swap enabled by default. The only practical solution is to buy a phone with more memory. IMO 6GB is the bare minimum in 2024


No, just use an ad blocker and save the environment from more waste.


Ad blockers certainly shave off a few MBs, but in my experience the vast majority of RAM usage is not caused by ads. Unlike first party content, ads are automatically benchmarked by ad exchanges and penalized for using too much resources. I also don't think a 200gram phone is the kind of waste that we should be concerned about. Think bigger


> I also don't think a 200gram phone is the kind of waste that we should be concerned about. Think bigger

A lot of raw ore is processed to get those 200 grams in your hand.

That said, a quick search tells me the carbon footprint of producing a phone is around 55kg, which is about 320km of car travel; it's not trivial, but it's not as much of a bottleneck I thought it might be.


It's not uncommon for me to drive 320km in a day and take flights that Google Flights claims emits 579kg per leg. So I'd place 55kg from phones in the "fuck all" category even if I upgraded every year.


... Or maybe you should avoid both?

Like, I don't know your life, maybe you go to your family or important business meetings or something with these flights, but in any case none of this is going to be "fuck all" if we want to have any chance to stay under the +1.5°C (or even +2°C) bar.


what apps can't run on 4GB?

games?

3D?

Editing?

have you tried forking chrome and increasing this limit?


Video editors are a big one. I've heard of people crashing a browser tab with Figma as well.

For data exploration tools it's very easy to want to use 4GB+ of memory. I found the limit cumbersome while working on financial tools. It usually comes up in internal tools where you reliably have a fast internet connection; it's harder to reach the limit for public-facing tools because there the slowness of sending 4GB+ to the browser is the more limiting factor.

The annoying part isn't just that the limit is there, but that you can't really handle it gracefully as the developer -- when the browser decides you've hit the limit, it may just replace the page with an error message.


For a video editor, only a small portion of the video needs to be in memory at any given time. The rest can be squirreled away in an IndexedDB store, which has no hard size limits on most browsers.


It's one of our big barriers over at Figma. Creative tools in general hit this limit pretty quickly. For context, I was a very heavy user of Photoshop back in the day. Even a decade ago I remember hitting 20GB of active memory use for Photoshop.

Things get really big really quick, especially when you're storing uncompressed versions of raster elements in memory. To frame things in a different way, 4GB is 22 seconds of 1080p video if you're loading the raw frames into memory.


Some AI apps. You can't really load a capable LLM in 4 GB. Or does this limit not apply when dealing with WASM and WebGPU?


4GB ought to be enough for anybody.


This is a 7B parameter model at int4, lots to play with!


Isn't that exactly the modern, AI based, mouse and keyboard BOT? (trained with click farms)


I think better SIMD support for webassembly is more inclusive than relying on / expecting WebGPU


For this blog post we are using Chrome for the testing environment which has WebGPU turned on by default now and other common browsers should hopefully follow suit, but given we are using Chrome here we know WebGPU will be available if the WebAI is using that (which many people are turning to for diffusion models and LLMs as its so much faster to run those types of models).

But yes, I am all for better support on all the things too, we have many WASM users too, and when anything new comes out there, this set of instructions can still be used to leverage testing that too as its just Chrome running on Linux essentially with the right flags set.


CPU inference is 10x slower. Not good enough for most use cases




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: