I don't quite understand your criticism here. It's running stable diffusion on your computer via your browser. How would it do this without downloading it and then loading it into RAM?
It'll be the same size download and use about the same RAM if you download it and run it directly without using a browser.
It's not a criticism. I'm just pointing it out. For good or bad, it is what it is. There are two sides to it. For anyone familiar with system theory, this was the inevitable end game of the web.
However, the web also has a terrible bloat/legacy issue it refuses to deal with. So sooner or later, a new minimal platform will grow from within it, the way the web started in the 90s as the then humble browser. And the web will be replaced.
I had great expectations of WASM, and maybe it can evolve into what we're discussing. But as it is, this system is too limited in two crucial ways.
First, it's explicitly designed to be easy to port existing software to. Like C libraries, say. Sounds good right? Well, it's not designed as a platform that arose from the needs of the users, like the web did. But the need of developers porting software, who previously compiled C libraries to JS gibberish and had to deal with garbage collection bottlenecks etc. This seems fairly narrow. WASM has almost no contact surface with the rest of the browser, aside from raw compute. It can't access DOM, or GPU or anything (last I checked).
Second, for reasons of safety, they eliminated jumps from the code. Instead it's structured, like a high-level language. It has an explicit call stack and everything. Which is great, except this is a complete mismatch for all new generation languages like Go, Rust etc. which focus heavily on concurrency, coroutines, generators, async functions etc. Translating code like that to WASM requires workarounds with significant performance penalty.
WASM can access the GPU via middle layers like SDL. I.e. you can write a C program that uses opengl and compile it and as long as you include the right flags etc into `emcc` you will barely need to touch or glue it together on the JS side at all.
All those go through JS as far as I'm aware. Emscripten bridges everything for you, but technically it goes through JS so SDL's calls also go through JS.
You're correct but I don't see why that matters. From your perspective as a C coder you get webgl access without having to write JS, that's what's important. Everything is "mixed together" into asm in the end anyway whether it's glue code in JS or browser side glue code.
I for one am very happy for it. The promise of Java “write once, run anywhere” is mostly realized now. Now all we need is the browser to ship with interpreted language runtimes and language native bindings to DOM, GPU etc.
But why is that a criticism? I tried running SD on my computer a few months ago. I spent several hours trying to install the dependencies and eventually gave up. I'm sure it wouldn't have been a big deal for someone familiar with python but for me it was a massive hassle and I eventually failed to make it run at all.
For this one, as long as my browser supports WebGPU (which will be widely supported soon) and I have the system resources, it will run. Barely any technical knowledge needed, doesn't matter what OS or brand of GPU I have. Isn't that really cool? It reduces both technical and knowledge based barriers to entry. Why do people criticize this so strongly?
After seeing the sort of argument you are replying to countless times on HN, I came to a simple conclusion. Some people, esp on HN, just have disdain for anyone who might not be willing to deal with the hubris of running a piece of software, because it invites "lower common denominator", and they don't want to "taint" their hobby by the presence of "normies" in what used to be their exclusive domain.
In a similar vein, you can find plenty of comments on HN faulting the massive proliferation of smartphones among the general population throughout 2010s for "ruining" web, software ecosystems, application paradigms, etc. There are plenty of things one could potentially criticize smartphones for, and some of that criticism indeed has merit. But this specific point about "ruining" things feels almost like a different version of the same argument above - niche things becoming widely adopted by the masses and "ruining" their "cool kids club."
Another similar example from an entirely unrelated domain - comic books and their explosion in popularity after Marvel movies repeatedly killing it in the box office. I don't even like Marvel movies, barely watched any of them, but the elitism around hating things becoming more popular is just silly.
The objection (more surprise than objection) is that web browsers are supposed to be sandboxed environments. They are not supposed to be able to do things that negatively impact system performance. It is surprising you can do things involving multi-gb of ram in a web browser. It has nothing to do with what you are using that ram for or if its cool or not.
I dont think anybody has an objection to making it easier to run stable diffusion and i think the only way you could come to that conclusion is intentionally misunterpreting people's comments.
> The objection (more surprise than objection) is that web browsers are supposed to be sandboxed environments. They are not supposed to be able to do things that negatively impact system performance.
I agree with the sandboxing model, but it is orthogonal to WebGPU and impacting system performance. Sandboxing is about making the environment hermetic (for security purposes and such), not about full hardware bandwidth isolation.
First, there is no way for web browsers to have no system performance impact whatsoever. Browsers already use hardware acceleration (which you can disable, thus alleviating your WebGPU concerns as well), your RAM, and your CPU.
Second, afaik WebGPU has limits on how much of your GPU resources it is allowed to use (for the exact purpose of limiting system performance impact).
Java's real success was ( and still is ) on the server - powering a whole generation of internet applications, and creating a cross vendor ecosystem that stopped MS leveraging it's client dominance to take over the server space as well.
I don't believe Unix/Linux would have survived the Windows server onslaught without Java on the backend and the web on the front.
It's true that the server is where Java has been most successful, by a large margin.
But it was never Java's "original premise", which is what the comment you are replying to was about. According to their (very heavy-handed) marketing at the time, Java was supposed to be for native desktop applications and for "applets". But yeah, in the many years it took for those promises to truly become hollow, Java carved out a surprisingly robust niche for itself on the enterprise server.
Also, I am skeptical of this last sentence of yours. The thing that resisted the Windows server onslaught, broadly, was the wide range of free-as-in-speech-and-as-in-beer backend technologies, like Perl, PHP, Python, Postgres, and some other things that start with "P", as well as, yeah, Java. Java played a role, but it was just one of many.
Java was created from the beginning for embedded devices. Most people don't realize that is has been there since the beginning on each Nokia 3310 device all the way up to most Android apps on the newest smartphones.
On the desktop we had Swing which was OKish to build GUI apps (albeit still underneath Borland) and then totally lost sight of desktop with JavaFX that was created without hearing the community and then abandoned, also refusing to improve Swing. Quite a pity.
> Java was created from the beginning for embedded devices
This is technically not true, as far as I know. That whole idea of Java ME, different "profiles", all that stuff happens in roughly 1998, which is definitely not "from the beginning". Though, looking it up now, apparently the Java Card stuff gets started a little earlier than that (which I didn't know/notice at the time, probably because it apparently wasn't initially a Sun initiative and so I'm guessing Sun's self-promotion didn't mention it in the really early days).
But depending on what your point is, maybe my first paragraph is merely a technical quibble, not a substantive disagreement. Maybe your point is that Java's success has been, in part, due to its ubiquity in small-but-not-tiny devices like "feature phones". Fair enough, I guess, and if that's your point then it doesn't really matter if it was truly "from the beginning", or just "one of the earliest pivots" (which I think is more accurate).
Myself, my point is that DrScientist's reply to quickthrower2 is, as a reply, just straight-up wrong wrong wrong. Java's original premise was twofold: web applets, and desktop apps that didn't need to be maximum performance (note that Swing was also not the original Java GUI toolkit, I've forgotten the name of the thing that preceded it, Swing was certainly much better). Building servers was NOT part of Java's original premise. And quickthrower2 is right: the web ate that original premise. Java had to pivot to live, and did.
I'm getting too pedantic here, but the historical revisionism is winding me up.
Other people pointed relevant pages where you can read: "In 1985, Sun Microsystems was attempting to develop a new technology for programming next generation smart appliances, which Sun expected to be a major new opportunity"
This was common knowledge on that decade.
From memory don't recall Java being focused on server-side much later until the 2000s with Tomcat and JBoss making a lot of stride, can't say I was fan of either. Maybe that is the time when your person first saw Java trying to compete for whatever space was left of web to take. I'm failing to have the impression AWT was ever relevant, that's why it wasn't even mentioned as everyone seemed to be using only Swing except for some god-awful projects in the gov domain.
For embedded developers (phones, smartcards, electronic devices, ...) it was well-established since the early days because IMHO was _easy_ to use/deploy/maintain by comparison to other options. Even looking at the options available today, it is still on the top albeit C++ making quite a fantastic comeback with Arduino albeit continuing to be a pain in the rear to debug.
Definitely agree. Java massively succeeded on the server. I admit I was a Java enthusiast in 2000 and then my jobs were all C# which is approximately Java :-) better in some ways and not as good in others.
Or java was too heavy for the computers at the time to get people to use "applets" for everyday things (i.e. go to a new website a do a thing on it)
Flash et al also failed to catch for long.
The web browser's success might have something to do with neverending feature creep as opposed to "this can do everything but as such it's broken and vulnerable".
Applets implementations were terrible - the problem wasn't so much Java ( though early versions pre-jit were slow ) , but the interface between the browser and the applet.
Memory leaks abounded in particular.
Life cycle management was difficult as well.
Note the interface had to be implemented in each and every browser separately - compounding the problem - for applets to be viable it had to work on all the major browsers well.
Not blaming the people who worked on it - I suspect the origin design was put together in a rush, and the work under-resourced, and it required coordination of multiple parties.
There's however the important detail that this company has been doggedly working for achieving that end by first co-opting the browser of a competitor (Safari WebKit), then forking it, then taking over the web standards process, and putting in every API possible on the web, including access to USB devices and so on, so they can make an OS around it.
Because if it's the web, Google sees it. And if everything is the web, then Google sees everything.
And Apple had in turn had cop-opted KDE’s KHTML and KJS project to start WebKit. An illustrious linage.
(I remember several awesome hobby OS projects ported KHTML to get a really good browser back in those days. It was a really solid and portable codebase and much tidier than Firefox.)
Google doesn't see anything just because it uses web technology. For instance, the payroll system I use is a web app, but Google doesn't see my company's payroll data. What Google sees is a marketing blurb about the payroll system.
Google sees everything that is public and everything that uses their ad network, including data from apps that don't use the web at all.
We still have that in the form of PWAs. I don't mean the web "runtime"/webview, but all the cruft that Windows ships with... the endless ads, multiple UI and config layers, Office trial, the stupid games, OneDrive, Skype/Teams, the endless notifications, Bing everywhere... it's an over-monetized in your face nightmare on every fresh boot.
If not for DirectX and Windows-only games, I'd totally ditch it. Maybe when Proton gets there.
As bandwidth increases and the web sandbox matures, it’s fascinating to watch the evolution towards apps you just use rather than download and install and maintain. This will bother some but for the masses it opens a lot of doors.
I love web apps because they mean I have to trust the developer a lot less than with native apps. Of course there are still things you'd have to monitor (e.g. network requests) to fully trust any web app. A good solution could be something like OpenBSD's pledge, to allow me to prove to the user nothing malicious is possible (e.g. by disabling fetch and any new requests, even from src attributes, altogether).
As a sandbox, I especially like that there's a dropdown with a huge list of things a website/app can do. Many of them on by default, but I have total control over that. And of course the API for asking.
"Hey this game wants to use your motion controls and USB gamepad." Okay sure.
Yeah, the sandbox is nice, but it doesn't go far enough. Let's say I build a JSON viewer. Why should the page have any ability to make network requests? So what I'm asking for is an ability to pledge that I'm not going to make any network requests.
Yes, and I am coming round to the idea that WebGPU is useful like this for cases other than realtime interactive WebXR pages with streaming multiplayer live state and loaded up with draw calls etc. There is a simplicity to curating the experience through the browser like this and there isn't any easier way to get SD up and running so I hope these kinds of projects keep getting support. Thanks for building this OP!
AI models can get quite memory intensive while running. I have seen a primitive image improvement AI eat up over 80 GB of RAM on high res images. The data of the model itself "only" used up 4 GB.
I enabled the requested chrome:flags in Brave, but it still doesn't work. I haven't downloaded Chrome on any of my M1 Macs, and don't plan to start now.
I tried on latest (normal) Chrome, beta and canary/nightly and enabled both options one by one and let it relaunch but still wouldn't work at all. ¯\_(ツ)_/¯
It's running only a single thread, so I think the specs are a little less relevant. It takes about 80s per iteration and I ran the 4 iterations set by default, so a little less than 6 minutes.
Hold on, to run your demo does one have to click the "Load Model" button before doing anything? 'cos what I see is a form that is greyed out with the error message still at the top:
> You need latest Chrome with "Experimental WebAssembly" and "Experimental WebAssembly JavaScript Promise Integration (JSPI)" flags enabled!
Now I'm wondering whether the top message goes away once the flags are enabled?
> Hold on, to run your demo does one have to click the "Load Model" button before doing anything?
Yes. I thought it won't be good if it would download 3.5gb once you open the page.
>Now I'm wondering whether the top message goes away once the flags are enabled?
No, I haven't added any checks for that (and I'm not sure how the first one can be properly checked), so it's just an info bar. Which is, eventually, misleading.
It works on canary on M1 mac and Windows w/ an NVIDIA RTX GPU. I believe there are custom command line options that have to be passed to make it work. The MLC site has the deets that work.
Nah, I don't use Chrome so I don't have it installed. I'm not a web developer, so testing across different platforms isn't useful to me. I've used StableDiffusion before, so hacking around to make this demo work in my browser isn't particularly interesting either.
I agree with the poster 100%. Im convinced any Google applications immediately suck every iota of data they possibly can at install time / first launch. It’s not worth it to me either.
Why is it that implementing something in wasm is stalled for so long but doing it as a js feature is so fast? Anyone have insights? As an outsider it feels like wasm is being developed in an impossibly slow way.
Implementing something new in JS can be done relatively easily using a slow path, where you just write some privileged JS or C++ and then wrap it, without doing any optimizations. Then if it gets popular the vendors can optimize it at their own pace.
Implementing a new feature in WebAssembly is a bit more complex due to its execution model and security constraints. I expect it's also just the case that a lot of these new WASM features are very complex - promise integration is super nontrivial to get right, so are WebAssembly GC and SIMD.
JS Promises in something like their modern form were first played around with in ~2010, and it was ~2016 before browsers were shipping them natively. Good standards can take a while!
Because it basically covers what PNaCL, Java plugin, Flash plugin, Silverlight asm.js were doing.
Anything beyond those use cases it is really meh, specially given how clunky compiling and debugin WASM code tends to be.
Then we have all those startups trying to reivent bytecode executable formats in the server, as if it wasn't something that has been done every couple of years since late 1950's.
> Because it basically covers what PNaCL, Java plugin, Flash plugin, Silverlight asm.js were doing.
Right but it doesn't right now? Like you can't just write arbitrary code as you would with a Java plugin, or a PNaCL C++ plugin. Wasm is extremely difficult to use for those use cases.
> Then we have all those startups trying to reivent bytecode executable formats in the server, as if it wasn't something that has been done every couple of years since late 1950's.
Yes, because people really want this and the solutions have all been fraught with security issues historically.
I didn't say WASM is without flaws, I said the predecessors had flaws but that the premise is valuable, which is why we keep trying it over and over again.
Notably, the first paper is about exploitation of webassembly processes. That's valuable but the flaws of previous systems wasn't that the programs in those systems were exploitable but that the virtual machines were. Some of this was due to the fact that the underlying virtual machines, like the JVM, were de-facto unconstrained and the web use case attempted to add constraitns on after the fact; obviously webassembly has been designed differently.
I hope wasm sees more mitigations, but I also expect that wasm is going to be a target primarily for memory safe languages where these problems are already significantly less significant. And to reiterate, the issue was not the exploitation of programs but exploitation of the virtual machines isolation mechanisms.
Out of curiosity, what are use cases/applications of this?
So what I know is that this generates images via browser rather than server. The only thing I can think of is not having to refresh the page in order to change an image or generate a new image. Which... hmm, well, that could mean websites whose visual design changes in real-time? And maybe changes in a way that would be functionally relevant/useful? That does seem pretty cool, although I'm not sure how useful Stable Diffusion is for generating UI components/visual aspects of a site.
Any hardware! As long as that hardware is overpowered for the job, so that the browser overhead is acceptable. Oh and it needs internet. Oh and it needs a reasonably large screen because padding and margins. Oh and it needs quite a bit of RAM to start. Maybe not any hardware.
UNET takes about a 1:10 on WebGPU and around a minute on CPU in one thread. VAE is 2 minutes on CPU and about 10 seconds on GPU. It should be because most GPU ops for VAE are already implemented but for UNET are not. So in the latter case browser is just tossing data from GPU to CPU and back on each step
If this is fast enough, then you could use it to render images locally for personal use. Websites could deliver prompts only, perhaps rendering different images for different users. At that point, what does it mean for copyrights? Is the model itself copyrighted or does the system break down?
> If this is fast enough, then you could use it to render images locally for personal use. Websites could deliver prompts only, perhaps rendering different images for different users.
That's a fascinating possibility, but we're very far from that world right now: elsewhere in the thread it's mentioned that this actively uses 8 GB of RAM. And I doubt many web designers would accept the risk that a model misinterprets a prompt, produces distorted output (like the wrong number of fingers on someone's hands), or accidentally produces sexual or violent content in a context where it's not intended.
For many generative image models today, people often pick the best of a dozen or more images, and the others that they throw away may actually be quite bad.
The quality and predictability of the models would need to be significantly higher than it is now in order to routinely dynamically illustrate web sites.
But I don't want to say that we'll never get there. All of the recent models are doing things now that would have been considered inconceivable just a few years ago. (Compare https://xkcd.com/1425/ where it may even be a challenge to explain the issue behind the joke to some younger readers!)
Browser dev tools (and most of them really) hook into the application flow to do additional work compared to running it without any developer tools. Depending on what the application does, it can mean that it has to do a lot of extra work and need a lot of extra memory, just to be able to process and store all the extra information that it needs.
I don't know the specifics of why the slowdown is so extreme in this case, usually it has a negligible impact. But I'm guessing it's related to what I wrote above.
Another one that slows a lot of things down is if the application uses the console. Before you open the inspector, those methods are no-ops and just get skipped, basically, but once open, all of those strings have to get copied and it can slow things down quite a bit.
This isn't unique to the web, either, adding the verbose flag to most linux file utilities and then operating on a large set of files will be slower than without the verbose flag, too, just because printing to stdout takes time.
Even more impressively, they followed up with support for several Large Language Models: https://webllm.mlc.ai/