Hacker News new | past | comments | ask | show | jobs | submit login
CPython's main branch running in the browser with WebAssembly (twitter.com/ethanhs)
268 points by bobbiechen on Nov 29, 2021 | hide | past | favorite | 95 comments



One of the painful aspects of WASM is there's no blocking calls. You can't say "wait for the next event;" instead you must return to the outermost event loop, and wait to be called back.

How does Python-in-WASM work around that? For example, how does `for line in sys.stdin:` work if you can't actually block on stdin?

Emscripten has some support for this via the "asyncify" transform, which layers additional control flow to enable return all the way up the call stack, and then "rewind" back down into it. But this bloats the code (and is also buggy) so maybe it's not being used.


Yes, currently input goes into a propmpt() and it doesn't output anything unless you hit "Cancel" on the prompt, definitely a bad time.

Python allows you to reach in and replace the core interpreter loop, so this may be an avenue to have our own asyncify-like function pop out to JS land and restore state correctly (which we can be smart about since we are the interpreter).

It may also be possible to write something that runs Python in a webworker and communicate with it over a sharedarraybuffer, but that I'm a bit more hazy on. Pyodide has some discussion of this in https://github.com/pyodide/pyodide/issues/1219 and https://github.com/pyodide/pyodide/issues/1503.

This is definitely the hardest part of getting Python to work. Well, hardest after the hardest part of building a compiler toolchain like Emscripten :)


In WebAssembly.sh (https://github.com/wasmerio/webassembly.sh) they run WASM binaries in a Web Worker and then use `SharedArrayBuffer` to block the WebWorker while the main thread does some work (e.g. collect input). You could use a similar solution.

When building Runno (https://runno.dev) I forked off that project and did a bunch of other things on top to get blocking to work in Safari and non-cross-origin-isolated contexts.

Ultimately I think it's JavaScript's (or whichever host language) responsibility to block when the binary calls out (if that is the expected semantics).


Does the SharedArrayBuffer approach offer any way to tell the OS scheduler to wait? Because the only waiting method I know of is busy waiting, aka wasting cycles at 100% of a CPU core. In normal processes you can call a sleep function, which saves CPU cycles, but there is no synchronous method for that available in javascript.


It's part of the Atomics feature set (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...). You have a shared chunk of memory and you say "please sleep until this memory changes". Then it's up to the host to change that memory.


Oh that's pretty neat, didn't know about that.


It's on the embedder (Wasm VM) to provide this functionality. I'm working on a Wasm runtime [0] that is written in Rust and uses stack switching to allow you to call Rust async functions as if they were blocking. This keeps the Wasm bytecode simple (blocking), but at the same time provides high performance i/o.

There is also a proposal to bring stack switching to the browser.

[0]: https://github.com/lunatic-solutions/lunatic


What is the difference between "stack switching" and a fiber based approach?


Fibers are usually implemented with stack switching. It should be the same approach.


I know Absurd SQL[0] uses SharedArrayBuffer and Atomics to turn the async IndexDB into sync for use by Wasm. I wander if it’s possible to use that here too although it’s obviously a little different?

0: https://github.com/jlongster/absurd-sql

1: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

2: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...


For Pyodide I made a proof of concept where you can run async/await with promises.

If people actually develop stuff in Python for the web, they should do something like that.


In my experience, asyncify works pretty well everywhere but Safari on macOS/ARM or ios/ARM, where the stack sizes are too small to be useful. You do want to be a bit careful about where you block, which can minimize the number of functions that need to be transformed.


> and is also buggy

If you've found any bugs in Asyncify please file them! There are no open issues atm about any general bugs, aside from some corner cases with features like dynamic linking.


You can use Atomics.wait() to block a non-main thread.


Although it's only Python 2.7, cpython-based games have been running in the web browser for years.

https://beuc.itch.io/the-question-web

(I'm the lead developer of Ren'Py, though Sylvain Beucler did most of the work. He also has a 3.8 port here: https://www.beuc.net/python-emscripten/python/dir?ci=tip )


The PyPy team demoed a networked multiplayer browser-compiled Python game at EuroPython in 2006. Anyone remember the details? Seems it has dropped off the face of google.


I don't remember that game specifically, but I assume it used PyPy.js,

https://github.com/pypyjs/pypyjs


Ren'Py is a joy to use and an excellent project.


Thank you.


The homepage that's linked from Github seems to be down? https://renpy.beuc.net/


Yep! I definitely want to build on and learn from existing patched versions of Python running in the web. Do you know what you folks do for synchronous I/O calls?


Of course. It will be nice to have the upcoming Python 3 version of Ren'Py based on the main branch, so we don't have to maintain patches.

Right now, most of the I/O is synchronous - the files are downloaded to the browser before a game starts, so all of the calls are fast, so far as I can tell, as they're happening within the browser.

Output is through SDL, and there's a call into a cython-defined function that calls __emscripten_sleep, with the path of calls to it listed in the ASYNCIFY_WHITELIST. That's the only place we block. (It's a bit late, so I might be misremembering the exact emscripten function.)


It's a pity WASM and all the tech around it wasn't available 20 years ago when Javascript was added to the browser, we'd all be using python now and node.js wouldn't exist at all ;)


It would be just like Javascript, only slower.


Perl missed this great opportunity to compile to JS even before Node appeared…


javascript was in a very sad state during the time Perl was popular


Why would you want such a fast language to be replaced by what is the slowest of the mainstream languages?


Javascript was not fast until Google and others poured money into it because it was the only choice for the browser. Presumably Python would have had the same experience.


It would be at least as fast if it wasn't burdened with decades of legacy API compatibility.


imagine having to lock down the entire python standard library for security concerns. Yeesh.


For Netscape in 1995, I believe Python would have been a better choice than JavaScript, but Lua would have been an even better choice, given how much smaller, simpler, and more efficient Lua is, and the eventual excellence of LuaJIT. (If only Lua indexed its array from 0 instead of 1...)

But Python and Lua didn't "look like Java" enough for Netscape.

https://news.ycombinator.com/item?id=17061967

>In 1990, Sun played with the idea of putting a PostScript interpreter in the SunOS kernel.

>Like NeWS was the Network extensible Window System, so NeFS was the Network extensible File System, or NFS 3.0.

>It was actually a great idea, just a wee bit before its time, and very poorly named and positioned!

>For example: If you want to make a copy of a file on the server, you can send a PostScript program that runs in the kernel and copies the file locally on the server in the kernel with ZERO context switches, instead of sending it over the net to the client, then back from the client to the server. Even if you rsh'ed the user command "cp" on the server, it would still incur context switching, but if your copy loop was running in the kernel then it didn't need to switch in and out and in and out for every block it copied.

>There are more examples of why it's a great idea in the paper.

>This comparison of NeWS to AJAX also applies NeFS, which is like kernel NeWS with file operations instead of a graphics library -- it also saves you lots of user/kernel context switches even if you're not doing any networking:

https://en.wikipedia.org/wiki/NeWS

>>NeWS was architecturally similar to what is now called AJAX, except that NeWS coherently:

>>- used PostScript code instead of JavaScript for programming.

>>- used PostScript graphics instead of DHTML and CSS for rendering.

>>- used PostScript data instead of XML and JSON for data representation.

>It didn't go over very well because the unenlightened philistines of the time couldn't get their head around an API to the file system that wasn't compatible with creat open close read write and ioctl.

http://donhopkins.com/home/nfs3_0.pdf

>Network Extensible File System Protocol Specification

>1.0 Introduction

>The Network Extensible File System protocol (NeFS) provides transparent remote access to shared file systems over networks. The NeFS protocol is designed to be machine, operating system, network architecture, and transport protocol independent. This document is the draft specification for the protocol. It will remain in draft form during a period of public review. Italicized comments in the document are intended to present the rationale behind elements of the design and to raise questions where there are doubts. Comments and suggestions on this draft specification are most welcome.

But if not PostScript, Python, or Lua, then at least Netscape didn't use TCL in the browser. Around 1994, long after NeWS and right before Java, Sun announced they were going to make TCL the official scripting language of the world wide web, which triggered RMS into kicking off the Great TCL War:

RMS's "Why you should not use Tcl" flame:

https://wiki.tcl.tk/16730

https://news.ycombinator.com/item?id=17061858

>And with that diplomatically worded message, RMS kicked of The Infamous TCL War. That was Stallman's response to Sun bombastically pushing TCL as the official scripting language of the web, BEFORE Live Oak / Java was a widely known (or evangelized) thing.

>At the point anybody started talking about a Java/TCL bridge, it was already all over for TCL becoming the "ubiquitous scripting language of the Internet".

>Sun's unilateral anointment of TCL as the official Internet scripting language trigged RMS's "Why you should not use Tcl" message, which triggered the TCL War, which triggered Sun to switch to Java.

>After the TCL war finally subsided, Sun quietly pushed TCL aside and loudly evangelize Java instead. The TCL community was quite flustered and disappointed after first winning the title "ubiquitous scripting language of the Internet" and then having the title yanked away and given to Java.

>Any talk of bridges were just table scraps for TCL, the redheaded bastard stepchild sitting outside on the back porch in the rain, smoking a cigarette and commiserating with NeWS and Self.

>Tom Lord's description of what happened is insightful and accurate:

https://web.archive.org/web/20110102015130/http://basiscraft...

>The Infamous Tcl War

>[...] Mr. Ousterhout had, a few years prior, developed Tcl while on the faculty of UC Berkeley - mainly, I think, to have a handy tool for other research and only secondarily as an experiment in language design. And he topped it off with Tk. Tcl/Tk took off in a huge way. It was easy to understand. The source code, written in Mr. Ousterhout's methodical and lucid style, was a joy to read. At the time, about the most convenient option for developing a GUI to run on a unix system was to write C code against the Motif toolkit - an ugly, expensive, and frequently disappointing process. With Tcl/Tk in hand, people started handing out new "mini-GUIs" for this and that, like candy. Tcl/Tk started to find application in some rather intense areas, like, for example, the "control station" software for some oil rigs. It was a smash hit.

>Meanwhile, I don't think I'm letting too many cats out of the bag here, the informal Silicon Valley social network of well placed hackers were quietly and unofficially circulating some very interesting confidential whitepapers from Sun Microsystems. One of their researchers, a fellow called Mr. Gosling, had dusted off a language he'd once led the design of called "Oak". Oak was originally intended for use in embedded systems. Its basic premise was that devices ought to be Turing complete and hackable, whenever possible. Oak's approach to statically verifiable byte-code comes from that origin. Mr. Gosling came out of Carnegie Mellon University and the attiude behind Oak was popular there. As one grad student had quipped a few years earlier: "If a light switch isn't Turing Complete I don't even want to touch it."

>In light of the rising star of web browsers, the folks at Sun conceived the notion of offering up a derivative of Oak to serve as the extension language for browsers. (It is probably worth mentioning here that Mr. Gosling was earlier well known for making one of the very first unix versions of Emacs.) Oak was re-named "Java" and the rest of its history is fairly well known.

>I've read, since then, that up to around that point Brendan Eich had been working on a Scheme-based extension language for Netscape Navigator. Such was the power of the hegemony of the high level folks at Sun that the word came down on Mr. Eich: "Get it done. And make it look like Java." Staying true to his sense of Self, he quickly knocked out the first implementation of Mocha, later renamed Javascript. This phenomenon of Sun's hegemony influencing other firms turns out to be a small pattern, as you'll see.

>Mr. Ousterhout was hired by Sun (later he would spin off a Tcl-centric start-up). The R&D; team there developed a vision:

>Java would be the heavy-lifting extension language for browsers. The earliest notions of the "browser as platform" and "browser as Microsoft-killer" date back to this time. Tcl, Sun announced, was to become the "ubiquitous scripting language of the Internet". Yes, they really pimped that vision for a while. And it was "the buzz" in the Valley. It was that pronouncement from the then-intimidating Sun that led to the Tcl wars.

>Mr. Eich, bless his soul, brute-forced passed them, abandoning Scheme and inventing Javascript. [...]


Here is Brendan Eich’s HackerNews comment on the topic:

https://news.ycombinator.com/item?id=1905155

(2010, he was still at Mozilla)


Ah yes, it's a shame every website doesn't have to download and execute a python interpreter so it can slowly read scripts. Having the browser freeze after clicking a button would be a way better experience.

WASM is meant to allow for fast/efficient programs written in low level languages like C/Rust to run in the browser, not subject the client with slow clunky experiences because the developer only learned Python.


Guido responded and wondered if this could be integrated into github.dev for Python work in the browser without remote compute. That’s a very cool idea, but I wonder if this would work any better with the usual suspects (pandas, numpy, mathplotlib) than the other attempts like pyodide which make more modifications to CPython.


What is the issue with Pyodide?


What is the novelty here?

E.g. we have https://github.com/pyodide/pyodide and other examples.

The cool part here is Emscripten, which has been around for a long time.


It's pretty cool that this latest version uses the latest main branch of CPython directly, without any additional modifications.


Yes exactly - numpy, matplotlib etc available in the browser via sensible JS interfaces could significantly impact the front-end ecosystem IMO. Exciting times and great work from the community.


Alright. To be fair, I've reached the front page on HN by just compiling a C linear programming solver with Emscripten.


And before it we had Python on the browser via ActiveX and ActiveState's plugin.


I imagine the biggest downside of this approach is simply the size of the CPython implementation. Does anyone know how big it is when compiled to WASM?

I wonder if anyone has tried the same approach using MicroPython.


There's a version of CPython 3.6 compiled here (https://wapm.io/package/python) which is ~5mb gzipped.


I guess that would be prohibitively large for some uses.

I’d expect MicroPython (or Lua/mruby/etc) could be an order of magnitude smaller. Still larger (and slower) than just using JavaScript, though.


> I’d expect MicroPython (or Lua/mruby/etc) could be an order of magnitude smaller. Still larger (and slower) than just using JavaScript, though.

Fengari [0], a Lua interpreter written in JS, is a little over 200Kb. (And was intentionally written in JS [1] because of a variety of reasons that made WASM not work that well.

200Kb isn't that bad of a price to pay to switch languages, on most websites. It'll be about the cost of a single image added to the page. And it's fairly performant.

For most sites, the costs in terms of requests and performance will be negligible compared to what you're trying to achieve.

And Fengari makes it nice and easy to interact with JS, too. Using React with Lua's syntax was what sold me on it. No ecosystem lockout, like I'd expect with most WASM ports.

[0] https://fengari.io/

[1] https://hackernoon.com/why-we-rewrote-lua-in-js-a66529a8278d


I once had to squeeze CPython down for embedding into a mobile app. I ran our workload under strace so I could include only the needed parts of the stdlib, and ended up with just under 3MB zipped. That's probably about the theoretical size limit.


This is it. This is the beginning of a revolution. Prepare your fork-picks.

Jokes aside, JS ecosystem really needs a competitor. Web developers have been cutting corner after corner for decades, with ever increasing disregard for performance and memory consumption.

Now with both Python and Rust in the browser, things may change for the better.


CPython is orders of magnitude slower than modern JavaScript engines. Especially so when the interpreter is compiled into WASM.

It is interesting from an interoperability point of view, but this purely negative from a performance and maintainability point of view.


I’ve been primarily a Python web dev, focused on Django.

But in picking up modern frontend, even in the past two years, the frameworks and toolchains have matured or simplified a great deal.

The training options are plentiful. To some extent, Node has Deno “competing” to add features and provide performant backend.

Esbuild, for example, is built on rust and unlocked massive performance improvements in bundling.

It seems like JS is in a better position than it has ever been.


Esbuild is written in Go. You might have confused it with SWC, which is written in Rust and offers similar (or better) performance.


Thank you for the correction.


I really want you to be right, but what incentives does your run-of-the-miil web dev hav for switching to WASM + (pick language here)?


Classic web-dev people have little incentive.

All others have lots. That's the point.



This looks exciting. If anyone knows how to, perhaps they can report how much space this uses in the browser? Some benchmarks? Someday soon I hope we can use this for developing code on the browser to aid with web applications.


Hi, author of the tweet here

It is definitely too early for benchmarks, this is a "I got it working!" update.

The original data file with all of the standard library was a bit over 200MB. Slashing what isn't going to be run in the browser (e.g. tkinter) and zipping the standard library got it down to about 20MB. There is probably more that could be removed, and there are modules we don't need to build that we currently do. There are other things we can do like set the less frequently used modules to be loaded asynchronously.

While I doubt this will be production ready "soon", I do hope to keep working on fixing bugs and such.


How does that compares to Brython [1]?

[1]: https://brython.info/


Brython is a complete re-implementation of Python, thus it doesn't support some features/libraries (at least, it didn't when I tried it last), and is not compatible with C extensions.

The demo I put in the tweet is the same code as when you type `python3` in the terminal, just running in the browser. So it is much more compatible and is mostly [1] feature complete.

[1] minus whatever libraries are likely never to be used that we ripped out


See "Running Python in the Browser" [0] - I believe this approach is similar to Pyodide's.

[0] https://yasoob.me/2019/05/22/running-python-in-the-browser/


Very cool, I did some experiments with libxml on a WASI environment a few years ago (https://github.com/matiasinsaurralde/wasm-libxml2).


This is like reverse cloud system. Don't run on cloud, or in your os, run on your browser.


So, Castle in the sky system?


How is it directly replacing the JS REPL in the console? I didn’t know that was possible.


It's not. The output appears in the console but input is provided in the page.


Is there DOM access?


Since WebAssembly doesn't have DOM access, the obvious guess would be no.


If I'm not mistaken, once interface types are added, the DOM can be made accessible to WASM.


WASM can already access the DOM by calling out into Javascript. AFAIK interface types wouldn't change that approach much, except that the mapping between numeric ids and Javascript object references doesn't need to be handled by the JS shim anymore.

So the answer to the question "Can WASM access the DOM?" is both yes and no, always has been, and probably always will be ;)


Would it be possible to use Python directly in say Chrome, assuming someone pushed a PR for chromium to do this?


The beauty of WebAssembly is that you don't need Google's permission to add support. Just send your Wasm blob to the browser and Chrome's existing Wasm runtime will just run it.


My knowledge is years out of date, but does WASM still require the application to request its maximum memory footprint up-front? Granted, that's what Sun/Oracle's JVM has been doing to allocate its heap from the OS for well over a decade, but I'm also not aware if WASM is able to use the equivalent of madvise() to tell the browser/OS that it's fine to unmap a region of memory and map it back zeroed-out when it's next needed.


Yep you need to specify the maximum memory amount up-front. Its defined as "webassembly memory pages". Each page is 64kb. You need to specify an initial and a maximum amount. The webassembly module can call memory.grow() to grow it by a page until it reaches the maximum. Though you can't "un-grow" or decrease the amount of allocated memory.


This is not correct, it is not necessary to specify a maximum memory size. See the WebAssembly specification https://webassembly.github.io/spec/core/syntax/modules.html#.... Due to 32-bit address space, the maximum memory is limited to 4GB however.

(In asm.js, memory was provided by an ArrayBuffer of fixed size, so there memory could truly not grow at runtime.)


You'd need a pretty solid reason to want users to download the Python engine to run your code in their browser every time they visit an updated version of your site though. "I like writing Python better than JS" would be a sucky excuse.

If anyone does choose to do this I hope they spend significant amount of effort making their caching and code splitting optimal.


Given that there were people paying ActiveState for their Python ActiveX, I assume there are enough people that care enough for such use cases.

Many devs already make me download their SPA to display static text and images.


Say what you will about Jupyter notebooks and all that, but the talent pool for Python is still at a higher level. Then, this could be the worse-is-better equilibrium, but there's also a market-for-lemons situation regarding web dev these days.

https://en.wikipedia.org/wiki/The_Market_for_Lemons


My impression was that top-dollar was being payed for web-devs, with competition from some of the biggest tech giants driving the trend. Not sure that's a market for lemons, unless you are talking about the lower-end of web dev.


What the caching story like? Is it possible to cache the Python interpreter in one (unchanging between apps) blob and then send another blob with your app-specific code? I’m imagining a world where lots of apps want to use WASM Python but don’t want to have to ship the whole interpreter with their page.


Cross-Origin resource caching has been disabled by all modern browsers by now. So your page could use the same cached Python interpreter again and again, but you and example.com would each have to download the interpreter, even if it comes from the same URL.


With normal Wasm blobs that’s more of an issue. It wouldn’t be an issue in this case because you would just pass your Python script in as text.


There is this: https://pyodide.org/en/stable/console.html Not sure what you mean by python in chromium however, I assume, pyiodide may already be doing this.


I want Python to be used the same way JS is used. As in a native 3.7 interrupter in Chrome


I don’t see how that could be done without nerfing a lot of the power of JS. JS was bred by the web and is entirely async. Python has no concept of callbacks or promises etc. JS is perfect for frontend design because that’s what it was created from the ground up for


Python has always had the ability to do callbacks (ie. first class functions), though admittedly it's not a common pattern and the stdlib doesn't support it for IO (also its anonymous function syntax is poor).

There have been several libraries that provide the concept of a promise, but with modern python these are built into the stdlib with native async/await support, though many interfaces are still sync-only (though you can work around that with things like gevent that monkey-patch those interfaces to work with an event loop under the hood).

I agree that python would be a poor fit for the kind of uses that JS in the browser usually serve, but it's really a matter of ecosystem, not core language design.


Python would be optional here.

You could still use JS while Python support catches up.


They did wasm to not do it and have it at the same time.

What they can do is to provide GC to unbloat a lot of those runtimes I guess.


i don't know the story under wasm, but i looked into what it would take to embed python into a browser years ago.

the hard part at the time was obviously all the hooks between the dom and the javascript runtime as well as concurrency story. python 2 was not built to be driven by callbacks, which is how the whole browser/javascript ecosystem works.


web assembly is making leaps. looks great


Nah, just catching up with what was already possible with ActiveX, Applets, Flash and PNaCL, just more cross browser and political acceptance across all parties.


That’s nice.


[flagged]


Careful, you should reread the HN posting guidelines - specifically "Please don't use HN primarily for promotion. It's ok to post your own stuff occasionally, but the primary use of the site should be for curiosity." This site is for discussion, not promoting your startup, and a large portion of your comments have been just that.


Thanks for the feedback, but I'm doing this to try to help game developers solve a major problem, half of the show HN and 100% of the launch HN are startups trying to make money anyways and the point you're trying to make never gets brought up.

Think about it this way - if it solves a very real pain point for developers, which is the primary demographics of this site, then why try to suppress that..?


Because if they didn’t ban ads like yours, nobody would use the site.


So do a Show HN? It seems totally off topic for this thread.


Tried many times and it just gets buried


Maybe take that to heart then, and figure out what's missing in your messaging (or audience? there's intersection with game development here, but this is not a forum for game development).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: