Hacker News new | past | comments | ask | show | jobs | submit login
Populating the page: how browsers work (2020) (developer.mozilla.org)
277 points by sigma5 on Oct 13, 2023 | hide | past | favorite | 36 comments



That was really informative, but it raised a question for me:

It described the browser as single threaded, but then talked about multiple concurrent tasks. Aren't those threads?

One more question: are there any browsers that use multiple threads to lay out the various object models and render the page? If it's been found to be too difficult, what were the issues?


To paint broad strokes, the layout phase (~= take the HTML, take the CSS, determine the position and size of boxes) is largely sequential in production browser engine today. Selector matching (~= what CSS applies to what element) is parallel in Firefox today, via the Stylo Rust crate originally developed in the research browser engine Servo. Servo can do parallel layout in some capacity (but doesn't implement everything), https://github.com/servo/servo/wiki/Servo-Layout-Engines-Rep... is an interesting and recent document on the matter.

Parallel layout is generally considered to be a complex engineering problem by domain experts.

Rendering the page, as in deciding the colour of each pixel and putting them on the screen, based on the layout, style, and various other things, can be done with lots of parallelism, on the CPU or on the GPU (that is preferred on most platforms in production browser engines, these days).

https://hacks.mozilla.org/2017/08/inside-a-super-fast-css-en... is a really cool article that is related, that is a few years old but what it says is largely correct today.


Cool, thank you for the informative comment!


This 4 part series is related and excellent. https://developer.chrome.com/blog/inside-browser-part1/


MDN is just simplifying and describing the browser from the web developer's perspective.

Browsers are not literally implemented in a single thread.

Javascript execution is determined by an event loop. Events are queued up by whatever means the browser implementer wants as long as there is only a single thread actually consuming the events. This is where the notion of being "single-threaded" originates. The web developer assigns handler functions to event listeners which are called by this consuming thread later when the event occurs.

This kind of concurrency is cooperative multitasking. The code is executed asynchronously, but not in parallel.

The renderer is the entry point since the HTML contains the CSS and JS tags. Generally speaking, the HTML is rendered line-by-line in the order the tags are written, but in practice there are some aspects that deviate from this such as the "async" and "defer" attributes for script tags as well as any HTML or CSS requiring network requests that cannot block rendering the rest of the page (img tags, url CSS values, etc.)

Naturally this ability to make network requests is implemented as a thread pool (at least on modern browsers), but any Javascript waiting on that would not execute until the event is consumed and its handler is called which preserves the illusion of being "single-threaded". As for loading images, fonts, etc. from CSS/HTML, the developer cannot control when they are loaded and rendered. Anything that really does need threads is handled by the browser already.


Concurrency can be done on a singular thread


My understanding is that single-thread concurrency is essentially what Javascript does. It basically flickers between tasks very rapidly to simulate concurrency. Does that match your understanding or am I incorrect?


I don't think that "flickers between tasks very rapidly to simulate concurrency" is a good mental model for event loops. It's more like "runs one task at a time until it hits a suspension point," where a suspension point is something like an I/O operation. If you had an event loop that switched tasks between suspension points, then you'd still need locks for shared data.


It doesn’t simulate, and the “flicker” is named “event loop”, but otherwise you’ve got it right. The concurrency model is essentially cooperative, ie pending tasks wait for the current task on the event loop to unblock, and then they are each executed in turn (with several different scheduling priorities based on how they became pending, eg synchronous event callbacks, Promises, various timers).


> It basically flickers between tasks very rapidly to simulate concurrency.

That’s usually used a description of preemptive multitasking, like what you get running a modern OS on a single core (on multiple cores it’s also this but on each core). Every once in a small while, the current task is made to inhale chloroform, another task is chosen to wake and takes its place. Pro: you can’t hang any given task from a single thread; con: shared memory is very difficult to impossible, as you never know when you’ll be interrupted.

Browser JavaScript and UI interactions instead use cooperative multitasking, which you’ll also find in classic Mac OS, 16-bit Windows, basically every embedded system ever, and languages like Python and Lua. A task has to explicitly call out to the scheduler to hand off execution resources to the next one. Pro: as task switching only happens at clearly visible points, many of the horrors of shared memory are diminished; con: you can and will hang the UI (or other outside-world interactions) if you accidentally write an infinite loop or just do too much CPU-bound work without yielding (as any developer using one of the aforementioned systems knows).

For how this works in the browser environment specifically, see Jake Archbald’s superb talk[1].

[1] https://youtu.be/cCOL7MC4Pl0


You're mixing up concurrency with parallelism.



Neat. I'm trying to write a browser, of a kind, which is focused on only presenting content, navigation, and an index of content where appropriate. I'll try to incorporate things like this into my thinking.

(Browser is here, very early stages, not appropriate for any kind of use really while I move to fltk: https://github.com/jmthackett/freeflow )


Your project looks really neat. I wish for a browser that only connects to localhost and never makes network calls under any condition, so that it’s purely the interface to a desktop app without compromise of security or privacy. A less important wishlist item if for browsers to render a DOM from an extended markdown instead of HTML so that accessibility and semantics are always correct by default.


I have been looking for exactly this sort of thing for ages, I'm so glad to have found your comment!


Thanks! It is still early days and I don't have much time but I'm telling myself I'd like to be able to daily drive my news sites with it eventually.

Annoyingly it segfaults a bit at the moment, but it'll get there with time.


You can’t change the internet. You could change the browser. Nice.


Neat! Sounds like the kind of thing that would pair well with RSS feeds. I'll definitely keep an eye on this!


Thanks! It uses RSS if it is present. Plus sitemap.xml if that's available, too.


Fun read that puts a lot of things together that I "sort of knew" but never really knew.


It's funny how most diagrams describing DNS or TCP look like they were made in the 2000s (and probably were)


The diagrams in this article are pretty bad no matter when they were made.


Hey web perf folks: the visa.com waterfall image on this page... what was used to generate it?

Also, I love mdn. To this day it's the best technical writing about web development. Every concept is so well explained.


The waterfalls match the ones generated by WebPageTest, though they could be using a third party tool themselves


It's a nice article.

I would like one that goes a little bit deeper into the initial part of the Browser-Server interaction (but still readable in one sitting). Touching things like the headers sent by the browser.


Maybe not what you mean, but the “browser sends an initial HTTP GET request” can be through of a a text file that is sent. The text file is the the request type as a word in the first line along with the path and http version. The next lines are the headers: GET / HTTP/1.1 Host: www.example.com User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,/;q=0.8 Accept-Language: en-US,en;q=0.5 Accept-Encoding: gzip, deflate, br Connection: keep-alive

The response is the similar. But starts with the status line, the headers, then the body (not HTML body, but the payload of the response:

HTTP/1.1 200 OK Date: Fri, 13 Oct 2023 15:04:05 GMT Server: Apache/2.4.41 (Unix) Last-Modified: Mon, 9 Oct 2023 16:30:00 GMT ETag: "2d-58e4-4d5f487a7b300" Accept-Ranges: bytes Content-Length: 1024 Content-Type: text/html Connection: keep-alive <html> <head> <title>Example Page</title> </head> <body> <h1>Welcome to www.example.com!</h1> <p>This is a sample webpage.</p> </body> </html>



Same. I remember first reading about all of this back around...2013 maybe? But it blew my mind at the time. I want to say that the article I read was that last one from Paul Irish, but I'm fairly certain the author of the article was a woman. She had dug deep into Chrome's internals to figure out the rendering pipeline and mostly focused on that. It was a really neat article though it might be slightly out of date at this point.


Probably this: http://taligarsiel.com/Projects/howbrowserswork1.htm

Paul's article references it


Ah yep, that's the one. Thanks!



Would be fun to read an article about how websites abuse browsers. I keep getting frustrated with https://www.macrumors.com and its insane amount of network calls, ads, bits of javascript, and other endless connections. Browser features started off so easy to understand, and now it's crazy. I had to write a webapp for the USPS in 1998 using IE 2, which had almost no features, and occasionally the features they had actually worked...


It'd be nice if browsers would show an objective, stats-based score when you visit a website on how much bloat there was.

Shame bad websites and make them chase "green" scores.


Speaking of green, if your browser could tell you “this webpage has consumed X watt-hours of electricity and has cost you $Y in electricity”, that'd be amazing. I think Apple devices could actually do it.


https://developer.chrome.com/docs/lighthouse/overview/

Lighthouse is already built into Chrome amongst many other tools. Lighthouse scores are already a factor for search result rankings. The fact of the matter is, those junk sites don't care.

Even if these scores were exposed to the user by default instead of hidden away in a dev tool, it's just not a strong enough incentive for those sites to change or users to care. Rankings are more strongly affected by adwords, time-to-first-byte, link juice, etc.


Field metrics as measured by the Core Web Vitals / CRUX are a factor in search ranking, not Lighthouse.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: