Hacker News new | past | comments | ask | show | jobs | submit login
We rendered a million web pages to find out what makes the web slow (catchjs.com)
182 points by simonpure on Jan 26, 2021 | hide | past | favorite | 163 comments



This study is flawed for a few reasons that I'd mentioned to the author when it was first published, but they ignored my comments.

The main problem is that they confuse DOMInteractive for Time To Interactive. The two events are very different. DOMInteractive is a legacy event that refers to the time when JavaScript can safely interact with the DOM. This really has nothing (or very little) to do with user experience.

Time to Interactive, on the other hand, is the time when a user can expect the page to respond quickly to interactions (clicks, scrolls, keypresses), ie, these user interactions should have no user perceivable latency. This is not a metric that has a corresponding browser event. It requires a bunch of post-processing based on various events. The researchers have made no attempt to calculate TTI.

Source: I've been working in this area of research for 15 years.


What way would you suggest measuring this? Duration between TTFB and when user events first get pumped?


The simplest way to measure impact on users would be to just use boomerang which gets you all of this out of the box: https://github.com/akamai/boomerang

To do it yourself, you need a combination of PainTiming and measuring LongTasks or study the periodicity of setInterval/requestAnimationFrame

The above works for synthetic as well as real users. If you're able to measure real users directly, then you'd also want to look at Input Delays, Rage Clicks, Dead Clicks, Missed Clicks, Scroll Jank.


> I've been working in this area of research for 15 years

Is the outcome of the research by CATCHJS different from what you have learnt?



The article talks about a million web pages, but it's actually the top million domains. This is an important difference because the top sites could be completely different to the web pages that we view.

For example, Facebook is probably high up the list. But which Facebook page do they measure? The 'front page' of a non logged in user, presumably? That would be vastly different from the majority of FB users. Likewise, Wikipedia is also an extremely popular site. But surely most users are looking at specific pages of interest, and not just the front page?


This reminds me of an experiment [1] we run a couple months back. We crawled top 100 Alexa websites and check the bloat in the images served to billions of users.

[1]: https://optidash.ai/blog/optimizing-images-on-the-worlds-top...


That's really cool. It would be interesting to see what that 32% savings adds up to both for typical market bandwidth and AWS like egress rates.

Curious, did your comparison ensure that none of the images lost any detail, etc? Or how much "lossiness" did you introduce to get the 32%?


Thanks for chiming in.

It's a perceptually lossless optimization and recompression.

We use saliency detection (trained on an eye-tracker) which tells us where the human vision system would look in the images and optimise those fragments (heatmaps) using our own comparison metrics.

If you're interested in the details shoot me an email to przemek [at] optidash [dot] ai


Thank you for sharing this.

It's interesting to see that even these big websites potentialy still have a lot of room for improvement for their loading times.


The speed correlation with HTTP 0.9 and HTTP 1.0 is interesting. While it is probably more the case that newer protocols are serving newer content which is slower for myriad reasons, I find myself wondering if there are interesting correlations regarding what's being served by older protocols. Is it the case the content on the older protocols is living in an intersection of mission critical to somebody (so it has never gone away due to simple lack of maintenance) and either sufficiently functional as is that nobody has seen a need to upgrade its infrastructure, or too deeply tied into a consumer that assumed quirks of a protocol for upgrading to be tractable?

It would be interesting to get a drill down on what is being served on those protocols.


I'd be willing to bet (a very small sum) that the HTTP 0.9 and 1.0 servers encountered are app servers/frameworks. They're simpler to implement than HTTP 1.1 and don't set expectations on the client they can't meet.

When you've got a fleet of machines behind load balancers you don't need things like a Host field to support vhosts since it's one site to a host. You also don't need pipelining because each connection is a one and done operation.


I would read that followup article.


It's a blog article yet has no date on it. Why are these sites so obnoxious and secretive. Let us know when the blog was written.

34 days ago:

https://news.ycombinator.com/item?id=25517628 (115 comments)

https://itnext.io/we-rendered-a-million-web-pages-to-find-ou... (Dec 22, 2020 - repost from CatchJS)

WRITTEN BY

Lars Eidnes Chief error catcher at https://catchjs.com


It really shows why big tech own the keys to the internet. We hand them all our browsing history.


The sites we visit do. I didn’t tell spammers my phone number either, but every org that requires that field to be filled thinks it has an obligation to sell it to them. Skipping few steps, it is trust problem. We don’t trust each other, thus some money should be spent on being top trusted and top wanted at open markets/serps. We have to trust big tech because staking money is the only way to the trust and biggest money concentrates at big tech. What should be done?


> There's a handful of scripts that are linked on a large portion of web sites. This means we can expect these resources to be in cache, right? Not any more: Since Chrome 86, resources requested from different domains will not share a cache. Firefox is planning to implement the same. Safari has been splitting its cache like this for years.

[most of the top common resources are Google (and FB) ad tracking]

I read this as Google is willing to spend millions upon millions to move huge amounts of additional unnecessary network traffic to make sure only they can reliably track most people across the whole of the web.


Even if Google didn't have a stake in targeted advertising, shared caches lead to easy identification - browsers have a duty to close off any avenue by which users can be tracked.


I don't use any externally hosted scripts on sites I develop.

Security is part of the reason but the bigger problem I find is tying uptime to somebody else whom I have no control over.


Gotta spend money to make money


My guess would've been 1) ads / tracking and 2) bloated JS frameworks.


What is Amazon Publisher Services, is it a tracking script or something? I've never heard of it.


Basically their version of AdSense. Integrate Amazon shopping ads into your content.


https://ublockorigin.com

Install this wonderful extension to save hours of your time and gigabytes of your bandwidth.


Any good alternative for Safari?

I knew I would get alternative browser recommendations because of my phrasing... but I was also looking forward to them!


As far as Mac Safari recommendations, I've tried most of them. Better seems the leanest and most effective for me. I have no relationship with the developer.

https://better.fyi

From their website: "Better is hand-crafted by Small Technology Foundation, a tiny two-person-and-one-husky not-for-profit striving for social justice in the digital network age. We curate our own unique blocking rules for Better based on the principles of Ethical Design."

Nice app.

Be aware, Better actually allows most compact, low compute, non-tracking ads. Anyone who wants to serve me respectful ads that don't abuse my privacy or my compute resources are absolutely welcome on my system. Happy to help. Non-respectful ads are not welcome.

While I'm promoting small Indie browser extension makers, I also like the StopTheMadness extension. This kills lots of rude click/function hijacking that is done by many obnoxious web pages. It also stops a lot of tracking code. Again, I have no relationship with the developer.

https://underpassapp.com/StopTheMadness/

Between the two of these, browsing becomes much less user hostile.


I find Firefox to be a great alternative for Safari.

I joke but from what I know it is only available for Firefox and it either will soon stop working/does not work on Chrome.


uBlock origin works fine on Chrome today as it has for years. With their new "no third party cookie" BS we might see it break but until then I don't think they can be so user hostile



Remember the wall of text and flame wars posted on slashdot whenever hosts is mentioned


Firefox. But kidding aside, for what I've found on adblocks for Safari you need to pay and I believe those are quite inferior in blockage to uBlock.


If you are in the "I wish Safari blocked ads" boat but don't know where to go, let me recommend an out…

There are a lot of reasonable choices, I've been using Wipr for years. It does a good job of blocking ads and costs about $2. There are macOS and iOS versions, they update their lists, and because of the architecture have no access to your browsing so can never be tempted to start farming you.

(No affiliation, I haven't done an extensive comparison, but this one works and isn't expensive and you can stop worrying about what to do if you do this now.)


Have to agree here. I actually quite like safari, but cant use it due to no uBlock Origin which is unfortunate. uBlock and Facebook Container have made all the difference for me.


I use AdGuard, a content blocker that supports the major ABP lists like EasyList, Fanboy’s, etc. I wish there was a content blocker that support automatic translation of ABP lists into content blocker rules—I’ve been working on one but haven’t found the time to finish it.


To expand on adguard, you can self host their dns blocker via docker. It works really well and IMO a much nicer experience than trying to get pihole perfect.


For iOS: download Firefox focus and set it as Safaris content blocker.

For desktop & mobile: use something like nextdns/adguard or pihole & consider installing it on your home networks router.


I installed Magic Lasso and Firefox independently, and also together, and together they were basically perfect.

(It’s been 2 years or so, for all I know, they have both advanced to the point of being sufficient alone - but I did not experiment, since everything just works so well)


Wow, never knew FF Focus could be a content blocker on iOS. I'll try it over Wipr.


I run Wipr on Safari. It's quite cheap, and since I don't use Safari as my main browser, the fact that there's nothing to configure is a bonus.

It was also easier to install it on my parents' computers than to convince them to change browsers.


I've developed my own blocker for Safari, mostly because I was tired of other blockers breaking too many sites for me. So, it is designed to be less aggressive in filtering.

Give it a try if you want. It's free. https://ads-free.app/Ads-Free!%20Desktop/


AdGuard is a free extension that, as far as I can tell, is on-par with uBlock Origin. It's been great so far.


I've flip-flopped between uBlock and AdGuard for a while now. Generally find AdGuard to be slightly better and has a much nicer UI.

Also has a neat broken site reporting system in it which automatically generates a GitHub issue to fix the filter lists from a simplified form. Automatically prioritises sites via an algorithm, probably their Alexa ranking or something similar.

https://github.com/AdguardTeam/AdguardFilters/issues

I've found the issues get fixed pretty quickly too.

They've got an iOS app as well which integrates with the Safari content blocking system.


uBlock or uBlock Origin? They are not the same thing.


Sorry, uBlock Origin


Not sure how applicable this is to Macs but I found that blocklists based on the hosts file cover almost all ads. The only exceptions I encountered were YouTube ads as well as ads hosted directly on the website, which is pretty rare nowadays.


https://github.com/notracking/hosts-blocklists

Use this for network wide blocking of all sorts of virtual garbage. Not only for safari, but all your locally connected devices.


safari on iphone uses a system-wide ad-block list, so no need for browser plugins.

'1block'[1] kills even the video-ads on yt for free. 'firefox focus' has built-in ad-blocking but does not get the yt-ads afaik.

[1] https://apps.apple.com/app/id1365531024


Just use NextDNS. DNS blocking is just so much more efficient. And gets the job done 98% of time.


Adguard, its also usable for free and doesn't have shady deals with ad companies.


It's 2021. Who isn't using an ad blocker?


have been pressing this for around 15years meanwhile the percentage of adblock users actually got smaller.

i wonder if the the fact that the most popular ad-tech company also producing the most popular mobile os has something to do with it.


Does this number includes mobile browsers? Given the big mobile browsers do not have adblockers (or even extensions altogether?), that would explain it.


Mobile Safari has had content blockers for years.


And yet, surprisingly, almost no one is aware of it.

I have informed many of my technophile, uBO/ABP friends and colleagues that it’s possible, and not one of them was aware before I told them.


same here. then again a lot of "technophile" folks are in webdev and favor a vanilla browser experience.


pretty sure that mobile and the inclusion of masses of tech-unsavy users is responsible.


I don't use uBlock. But I learned how to write chrome extensions and it turned out extremely easy to insert my own CSS and JS snippets to the selected pages. So I just added few URL filters to remove most obnoxious tracking and ads, I added very few CSS edits to the selected websites to remove popups and I added some JS to youtube to remove its ads. Web is pretty fast and usable for me. I did not cut every ad, but I don't often browse new websites and I'm okay with some ads as long as they're not very bad.

The reason I don't use uBlock is because I think that it's overkill for me to run thousands of filters for every website in the world. And also I like the fact that I'm in control of my user agent. For example recently I turned off feature on some website which paused video when I switched to another tab. I did not like that feature, so I disabled corresponding JS handler, simple as that.


> The reason I don't use uBlock is because I think that it's overkill for me to run thousands of filters for every website in the world.

In return you get thousands of lines of tracking and advertisement JavaScript running on your machine for almost every website in the world.

Is that better? ;)


I don't use an ad blocker - I feel bad since ad revenue is the only thing most of these sites have (OTOH, I don't run ads on my own blog because I don't like what the ad-supported internet has become).


You should reconsider. Facebook made more than $80 billion in revenue, a majority of it through ads.


75% of internet users


And what percent of those are real humans?


Almost all of them I assume? Browser ad blocking is much less widespread than people seem to think, otherwise there wouldn't be so many ~trillion dollar companies built around tracking and advertising on the internet.


Ads are usually the number one culprit why simple "news" pages take sometimes dozens of seconds to finish loading, (and in some egregiously engineered designs, the page is unusable until every last script/ad and font has loaded), and spawn several processes and consume hundreds of megabytes of memory and utilise 50% of your i7/i9 3000MHz CPU, all for displaying a news page or an article.

The JS for actual essential site functionality often pales in comparison to the assets and scripts activated by ads, which simultaneously track you.


It's not just the ads. Tag managers are the scourge of performance. Marketing folk just want to stuff them full with dozens of analytics services and third party integrations. And they want to do that without involving pesky engineers.

Some, like Google Optimize, tell you to to clobber performance by hiding all content on your page using CSS until their code has loaded and run.


Also the accursed videos every site wants to cram down your throat.

No no no.

Mostly text, MAYBE some images.


Anyone who directs their web team to implement a floating picture-in-picture autoplay video whose pause and close buttons either render late or don't work at all deserve a special place in hell.


There used to be an add-on (or was it even built-in?) to block loading videos and images unless explicitly requested by the user. I sometimes still miss that, especially on mobile.


In other words, ads and tracking are given priority over content. Welcome to Capitalism!


How about a standard how heavy the site (and other meta details) before it is loaded within the html spec? In this way, a user can opt out to continuing to browsing with details from say a short response header.

Is something like already proposed anywhere, or not really a solution to the problem — thoughts?


JQuery being more prevalent than Google Analytics is a surprise to me.


This is a good page that shows why JQuery isn't going away anytime soon: http://youmightnotneedjquery.com/


jQuery is a default dependency for many WordPress blogs. Not all of those are run by folks who want/care about analytics.


Sure. I was assuming that culling to the "top million" would skew things in favor of GA. Clearly it didn't, but I was surprised. And you're probably right. All those Wordpress instances does drive a lot of the stats.


I had hoped at some point in the breakdown that they differentiated between Wordpress and.. everything else I guess.

Wordpress itself probably skewed the numbers significantly.



Skimming through the article didn’t give a meaningful tl;dr - it’s complicated and there is no prevalent answer. Personally I got two (naive) things:

  - latency is bad, do a single request at start
  - don’t use jquery
  - ui frameworks are not part of the issue(?)


They found jQuery is correlated with longer times to interactivity. It doesn't necessarily cause it.

The latest jQuery is 85 KB (~30 KB gzipped), and doesn't do anything time consuming on load. My guess is the types of sites that don't use jQuery happen to be the types of sites that are faster.


jQuery really isn't a problem - it's just a light-ish wrapper around the native JS APIs. The problem is JS itself or rather the fact that it's used where it really shouldn't be.


It’s understandable that keep it simple is a good technical rule. But it is also interesting for app developers who have to use some kind of inplace dom modifications. Personally, I don’t (yet) see how an in-browser html parser could be much faster than createElement from e.g. hyperscript driven by pojo. It is basically the same recursive process, with an exception that small updates generate small modifications in the latter.


With jQuery (as an example) when you change elements it does so on the live DOM. This often causes the browser to re-layout, re-render, and re-draw content. If you create a thousand elements in a loop that's tons of extra work for the browser and blocks interactivity. You can do batch updates or non-display elements but it's not (or at least wasn't) built into jQuery. It's faster but more complicated.

With React and the like they operate on a copy of the DOM and automatically batch updates to it. All of the changes are applied at once so there's fewer layout, render, and draw events.

Part of the problem with jQuery is there's a cohort of web "developers" that grew up with it. Instead of learning the actual web stack they learned jQuery. Everything is put together (slowly) with jQuery. Even if they've been promoted out of junior positions they're not PMs or marketing drones requiring slow scripts because they're written in a style they understand.


jQuery isn't the problem as much as jQuery "needing" to exist is the problem.

Developers still have to make webpages that work with customers that are stuck on old phones which will never see another firmware update, corporate desktops that have some forsaken frozen copy of IE6 that gets used for some legacy corporate platform (and then end users either don't or actually aren't allowed by IT/security to use any other browser), people with PCs that 'still work' (and if you're lucky have Windows XP with whatever it came with)... etc.

Generally we can't have nice things because that legacy cruft is going to be around until all of that legacy falls over and dies so hard that even laypeople consider it laughable to assume it still works. It's got to be like those old blenders or fans... wait they still work; we're cursed forever.

Want the Internet to be fast? jQuery is a defacto standard that needs to be made native in browsers and then when called for not loaded as script but already existing as native code and management models.


Are those legacy-supporting sites really the slow ones though? Targeting old software usually also means targeting old hardware, so those sites tend to perform reasonably well (and in general, old browsers didn't give developers many options to slow things down).

The problem really starts once developers can start ignoring old platforms and (ab)using the full power of a modern browser. The problem isn't any specific library, but the mindset developers use to develop web sites/apps.

Bundling 20 dependencies and 10k lines of "compiled" JSX into one giant 5MB uncachable blob that needs to get parsed in its entirety, then shoot some requests off then need to complete, be parsed, then compile some HTML and CSS that then need to again get parsed by the browser and then finally, the the browser's rendering engine gets to start laying things out and drawing the first pixels - that is the problem.

The browser is a document viewer, not an application runtime, so you should be giving it documents, not programs. Yes, JavaScript is necessary, but just like you shouldn't be using HTML for layout and CSS for content and graphics (although CSS-only art is impressive!), you shouldn't be using JavaScript for content and styling.


dominteractive isn't a good metric for this. Many sites are perceivably slow because of rendering work they're doing after dominteractive. They should be looking at a more user-centric metrics like Largest Contentful Paint: https://web.dev/lcp/


And... surprise! It's analytics, tracking, and advertisements!


The irony that Google has initiatives like AMP to speed up page loading-- supposedly. Check out the Texas anti-trust filing for other possible reasons for AMP..


Could you please give a TLDR for those who don't have the time to read the anti-trust filing? Thanks!


And bloat. Few years ago a news website was 95% side content, the article in question was basically a tweet long sentence. People want to attract and fill the page with more stuff than "necessary".


Even with bloat you can trim 2-3 seconds off load times by removing trackers. Trackers that do dynamic swaps like CallRail along with LiveChat are the worst offenders.


Improve the web and the world: remove all garbage third party javascript from your site.


I did just that and it's surprisingly heart warming to know my site could run on a calculator if need be.

No JS, no tracking, no nonsense. 100% content.


Its more the CopyPasta of god knows how many JavaScript frameworks.


It's really not. I ran the website for a large news org. Even without ads third party JS was loading twice as many bytes as our React and first party code. The YouTube embed, alone, was larger than our entire first party JS.

What are you going to do as an engineer? Tell sales that expensive marketing solution they bought isn't going onto the site because it loads too much JS?


There is that as well - you must have much better developers than most sites.

And some times you have to let things fail and then explain why it di and then do it right


NoT really blocking JavaScript / CSS, not preloading and stupidly large images.

BTW I do this for a living for some big brands just dropped FCP from 2.6 to 2.4 and CLS .478 to 0.385 with some tweaks to the preload.


That's the most common stuff but it doesn't seem to be the slowest.


Off topic: I haven’t visited one million websites (yet), but I can already tell you that your sticky header is annoying.


Sticky Ducky is a nice plugin for that.


It's in the domain name: catchjs. The reason that the web is slow now is javascript. That is, treating a webpage like an application instead of a document. This analysis just assumes all websites are JS-application monsters and only differentiates between them. It misses the point.

What makes it even slower is the practice of serially chain loading JS from multiple domains. Once you're 3 deep in that it doesn't matter how 'fast' your webserver is or how fast and clean your JS code is, the serial loading will slow things to a crawl.


I don't know what kind of web you are using. But The sites I'm visiting on a daily basis are, in fact, applications and not documents.

Slack, Asana, GitHub, Gravit Designer, Google Docs/Sheets/Mail...


Of those I only go to github. And for now it still works without javascript execution.

Sites that want or have to serve everyone do not use the web as an application because they know it doesn't work. Gov.uk and amazon.com for example went out of their way to work for all people of the world. And Gov.uk has found approximately 1 in 96 actual people do not have JS turned on[1].

For fancy businesses with other businesses as their end users you can get away with not supporting everyone. It doesn't effect their income so it doesn't matter. But the reality is that JS-application sites fail 1% of users and for actually serving everyone that 1% matters.

[1] https://gds.blog.gov.uk/2013/10/21/how-many-people-are-missi...


And Gov.uk has found approximately 1 in 96 actual people do not have JS turned on[1].

It's challenging to convince a PM that people who block stats don't show up in their stats, but do still have money to spend. Often, a simple try/catch in the onclick handler that fires analytics events, and a quick happy path test with ublock on, is all it would take to fix a site. Well worth the 1% of extra revenue for a few minutes effort.


I suspect that 1 in 96 figure is mostly people who have uMatrix or NoScript installed, not people who have disabled JavaScript entirely. A little judicious blocking can greatly improve browsing experience at the cost of having to set up rules for sites when you first visit them sometimes. It can be a real hassle when someone has embedded a video you want to watch and there are like 8 layers of embedded scripts from different domains necessary before you get to the actual video content.


The math is $500 expense today for the test code. $0.01 each time the test runs and +$ for each client acquired. At $20/mo MRR it's 3 clients/year. Easily pays for itself.


gmail works without JS too . I use it routinely with w3m


The sad reality is that many of these actual applications load faster than pages that could really just be pages but still load vast amounts of JS...


Static website generators in contexts where they make sense (heaps of the web) are still criminally underused. :-(


Yes.

Probably because people who create "real apps" and not just "docs on steroids" know what they're in for right from the start.

If you start with a doc, you can get the feeling that there is much more headroom than there actually is.


Everyone lives in its own bubble. Among websites that I routinely visit, only Youtube could be counted as web application, but I'd argue that its main function could be simplified to a <video> tag.


GitHub fits the document model pretty well in my opinion.


And the one that doesn't require JS in that list.

Bad example, I give you that.


Gmail also has a perfectly working non-javscript interface. Or you can just use pop3 or imap with a real client and avoid the web crap all together.


I'd argue that gmail without js is an application made to act like a document when it just isn't.


> Or you can just use pop3 or imap

This is my choice, except that periodically I have to go back to the web interface because searching over IMAP barely works at all. Maybe it's not even implemented and I'm only really searching downloaded mail, I'm not sure.


The imap interface will also never deliver any of the mail that gmail mailservers accept but then tag spam. That's pretty much mail from any independent mailserver that's not some megacorp or spamming so much that they can use google's postmaster tools to get approved because they spam so much.


I miss the 90s where HTML pages just contained just super informative text and a bunch of links to other super informative text. Flash was fun for a while with some nice animations and interactivity, then the JS themes took over, they were double edged swords - they enabled us to do things previously we would need separate installable desktop applications for (Email, Powerpoint, etc.) on the browser itself, but it also enabled every newbie hipster to push these tools for everyday pages, the value of the informative plain text was eroded in just under a decade and replaced with subscription popups, with shallow titled articles like "4 ways you can do X..by the way give me your email so I can spam you"

I miss the marquee era, truly.


Absolutely. Last night I had the misfortune of experiencing a comcast outage. Although I live in the city, my neighborhood has horrible cellular coverage. Comcast serves their outage data through their account management portal which happens to be a massive, bloated, slow javascript app. It took minutes to load on my phone to find out if there was an outage.

All of this for something that should be no more complicated than entering a zip code and getting back a small page with information about whether or not there is an outage.


I ran into the same problem last week. Customer service portals are rarely given priority when resources are allocated, but it's pretty astonishing just how slow such a critical service portal loads over mobile tethering, given that is going to be a primary usecase. I'm practically counting down the days until this summer when a new fiber ISP rollout is supposed to reach me.


I am sorry to say but this is an overly simplistic answer. We've had Javascript since 1995. We've been building great applications with it as early as the 2000s, Yahoo being on the forefront. We had complex charting applications and dashboards in an era where JavaScript was still interpreted by the latest cutting edge browser: Internet Explorer 6. This was also a time of much slower internet speeds across the board.

Bad software development slows the web down. The web, for what it is and does, is an incredibly fast network. The problem is people no longer care about performance, nor do you have the engineering talent to write performant web applications. Yes some of this can be boiled down to companies not prioritizing performance, and product managers pushing for more features and tracking, but we've always had these constraints, and tight deadlines, and seemed to deliver just fine. It was part of the expectations.

Our craft is degrading, I hate to say. We've allowed the vocal minority who were shouting "gatekeepers!" to water down the discipline of software development to the point where yes, you install a single npm dependency so you can reverse your array.


Bad software development slows the web down

This is an empty statement and another overly simplistic answer. Of course bad software development slows the web down, because you're defining slow web sites as being badly developed.


Touché -- I am attempting to make a distinction between the general statement of "javascript slows the web down" to a certain type of Javascript slows the web down, namely lazy and careless Javascript slows the web down. We reach for the dependency first instead of carefully analyzing our requirements and making an engineering decision. I've seen it countless times, fallen victim to it a few times myself. It's really easy to do. No one stops to ask, "perhaps we do not need a hard dependency, and need to include that 15kb library, but rather just need a few functions from it?" or "perhaps a simple algorithm would indeed suffice." We mindlessly reach for complex regex engines to solve simple parsing problems, try to garner "functional programming cred" by reaching for a series of collection functions from our favorite 50kb utility package when a simple c-style for loop would have not only sufficed, but indeed got the job done much faster.

All this results in bloated, unoptimized javascript bundles. To top it off, we under cache, and aggressively cache bust, and ship unminified bundles with oodles of source maps because of "developer experience" I can't tell you how many times I see, to this day, a fortune 500 company bundling an entire unminified javascript framework. Wirth's law indeed.

Hope this was a little less empty. These are my observations having done this for a few decades now and managing developers these days.


I agree with all you are saying, but I wonder how much this actually comes down to software developers. I suspect (but have absolutely no data!) that for many sites, the pressure to add extra stats, tracking, advertising, monitoring, etc etc, comes from 'above' and it's not the developers' choice to add these mountains of javascript libraries to their sites. Perhaps they should push back, but what leverage do the developers have?


pretty much like enterprise software. The similar situation allows for an obvious for many conclusion - there is no money in performance, or more exactly the opportunity cost of performance is higher than benefit.


> because you're defining slow web sites as being badly developed

whats wrong with that?

living throu the computers/internet/web for 25 years i can attest that there is truth in wirth's law, if only that folks are lazy.


Op is pointing out the argument is tautological and doesn’t add anything insightful to the discussion. All tautological statements are true.

You could similarly write:

“Lazy software development is why websites are slow because slow websites are caused by lazy development.”

“Greedy business practices are why websites are slow because advertising slows down the web and advertising is the result of greedy business practices”.

These are feel good statements that don’t actually provide any insight on the market forces at play nor provide any path to improving things.


It's a misreading of the point I was making. I was refuting the claim "javascript slows the web down" by qualifying it with "bad javascript slows the web down"


Right, but then you conclude that it is just incompetent engineering. That may be true in some cases, but in a lot of cases it's because it's not worth it to the business in terms of time or resources to do good engineering. Great engineering is not just about having great engineers, but a huge budget because it's very, very expensive.


Whatever the causes may be, the end result is poor engineering. I didn't say incompetent, but rather lazy and lower quality. Reading further into my point, I mention that often time this carelessness comes as a result of businesses pressing without giving attention to proper resources. But nonetheless, the end result of all of a businesses choices: hiring decisions, budget decisions, scheduling decisions, feature prioritizations, all the same result in the root cause of poor engineering.


> "Whatever the causes may be, the end result is poor engineering. I didn't say incompetent, but rather lazy and lower quality."

The engineering might be perfectly fine if you take into account the deadlines, budgets and requirements from the clients/managers.

If you pay someone to build a house on a short deadline with a small budget, then you'll obviously get a crappy house. But the skill that went into building that house in such a short time, and on such a low budget, might be extremely high.

Good engineering doesn't mean a perfect product. It just means you managed to deliver the best possible one out of the situation you were in. It's up to the owner of the product to decide if it's good enough.


No true scotsman would slow the Web down


argumentum ad logicam. And around and around we go.


Nope. From https://en.wikipedia.org/wiki/Argument_from_fallacy

> Argument from fallacy is the formal fallacy of analyzing an argument and inferring that, since it contains a fallacy, its conclusion must be false.

Nobody here is claiming that the parent's statement is false. We're saying it's true! However, we're also saying it's useless (i.e. "vacuously true" or "tautologically true"). The parent is saying that many Web sites contain JS; the slow ones are due to "bad JS" (and implicitly, that the non-slow ones have non-bad JS).

How can we tell whether some JS is "bad"? By checking if it's slow. Let's say we do that, and find that the site is slow, and hence our JS is "bad". How can we improve it, to get non-bad JS? No idea; maybe getting a bunch of developers to make random refactorings, measuring the speed of each, using that to tell whether the JS is still "bad", and stopping once it's no longer slow and hence no longer "bad"? In which case, the whole notion of "bad JS" is irrelevant; it's conceptually indistinguishable from "slow page", and hence adds nothing to the discussion (i.e. if we followed the above steps, we would get exactly the same result if we ignore all the stuff about "bad JS" and just use the timing measurements directly).

Note that the concepts of "slow page" and "page with JS" are not vacuous, since we can measure them (with a clock, and by parsing the page source, respectively) and hence make falsifiable predictions about correlations and causal relationships between them.


i see. thx for explaining and sry for the noise


I'll try and offer the more controversial point.

Good software development can slow the web down. People are often optimizing for code reusability, code quality, ease of debugging and other advantages of some heavyweight frameworks rather than raw load time. Often this is the right call as engineering is a cost centre. Where it goes wrong is when things get too slow and too bloated to the point you're actually having people abandon the site before load but most of the major frameworks used properly don't offer that extreme a tradeoff. I also have no idea what % of these sites have reached that level.


> The problem is people no longer care about performance, nor do you have the engineering talent to write performant web applications.

I don't think that's the whole story. I don't have hard numbers but Electron-based applications seem to reliably be far more bloated than applications using ordinary GUI frameworks.

I agree that deprioritising performance is part of the problem, but as far as I can tell the web is a very poorly performing GUI framework, when it's used as such.


> the engineering talent to write performant web applications

Which, for the most part, is celebrated. Nearly nobody actually understands what they're doing anymore - I challenge anybody to point to the Javascript in their web app and explain what the purpose of each file is. Just the purpose, just the top level files. I doubt the majority of web devs can (if they could, they'd realize they don't need at least half of them).


If everybody would learn to voice-control their smart phones and receive 3x speed audio feedback, battery life would shoot through the roof, since illuminating the screen is so costly, and you'd never need to do it.

Blind people use smartphones this way, so it's not as though the device can't support it.

And it would encourage all websites to be accessible, content-only, no-image sites, and they'd be no-javascript as well. Problem solved.


How is this going to solve the chief bandwidth problem on the internet of watching funny cat videos and perusing dank memes?


Is there a good solution to rendering something like a react app to a static page? I feel like there are a lot of pages that don't really need client-side html rendering, but they have it because react is a good solution for modular web content.


react-static (https://github.com/react-static/react-static) is both good and enough. You don't need Gatsby/Next or anything else.


Probably not what you're after, but load the page in Chrome, open the Inspector, and copy the entire HTML out. (Which will be a different HTML than the empty stub that "View Source" gives you.)

You could automate it with Selenium (headless Chrome). I think Googlebot does something similar?


There's the likes of Gatsby [0], which is generally well supported and pairs well with Netlify and a headless CMS such as contentful.

0: https://www.gatsbyjs.com/


You can look into the Gatsby and Next JS frameworks for this, among others


As far as I know, all major frontend frameworks can render to a static HTML document.


How would you do that? I guess through webpack?



This outputs static HTML that gets hydrated on the client (as opposed to server rendering the HTML and then hydrating it on the client), which I don’t think what was being asked.

I think what the OP was asking was more along the lines of partial hydration (where only parts of the DOM are hydrated by React/other framework) or no hydration (no JavaScript is loaded at all).

11ty does the latter: https://www.11ty.dev

The React team are working on partial hydration and announced it in December. Vercel did a write up on it here: https://vercel.com/blog/everything-about-react-server-compon...


> It's in the domain name: catchjs. The reason that the web is slow now is javascript. That is, treating a webpage like an application instead of a document. This analysis just assumes all websites are JS-application monsters and only differentiates between them. It misses the point.

A large part of the web are static documents and they should be developed as such.

But I 100% disagree that this is how every website should be. We're given these amazing technologies, the internet, computers, libraries, tools and creativity and now we should just stick with sending and looking at formatted text?

I don't think so!

https://ciechanow.ski/gears/

https://www.maria.cloud/

https://nextjournal.com/

> What makes it even slower is the practice of serially chain loading JS from multiple domains. Once you're 3 deep in that it doesn't matter how 'fast' your webserver is or how fast and clean your JS code is, the serial loading will slow things to a crawl.

Bundling, code splitting, tree shaking and asynchronous loading, pre-optimizations help here too.


> It's in the domain name: catchjs

I'm sceptical of this statement. I think catching application errors is a necessity to ensure your application runs smoothly and doesn't mess with customer/user experience. We're already tracking all the errors in the backend, and we should also be able to do the same with frontend. At the end of the day, we're all humans, we will introduce bugs. Even if you can guarantee 100% no one will deliberately introduce bugs, there's still the accidental introduction, or the third party library issues.

The issue comes with tracking users. The amount of code needed to track user interactions is insane, because of the number of possible interactions. Tracking application code generally does not need a lot of code.

Sadly, the user trackers have messed it up for application error trackers too. We're blocking everything regardless of context, and this doesn't allow websites to fix their application errors.


Every site is a JS-application monster because of the 3rd party add-ons people feel they need to have.

Running Google or FaceBook ads? You need analytics, pixels and event trackers to know if you're ads are working optimally.

ReCaptcha v3 is a good way to slow down your site as are hero banner sliders (5% of sites were running GreenSock + Waypoint), also some flavour of interactive chat. Some 3rd party plugins simply don't work with async/defer or still use DOM write().


> That is, treating a webpage like an application instead of a document.

That’s not why. It’s ad networks and tracking scripts. Notice that single page application frameworks show up exactly zero times on the list of worst offenders.


James Mickens on Javascript:

https://vimeo.com/111122950

Here's a stupid, simple Vimeo downloader instead of a using the massive, slow starting youtube-dl

  #!/bin/sh
  # usage: curl https://vimeo.com/123456789 | $0 
  x=$(curl -s `grep -m1 -o https://player.vimeo.com/video/[^\"?]*|sed 's>$>/config>'`|grep -o https://[^\"]*mp4|sed -n \$p)
  y=${x%/*};y=${y##*/};exec curl -so $y.mp4 $x 

More from Mickens:

https://www.usenix.org/legacy/events/webapps10/tech/full_pap...

https://www.usenix.org/system/files/1403_02-08_mickens.pdf

Unlike Mickens, I cannot save the world, and I am not telling anyone else what to do or not to do, but I made the web fast for myself. Hence I am very skeptical of claims that "the web is slow". Web servers, the network and computers are plenty fast and still getting faster. I do not define "the web" as certain popular browsers, CSS, Javascript, etc. or whatever web developer tell me it is. Those are someone else's follies. I define it as hyperlinks (thus, a "web") and backwards compatible HTML. Stuff that is reliable and always works. To "make the web fast", I follow some simple rules. I only load resources from one domain, I forgo graphics, and I do not use big, complex, graphical, "modern" web browsers to make HTTP requests.

I do not even use wget or curl (only in examples on HN). I generate the HTTP myself using software I wrote in C for that purpose and send using TCP clients others have written over the years. There are so many of them. With a "modern" SSL-enabled forward proxy or stunnel, they all work with today's websites. "Small programs that do one thing well", as the meme goes.

Obviously, I still need the ever-changing, privacy-leaking, security risk-creating, eye-straining, power-consuming, time-wasting, bloated, omnibus browsers for any sort of serious transaction done with web, e.g., commerce, financial, etc. However that is a small fraction of web use in my case.

For me, using the web primarily comprises searching, reading and downloading. I never need Javascript for those tasks. I can do those them faster without the popular broswers than I can with them. The less interaction the better. I use automation where I can because IMO that is what computers were made for. "The right tool for the job", as the meme goes.

To think how much time and energy (kwH) has been devoted to trying to make Javascript faster as a way to make websites faster is, well, I won't think about it. Those working in the "software industry" and now "tech" are highly adept at creating the problems they are trying solve. Unfortunately today as we try to rely on software and the web for important things, we all have to suffer through that process with them.

By not using the popular browsers for a majority of web use, I have minimised the suffering of one user: me. The web is fast.


Forgot about the most important way I make the web fast, rule #1. Eliminate DNS lookups. I gather the DNS data in bulk before I start reading and add it to custom zones file that are served from localhost authoritative servers. For example, I gather all the DNS data for all sites posted to HN before I start reading. One way to do this, if encryption is desired, is to use DoH+HTTP/1.1 pipelining. I wrote some custom programs to speed up the Base64URL encoding.

This way, there is no DNS traffic leaving the network. The only lookups preceding a TCP connection to a website are to the loopback.

That is the number one way I "make the web fast". If retrieving a web page were a chemical reaction, DNS is the "slow step". In most cases, I eliminate it.


The title of this blog post refers to "the web" but it mainly discusses "rendering". IMHO, those are two different things. The later is concerned with graphical Javascript- and CSS- enabled browsers. The former is concerned with web servers.


Agreed with this. I dabbled in React for a while, but then I realized how flawed the concept is (it's like downloading a full .dll every time you run a program).

I always ask myself if something can be static now, and if so, I make it static.

I recently worked on a side project to transition [0] to a fully static web. Needless to say, the speed went up.

[0]- https://beachboyslegacy.com/


Off-topic but I'd never heard of CatchJS before. As the founder of TrackJS[1] I can't help but feel they were heavily inspired by our product... almost too inspired considering their logo and marketing copy.

(We've been around since 2014, so we pre-date them by 4 years)

[1] https://trackjs.com


You both have an incredibly generic logo/header/copy template so I'm not really sure what you're trying to imply. Your site copies the format of Stripe and they were around in 2010, pre-dating you by 4 years.


Wait a company logo about javascript that uses {}? Maybe thrown into a single color circle? Mild Shock.

Edit: I will give you that their web page header and pricing looks very similar.


Lars from CatchJS here.

Anyone can go on archive.org and verify that they changed their color scheme and header to ones very similar to CatchJS in Feb 2020, after CatchJS had used that look for 2 years. Thank you for also pointing out the pricing page, where the same is true.

I've now received a cease and desist from their attorney stating that the looks are too similar. It's nice that they've paid for an attorney to make my point for me, but they seem to have mixed up who did what here.


Is it “ads”?


It is!


a million? is that a lot?


...and we discovered it's JavaScript. Most of it being just tracking code.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: