Hacker News new | past | comments | ask | show | jobs | submit login
Page Weight Matters (2012) (chriszacharias.com)
556 points by shubhamjain on Oct 15, 2015 | hide | past | favorite | 165 comments



When I joined Google in 2009, we were on the tail-end of a latency optimization kick that Larry had started in 2007. At the time, we had a budget of 20K gzipped for the entire search results page. I remember working on the visual redesign of 2010, where we had increased the page weight from 16K to 19K and there was much handwringing at the higher levels about how we were going to blow our entire latency budget on one change.

We did some crazy stuff to squeeze everything in. We would literally count bytes on every change - one engineer wrote a tool that would run against your changelist demo server and output the difference in gzipped size of it. We used 'for(var i=0,e;e=arr[i++];) { ... }' as our default foreach loop because it was one character shorter than explicitly incrementing the loop counter. All HTML tags that could be left unterminated were, and all attributes that could be unquoted were. CSS classnames were manually named with 1-3 character abbreviations, with a dictionary elsewhere, to save on bytesize. I ran an experiment to see if we could use JQuery on the SRP (everything was done in raw vanilla JS), and the results were that it doubled the byte size and latency of the SRP, so that was a complete non-starter. At one point I had to do a CSS transition on an element that didn't exist in the HTML, because it was too heavy and so we had to pull it over via AJAX, so I had to do all sorts of crazy contortions to predict the height and position of revealed elements before the code for them actually existed on the client.

A lot of these convolutions should've been done by compiler, and indeed, a lot were moved to one when we got an HTML-aware templating language. But it gave me a real appreciation for how to write tight, efficient code under constraints - real engineering, not just slapping libraries together.

Alas, when I left the SRP was about 350K, which is atrocious. It looks like it's since been whittled down under 100K, but I still sometimes yearn for the era when Google loaded instantaneously.


Remember, http://google.com/custom still loads instantly ;)


Funny that they didn't even update the Google logo on that page.


Wow. What is this and why does it exist?


It is the old rendering path for Google Custom Search Engines:

https://developers.google.com/custom-search/docs/overview?hl...

Way back when Google first got VC, before they invented AdWords and got profitable, they explored a lot of custom business partner opportunities. So there were a number of specialized Google search endpoints like /linux, /bsd, /unclesam (search over Federal government websites). These were maintained until about 2013 - they go through a different rendering path than the one that serves mainstream Google Search. /linux, /bsd, /unclesam, and a few others were decomissioned in 2013, soon before I left Google, but it looks like /custom got a reprieve for whatever reason. It's actually been deprecated by the Custom Search Engine functionality linked above, but nobody's gotten around to removing it.

Incidentally, you can also get a similar page (slightly different, its last update was around 2007 and it shows no ads) by setting your user agent to "Mozilla 3.0".


But, the advantage of /custom: It uses less personalized results, and it returns exactly 20KiB of data – which is why it still loads so fast.

There was a day when I just wrote a small script to open all sites mentioned in Google’s robots.txt (which did not return 404) in a separate tab. That’s how I found it.

As far as I know, some partners actually still embed /custom (I’ve only seen it at a newspaper a few months ago), so removing it might be problematic.


    > As far as I know, some partners actually still embed
    > /custom (I’ve only seen it at a newspaper a few months
    > ago), so removing it might be problematic.
An ISP local to here does. [0]

http://centurylink.net/search/index.php?context=search&tab=W...

I see it every so often if a domain doesn't get resolved - they'll hijack the request and show you their own page. I was pretty surprised that that still happens.


I love you for using binary prefixes. <3


Ahh, heirloom. (That's the name of the system that renders Google for ancient user-agents). Takes me back. It actually got a small set of updates in 2010 when we wrote Heirloom+ and it appears that someone has collapsed the two systems into one since I left in 2011. I'm strangely gratified to see that no one has come up with a better layout solution than putting the results in a table.


Wow, THANK YOU!

It blows my mind this is not advertised and not the default.


I'm curious, why would this be the default? For most users (the people not on HN) personalized search has increased the relevancy of results vs this.


- It’s 20k vs 190k for the same results page (loads faster, especially useful when on throttled 3G),

- It’s non-personalized – meaning you get less filter-bubble effect, which is useful when researching topics where you want to see opinions that oppose your own

- It has less tracking applied.


Yeah and no TLS handshakes need to be performed either.


Why don't they provide an HTTPS version of this? It's a bit of a shame.


I'm surprised to learn that HSTS isn't set on google.com.


As does http://www.google.co.uk/custom

Thanks very much for the tip off.


For me, only the first page load is faster but the actual search is slower compared to the full version with AJAX for the instant search (around 400 ms for /custom vs. 80 ms ajax response).


> CSS classnames were manually named with 1-3 character abbreviations, with a dictionary elsewhere, to save on bytesize.

Not to take away from your effort, but that sounds like something a minifier should be able to do, right?

Edit: Yep, you said it below:

> A lot of these convolutions should've been done by compiler, and indeed, a lot were moved to one when we got an HTML-aware templating language.


Yeah. The issue was that there were two other constraints that had to be satisfied at the same time: CPU time to render the page couldn't go up appreciably (or we'd run out of machines), and there was a lot of legacy HTML and JS to update. Google actually had a CSS compiler at the time (since open-sourced as Closure Stylesheets), but it required that you run every JS reference of a classname through goog.getCssName (which was prohibitive because we couldn't use Closure in the JS for bytesize reasons), and that you post-process the HTML (which was prohibitive both for CPU reasons and because Closure Compiler is written in Java while the HTML for Google Search, at the time, was generated in C++).


It's somewhat tangential, but I'm curious what you mean by "HTML-aware templating language".


I'm not op, but I think this would be like xhp or jsx. Really it's any templating language that works closer to the DOM level and can perfom optimizations like understanding which close tags don't need to be output



It wasn't Closure Templates - those are what's used on the Apps side of Google (GMail, Google+, Docs, etc.). The technology in question lies in a grey area as far as confidentiality goes...it's been mentioned once in a recruiting presentation and there are references to it in other Google open-source projects...but I figure I would be safer by not naming it.


[flagged]


The reason we make web pages is so they can be accessed, not so they fit some dubious image of technical purity. What they did seems fine to me.


Web standards are part of the reason why your web page can be accessed.


The HTML5 spec is very explicit about which tags are auto-closed, and it is neither a parse nor validation error if those tags are omitted.


I wish HN prevented new accounts from commenting for one day unless verified by a mod.


I think the community moderation has done its job here, and the negatives from preventing new accounts from commenting wouldn't be worth the advantage.


If you have an engineering mind and care about such things - you care about complexity. Even if you don't - user experience matters to everyone.

Have you ever seen something completely insane and everyone around doesn't seem to recognize how awful it really is. That is the web of today. 60-80 requests? 1MB+ single pages?

Your functionality, I don't care if its Facebook - does not need that much. It is not necessary. When broadband came on the scene, everyone started to ignore it, just like GBs of memory made people forget about conservation.

The fact that there isn't a daily drumbeat about how bloated, how needlessly complex, how ridicuous most of the world's web appliactions of today really are - baffles me.


Honestly, I think "a daily drumbeat about how bloated, how needlessly complex, how ridiculous most of the world's web applications really are" pretty much describes every HN conversation on any article with even a remote connection to web technologies.


Thankfully web developers are sharing the goodness and distributing 100MB+ executables using NodeJS & CEF so desktop applications can be insanely large too :^)


What would be super awesome would be a daily drumbeat about how to slim down and simplify applications, with working, open-sourced code.

Here, I'll beat a drum a little. Maybe it will inspire somebody.

I just wrote this tiny text-rendering engine, mostly yesterday at lunch. On one core of my laptop, it seems able to render 60 megabytes per second of text into pixels in a small proportional pixel font, with greedy-algorithm word wrap. That means it should be able to render all the comments on this Hacker News comment page in 500 microseconds. (I haven't yet written box-model layout for it yet, but I think that will take less time to run, just because there are so many fewer layout boxes than there are pixel slices of glyphs.) http://canonical.org/~kragen/sw/dev3/propfont.c

The executable, including the font, is a bit under 6 kilobytes, or 3 kilobytes gzipped.


Impressive!


Thank you, but I don't think it's impressive! It's still nearly an order of magnitude slower than memcpy(). But if we want simpler systems, I think doing experiments like this is a big part of how to get there.


Jeez, you aren't kidding. It's as though, if we bitch enough about the technical deficiencies, the economic and social circumstances that cause them will, I dunno, vanish into thin air I guess. This topic is pretty much just red meat here


Well, bitching isn't that great. But as a web developer myself, it has affected the way I work. Like, there was an article just the other day about how to achieve "lightning fast page loads". I shared it around the office and as a result, 4 people have added new steps to their workflow to help reduce page load latency on our client sites.

The author knew the article would get good reception here, because of how much people complain about page latency, and I knew that page latency is a thing that's important to more people than just me, for the same reason. (My clients mostly don't seem to care, for some reason, but I bet their customers do).

So yeah, having these public conversations can make a difference. We just have to stay positive and constructive.


This is a great point. As much as it can be a bit annoying to see the same conversations over and over again, and as much as we make fun of supposed groupthink, it's all a long slow process of consensus building and idea dissemination.

Thanks for reminding me of that; I was previously sitting here thinking, "oh geez, here we go again!".


The fact that there isn't a daily drumbeat about how bloated, how needlessly complex, how ridicuous most of the world's web appliactions of today really are - baffles me

Well there is, or at least there used to be. The reason native apps in mobile became so popular is exactly because web sites are bloated. It might not matter in the PC but in a device running on batteries it matters a lot.

The real problem of web development is JavaScript. It’s a relic of the past that hasn’t caught up with times and we end up with half-assed hacks that don’t address the real problem. We need something faster and way more elegant.


JavaScript isn't the problem, but it seems to always take the blame. And its not even slow.

Here are a couple of real culprits:

* Advertising / analytics / social sharing companies. They deliver a boatload of code that does very little for the end-user.

* REST and HTTP 1 in combination. A page needs many different types of data. REST makes us send multiple requests for different kinds of data, often resulting with those 60-80 requests mentioned above. We could be sending a single request for some of them (e.g. GraphQL) or we could get HTTP 2 multiplexing and server push which completely fix that problem.

* JSON. Its simple but woefully inadequate. Has no built in mechanisms for encoding multiple references to the same object or for cyclic references. Want to download all the comments with the user info for every user? You have a choice between repeating the user info per comment, requesting the comments first then the users by id (two requests) or using a serialisation mechanism that supports object references (isn't pure JSON)

* The DOM. Its slow and full of old cruft that takes up memory and increases execution time.


> Want to download all the comments with the user info for every user? You have a choice between repeating the user info per comment, requesting the comments first then the users by id (two requests) or using a serialisation mechanism that supports object references (isn't pure JSON)

Or you send it like

    {
      "users": [
        {
          "id": "de305d54-75b4-431b-adb2-eb6b9e546014",
          "name": "Max Mustermann",
          "image": "https://news.ycombinator.com/y18.gif"
        },
        ...
      ],
      "comments": [
        {
          "user": "de305d54-75b4-431b-adb2-eb6b9e546014",
          "time": "2015-10-15T18:51:04.782Z",
          "text": "Or you can do this"
        },
        ...
      ]
    }
Which is a standard way to do it that works here, too. Because you can have references in JSON, you just have to do them yourself.


Alternatively, I would think that gzip does a good job of factoring out repeatedly embedded user objects.


Yes I agree that gzip will probably fix this single-handedly. The main cost here is very likely network transfer, so gzip will do great.


Don't forget it takes time to parse large json blobs.


And to serialize it from whatever the in-memory representation is.


yes, obviously – gzip even does that explicitly, replacing repeated text with references.

But it still takes more power for your server to go through gzip every time. And it will take more RAM on the client to store those objects.


Something similar has been standardized as well with the JSON API specification (although it adds its own weight to the message as well, it does address this problem): http://jsonapi.org/format/#document-compound-documents


Its a great solution! I'd use dictionaries for faster lookups though.

But not exactly pure JSON. Client-side, you need to attach methods (or getters) that fetch the user object to the comment. I suppose you could just attach get(reference) that takes this[reference + '_id'] and looks it up inside the `result[reference]`. m:n relations will be harder though.

Otherwise you can't e.g. simply pass each comment to e.g.a react "Comment" component that renders it. You would also have to pass the user list (dictionary) to it.


Well, you could process the JSON on client side.

    response.comments.forEach(comment =>  comment.author = response.user[comment.author]);
Preferably, even, you could do it during rendering so it can be garbage collected afterwards.


Well, that counts as further deserialisation in my book. At least if you set up a convention to automate it for all kinds of objects. Otherwise you'd have to do it manually for every type of request


> * The DOM. Its slow and full of old cruft that takes up memory and increases execution time.

I'm not going to say that the DOM is wonderful but … have you actually measured this to be a significant problem? Almost every time I see claims that the DOM is slow a benchmark shows that the actual problem is a framework which was marketed as having so much magic performance pixie dust that the developer never actually profiled it.


Good point. Lets see:

http://jsfiddle.net/55Le4ws0/

vs

http://jsfiddle.net/nem6tnv1/

Initialising about 300K invisible DOM nodes containing 3 elements, each with 3-character strings is ~15 times slower than initialising an array of 300K sub-arrays containing 3 elements, each being a 3-character string.

Additionally, paging through the created nodes 2K at a time by updating their style is just as slow as recreating them (the console.time results say something different, but the repaint times are pretty much the same and this is noticeable on slower computers or mobile.)

Thats a single benchmark that raises a few questions at best. But I think React put the final nail in the coffin: their VDOM is implemented entirely in JavaScript, and yet its still often faster to throw away, recreate, diff entire VDOM trees and apply a minimal set of changes, rather than do those modifications directly on the DOM nodes...


It's not useful to compare the DOM to a simple array which doesn't do most of the real work which you need to do. Comparing rendering that text using canvas or WebGL would be closer if it also had a layout engine, dynamic sizing, full Unicode support, etc.

Similarly, React is significantly slower – usually integer multiples – than using the DOM. The only times where it's faster are cases where the non-React code is inefficiently updating a huge structure using something like innerHTML (which is much slower than using the DOM) and React's diff algorithm is instead updating elements directly. If you're using just the DOM (get*, appending/updating text nodes instead of using innerHTML, etc.) there's no way for React to be faster because it has to do that work as well and DOM + scripting overhead is always going to be slower than DOM (Amdahl's law).

The reason to use React is because in many cases the overhead is too low to matter and it helps you write that code much faster, avoid things like write/read layout thrashing, and do so in a way which is easier to maintain.


Most of the elements are hidden via display:none - there is no layout work to be done for them whatsoever. The rest of the overhead seems to lie entirely in allocating a lot of extra data per node.

Also, total time to repaint seems to be equally fast whether we're recreating the entire 2K rows from scratch or changing the display property of 4K rows. That seems concerning to me.

But yes, a more fair comparsion would be to write a WebGL or Canvas based layout engine in JS. If its less than 4 times slower (its JS after all), then the DOM is bloated.


Also, remember that my position is that most applications are limited by other factors before you hit the DOM.

I certainly would expect that you could beat the DOM by specializing – e.g. a high-speed table renderer could make aggressive optimizations about how cells are rendered, like table-layout:fixed but more so – but the more general-purpose it becomes the more likely you'd hit similar challenges with flexibility under-cutting performance.

The most interesting direction to me are attempts to radically rethink the DOM implementation – the performance characteristics of something like https://github.com/glennw/webrender should be different in very interesting ways.


> * Advertising / analytics companies. They deliver a boatload of code that does very little for the end-user.

That's not quite fair. They subsidize the content for the end-user. Perhaps that's a crappy status quo, but in many cases without the advertising and analytics the content wouldn't exist in the first place.


I'm increasingly of the mind that:

1. Advertising is the problem.

It's creating technical problems. It's creating UI/UX problems. It's creating gobs of crap content. It's creating massive privacy intrusions and security risks. And for what? Buzzfeed?

2. Ultimately, the problem is the business model for compensating informational goods. Absent some alternative mechanism (broadband tax, federal income tax applied to creative works), I don't see this changing much.

https://www.reddit.com/r/dredmorbius/search?q=broadband+tax&...

3. Micropayments aren't the solution.

http://szabo.best.vwh.net/micropayments.html


Is it not an option to just stop visiting these sites?


Think of the children. I mean authors.

I'd like a system under which creators of quality content would be equitably and fairly compensated. The present system fails this.


What if they already are? I can think of a handful of newsletters that command a three-figure annual subscription price. The people who buy them must regard them as useful and of sufficient quality. It could be that the rabble of news sites drowning in ads are mostly schlock, and that their business model is one of attempting to monetize some of your least-attentive moments.


Perhaps. But maybe, if you add googleads, amazon adsystem, moatads, rubicon project, taboola, scorecardresearch, krdx etc [1], then you start wondering why are users using adblockers, so you add pagefair, and at that point you want to find out what works better so you add optimizely... maybe, just maybe, at that point, you're actually losing money because your page is so damn slow, rather than getting more because of your efforts to perfectly monetise it.

[1]: Just copied them from coldpie's news site screenshot comment. That was about 1/2 of them, there were many more.


I remember when I was young, Television was sponsored by Advertisers: One of my favourite shows, was made possible because of Advertisers. To bemoan Advertisers in my mind is to say that part of my childhood should never have existed, so I think greater, more focused criticism is required.

Do you ever wonder why the landscape became the way it did?


If a TV show were five minutes of content, and 55 minutes of advertising, I would stop watching it. And yet, that's the approximate breakdown for many websites between content and advertising. And they wonder why readers complain.


They know why readers complain. They wish you understood their perspective better.

A significant problem is that content costs a certain amount of money to be produced, and web content is unable to command those prices.

Ad fraud is a big part of it, and some of the companies in the best position to solve it (like Google) are benefitting so handsomely from ad fraud that I can't imagine them stopping it.

Ad blockers will hurt legitimate content producers, but because people defrauding advertisers don't use ad blockers, they'll continue to make money.


I'm sorry, but watching an ad isn't just a payment like you do with money. It's a rape of your mind. It needs your personal data, to rape your brain deeper. So i block ads, not to see content, but because ads don't deserve to exist.


A rape of your mind?

Wow. Well, I'm glad you have your feet anchored down here in reality and aren't exaggerating at all.


> It's a rape of your mind.

This is perhaps the most ridiculous comment I have read this afternoon.


Read more about ads, their perception by our brain, how they work and how they are designed to bypass our consciousness. Next, understand the concept of metaphor. And then, come back... or not.


There's actually a lot of momentum on this from with the IAB (standards body for online advertising). The goal is to separate content from analytics. The result would enable publishers to prioritize loading the content before loading analytics.


Hmm, cool, didn't know they were working on it. That should help.


I'm increasingly of the mind that:

1. Advertising is the problem.

It's creating technical problems. It's creating UI/UX problems. It's creating gobs of crap content. It's creating massive privacy intrusions and security risks. And for what? Buzzfeed?

2. Ultimately, the problem is the business model for compensating informational goods. Absent some alternative mechanism (broadband tax, federal income tax applied to creative works), I don't see this changing much.

https://www.reddit.com/r/dredmorbius/search?q=broadband+tax&...

3. Micropayments aren't the solution.

http://szabo.best.vwh.net/micropayments.html


If it degrades the user experience so much that the page is unusable, it's not serving the intended purpose. Users will block the ads or not bother waiting for the page to load. Some ads/buttons/analytics code is worth it, but most content sites don't seem to think about the tradeoffs at all.


> Well there is, or at least there used to be. The reason native apps in mobile became so popular is exactly because web sites are bloated. It might not matter in the PC but in a device running on batteries it matters a lot.

I think you'd find a much easier case made that its because:

1) discoverability and distribution through app stores,

2) access to native features that took years to be enabled in the mobile web (for example access to photos..)

3) ability to make games (access to graphics, etc -- this also kind of hampers the case being made about "bloated" sites when people seem very willing to download 100MB games)

4) access to ease-of-payment (CC info stored on people's phones that you can in app payment, vs having to type info into a web form).

So we could try to focus on clear issues the web has vs native, or I guess we could design yet another language that is theoretically faster thanks to >insert pet feature here< and hope that fixes the web.


> 3) ability to make games (access to graphics, etc -- this also kind of hampers the case being made about "bloated" sites when people seem very willing to download 100MB games)

There's a huge difference between downloading a 100 MB app once (plus occasional background updates) and waiting to load a 100 MB web game every time it falls out of cache.

Plus the web loading delay happens constantly while you're trying to use it. You wait on an app download exactly one time: when you buy/install it.


Games also often have loading screens. I think the real counter there is that there is a big difference between waiting to play a 3d game vs waiting to read a news article.


There's also a big difference between loading something out of flash storage and loading it off the internet.

My 4G connection right now is speedtesting at 46.68 Mbps download.

An iPhone 6 benchmark I found on MacRumors can read at 6720 Mbps (840 MB/s) [1]

Given the choice, I'd take the option that's 140x faster.

[1] http://forums.macrumors.com/attachments/img_0005-png.514599/


The real problem of web development is JavaScript. It’s a relic of the past that hasn’t caught up with times and we end up with half-assed hacks that don’t address the real problem. We need something faster and way more elegant.

I absolutely disagree. JS is plenty fast for >99% of the things being done with it. The DOM, and manipulations of it, is the slow part.


We AABN tested injecting delay into rendering the MSN home page and found a direct correlation between amount of delay and the abandon rate. People are not tolerant of delay. Here is the team I worked on: http://www.exp-platform.com/Pages/default.aspx


"people"? What people? This OP's experience seems to indicate that there is a lot of patience in "Southeast Asia, South America, Africa, and even remote regions of Siberia".


It sounded more like patience was relative to expected wait time. If you expect something to take 20 minutes, then a 2 minute wait isn't much at all. If you expect something to take 2 seconds, then a 10 second wait is quite irritating.


I'm certain you're right but we need to test that :)

the way I read OP was the abandon rate was ~100% previously in areas with high latency/low bandwidth, and his experiment had a non-zero abandon rate that was shrinking as word spread. Perhaps I need to read it more carefully.


I took this screenshot yesterday. This was a page showing a news article (text). http://i.imgur.com/hmFaW3M.png


Oh, why are you blocking them? They are trying to enhance your experience on that website. /s


Most of those are on my /etc/hosts blocklist. Or would be were I aware of them. Worthless crap.

BTW: /etc/hosts + dnsmasq, for Linux, is amazing. (dnsmasq reads /etc/hosts and will block entire domains if listed as same).


For Firefox users, is this better than NoScript for any reason?


Putting hosts (and domains) under control of /etc/hosts and dnsmasq means that there's little likelihood of traffic reaching them from your browser (though Web hosts could provide back-end data transfers).

It's also possible to directly address hosts by IP, though unlikely (Web protocols such as virtualhosts would fail).

I'm strongly favouring uMatrix for now. It takes some tuning, but you have fine-grained control over CSS, images, scripts, XHR, frames, and other bits, by domain or host.

Aggregators and CDNs confound things a bit (Akamai, Amazon's cloudy thing.)


For Firefox, the real comparisons are Request Policy and ublock (origin) in advanced mode (for chrome, umatrix). These do full third-party host whitelisting per domain. So every time you visit a website on a new domain, by default all requests to third parties are blocked. Then you spend a few minutes working out which ones are required.

By comparison, noscript simply blocks javascript from third parties. It does include a number of anti-xss heuristics though.


uBlock and uMatrix are two separate extensions. Both are available for both Chrome and Firefox.


I can't say whether it's better or worse. It's different. With NoScript you still hit "remote" servers when it comes to other resources, like images.


I think it could make pages load marginally faster, because NoScript only stops scripts from executing, where a hosts blocklist would stop the scripts from ever being downloaded.

I think, not totally sure how NoScript works.


Nah, NoScript blocks them at load-time, too.


This is the problem exactly - it doesn't have anything to do with the language ... too many damn bells and whistles and too many damn analytics / ad libraries. If you're delivering content, you should focus on delivering the content. My computer shouldn't nearly freeze trying to read a recipe, or the weather, or a simple news article, etc ...


The fact that there isn't a daily drumbeat about how bloated, how needlessly complex, how ridicuous most of the world's web appliactions of today really are - baffles me.

I feel Jakob Nielsen used to majorly play this role amongst Web developers back in the day, but for some reason I haven't seen him come across my radar for many years now. The Web performance experts now are better than ever but I don't get the sense of the majority of the industry hanging off their every word as seemed to happen with Jakob 10+ years ago.


I get weekly newsletters from him. Still good stuff.


Unfortunately, computing has always been that way. Remember the mantra "Intel giveth, Microsoft taketh away?". Developers quickly get spoiled with faster hardware, more bandwidth, more RAM, etc.

I'm not saying it's right. In fact, I would prefer if everyone took a look at their apps much like the author of the article. I'm just saying, this isn't a new phenomenon.


> baffles me

IMO pretty clear why, winner-takes-most economics emphasizes dev speed over quality, and then we need to maintain compat with it


There are reasons for it: it takes more work to write an efficient, simpler application with the same functionality. Developer time is expensive.


Well damn.

I'm in Bangalore and being forced to use mobile 3G for now. The data charges are atrocious and each page load saps an MB.


Maybe you have taken these steps already, but uBlock Origin works great, and NoScript works okay, in Firefox for Android. They may get your page size down by significant fractions.


This is a fascinating example of Simpson's Paradox:

https://en.wikipedia.org/wiki/Simpson%27s_paradox

It also reminds me of the phenomenon, in customer service, whereby an increase in complaints can sometimes indicate success -- it means the product has gone from bad enough to be unnoticeable to good enough to be engaged with.


In WW1, helmets were introduced to protect soldiers. Surprisingly the frequency of head wounds went way up. It took a little while to realize soldiers were "just" being wounded, rather than outright killed if they had no helmet.


There's a similar story about putting armor on planes during WWII - http://www.johndcook.com/blog/2008/01/21/selection-bias-and-...


Pretty funny story considering YouTube is back to unusable on slow connections. They used to buffer the full video, so you could load up a page, let it sit until the video buffers, then watch it eventually, maybe after reading your social news sites. Nowadays the buffering feature has been removed and you'll just come back, hit play, get a second or two of video, then it has nothing again for a long time.

Feels bad for the engineer who spent all that time reducing the size and finding out it made YouTube much more usable across the globe. Amusingly, disabling buffering was probably some penny wise pound foolish way to save bandwidth.


It's for the average user with a decent connection that skips parts of a video. This is tedious, but if you want to buffer a YouTube video try youtube-dl[0] or using VLC.

[0]https://rg3.github.io/youtube-dl/


You can disable mediasource extensions in your browser to get the old download-it-all html5 video.


I went to work for a company that makes a travel product used by people in almost every country in the world, after trying to use it in southeastern Europe. I told them their page weight was killing the experience, and wanted to join the front end team to fix it.

After 6 months of banging my head against a wall, I realized the reason we weren't fixing page weight was because our product managers didn't care about the experience of users in poorer countries, because they didn't have any money to spend anyway. Even though we had lots of users in those countries, and even though we made a big show of how you could use this app to travel anywhere in the world.

If there's a lesson there, its that as long as cold economic calculations drive product decisions, this stuff isn't going to get any better.


If you're on Windows you can use the Network Emulator for Windows Toolkit (NEWT): http://blogs.technet.com/b/juanand/archive/2010/03/05/standa...

I've used it to emulate what it's like on a high-latency or high-loss network. Relatively easy tool to use.


Chrome also has this feature if you switch to device mode (Ctrl+Shift+M).


Coming from a low bandwidth, high latency part of the world, I can't confirm this enough.

Today, I have 2 mbit and can use Netflix or Youtube just fine, but mere 4 years ago, I had 600k and, boy, that was hard. Hard as in loading youtube URL and go for a coffee.

UPDATE:

In case Duolingo developers are listening, please test your site on high latency and very low bandwidth scenarios. I just love your site, but lessons behave too strangely when internet is bad here.


> Hard as in loading youtube URL and go for a coffee.

You can't even to that now. Youtube videos buffer about 1m30 of videos and stops after that :(


I understand that Google is doing this to save an immense amount of wasted bandwidth. These days if I need to wait for something to load, I use youtube-dl.


At the time, I wrote an Chrome extension that would select the lowest resolution (240p at the time), but I can't remember if it was capable of autoplay.


My train commute into work is the perfect place to brush up with Duolingo. Shame it randomly goes through patches of no signal, as this makes Duolingo fairly unusable. I have to time my practice to fit into the areas I know it’ll work in.


If you're looking for an alternative to Duolingo, give Memrise a try. It allows you to download entire lessons for offline use.


Comments from the last time this was posted: https://news.ycombinator.com/item?id=4957992


Wow. The verge was the poster child for most bloated site even three years ago, long before the recent ruckus.


What was the recent ruckus?



It is insane. One of my favourites is the "about.me" site, which is meant to be a simple online business card. Picking a random page from https://about.me/people/featured, you get a page weighing over 3MB!

http://tools.pingdom.com/fpt/#!/F4VDN/https://about.me/penta...


Everyone's sometimes on spotty WiFi or foreign expensive 3G. I'm more inclined to trust fast-loading sites and apps.

I wonder what would happen if for example iOS decided to visually indicate page weight, kind of like how you can see which apps use the most energy.


I think that's a great idea. Make it more visible, and then you get normal people to care about it and put pressure on content owners, developers, etc. Like Google did with their "mobile-friendly" tag; I could yell at my managers all day about how we need to improve our site's mobile experience to no avail, but Google steps in and threatens lower rankings and suddenly it becomes top priority.

Perhaps the simplest solution is for Google to start penalizing heavy pages, but as far as I know, page weight isn't part of their mobile-friendly criteria.


As a web dev I always have this in mind but the challenge is convincing your client who wants a video background. Maybe we need a media query that detects internet speed.


Even then, it's fairly easy to load the video asynchronously and as part of the last assets, make its intro blend in to a single-color background, voila, problem "solved".

I find video backgrounds ridiculous most of the time (though they can be done really well), but that's not the real problem here - the real problem is including 200-800kb javascript code that does nothing but track your user, and often enough doesn't do it for you! (Hi Facebook!)

The real problem is using massive js frameworks for the sake of adding dynamic functionality to your site that, often enough, isn't actually worth it.

The real problem is that very often, these "features" are only as necessary as the marketing team says they are... the people who have the ability to ask "why?" and the ability to understand "why not" don't have the voice (or guts...) to do so.


"Massive" JS frameworks aren't the issue, not by themselves. AngularJS, for instance, is only 39.5kb. Bloat sneaks into web applications in other ways, but merely bringing in a framework isn't enough to add a noticeable load on a web page.


I’ve seen sites loading Angular, React, jQuery(+jQuery UI) and some custom frameworks. Different parts of the site rendering in different frameworks.


I would love this. My company just updated their public-facing website and it has a 28MB 1080p video playing in the page header. The site was outsourced, but I work as a web developer for some of our products and was asked by the marketing team to give the new site a look over.

The marketing team was surprised to learn how big the file was since the outsourced team "ensured" them that the new website would be lightweight. Oh well... Now the video has been reduced to a 16MB at 720p, which is still ludicrous to me, but they just really like having the video on the home page more than they want a truly lightweight page.

And to be fair, the video does not load when viewed on a mobile device, so at least there's that.


Have you looked into applying a blur filter in CSS and just reducing the bitrate of the video to a very low value?


I'm looking at you Airbnb that I often need to access from bad hotel wifi in a far away country :(


Two Client Hints are in the works to provide exactly this. Check out Save-Data and Downlink: http://igrigorik.github.io/http-client-hints/


Isn't this phenomenon getting worse now that responsive design is in vogue? We've collectively decided to shoehorn a website designed for the connectivity and speed of a desktop browser into a lower powered device with slower/spotty connectivity.

Genuinely curious: Why is this better than a mobile-friendly site designed specifically with the constraints of a mobile device in mind?


Simple answer: Writing a responsive site is much faster and easier to maintain than creating two versions of the same site. And it wouldn't it be just two, you also have to support tablets, various desktop sizes, and various mobile sizes. Custom versions for each one isn't feasible.

Also, if you care about page and weight and optimization, your site will be light everywhere, and the criticism of shoehorning a big bloated desktop site into a phone won't apply. This is not that difficult to achieve.


More than page weight, this article demonstrates that averages are dangerous, especially for performance metrics. All key metrics should be plotted in 50/90/95/99 percentiles, and for latency-sensitive ones, geographic breakouts can often reveal a serious delta from the mean.


Thought certainly an interesting anecdote, I don't understand how a video streaming site like YouTube would be useful in a market where a 100K download takes 2+ minutes. You'd have to open a page, walk away for an hour, and hope everything was OK when you got back.


Well think about it. Imagine you had no alternative to getting any video on the internet? After all, if not youtube, anything else was probably slower at the time. Obviously you wouldn't simply lose the desire to watch relevant videos, it'd be dimmed quite significantly sure but you'd still want to watch a video every now and then. Whether it's a music video, or a chess match or a farming lecture, there's just a ridiculous amount of content available. I have plenty of videos I'd want to watch even if I'd have to wait hours. (take torrenting for example, or pre-youtube & pre-streaming file sharing, video and music on p2p networks were incredibly slow yet incredibly popular).

But if you couldn't even load the pages to browse through the videos, that'd pretty significantly reduce your watching even further.

Liken it perhaps to a library where you could only get 1 book per week. Well that's not much, but at least you can get 1 book. But if you go to the library you have to wait in line for an hour, and after examining one book section of 10 books, you have to wait 5 minutes to examine the next one. Choosing the weekly book would become such a chore you'd probably not even bother to go anymore. But if you'd suddenly be able to examine the entire library with only 1 minute of total waiting, many more would be much more inclined to do so, even if they could still only borrow 1 book per week.

Beyond that in my experience, lots of requests in a slow connection often fails before completing fully. I don't know why exactly, I've never been a network engineer. While streaming in a slow connection eventually gets there. So it's entirely possible that even browsing normal pages was failing entirely before for some people, even though (if they ever did load the page and choose a video), the video download worked fine albeit at a very slow pace.


> you'd still want to watch a video every now and then

My favorite example of this is YouTube repair tutorials. If I need to get something done, these can be irreplaceable. If I had a slow connection, but my car was up on the jack stand, I'd just have to wait.


Still, I imagine it would be better to forget the 2MB page load issue, and drop frames/colors/resolution to cut 50MB off the video size.


>You'd have to open a page, walk away for an hour, and hope everything was OK when you got back.

This was my first experience with the web in high school, with a modem, on AOL. I loved it.


I only ever watch youtube videos via youtube-dl, but I still use a browser to navigate the site to find videos to download.


I lived like that for some years.

My friends sent me e-mail with a youtube link and I literally waited one hour or so to view that. Just let it loading while reading something else.

So, the trouble here is not me having low bandwidth, but having connections on broadband places.

And even people with no connections in the first world want to see video clips online.


Page weight may matter, but I think amortized page weight matters most. It's like the marshmallow experiment for the web. If you can make one request at 10x the size, but it's only made 1/100th as often (presumably spans multiple pages) then as long as people come back enough to justify that initial extra cost, you've effectively decreased to 1/10th again.

That's why I think AJAX, web manifest [1], indexedDB, localStorage, etc. need to be leveraged much more. Imagine most of your app loading without making a single request, except for the latest front page JSON, or the latest . You have a bunch already in indexedDB so you just ask the server "hey, what's new after ID X or timestamp T?"

So your two minutes just became a couple milliseconds (or whatever your disk latency happens to be), and the data loads shortly thereafter, assuming there's not much new data to send back. And if you don't need any new resources, you only had to make a single request.

[1] https://github.com/w3c/manifest


The initial page load matters a lot. That's the one where the user is deciding whether he'll be coming back to your site. The unfortunate thing about many caching technologies is that they speed up subsequent page loads, but do nothing for the initial one. You still need to pay attention to clean-cache load times. Indeed, it can often be worth it to pay a cost on total page load time to get first-paint time down - that's the time at which the first visual representation of the webpage appears, even if it's just the header, layout, and a bunch of boxes.


This is a bad idea, because users may not give you a second chance. The research says over and over again, users respond negatively to small increases in latency.

In fact, if I knew of a way to decrease the size of the first load, at the expense of making the second load take longer, I'd probably do it.


The Ember FastBoot project has the goal to provide exactly this: lightweight HTML/CSS on the initial load and then hydrate the page/application with JS.

http://emberjs.com/blog/2014/12/22/inside-fastboot-the-road-...

https://github.com/tildeio/ember-cli-fastboot


For some reason this whole problem reminds me of early game developers dealing with small amounts of RAM. Which clearly isn't a problem today. So would it be fair to say we should focus on increasing bandwidth to most of the world. I'm not saying page weight doesn't matter, but if you're just trying to get something off the ground maybe you shouldn't worry about it so much. I mean, why worry about users with poor bandwidth, if you don't even have users? If you already have a growing user base, then by all means refactor, reduce the footprint. But if you don't, code the damn bloated thing first.


Sure, most games these days are probably more CPU/GPU bound, but then again people usually don't have more than one game running at a time, while caching the assets of thousands more.

Also, isn't adding bloat "more work"? And just like in real life, I think losing bloat is often harder than gaining it. Why not worry about carousels or endless scrolling or video background when even just one person misses that stuff? Where are all the highly successful websites that started to reduce bloat after they got off the ground? It's not a rhetorical question or sarcasm, I am interested, but I honestly can't think of even one example, it doesn't really mesh with my own (admittedly rather pedestrian) experience. A site starts with a blank design doc, and empty file and a white screen, and adding something and later removing it isn't easier than simply not adding it in the first place.


Nice to see this again - I've told this story to many of my web dev students. :-)


I highly recommend turning on page throttling in the Chrome Dev Tools sometime. You'll be amazed at even how slow even 4G seems.


Heh, I've been using flask tempting to make some html forms for exploring large datasets. Turns out when you have 6000 terms that show up in 6 different UI elements putting those in as raw html results in a 13mb file that compresses down to 520kb. Pretty awful use case for forms. I'm pretty prejudiced against JavaScript, but having seen this I now deeply appreciate being able to send something other than raw html.


" I learned a valuable lesson about the state of the Internet throughout the rest of the world. Many of us are fortunate to live in high bandwidth regions, but there are still large portions of the world that do not. "

lol, I'm in China and all what you're discussing just does not exists here. What you'll get is just a "Connection Reset", no matter how compact the page is.


If it takes two minutes to load a 100kb page, does it take twenty minutes to watch a 1MB video? Over three hours to watch a 10MB video?


Probably not. Downloading the video is going to depend more on throughput than on latency. The initial connection & page load, as it hits all the domains and resources, is going to be much slower because it relies more on latency and roundtrips.


The article says that the page would have taken 20 minutes to load under previous circumstances.


The more things change, the more they stay the same.

I remember visiting microprose.com with my 14.4k modem in the mid-90s and being mad that they used so many images I had to wait for about 5-10 minutes or so. I couldn't effectively read it at home and usually ended up reading it at the library.


This is a bit tangential, but how did we get from "size" to "weight"? It seems a bit of an odd phrasing to me. With the exception of ESL mistakes, I don't know of anyone or any software which refers to "file weight", for example.


Because the weight is hidden behind the visible page content, which had a different size. Page weight is the sum of the sizes of the resources used to build the page.


If it took two minutes to load the framing page, how would they be able to stream the video?


It's not surprising large numbers of internet users in emerging markets skipped the web entirely. Delivering product via SMS and chat starts to make a lot of sense in context.


Great point and very important in times when bundling howmany? js dependencies for client apps.


Fantastic


Ditto - Loved reading this. I work on similar projects but it's a tough battle to win against marketing these days.


WOW! Faster load times and lighter code makes for a better user experience? (mindblown)


whoosh! You missed the whole point.


That you can't always look at a single metric as a basis for success? This is rudimentary analytics here. Is there a single person who believes making client side code heavy is the way to go? Do people really think user experience should take a back seat to cool stuff on a site like YouTube?


Try "what makes a site merely inconvenient for some can render it completely unusable for others, and so one should not overlook minor issues because they don't seem worth fixing."


Excuse me. I didn't realize this wasn't obvious.


The problem of unknown unknowns.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: