> The article somehow contrives to be 18 megabytes long, including (in the page view I measured) a 3 megabyte video for K-Y jelly, an "intimate lubricant".
(Warning. Swear words incoming, because the situation has grown far out of control)
Fuck websites with autoplay (or autoplay-on-hover) videos. Fuck them. Whoever has invented or implemented this crap, please resign from your post immediately.
Even in 1st world countries, people use 3G/4G with data caps to work or are in otherwise bandwidth-constrained environments (public hotspots, train/bus wifi) etc. You are screwing over your users and don't realize it.
Also, something especially Spiegel Online comes to mind: 30 secs video clip with a 25s advertising clip. Fuck you.
> Why not just serve regular HTML without stuffing it full of useless crap? The question is left unanswered.
Easy actually: because a well-defined restricted subset of HTML can be machine-audited and there is no way to abuse it. Also, Google can save resources at indexing.
> Easy actually: because a well-defined restricted subset of HTML can be machine-audited and there is no way to abuse it
This is why I personally use NoScript.
People often ask me "how can you stand to use NoScript on the modern web? Isn't it a huge pain to whitelist scripts? Isn't everything broken?"
Nope. Most of the web works perfectly fine without loading Javascript from 35 different domains (as my local news site does). You whitelist a few webapps and you're pretty much good to go. The difference is incredible. Your browser uses a fraction of the memory. Pages load faster than you can blink. The browser never catches up or lags. Pages scroll as you expect. Audio and video never autoplay and distract you. When I briefly used NoScript on mobile, it was a miraculous life-saver that made my battery live forever.
In the past couple years, however, I have noticed a new phenomenon. Remarkably - madly, in my view - there are webpages, webpages that should be simple, webpages by all appearances that consist of nothing more than a photo (maybe more than one), a byline, and a few hundred words of text, that require Javascript to load. As in, you will get a a blank page or an error message if you don't have their scripts enabled.
I don't understand it. I don't want to understand it. I just want it to stop.
I understand that you need Javascript and so forth to run a webapp. I'm not even asking for your webapp to be less than 5MB. Hell, make it 50MB (I just won't ever use it on mobile.) Making applications can be a lot of work, maybe yours is super complicated and requires tons of libraries or some crazy media loading in the background and autoplaying videos and god knows what else.
But please, please, don't require Javascript to simply load an article and don't make a simple article 5MB. Why on Earth would you do that? How many things have to go wrong for that to happen? Who is writing these frameworks and the pages that use them?
"In the past couple years, however, I have noticed a new phenomenon. Remarkably - madly, in my view - there are webpages, webpages that should be simple, webpages by all appearances that consist of nothing more than a photo (maybe more than one), a byline, and a few hundred words of text, that require Javascript to load. As in, you will get a a blank page or an error message if you don't have their scripts enabled."
I use noscript as default and I'm noticing the same thing. I post them to twitter. Here's a sample:
- 'Here are the instructions how to enable #JavaScript in your #web #browser.'
- 'For full functionality of this site it is necessary to enable #JavaScript.'
- You must enable #javascript in order to use #Slack. You can do this in your #browser settings.
- 'You appear to have #JavaScript disabled, or are running a non-JavaScript capable #web #browser.'
- 'Please note: Many features of this site require #JavaScript.'
- 'Tinkercad requires #HTML5/#WebGL to work properly. It looks like your #browser does not support WebGL.'
- 'Warning: The NCBI web site requires #JavaScript to function. more...'
- 'Whoops! Our site requires #JavaScript to operate. Please allow JavaScript in order to use our site.'
- 'The #media could not be played.'
- 'Notice: While #Javascript is not essential for this website, your interaction with the content will be limited.'
- 'Powered by #Discourse, best viewed with #JavaScript enabled'
Seriously, I remember using a web chat client (might have been an old version of mibbit) which was "minimal" JS. Basically server side rendering of the incoming messages plus periodic page refreshes. It was a very poor user experience.
Back in the days of IE4/5 I discovered that turning JS off would very effectively stop pop-ups, pop-unders, pop-ins, slide-overs, and all other manner of irritating cruft, and it's been the default for me ever since. Conveniently, IE also provided (and AFAIK still provides) a way to whitelist sites on which to enable JS and other "high security risk" functionality, so I naturally made good use of that feature.
More recently, I remember being enticed by some site's "Enable JavaScript for a better experience" warning, and so I did, only to be immediately assaulted by a bunch of extra annoying (and extra-annoying) stuff that just made me disable it again and strengthened my position that it should remain off by default. That was certainly not what I considered "a better experience"... now I pay as little attention to those messages as I do ads, and if I don't see the content I'm looking for, I'll find a different site or use Google's cache instead.
Another point you may find shocking is that I used IE for many years with this configuration, and never got infected with malware even once from casual browsing, despite frequently visiting the shadier areas of the Internet and IE's reputation for being one of the least secure browsers. It likely is in its default configuration with JS on, but turning JS off may put it ahead of other browsers with JS on, since the (few) exploits which technically don't require JS are going to use it anyway to encode/obfuscate so as to avoid detection.
And on top of that, some redirect you to another page to display the no-js-warning. Then you enable JS for the site and reload but will get the same warning because you're no longer on the page you visited...
I think it is mod_pagespeed that's doing this, kinda silly.
NoScript, at least, has an option to catch and stop redirects of this kind. They can be quite funny when you can see a perfectly loaded page that wants to send you elsewhere.
I feel it's getting worse. A large % of the links (mostly start-ups landing pages, not articles) I click on HN just present me with a completely blank white page. No <noscript>, nothing! If you're lucky, you see enough to realize that it's probably just a page that relies on JS. It's never a good first impression.
It's getting progressively more annoying to whitelist the TLD as well as the myriads of CDNs that sites are using. Often it's a click gamble: a "Temporarily allow sketchy.domain.com", throwing in the towel and saying "Temporarily allow all this site".
Site embeds a video? Good luck picking which domain to whitelist out of a list of 35 different domains ;) Temporarily allow one, reload, repeat a few times, close tab in anger and disappointment.
It's sorta-kinda a good thing, implemented poorly. What's actually happening is that more things that used to serve HTML directly to the browser, are now just browser-oblivious HTTP REST-API servers. Which is good! API servers are great!
The correct thing to do, after implementing one, though, is to then make a "browser gateway" server that makes requests to your API server on the one side, and then uses that to assemble HTML served to the browser on the other side. (This is the way that e.g. WordPress blogs now work.)
What's happening instead is that the site authors are realizing that they can just get away with writing one static blob of Javascript to act as an in-browser API client for their API server, stuff that Javascript in an S3 bucket, put e.g. CloudFlare in front of it, and point the A record of their domain's apex at that CF-fronted-S3-bucket. Now requesting any page from their "website" actually downloads what's effectively a viewer app (that they don't have to think about the bandwidth costs of at all; they've pushed that entirely off to a third party), and then the viewer app starts up, looks at the URL path/query/fragment, uses it to make an API request to their server, and the response from that is then the rendered page.
Kind of ridiculous, no?
It does sort of make sense to me for sites like e.g. Twitter, where their bread-and-butter is running API servers and writing API clients that interact with those servers. The "web client" is just, then, another API client, written in Javascript and "zero-installed" into web browsers whenever they request a page from the site.
But for, say, a newspaper site, or a blogging engine? Those sites should definitely care about serving HTML directly, even if only for Googlebot's sake.
I think you really don't get the point of the article. This kind of crazy overcomplexity seems like one of the many things the article author would lump into excessive complexity for the sake of complexity.
Yes you can implement a REST API thingy and then an application server that templates it all, and maybe that logic is written in JavaScript and is the same code that executes on the browser so you basically have a sort of headless quasi-web-browser assembling pages to serve so you can reuse the same code on the client side to try and reimplement the browser's navigation logic with pushState etc. etc. in some dubious quest to outperform the browser's own navigation logic. I understand this sort of thing is actually done now.
Or you can just serve HTML.
And you miss the point of REST as well, I think. I'm increasingly convinced that nobody has the slightest clue what 'REST' actually means, but to the extent that I can determine what it means, it seems to essentially mean to design according to the original vision of at least HTTP. To use, e.g. content negotiation, HTTP methods, etc. as they were originally intended, rather than carving out some arbitrary path of what works arbitrarily choosing features of HTTP and bending them into some custom RPC system that you for some reason chose to implement over HTTP.
A consequence of this is that the same endpoint should be as happy to respond to Accept: text/html as Accept: application/json, surely. (And obviously, not just with some bootstrap page for your JavaScript web page-viewing web application.) It means your URLs represent resources and don't change. It means resources have canonical URLs. (If an API has a version prefix like "/v1/", it isn't RESTful no matter what its authors claim.)
I suppose you could proxy Accept: application/json requests without transformation to an API server, making the "browser gateway" server a conditional response transformation engine. In some ways that's kind of elegant, I think. But it also feels like overkill.
"Just serving HTML" assumes you have HTML to serve. If you're writing a CMS that consists of manipulating a bunch of /posts/[N] resources that are effectively un-rendered Markdown in the "body" field of JSON documents pushed into a document database/K-V store, though, then you've not got HTML. Coming from the perspective of writing the tools that allow you to manipulate the content, the simplest, cheapest extra thing you can do is to extend that tool into also being an API for getting the content. And nothing more. You don't want your flimsy document-management service to serve your content. You just want it to be spidered/scraped/polled through that API by whatever does serve your content. Exposing your flimsy hand-rolled CMS to the world, even with a caching reverse-proxy in front, would be a security nightmare.
And, even if you did want your CMS server serving up HTML to the world†, you don't want your CMS to serve up "pages", because deciding what combines to make a "page" is the job of your publishing and editorial staff, not the job of the people writing the content. This is an example of Conway's law: you want one component (the CMS) that your journalists interact with, that talks to another component (whatever themes and renders up the "website") that your line-of-business people work with, and you want them basically decoupled from one-another.
There's no reason, like you say, that the same server can't content-negotiate between the web version of a resource and the API version. I've implemented that sort of design myself, and a few years back, I thought it was the be-all and end-all of sensible web authoring.
These days, though... I've come to think that URLs are hateful beasties, and their embedding of an "origin" for their content (effectively making them {schema, namespace, URN-identifier} tuples) has broken the idea of having "natural" web resources before it ever got off the ground. The CA system and CORS have come together to make "origins" a very important concept that we can't just throw out, either.
What I'm talking about: let's say that the journalists are actually journalists of a subsidiary of the publishing company, recently acquired, kept mostly separate. So the journalists' CMS system is run by the IT department of the subsidiary, and the news website is run by the IT department of the parent company. This means that the CMS probably lives at api.subsidiary.com, and the website is www.parent.com.
Now, there's actually no "idiomatic" way to have the parent website "encapsulate" the subsidiary's API into itself. You can allow www.parent.com to load-balance all the requests, frontload content negotiation, and then proxy some of the API requests over to api.subsidiary.com... but api.subsidiary.com still exists, and now its resources are appearing non-uniquely through two publicly-available URLs. You can further try to hide api.subsidiary.com behind firewall rules so only www.parent.com can talk to it, but, now—disregarding that you've just broken the subsidiary's CMS tooling and that all needs to be recoded to talk to their own API through the parent's website—what if there's more than one parent? What if, instead of a parent company, it's a number of partners re-selling a white-labelled multi-tenant API service through their own websites? api.subsidiary.com still needs to exist, because it's still a public service with an indefinite number of public partners—even though it's denormalizing URL-space by creating a second, redundant namespace that contains the same unique resources.
And all this still applies even if there's no separation of corporations, but just a simple separation of departments. The whole reason "www." was a thing to begin with—rather than whatever's on the domain's apex just setting up a forwarding rule for access on port 80—is that the web server was usually run by a different department than whatever was on the apex domain, and each department wanted the ability to manage their resources separately, and potentially outsource them separately (which still happens these days; "blog." and "status." subdomains will quite often be passed off to a third-party.)
In short: REST is awesome, and content negotiation makes perfect sense... until there's multiple organizational/political entities who need to divide control over what would originally be "your" REST resources. You can try to hide all that complexity after-the-fact, but you're just hiding it; that complexity is fundamental to the current {origin, identifier}-based addressing model used for the web. We would need to switch to something entirely different (like, say, Freenet's SubSpace Keying approach) to enable REST resources to have canonical, unique identities, such that you could really encapsulate multiple REST "representations" served by multiple different third-parties into the same conceptural "resource", and have that "resource" only have one name.
---
† On a complete tangent, HTML is way cooler as a format—and far less of a hassle to generate (no templates!)—when you just use it as a pure flat-chunked unstructured markup language, like so:
<html>hi <em>guys</em></html>
...instead of forcing it to be a container-format for metadata that should rightfully live in the HTTP layer. (Yes, the above is fully valid HTML. Validate it sometime!)
While it does make sense to request, say, the text/html representation of a BlogPost resource directly from the CMS server, what you should get in return is, literally, a direct HTML equivalent to whatever the JSON response you'd be getting is, instead of a "page"—that being an entirely different compound resource that you weren't talking about at all.
That's where people screw up REST—by thinking "send me text/html" means "send me a web page." Pages are a particular kind of "microformat" that the text/html representation is capable of! There are other kinds! Sometimes the unstructured HTML markup of a BlogPost-body might be all you want!
Now, that sounds all well and good as an argument for the sake of e.g. REST client libraries that want to get uncluttered HTML resources to chew on. But web browsers? It'd be crazy to feed unstructured markup directly to a web browser, right? Horribly ugly, for a start, and missing all sorts of important structural elements. It'd look like 1994.
Well, surprisingly, you can gussy it up a fair bit, without touching the Representation of the State itself. One thing people don't tend to realize is that the <link rel> tag is equivalent to sending a "Link" header in HTTP. This means, most importantly, that you can tell the browser what stylesheet it should use for a page as pure HTTP-response metadata, instead of embedding that information in the transferred representation.
With enough clever use of CSS :before/:after styles to insert whatever you like on the page, you can "build" a structure around some REST "State" that was "Represented" during the "Transfer" as completely unstructured markup. That's what you should be seeing as the result of letting a browser do content-negotiation directly with a RESTful API server.
(Now, you can't get any Javascript going that way, but shame on you for wanting to; the thing you're rendering isn't even a representation of a compound resource.)
Anyway, follow this kind of thinking far enough, and you realize that everything the "API CMS" server I just described does is something databases do (at least, the kind with constraints, views, triggers, stored procedures, etc.); and that the "website service" of the parent company is what's more traditionally thought of as a "webapp" server.
While there's for sure a use case where what you're describing is 100% appropriate, and in those cases, yes, you need some of that complexity - a good majority of blog-like things on the internet are by and large managed by a single person, and something like Jekyll or even just writing in HTML directly is totally fine.
In fact, this is the exact point the author brought up - of course a lot of this complexity has it's uses - complex systems aren't invented just for the sake of complicating things. But instead of assuming that every single thing you write needs to be written to the most complicated standards you can imagine, we should instead be focused on "What's the simplest way I can deliver this experience right now?".
Really, it's another way of expressing YAGNI, which apparently the development community has completely forgotten in their quest for the latest and greatest.
> With enough clever use of CSS :before/:after styles to insert whatever you like on the page, you can "build" a structure around some REST "State" that was "Represented" during the "Transfer" as completely unstructured markup. That's what you should be seeing as the result of letting a browser do content-negotiation directly with a RESTful API server.
CSS is likely to be too limited for that; often the layout you want to have means the DOM elements need to ordered and nested in ways that don't actually make sense for your data. What I've seen actually working was XSLT stylesheets. Of course, that has its own issues with bloat, and forces XML for data transmission. (The context I saw it in was a RSS feed, a few years ago.)
Well, when I write PHP/MySQL apps without all the plugins and libraries and crud, everything it sends to browsers is pure HTML. Manipulating the output can be done with server side code. It's just not as flashy and shiny.
Ah, no, that's not what I meant with that second bit; I was referring mostly to not having a <head> in your HTML, thus making your HTML less of a "document" and more just a free stream of bits of text with an <html> tag there purely to indicate what you should be parsing these as. (You don't even need that if you've sent a Content-Type, but removing it means you can't save the document because OSes+browsers are dumb and make no place for media-type metadata on downloaded files.)
Things that go in the <body> like "top navs" and "sidebars" actually are actually part of the resource-representation; it's just a compound "Page" resource that they're properly a part of. /posts/3 should get you "a post"; /pages/posts/3 (or whatever else you like) should get you "a web page containing a post."
But when you do use the "<head>-less" style, and are requesting a non-compound resource more like /posts/3, then a wonderful thing happens where you get literally no indentation in the resulting linted document. No recursive "block-level" structure; just single top-level block elements (<h1>, <blockquote>, <p>, and <addr> mostly) that transition to containing inline-reflowed text. It's quite aesthetically pleasing!
This seems pretty ridiculous. Why do you need a completely separate API?
The rendering is just business logic. Run a normal web app and it pulls the content from a database and does whatever transforms you need and still serves up HTML.
Tracking can absolutely be done without JS, although the granularity of the data is bigger; these have been around ever since the very beginning of the Web:
https://en.wikipedia.org/wiki/Web_beacon
I've noticed quite a few simple article websites that have something like a class on the body that sets display: none - and then that class is removed only after all the JavaScript has loaded. It's very obnoxious.
Indeed. I have used Adblock and Noscript for years.
My browsing the web is not an invitation for websites to serve a webpage viewing webapp so that they can (poorly, buggily, in a more error-prone manner) reimplement a browser's navigation logic. (Ever had to reload a website which uses the pushState API because you clicked a link and for whatever reason the XMLHTTPRequest it made to fetch the page didn't work and it just hung and ignored all future link clicks? Dear chickenshit webdevs, if you think you can implement navigation better than an actual web browser, you're probably wrong.)
The vast majority of the time when I come to an article which is a blank page without JavaScript, I don't enable JavaScript; I just make a mental note that the web developers are beyond incompetence and move on.
I'm starting to respond to this trend with a more aggressive refusenik approach. For example, CSS is now so powerful that you can cause excessive CPU load with it alone. So I now have a shortcut configured to disable CSS for a site. This also makes many sites readable which otherwise wouldn't be, because they're doing something insane like blanking out content with CSS under the expectation that it'll be shown using JavaScript. And of course all of these recent 'ad-blocker-blockers' (http://youtu.be/Iw3G80bplTg) seem to rely on JavaScript.
Sometimes the content is loaded via JavaScript and so this won't work. Amazingly, for some years now there is a Blogger template which does this, which demonstrates that this brain-damaged approach has spread even to Google. But the greatest irony is that you can work around these sites, quite often, using the Google cached copy. Googlebot has supported JavaScript for some time (actually sort of unfortunate, in the sense that it removes an incentive for webdevs to design sites sanely), and it appears that cached copies are now some sort of DOM dump. Which has the hilarious consequence that you can now use Google to fix broken Google web design. There are *.blogspot.com sites which are blank pages, but the cached version is readable.
My own weblog is very spartan, being rather of the motherfuckingwebsite.com school of non-design. bettermotherfuckingwebsite.com was linked below, but I don't think I agree with it. Ultimately, in terms of the original vision of the hypertext web, I'm not sure web designers should be dictating the rendering of their websites at all; that is, I'm not sure web designers should exist.
So basically, imagine surfing the web with CSS disabled, but for your own user styles, that format absolutely all content the way you like it. Your own personal typographic practices are unilaterally adopted. bettermotherfuckingwebsite.com might be right as regards to typographic excellence, but it's wrong about where those decisions should be made.
Unfortunately it's undeniable that this is a lost battle. Browsers used to let you choose default background, foreground and link colours, as well as fonts, font sizes, etc. I think you can still choose default fonts. But the idea of the web browser as providing a user-controlled rendering of semantic, designless text is long abandoned. That ship died with XHTML2 - I think I'm about the only person who mourned its cancellation.
Off topic: Thanks for the video link; the source of it, The Big Hit, looks to be something I might rent and watch today, given that the clip and another from the movie were rather funny.
A lot of times, that's just more effort than I want to deal with. I have Privacy Badger on a PC or two, and have reported things broken... probably a dozen times because I forgot I was on a computer with it enabled.
We shouldn't try to fix badly bloated websites. We should RIDICULE badly bloated websites, and take our business elsewhere.
> Fuck websites with autoplay (or autoplay-on-hover) videos.
It's not just marketing. My personal favorite is how Twitter and (shudder) Facebook are doing this on their timelines now. At least they have the decency to mute the volume, but that doesn't help with data caps.
Both services offer to disable the auto-play, but unfortunately global and not only on a certain device or depending on network environment.
Hell, that 'd be a nice thing to have in HTML5 and OSes: allow the user to classify networks as "unrestricted" (fat DSL line, fibre,...) or "restricted" (mobile hotspots, tethering, metered hotspots), and expose this to websites so they can dynamically scale.
Android already supports this, but no other platform. A shame.
Twitter specifically won't disable autoplay for everything:
The option text reads:
Videos will automatically play in timelines across the Twitter website. Regardless of your video autoplay setting, video, GIFs and Vines will always autoplay in Moments.
I removed Moments from the ui with Stylish the same day they launched it, and image/video previews entirely in the main timeline. The selectors you want to hide are ".expanding-stream-item:not(.open) .js-media-container", ".expanding-stream-item:not(.open) .js-adaptive-media-container", and ".moments"
PAYPAL frikkin has full-screen video autoplaying on page load... of course one can go directly to .com/login but it's mind-boggling that a financial services provider for the masses thinks that this is acceptable.
I bet they did a study of 100 random users, and got more engagement with the videos. Just because we hate them doesn't mean our mom's and grandpas hate them.
But, I thought noscript breaks the web? What it seems like is web-devs are breaking the web. Somehow, they have turned browsing the web into a worse situation than watching cable television.
Even in 1st world countries, people use 3G/4G with data caps to work...
_No_ mobile browser supports autoplay on videos in webpages. There used to be a hack on Android but it was closed in 5.1 (maybe earlier). Another reason to avoid native apps.
That sentence wasn't limited to mobile. People (I for example) use 3G/4G with laptops too. I don't have a data cap, but in other countries those are more common.
Easy actually: because a well-defined restricted subset of HTML can be machine-audited and there is no way to abuse it. Also, Google can save resources at indexing.
But AMP isn't a subset of HTML, is it? They've replaced a bunch of HTML tags with their own amp-prefixed variants… IMHO it reeks of vendor lock-in, and could just as well have been made a proper HTML-subset.
A surprising number of videos are served through a small number of video service provider sites (there's a different acronym for this, I can never remember it).
A half-dozen or dozen entries in your /etc/hosts file will block them quite effectively. I've posted this to HN in past.
(Warning. Swear words incoming, because the situation has grown far out of control)
Fuck websites with autoplay (or autoplay-on-hover) videos. Fuck them. Whoever has invented or implemented this crap, please resign from your post immediately.
Even in 1st world countries, people use 3G/4G with data caps to work or are in otherwise bandwidth-constrained environments (public hotspots, train/bus wifi) etc. You are screwing over your users and don't realize it.
Also, something especially Spiegel Online comes to mind: 30 secs video clip with a 25s advertising clip. Fuck you.
> Why not just serve regular HTML without stuffing it full of useless crap? The question is left unanswered.
Easy actually: because a well-defined restricted subset of HTML can be machine-audited and there is no way to abuse it. Also, Google can save resources at indexing.