I work in energy, and there's a saying: "The cheapest kwh is one you don't use."
Want to make your website faster? Don't send so much stuff. Removing unnecessary code or content from your site has always had the biggest impact on performance. Unfortunately, it seems that often the people in charge of increasing performance aren't the people who get to choose what has to be on the website.
I want to share with you my simple two-step secret to improving the performance of any website.
1. Make sure that the most important elements of the page download and render first.
2. Stop.
That being said having a fast loading site and being able to serve a spike in load are two different issues (even if having fast loading pages in the first place obviously helps).
Also don't forget that measuring (page size/response times/etc.) is the the first step. You can't optimize what you don't know!
Want to make your website faster? Use a 3G connection when developing or testing. Chrome Dev Tools make this easy and you can also adjust the latency as needed.
That should be done a lot more than it's usual today. Part of "mobile first" should be testing mobile speed.
One problem is of course that mobile networks vary greatly in both latency and bandwidth. A great resource on performance considerations for mobile networks is Ilya Grigoriks open book [1].
More often than I'd like, I'm trying to read different content "on the go," on the places where the environment blocks the 3G signal and I'm left with the "E" on the iPhone. There are pages that will load (like HN) and the pages that are just "forget it it will never even load."
That's the difference between "accessible" and "shiny but just forget it with the slow internet."
I don't use blockers. It seems that's why they are becoming necessary?
The UberEats homepage is 30mb of crisp imagery and fonts. I understand their target market is people on retina screens sitting on the hefty piped intranet of their trendy startup, but 30mb is a complete egress from sensibilities.
There isn't even that much content on the page, it barely goes past the fold.
Oh good god. I loaded the site with the network console open and found that the page loads 15MB of data before the first picture of food loads up on the page. Horrendous.
I concur. I did some cursory research the other day, and I noticed that most websites I visit (blogs, shopping sites etc.) now have total asset download sizes that are several times BIGGER than major enterprise software applications that I used to install on my PC 20 years ago. This is just for 'front end' stuff for the browser experience.
This is a strange metric because your phone also has 50x4 the processing power, 20x the ram, and 2g is significantly closer to the Internet connection you would have been lucky to have.
The disparity is that an ERP system that managed the customers, suppliers, production and employees of a multi million dollar business was a smaller executable than a single blog page nowadays that does nothing more than present information on a screen for someone to read.
The ratio of computing power & code size versus the meaningful work done _with_ it seems to be out of kilter somewhat.
It isn't strange, because what is being compared here is the utility of the website and the utility of the ERP. The amount of work (and, subjective though it is, the value of that work) done vis a vis the number of MBs each occupies.
"... this week on 'How to Do It' we're going to learn how to play the flute, how to split the atom, how to construct box girder bridges and how to irrigate the Sahara and make vast new areas cultivatable, but first, here's Jackie to tell you how to rid the world of all known diseases."
How to play the flute: "Now, how to play the flute. (picking up a flute) Well you blow in one end and move your fingers up and down the outside."
Great work. When I loaded the page I even thought to myself, oh shit this is fast.
One huge problem, with most large platforms for e-commerce stores, they are terrible at speed optimization!
For example, I have a store on Shopify and it's basically impossible to do most of what you explained. With Shopify, you can customize HTML / SCSS files with dynamic content. All of which gets compiled on their back-end. You have no access to the final compiled HTML and markup. You're sadly limited in what you can do.
It seems you need to have a custom store to really be able to optimize this hard-core.
This is largely incorrect. All the front end customizations that this article describes are possible with Shopify because you have complete control over the HTML, JS, and CSS that the platform spits out. While true that you can't "access" the final compiled HTML to edit it, the templates that power your theme are very close to that output, and not hard to change if you're familiar with HTML/CSS. The Javascript optimizations are all completely possible as well but you either have to pick a theme that has been optimized or implement them yourself in your own theme. Shopify also supports AMP and responsive themes (because it's just HTML).
It is true that with Shopify or similar platforms you aren't able to make backend choices or implement caching like the article has described, and yes it's true that all the themes in the theme store aren't perfectly optimized for page load time -- but the question really is do you want to do any of this stuff yourself? Shopify has invested years in their platform to make your website as fast as possible for as little money as possible. The themes that Shopify publishes in the theme store undergo huge amounts of performance testing to ensure that you aren't losing money from latency. The platform comprises a world class CDN, and Shopify doesn't charge for bytes through it. The platform uses many layers of backend caching at the data access, logic, and rendering layers to make your shop as responsive as possible if anyone has been to it before. Shopify powers some of the biggest sales on the internet that truly dwarf Good Morning America. The platform is also geographically redundant such that an entire datacenter can fail with only a moment of visible impact to you and your customers. If what I'm saying is true and you don't actually sacrifice that much flexibility on the Shopify platform, why would you choose to have to implement all the above?
Another angle is operations: How many people do you think they have monitoring it and ready to respond to any incidents? Shopify serves an order of magnitude more traffic than Baquend saw at the very peak of this occurrence every single minute of the day with a better response time on average.
The Shopify platform took an army of engineers 10 years to build so it would seem unlikely that Baquend has all the same features and stability. How much do you think Baquend charged Thinks to build this? Is it more than $29 / month?
Source: work at Shopify and built many of these systems.
> It is true that with Shopify or similar platforms you aren't able to make backend choices or implement caching like the article has described, and yes it's true that all the themes in the theme store aren't perfectly optimized for page load time -- but the question really is do you want to do any of this stuff yourself?
You're right, that is surely a thing you do not want to do yourself. This is why Baqend does all the browser and CDN caching, TTL estimation, query invalidations and so on automatically.
In terms of the frontend if you do not want to rely on a stock theme - and many larger ecommerce site won't - then you will have to do these optimizations yourself. Fortunately with good tooling available as outlined in the article.
I think Shopify is in fact a great improvement over systems like Magento. But if Shopify is well-enginneered (which I'm convinced of), how do you handle sudden load spikes in the database? Do you reshard and migrate data in real-time? How does Shopify scale the application tier? How can tenants run scalable custom backend code? What happens if a database or server node crashes?
The advantage of Baqend is that its inner workings are published in research papers and blog posts, so people can judge the performance and scalability not only through benchmarks and real-world examples like Thinks but also by looking at the design.
In contrast to Shopify, Baqend is backend-as-a-service which means customers have full freedom on how they design their site or shop. They do not have to live with a server-generated HTML that they cannot change - they can for exmaple easily build a React/Angular single page applications, run server-side JavaScript code, do custom queries, build a native app around the same business logic and APIs, subscribe to websocket-based change streams etc.
While I respect that it took you 10 years to build Shopify, this is usually not an indicator for scalability and a low-latency design. Often such projects end up with solutions like MySQL sharded at application-level, intermingled technology stacks and legacy systems.
Disclaimer: I'm the Baqend CEO and its caching approach is my PhD thesis.
> then you will have to do these optimizations yourself
For the optimizations you listed they actually all still apply with a custom theme. The author doesn't have to do anything. Shopify caches all generated HTML, serves all assets from a CDN, and controls expiry without using TTLs because we know when anything changes. The stuff custom theme authors do have to do is things like bloat reduction / post-first-paint behaviour addition / css pruning etc etc. We're considering shipping mod_pagespeed for all customers too which would help automate a bunch of this stuff.
> how do you handle sudden load spikes in the database
Most of our read load is served by a datablob caching layer[1] so the DB is doing mostly stuff only it can do, writes, and we then have a carefully tuned schema that allows for massive throughput. We also buy really expensive databases -- we just took shipment of over a ton of database hardware in one datacenter in preparation for Cyber Monday. If we do hit our limit, we're able to throttle the number of people in the checkout while maintaining a good experience for customers.
> Do you reshard and migrate data in real-time
Yes
> How does Shopify scale the application tier
We shard by shop and build failure domains called pods that are self contained versions of each piece of Shopify that can failover independently between infrastructures.
> How can tenants run scalable custom backend code
> What happens if a database or server node crashes?
We have automated failover in place at almost every level of the system, so unless we've screwed up somewhere, no one notices!
> they can for example easily build a React/Angular single page applications, run server-side JavaScript code, do custom queries, build a native app around the same business logic and APIs, subscribe to websocket-based change streams etc
Did the Thinks store end up using any of these features of the platform? I am sure lots of apps need these things, but for web storefronts a lot of this seems like overkill. Shopify also supercharges you when you want to move beyond just a web storefront: we have a mobile SDK for building native apps, connectors to let you sell on Amazon, Facebook, and anywhere you can inject HTML, a world class mobile administration app, a POS solution, a massive app store for other integrations, etc etc etc.
> this is usually not an indicator for scalability and a low-latency design
Again, sub 100ms response time and an order of magnitude more traffic every minute than the peak the Thinks sale saw... that's pretty good!
I'm trying to get at the fact that building a web store from scratch, be it with a handy backend as a service or not, is going to be really hard in comparison to using Shopify. You sacrifice some small amount of flexibility, but you gain so so so much more by buying into the ecosystem. I still think it is unwise to try to compete with other platforms who's core competency is commerce.
I never said Shopify isn't good. I pay for it and use it. I'm just saying there are limitations by using a third party service and not having complete control of the stack.
Plus, you could always use Shopify's JavaScript SDK if you need more control. We use it with a static Middleman site hosted on Netlify, which is a great combination: You get a performant backend, easy to use e-commerce administration and all the freedom to create your own site.
Another option would be Snipcart, which looks very strong as well but lacked advanced inventory and product management when we had to decide between Shopify and Snipcart.
they could do even better.
I mean something is really bad here, since their uncached css from frankfurt to my house should not take over 100ms.
and they also serve images, css and js from the same domain.
and they download the text's as a extra request.
and somehow http2 doesn't kick in.
I checked out Baqend ("we" in the article) and the site prevents me from navigating back when I press "back" (I think by injecting itself as the previous page - I'm not really up to date on dark patterns).
This is the kind of behaviour that is an auto-no for me when buying B2B, not just by itself but because of what it signals about the company. I recommend you remove it.
On the article itself: you can do very fast pages off a relational DB. In one case, the round trip on a Postgres full text search for auto complete on millions of rows from multiple tables was so fast we had to add a 10ms delay - on a $5/month VM.
We fixed that problem and deployed it. It turned out to be the iframe with the tutorial app. That sets a new hash with a random todo list id and that captures the back button. It was not intended to keep people on our site.
Thanks for pointing the issue out!
BTW, I agree, RDBMSs are a great choice if the hot data set fits in main memory and queries are well-optimized through appropriate indexes. Issues usually only start to occur if 1) the data set is too large 2) queries cannot be optimized by the query optimizer and involve full-table scans 3) interleaved transactions contend on few rows.
People often point out web design flaws in pages submitted to HN, but so rarely do we see an author come back, accept the criticism, fix the problem, and report back. Thankyou!
It looks like the dark pattern they used to get that "no back" behavior is: "open in a new tab".
Edit: I stand corrected, link from the article opens in a new tab, so the back issue is moot there. Definitely red flag behavior. Baqend folks, if you are listening, you should fix it, and if it is the default behavior of your framework you should definitely fix it. No one likes hijacked back buttons.
However, in basket, I clicked "Zur Kasse" to checkout, and the response took 11.08 seconds to get to the checkout page per Chrome network tab. Repeated multiple times with same result.
Also, I don't see any product detail pages, categories, search result pages, or anything other than 7 products on the homepage that add directly to cart. Does this strategy scale to regular ecommerce site levels?
You're pointing out an important point: third-party services have to be fast, too. The call that starts the checkout does a synchronous request to Paypal Plus, which has very ugly request latencies from time to time.
The shop is indeed quite minimal at this time. The optimizations scale very well to regular ecommerce sites, though. Detail pages and categories can be cached very well, as they only change by actions of human editors. A notable exception are recommendations. If they are item-based (e.g. a-priori algorithm) they can also be cached very well. If they are user-based (e.g. collaborative filtering) that part has to be fetched from the server-side recommendation model. The same is true for search results. The heavy-hitters, i.e. the most-searched-for keywords are cached, while the others are fetched from the database. That is the part where server-side scalability and latency optimization is of utmost importance. At Baqend we support both MongoDB full text indexes and automatic separate indexing in an ElasticSearch cluster with automatic CDN and browser caching for the most frequent search results.
Why do you need to call Paypal before the payment selection screen? Still happening for me, and there is no visible feedback when you click checkout. It just looks broken for 10 seconds.
The rest of it is so well done, but I have to wonder how many sales were lost because of this checkout issue.
You're absolutely right, latency in the checkout process is a conversion killer. Somehow Paypal currently seems broken, maybe it's still related to the Dyn DDOS attacks which took Paypal down completely.
During our testing, Paypal had latencies of ~100-500ms but these new latency outiers of 10 seconds are a no-go. We will definitely change the code to make that API call asnychronous. Generally Paypal Plus is still quite buggy and not well documented.
We also realised that tackling web performance takes a lot of effort..so have been building different tools which can automate front end optimization techniques. Our recent tool (https://dexecure.com) tackles image optimization by generating different formats (WebP, JPEG XR), different sizes of images and compressed to different qualities depending on the device and browser the user is using. All of the images are served with a fast HTTP/2 connection.
We have seen people integrate with it in under 5 minutes and cut down their page size by 40%!
> Furthermore, browser caching is highly effective and should always be used. It fits in all three optimization categories because cached resources do not have to be loaded from the server in the first place.
That is incorrect.
The browser issues requests for all cachable content with a last-modified-date [or similar special metadata], the webserver replies with a 304-not-modified + empty body, if the file was not updated.
There are still HTTP requests sent, there is still network latency, there is still server processing involved. [Worst case scenario: For tiny JS/css/icon files, the latency might dominate everything and nullify the benefits of caching].
It gets more complex: Since a few years ago, Firefox stopped issuing HTTP requests ENTIRELY for some cached content. It speeds the load time but it causes caching issues.
Last I checked, chrome and IE didn't adopt this behavior yet. That may or may not chance in the (near?) future.
P.S. I felt like being pedantic about caching tonight. Otherwise, a nice article =)
You're right that browser caching has a few tricky parts, which is why we do a some things differently.
What you describe is an HTTP revalidation (If-None-Match, If-Modified-Since) which the browser triggers in two cases:
-if the TTL defined in the Cache Control header has expired
-or if the user hits F5
In the normal case where a user revisits a site, the browser takes all the cached resources without doing a revalidation and that is the case one should optimize for.
The cases you seemed to have in mind is explicit refreshes (F5), expired cache entries, or servers that respond with a max-age=0 header (which is useful in many situations).
The most difficult part when caching dynamic data is purging stale content from browser caches. Because if the TTL is too high and the content changes ealier, users will see stale cached data. At Baqend we solve this problem by piggybacking a Bloom filter of potentially stale URLs to the client [1]. The client checks the Bloom filter to determine whether to do a revalidation or use the cache. In order for the Bloom filter to remain small, we have a learning TTL Estimator that calculates TTLs by approximating the expected time to the next write.
> There are still requests sent, there is still network latency. [In the case of small JS/css/icon files, the latency can be as much as the transfer time].
What? When the "Expires" header indicates the item is still valid the browser should use the cached object without making any requests.
Some servers (Rails for example) just serve assets with "Expires" set to some distant future time. When they want to expire these files, they simply change the name (hence the long numeric suffix appended to js/css/etc. files)
It doesnt ignore it. In chrome, if you keep the network tab in dev tools open, you will see "From disk/memory cache" in the size column with a status of 200 when it comes from local cache.
> the browser should use the cached object without making any requests.
That's a common (mis)assumption, but that's not how it works in practise. Sorry ^^
I cannot comment on what was the intended behavior when HTTP caching and headers were first designed, a very long time ago.
It seems to me, that is was "request everytime + get 304 if not modified, thus save the file transfer time". That's how most implementations work in my experience.
Historically, transfer duration was way more problematic than latency. It's only recently that there are pages with hundreds of dependencies.
---
Rails and webservers: They MUST change the filename or the application will be fucked because of caching issues, mostly because of Firefox :D
All major browsers do respect the TTL if one is set. So it does save many requests. If it's not possible to the the TTL because the content might change at any time it is still useful to a caching header with a TTL of 0. Otherwise the browser will perform heuristic caching and compute a (often imprecise) estimation based on the last-modified header.
We built Baqend around the caching algorithms that we developed at the university. Our goal was to make the performance optimizations applicable to as many contexts a possible. We went for the "serverless" pardigm and backend-as-a-service to accelerate any application (website or app) that uses our APIs and hosting.
Currently most of our customers are e-commerce, web and app agencies, startups and software companies that look for web performance in combination with scalability.
We're currently getting lots of request from e-commerce, so we are building out our support for shop features (e.g. advanced search, payment, content management). We are developing a Magento plugin to make the transition to our platform and its performance benefits as easy as possible.
FWIW there are some clever Magento shops running pretty stout at sub second load times on AWS. Puts Magento back in the upper echelon for carts again, imo.
Personal story (P.S. I am not the author, I just worked in startups as well):
For me it didn't, especially when we were not prepared for it, and we know that we couldn't take the traffic if it happened.
What really killed me instead, was when our site was slowish or half-fucked during some events. And people still managed to use it harder and harder for the entire day. And we made unusual loads of money and new customers, with a crap site xD
There is of course a feeling of "how much did we miss out? that must be a lot considering how much we did get.", but it's less prominent than the feeling of "how the hell did all these people manage to pay us at all? and why the fuck did they keep using the site?". That does contradict every known blog and article out there about people just leaving.
And it not only hurts because of revenue. Slow sites also let users preceive quality and credibility as lower (Fogg et al. 2001; Bouch, Kuchinsky, and Bhatti 2000). The startup/brand will end up being seen as less interesting and attractive (Ramsay, Barbesi, and Preece 1998; Skadberg and Kimmel 2004). So being slow or offline can even have a long-lasting negative impact on the coolness of your startup.
Do you have any references dating after mobile browsing became ubiquitous? People have definitely adjusted to shitty cellular connections. I have severe concerns whether these studies from pre-iphone eras are of any modern value.
Sure. There is an abundance of studies. My favourites are [1]:
-100ms of additional page load time cost 1% revenue (Amazon)
-400ms means -9% vistors (Yahoo)
-Showing 30 search results instead of 10 and therefore accepting 500ms of additional page load time means traffic goes back by 20% (Google)
-Every second of page load time reduces conversions by 7%
There are many more. There is a nice infographic summarizing a few more studies in a nice way [2].
In fact, I think the trend is quite the opposite: as devices and connections get better, people start to expect that websites are very responsive and load fast.
The first and third are from 2006 and the second is from 2008 or earlier, so they don't seem to fit with stonogo's request for references after mobile browsing became ubiquitous (the iPhone only came out in 2007, and the first Android only in late 2008).
How about Googles Doubleclick (2016): 53% of mobile site visits are abandoned if pages take longer than 3 seconds to load [1].
Mobify (2016): For every 100ms decrease in homepage load speed, Mobify's customer base saw a 1.11% lift in session based conversion, amounting to an average annual revenue increase of $376,789. Similarly, for every 100ms decrease in checkout page load speed, Mobify's customers saw a 1.55% life in session based conversion, amounting to an average annual revenue increase of $526,147 [2].
Staples (2014): conversion increase per second [3].
Or the Obama campaign (2012): making the page 60% faster increased donations by 14% [4].
More new studies from recent years can be found at WPO stats [5].
> 53% of mobile site visits are abandoned if pages take longer than 3 seconds to load
Well. 100% of web page needs more than 3 seconds to load on my mobile connection.
We'll both agree that I ain't a statistically fair sample. I join @icebraining on that we need information that is relevant to the mobile web.
[2] is behind a paywall.
[3] is quoting all the common worthless old studies. There are some graphs and stuff but lacking test methodologies and information. Graphs are showing a few percent of gains, and latencies going from 1s to 10s, which is a lot less effect than other studies.
[4] is out of topic and gives no data. (x% of what? absolute latency? before? after? well, they didn't test for latency and impact on revenue)
Long story short: Google makes money on ads, only shown on the top of the page.
Showing 1 page with 30 results is only 1 batch of ads. Showing 3 pages with 10 results is 3 batch of ads (well, on the people who go past the first page).
Of course loading 30 results is slower than 30. Some articles quickly turned the study into "500ms more will destroy 20% of your revenue" but the Google study was never about that.
Oh the irony. The Medium page this blogpost was hosted on is ~3.5MB in size and took ~5 seconds (on my 20Mbps cable internet connection with 11ms ping to cloudfront) to first readable render.
It's interesting to me that the website "thinks.com" is entirely in German. I'm used to seeing sites on .com in English, and indeed I can't even think of another site right now that isn't.
Why didn't you use the .de ccTLD? Given that most German websites are not using .com domains, isn't this potentially confusing, and causing you to lose traffic? You don't even own thinks.de; it's blank.
You're mistaken. A lot of businesses in many countries use .com addresses even for their local shops.
For one othing, .com stands for "commercial". .us is for USA.
Regarding whether it's right to do so or not is another question. For starters, it depends on the regulations of each country, as in some of them you simply are not allowed to operate under a different TLD. But most countries are not this restrictive.
One of the many things that were supposed to "make sense" at the dawn of the web, and that were quickly abused and evolved into more than what anyone could possibly have thought of at first. (No, no, I'm not thinking of HTTP at all while writing this. Why would I?)
I'm used to seeing sites on .com in English, and indeed I can't even think of another site right now that isn't.
If you mostly frequent English sites, that's not surprising, but in reality many .com sites are not in English. Some reasons include restrictive ccTLD rules (for example, until 2012 you couldn't register a .pt domain, only .com.pt/.org.pt/etc), better recognition of the TLD by users and SEO.
.com domains are usually cheaper and usually much easier to register than ccTLD's.
In my country, the ccTLD costs more than double the .com equivalent, and requires manual intervention to set up. I have to wait for a guy (and it's just one guy who's been handling it for over a decade) to show up for work on Monday, go through his emails, finish his coffee, and then manually set up my DNS.
Given the above, .com's are always the first choice.
The JMeter load tests were conducted to test the throughput of the CDN and backend setup (browser caching was disabled). The load generating servers where hosted in Frankfurt (Germany) where the CDN (Fastly) also has an edge location. This lead to average response time of 5ms. We experienced some hickups within JMeter causing single requests to stall up to 5 seconds. For the 10 million requests we got an 99.9th percentile of below 10ms. We couldn't quite figure out why JMeter had these hiccups, maybe those were GC pauses, maybe something else, but it was consistent across different setups and servers.
One wonders about the irony of carefully avoiding bad stuff (like GC) when serving pages, and then using a test rig that has potential GC delays to test performance.
You have a point there. I found jMeter, however, really easy to use. I could simply let it monitor my browser (via a proxy) while i clicked through the website and the checkout process to record the requests of an average user. Then I configured the checkout process to only be executed in 20% of the cases to simulate the conversion rate. Even executing the test distributed over 20 servers wasn't that hard.
Which tools would you use to generate this amount of traffic?
In a previous life we started with ApacheBench, and then wrote our own async perl benchmarker because we wanted to generate steady hits-per-second instead of ApacheBench's N concurrent requests at a time. By now there's probably a tool out there that does what our custom tool did.
That looks pretty good -- not only does it have a "requests per second" generator, which I wanted, but it also presents 50/90/99/99.9 percentile results, which are a must. (Website latencies are not a normal distribution so it's inappropriate to compute standard deviation.)
I feel concerned by this new trend of using slow languages/stacks and the "we'll fix it later" mantra. I can't believe there are webpages such as reddit with such horribly awful generation times, do their tech guys sleep well at night?
This is not a new trend. This is a trend since the dawn of, well, as long as I've been programming, ~20 years now. Turns out that most of the time there are many more important things than your stack or language speed, and yes, often times "fix it later" is the right tradeoff.
On the other hand, there are things where "fix it later" just won't work. Especially when it comes to scalability, which has to be considered right from the start to get it right.
Yes and no. Facebook-scale technologies when you have 30 requests per minute is a horrible use of your time and money. The goal is to come up with the scale you need to fund your next scale growth. I’m working on a system that needs to handle a few thousand simultaneous users in bursts, and our growth rate looks like it might grow to a couple tens of thousand simultaneous users in bursts. We can horizontally scale for “hot” periods (like Black Friday weekend).
We’re working on designing our next iteration so we can handle tens to hundreds of thousands of constant users, and a couple of million in bursts—with the capability to horizontally scale or partition or some other mechanism to buy us other time when we need to look at the next scaling level (which will probably be some time away). It would be irresponsible of me to design for 10MM constant users now.
Considering scalability from the start does not just mean optimizing for millions of concurrent users, but choosing your software stack or your platform with scalability in mind. I get that it's important to take the next step and that premature optimization can stand in your way but there are easy to use technologies (like NoSQL, caching) und (cloud-) platforms with low overhead that let you scale with your customers and work wether you're big or small. This can be far supirior to fixing throughput and performance iteration to iteration.
I chose my software stack with scalability in mind: team scalability and rapid iteration (we started with Rails, displacing a Node.js implementation that was poorly designed and messily implemented). Because of that previous proof-of-concept implementation we needed to replace, we were forced into a design (multiple services with SSO) that complicated our development and deployment, but has given us the room to manoeuvre while we work out the next growth phase (which will be a combination of several technologies, including Elixir, Go, and more Rails).
One thing we didn’t choose up front, because it’s generally a false optimization (it looks like it will save you time, but in reality it hurts you unless you really know you need it), is anything NoSQL as a primary data store. We use Redis heavily, but only for ephemeral or reconstructable data.
The reality is, though, you have to know and learn the scalability that your system needs and you can only do that properly by growing it and not making the wrong assumptions up front, and not trying to take on more than you are ready for. (For me, my benchmark was a fairly small 100 req/sec, which was an order of magnitude or two larger than the Node.js system we replaced, and we doubled the benchmark on our first performance test. We also reimplemented everything we needed to reimplement in about six months, which was the other important thing. My team was awesome, and most of them had never used Rails before but were pleased with how much more it gave them.)
You're right, NoSQL systems tend to be more complex and especially failure scenarios are hard to comprehend. In most cases, however, this is due to being a distributed datastore where tradeoffs, administration and failure scenarios are simply much more complex. I think some NoSQL systems do an outstanding job to hide nasty details from their users.
If you compare using a distributed database to building sharding yourself for say your MySQL backed architecture, NoSQL will most certainly be the better choice.
> which has to be considered right from the start to get it right.
I totally disagree. Twitter did not consider getting scalability right from the start, nor did Amazon, nor did Uber. But when scalability of systems was key to scaling their businesses they found a way. Pre-optimization can kill companies, because running out of money kills businesses.
It’s all about the acceptable trade-offs. There are certain things that I am not willing to compromise on security; there are other things where I’m not as concerned. We currently don’t use HTTPS inside our firewall; once you’ve passed our SSL termination, we don’t use SSL again until outbound requests happen that require SSL.
Should we? Well, it depends. There are things that I’m concerned about which would recommend it to us, but it’s not part of our current threat model because there are more important problems to solve (within security as well as without).
I don't disagree. There are always engineering trade-offs. What I have issue with is sites that do not bother to even think about security. They operate under the false sense of security that no one will bother them.
I'm surprised you would mention Reddit in this context. It seems to me they're one of the lightest websites out there, which is not really surprising seeing they're mostly serving text. Reddit and HN are pretty much the only two websites which are useable on my Raspberry Pi 2s.
Want to make your website faster? Don't send so much stuff. Removing unnecessary code or content from your site has always had the biggest impact on performance. Unfortunately, it seems that often the people in charge of increasing performance aren't the people who get to choose what has to be on the website.
The Website Obesity Crisis: https://vimeo.com/147806338