Having never used a static site generator in anger, can someone explain to me li...

elviswolcott · on April 22, 2020

Gatsby is a fairly complex static site generator. At the highest level, it provides an ingest layer that can take any data sources (CMS, markdown, json, images, or anything that a plugin supports) and bring them into a single centralized GraphQL data source. Pages (which are built using React) can query this graph for the data they need to render. Gatsby then renders the React pages to static HTML and converts the queries to JSON (so there's no actual GraphQL in production).

This process is fairly fast on small/simple sites. Gatsby is overall very efficient and can render out thousands of pages drawing from large data sources rather quickly. The issue is that Gatsby isn't just used for personal blogs. As you can imagine, a site with thousands of pages of content that is processing thousands of images for optimization starts taking a long time to build (and a lot of resources). For example, I'm building a Gatsby site for a photographer than includes 16000+ photos totaling a few hundred GB. Without incremental builds, any change (e.g. fixing a typo) means every single page needs to be rebuilt.

Incremental builds means you don't have to rebuild everything. Because the data is all coming from the GraphQL (which Gatsby pre-processes and converts to static JSON), it is possible to diff the graphs (i.e. determine what data a commit has changed) and determine what pages it affects (i.e. which pages include queries that access that field). From there, Gatsby can only rebuild that changed pages.

This not only means faster build times, it also means that only the changed pages and assets have to be re-pushed to your CDN. This way, content that hasn't changed will remain cached and only modified pages will have to be sent down to your site's users.

earthboundkid · on April 23, 2020

But if you have 16,000 of anything, why are you using a static site? Surely the access patterns are long tail and you need to build more often than most pages are even accessed.

cameronbrown · on April 23, 2020

Cheaper to build a site and dunp files to a bucket than running Wordpress code on every request.

flukus · on April 23, 2020

Surely there's a CPU/disk trade off at some point. Static pages are much larger (less likely in memory) and would cause disk reads much sooner than the same files being generated dynamically. Of course wordpress isn't known for it's efficiency so the static page preference is probably quite high.

grogenaut · on April 23, 2020

there is a big difference in the cost of static hosting and CDN (think Cloudfront / S3) for static stuff and running an active piece of hardware for static stuff that doesn't change. Like orders of magnitude. Sure for small sites it's not that much but it's still orders of magnitude.

also the answer to a large number of my interview questions ends up being figuring out how you can just effeciently serve the stuff from a CDN/Blob Storage. You can scale the crap out of this for quite cheap.

tomnipotent · on April 23, 2020

If the final result is 500k of HTML, a dynamic website is doing a LOT more work to return that 500k than a static website. Assuming you get more traffic than you have pages/generated.

robertlagrant · on April 23, 2020

> Surely there's a CPU/disk trade off at some point

At an extreme case, yes. Disk is SO CHEAP.

flukus · on April 23, 2020

I was thinking more about the time cost. Disk is cheap but slow compared to memory.

elviswolcott · on April 23, 2020

Admittedly in this case I'm mostly just trying to push Gatsby to it's limits. For a photography site, there ends up being very little overhead with a static site (if you can do incremental builds). I also explored NextJS (SSR) and just making a good old SPA, but decided to go with Gatsby because at the end of the day, a distinct majority of the storage cost is just the raw images. I think Gatsby ends up making the most sense because you get to take advantage of a CDn for caching (most don't like being used just as an asset cache) and I can just leave it there without worrying about a server.

makethetick · on April 22, 2020

Hi, have you thought about hosting for this yet? I've got a similar site which I originally tried on AWS Amplify but it got too big for the artifact size limit so I opted for S3/Cloudflare instead however build times are slow and more of a manual process currently.

elviswolcott · on April 23, 2020

I’m yet to figure that out! I’m procrastinating on that but until I have everything else figured out (it’s for a family member so no strict timeline). I’m thinking in the end the setup will be something with a CMS for editing the photo metadata, a file storage system for the images, and everything else in git. The build would pull from all 3, run and then the processed images will be pulled out and hosted on their own. I’m planning that it’ll just run and take a long time on a droplet unless I figure something better out.

baldeagle · on April 22, 2020

netlify is the de facto build/host for gatsby

manigandham · on April 22, 2020

Server-side rendering (like Wordpress) generates HTML in response to a URL. Static site generators just visit every possible URL at build time and save the final HTML as files. This makes it easy to deploy and scale when your site is static and doesn't need any features of dynamic server-side rendering.

Gatsby (and other frameworks) automate this process by going through whatever data sources you have (directory of markdown files, databases, etc) and producing the HTML. Gatsby uses React for the templating logic and any client-side interactivity on the pages. Build times scale with the size of your content and number of pages to generate so that's the reason for the cloud.

Overall, static sites are in the hype phase of the software cycle. Most sites are just fine using Wordpress or some other CMS and putting a CDN in front to cache every pageview. Removing that server completely is nice but most static sites end up using some hosted CMS anyway and at that point you just replaced one component for another. There's also advantages to completely separating the frontend code from the backend system for fancy designs or large teams.

earthboundkid · on April 23, 2020

Wordpress in particular is not “just fine”. During this crisis, every government site based on WordPress ends up crashing under the load. Yes, you can avoid this with a good cache plugin and a CDN. But you can also avoid it by using a tool that is designed to not crash under load in he first place.

manigandham · on April 23, 2020

Serving up a HTML file from disk or generating some dynamic HTML are both trivial to do once. After that, a CDN layer is used to cache the HTTP response, regardless of how it was generated.

Sure if you just want to serve HTML files on every request with a simple file server than static files will be faster, but it'll eventually get overloaded too. The CDN is where the real scaling happens. And using a CDN is far easier than changing the entire backend to a static site.

Also most of the sites that crashed were dynamic applications, not just static pages. Using a static site generator wouldn't solve that problem.

earthboundkid · on April 23, 2020

I am specifically complaining about WordPress. Anyone could make a server side dynamic application that applies proper caching headers to work with a CDN. Anyone could, but WordPress specifically did not. It's an uncacheable mess by default and caching plugins just barely make it useable.

Using a static site generator is basically just a way of ensuring that the pages are properly cached by the brute fact that they've been dumped out on disk. It's not strictly necessary for a well designed system, but it raises the floor because even in the worst case, the pages are static files.

For many public facing sites, dynamic applications aren't strictly necessary. If you're just hosting a PDF of an Excel spreadsheet of permitted job categories, you don't need a dynamic application. Again, a well designed app would already be hosting this through S3, but you can't trust things to be well designed when made by a contractor with no technical oversight.

manigandham · on April 23, 2020

Static files don't automatically set any headers either, you still need a webserver to serve those. And you can override those headers in the server or in the CDN so there's no reason to switch out the entire backend for it.

CDNs handle scaling of static assets. That's their entire purpose, with features like request coalescing and origin shielding to help ensure unique URLs are never requested more than once. Optimizing for static files at the origin is just not worth the trouble when Wordpress and other frameworks are far more productive and provide CMS functionality which is usually needed anyway.

earthboundkid · on April 23, 2020

We're talking in circles. No one disputes that CDNs are good and expert users of WordPress are capable of making it not shit the bed. The point is that WP cannot be left unmanaged by novices, which means it should not be used in many situations in which it is currently used. Static sites have a higher floor and so are better suited to non-expert use.

manigandham · on April 24, 2020

My point is that tradeoff isn't worth it. It's far easier to tweak security settings and configure a CDN than to completely change the backend with a complex build process requiring more technical knowledge, deploy it to a host which you still need, configure a CDN which you still need, and wire it up to read from a CMS which you still need.

hobs · on April 23, 2020

And then you can regenerate all your files every time you make a change - the trade off isnt worth it for one off sites that generally sit at zero traffic.

Cheaping out on things that need actual support wont be fixed by making it a static html page.

denster · on April 22, 2020

I'll take a stab at this, Kyle just shoot me if I get something wrong below :D

1) There's a server-centric approach and a client-centric approach:

--a) hand-maintained HTML + php falls into the first camp

--b) React (/Angular/Vue) fall into the second

2) If you go with the second camp (b), you end up having a higher initial page load time (due to pulling in the whole "single page app" experience), but a great time transitioning to "other pages" (really just showing different DIVs in the DOM)

3) Gatsby does some very clever things under the hood, to make it so that you get all the benefits of the second camp, without virtually any downsides.

4) There are of course all kinds of clever code-splitting, routing & pre-loading things Gatsby does, but I hope I got the general gist right.

If not, Kyle, get the nerf gun out! -- how would you describe the Gatsby (& static sitegen) benefits? :)

mnutt · on April 22, 2020

Gatsby can also let you use react components to do some pretty clever things around image resizing, effects, etc that you might expect from a static site generator but couldn’t achieve with just a frontend framework.

Touche · on April 22, 2020

(3) is incorrect, Gatsby initial page load times are mostly really bad.

(2) is both overstated and overvalued. It's overstated because loading a static HTML page from a CDN is extremely fast. Too many people who point at this advantage for SPAs are thinking back to pre-CDN usage with slow origin servers. Of course there are still use-cases where going to network is not wanted, but these aren't the primary use-cases that Gatsby covers.

It's also overvalued in that most users are not getting to a page by navigating in a loaded site, they are coming from a social or search link (again, for the sort of use-cases that Gatsby pages are built for).

tomnipotent · on April 22, 2020

> (3) is incorrect, Gatsby initial page load times are mostly really bad.

This has not been my experience, considering all HTML is ready to go from last byte so that other than blocking CSS, rendering can begin ASAP. At this point, no JS is required to interact with the page so things are generally pretty snappy while we wait for React hydrate to kick in.

Touche · on April 22, 2020

Pick a few random sites from the Gatsby showcase on their homepage and run them through webpagetest.org Simple Testing.

tomnipotent · on April 22, 2020

shopflamingo.com, ideo.com, ca.braun.com, and bejamas.io are all blazingly fast for me.

davidwparker · on April 23, 2020

Ideo has a pretty bad insight score (41): https://developers.google.com/speed/pagespeed/insights/?url=...

First paint: 4.1 Seconds. Time to interactive: 11.5 Seconds.

I wouldn't say that's very fast.

Edit: I didn't check the other three.

davedx · on April 23, 2020

shopflamingo.com gets 47

ca.braun.com gets 77

bejamas.io gets 96

So implementation dependent I guess

Touche · on April 22, 2020

You mean your computer or on webpagetest.org?

denster · on April 22, 2020

So, for (2) above, not sure I understand:

camp 1) at best, a TCP connection is re-used, and the HTML for "page 2" is fetched over the network, parsed, the CSS OM is applied, and then the whole caboodle* is "painted on screen".

camp 2) the CSS OM is applied and "page 2" is painted on-screen (possibly even faster if the browser cached "page 2" in a texture on the GPU, so the CSS OM application step may be optimized away)

So I genuinely don't understand how fetching a "page 2" from a CDN

(we use Cloudfront & GCP's CDN at https://mintdata.com, so I'm basing my experience on this)

is faster than the SPA approach?

I am genuinely curious on the above -- not trying to start a flame war :D

* Yes, apparently caboodle is a word?! I had to Google it just like you to make sure :)

Touche · on April 22, 2020

It's not faster than the SPA approach. It's just not very much slower. It used to be much slower below using CDNs was common.

freedomben · on April 22, 2020

I've only done simple stuff with Gatsby, but it fully supports generating static HTML from dynamic data sources. The difference between that and traditional JS frameworks is the generation for Gatsby happens at build time instead of runtime.

I love it because I vastly prefer serving static assets to server-side rendering because of the numerous simplicities it provides (aggressive caching, predictable latency, etc). In most cases you get to have the cake of complex sites generated from template and eat the cake of static asset serving.

_630w · on April 22, 2020

It doesn't have to be markdown files. Gatsby supports a wide range of data sources which is available to use in your templates via graphql. If your website is big and gets frequently updated with data from the backend triggering the builds, new content on the site can take few minutes to appear as the generator will need to build a static site (html/css/js files) which I assume is a problem for big publication sites.

tvanantwerp · on April 22, 2020

For a hundred markdown files, no big deal. But if your site has tens of thousands of pages, those build times become a real pain point. Why should every single page rebuild if you only changed one of them?

deltron3030 · on April 22, 2020

> In other words, what problem am I supposed to be having that any of this solves?

It saves time, especially for larger pages, because instead of rebuilding the entire site with all its pages, you just rebuild those that change.

wnoise · on April 22, 2020

So does a Makefile.

AgentME · on April 23, 2020

A makefile works great when you have a pile of source files and you want to make a parallel pile of output files, and each source file is compiled individually. It's not so great when you have a compilation process where you have a folder of entrypoint source files that each need their own output artifacts produced but they happen to share many dependencies, and you want to automatically create common output chunks if there's enough overlap between them, etc. I'm sure you could find a way to involve Make by automatically generating makefiles, but at that point, Make is only handling the really easy part and isn't worth it.

Think of how many makefiles just end with one big linker call. Most web toolchains (which crunch a lot of source files into a few artifacts) have more in common with that linker call than the rest of the stuff that happens in a makefile. You have to have a system that's more integrated with what's being built to make that step meaningfully incremental.

WnZ39p0Dgydaz1 · on April 22, 2020

Instead of a Markdown file, imagine your data is somewhere in a REST API, or many REST APIs. Gatsby (and next.js, which I vastly prefer) will query these APIs during the BUILD process to generate your static sites - and that can be slow. Imagine you have a site that list the top 1000 IMDB movies with details. To generate your static site, you need to make 1,000 REST calls to the IMDB API during build time to get the necessary data. Parallelizing and caching it makes it faster.

If it were just Markdown files you probably wouldn't need this since parsing and transforming local Markdown files it fast. But this is Javascript, so nothing is truly fast.

akiselev · on April 22, 2020

Static site generators output static HTML files instead of running a server that renders every request, it doesn't say anything about what language they consume. Gatsby is built around Javascript and React, not Markdown.

nogabebop23 · on April 22, 2020

>> Static site generators output static HTML files

This is not really true; they often generate a static client-side web application vs. a dynamic first-time (or every time) app based on server-side processing. This provides a highly optimized, largely self-contained application that avoids a lot of the runtime dependencies and complexity we typically get (ex: web servers and databases). They are still highly dynamic through the use of APIs and such.

Gatsby has an extensive build pipeline and can query almost any data source during the build, but the original base source is markdown, and react is the Javascript.

jtdev · on April 22, 2020

None of the answers/comments below even come close to answering the simple question that started this thread. This looks like an overly complex solution looking for a problem to me.

yazboo · on April 22, 2020

True. The article has a decent answer though. Incremental builds are necessary because:

> [Slow builds] can be annoying if your site has 1,000 pages and one content editor. But if you have say 100,000 pages and a dozen content contributors constantly triggering new builds it becomes just straight up impossible.

Gatsby needs a cloud to host this build server. They also apparently host a nice content editing UI.

If you don't need a content editing UI, and/or are fine maintaining your own static builds, you presumably wouldn't subscribe to the cloud service.

kall · on April 22, 2020

They don't host a content editing UI, only a "dynamic" version of the site that you can embed or link in a CMS for draft previews etc.

I use and like gatsby a lot and don't think it is generally overcomplicated for what it does. They are really pushing static at all costs though, and these cloud solutions are needed because of that. When seriously evaluting a 100.000 pages / dozens of editors project, if you ask what the benefits and the costs of static really are, I think you might come up with a different answer than Gatsby Inc. I think Zeit+Next actually has a better story there, because its not "static at all costs".

kylemathews · on April 22, 2020

yup, totally on us to prove the static model can scale!

dustingetz · on April 22, 2020

Imagine MSDN as a static site

orta · on April 22, 2020

That is easy to do! https://docs.microsoft.com which is, to my knowledge, the MSDN replacement is built as a static site hybrid.

Here's a talk form one of the creators: https://www.youtube.com/watch?v=EpYYe6aQjJM