I started my blog about python/django (https://simpleisbetterthancomplex.com) using Jekyll and hosting on Github Pages and it was pretty good to get started because back then I wasn't sure if I would keep it up or not.
After a year or so I migrated to a 5 USD droplet on Digital Ocean (back then GH Pages didn't offer https for custom domains) and integrated with Github webhooks to automate the deployment when pushing new markdown to the main branch.
Over the time it indeed started to degrade. The build time takes almost a minute, but after building the website is just a bunch of static html pages.
Nowadays it is annoying to write new posts because I like to write locally and refresh to browser to check if it's looking good or not. So I would say it degraded for me but for the reader it's still as fast as it was when there was just a couple of posts.
I thought about migrating the blog to something else, but because I used some custom markdown extensions for code highlight and other things, it would be painful to migrate all the blog posts. So I've been postponing it since 2019.
Something similar happened to me when I was using static site generators. In fact one that I was really enjoying even switched programming languages between 1.x and 2.0.
Since that time I look for the people and community behind the project, and try to find signs of stability and long-term care. After that I look at open formats rather than open and flexible architecture-chains. For example, I'd rather use my LibreOffice HTML template and simple PHP controller on a more monolithic (but open) platform than connect a bunch of technologies together to create a build process with a bunch of moving, quickly developing, interdependent parts.
Not sure it's the best answer, but it has worked better to use more monolithic software, even blogging software that's been in steady, if slow development since the early 2000s...
It's a simple injection controller for adding links and markup. Very rough. It doesn't do much about the existing HTML, which I felt was mostly workable, or workaround-able. :-) You can see some output here:
If you use "bundle exec jekyll serve" you shouldn't have too much problems locally as it just rebuilds the pages that change on every save. A minute to deploy the finished version is not terrible by any stretch for a blog IMO.
I wrote my own blogging software, and went through a similar journey. Initially I wrote a simple Perl script which would read all the posts I'd written (as markdown), insert them into a temporary SQLite database, and then using the database I'd generate output.
Having the SQLite database made cooking up views really trivial (so I wrote a plugin to generate /archive, another to write /tags, along with /tags/foo, /tags/bar, etc). But the process was very inefficient.
Towards the end of its life it was taking me 30+ seconds to rebuild from an empty starting point. I dropped the database, rewrote the generator in golang, and made it process as many things in parallel as possible. Now I get my blog rebuilt in a second, or less.
I guess these days most blogs have a standard set of pages, a date-based archive, a tag-cloud, and per-tag indexes, along with RSS feeds for them all. I was over-engineering making it possible to use SQL to make random views across all the posts, but it was still a fun learning experience!
Don’t know about Jekyll but with Hugo you just run hugo server in the Git repo and it will give a a live preview that you can view in the browser, served locally.
It’s very fast.
I have a mockup blog with ~90 pages and it takes 190 ms to generate the whole site.
Jekyll would probably handle that in a few seconds.
But, once you approach two to four or five hundreds, it can quickly add up to a minute or two, making it impractical to say the least.
One solution is to run `jekyll build`, which will just build HTML to a directory, and then just removing old Markdowns and serving those generated HTMLs directly via nginx or something.
I've honestly given up and switched to Ghost, where I don't have to worry about that sort of stuff.
There was a company on here the other day talking about their product, built on top of Docker. I wish I'd bookmarked it.
Their secret sauce is, effectively, partial evaluation in Docker images. They run the code to detect if any of the changes in a layer have side effects that require that layer to be rebuilt (which invariably causes every layer after to be rebuilt)
I mention this because if I'm editing a single page, I would like to be able to test that edit in log(n) time worst case. I can justify that desire. If I'm editing a cross-cutting concern, I'm altering the very fabric of the site and now nlogn seems unavoidable. Also less problematic because hopefully I've learned what works and what doesn't before the cost of failure gets too large. It would be good if these publishing tools had a cause-and-effect map of my code that can avoid boiling the ocean every time.
Correct me if I'm wrong, but don't you have to formally map out all of your inputs and outputs for partial recompilation to work?
I believe the last time I touched make was to fix an exceedingly badly mapped out chain of cause and effect that would consistently compile too much and yet occasionally miss the one thing you actually needed.
Partial evaluation works out the dependencies by evaluating all of the conditional logic and seeing which inputs interact with which outputs.
> Correct me if I'm wrong, but don't you have to formally map out all of your inputs and outputs for partial recompilation to work?
Make needs that information, but that doesn't mean you need to be the one to catalog it. I found this article pretty useful [1]; in a nutshell, many compilation tools (including gcc, clang, and erlc) are happy to write a file that lists the compile time resources they used, and you can use that file to tell Make what the dependencies are. It's a bit meta, so it made my brain hurt a bit, but it can work pretty well, and you can use it with templated rules to really slim down your Makefile.
(I've tested this with GNU Make; don't know if it's workable with BSD Make)
yes because of a bad design decision I made to create a custom "related posts", which is a code that searches all my markdowns to pick the best pages and add to the current page. so even with `--incremental` it take quite some time to regenerate
to be honest jekyll still serves the purpose really well (especially for the reader)
the site is still sitting on an inexpensive entry-level cloud server, it's relatively fast, serves 300k+ page views every month and it give me some passive income
i'm pretty sure that if one uses the framework properly it can still hold on very well
i think what left me with a mixed feeling about static site generators is that it is constantly holding you back whenever you think about growing your website. i ended up building a django api that runs on the same vpc as the blog itself, and i used it to expand some features, like reading from google analytics the total page views for a given blog post, or consuming the disqus api to list the latest comments on the home page. this kind of thing.
managing the posts in static files is quite challenging as the number of posts grows. at some point you will want to change some info, or add a certain metadata and you will need to write a script to walk the _posts dir and edit the files (maybe there's a better way of doing that :P)
I wonder if you could save some time with some caching?
Find all posts related to "django", select the top 3, pin them as the related posts for all django entries? You'd trade out some uniqueness and slightly increased memory usage to gain speed.
(I just got started blogging and found https://github.com/sunainapai/makesite python static site generator worked well for me. Obviously you're probably a bit far in to switch horses by now.)
20 years ago that was my first useful custom CGI script, it took the metadata of the current page and ran a grep for pages with similar tags. Running that was really fast and flexible until I published it and got my first ten simultaneous visitors.
Then I learned about how hard it is to use a cache.
I made goode experiences with Lektor (a static site generator). If you run it locally using "lektor server" you see your page plus admin backend. Once you click (or type) deploy the page gets statically built and copied somewhere via rsync.
You can still have the sources in a git for quick rollbacks.
What I like about lektor over most CMS solutions is that it is more easily adjustable. Basically Jinja2 and python held together with glue
Some static generators like eleventy and Gatsby offer partial builds, only building the new posts etc. which should be considerably faster. Another thing would be to run an empty version of your site with only the text you're working on and when it's done, moving it to the proper site and building it then
Thanks for your Blog! It's a great resource - I love the generic form template and signal blog post.
It is really great and I recommend it to anyone who uses django
There is a problem in the CMS industry. Everyone wants to build a headless CMS to power Jamstack sites—only the content itself lives in databases and the CMS is usually a proprietary SaaS. Doesn't leverage Git for content in any way.
This is deeply problematic. We should have dozens, if not hundreds of contenders for "the next WordPress" that leverage Git as a foundational aspect of content management. Instead we have a bunch of Contentful clones (no disrespect to Contentful) with REST and/or GraphQL APIs.
It's bananas that if you search for Git-based CMSes you have NetlifyCMS, and…wait what? Is that all??? Forestry gets mentioned a lot because it's Git-based but that's also a proprietary SaaS. I just don't understand it. Is this a VC problem or a real blind spot for CMS entrepreneurs?
> We should have dozens, if not hundreds of contenders for "the next WordPress" that leverage Git as a foundational aspect of content management.
If you want a new CMS, give it a shot. But nobody's made a better free version of Jenkins either. It's hard to do and completely unrewarding/unmonetizeable (as FOSS), which is probably why nobody has done it.
However, insisting it leverage a difficult source code version control system is artificially restricting. Start with an MVP of flat files for content and add a plugin system. Somebody will write Git support, but they'll also add other content backends. I'll bet you a million dollars that a custom SQL plugin that does version control will be preferred over Git by 99% of users. But they may wise up and use an S3 plugin instead. Or maybe all three. They'll have choices, and your project will become more useful.
Don't start your project with an artificial restriction and it will be better for it.
One explanation comes from just looking from the content creator's point of view: they kinda don't care what previous iterations of the work really look like (draft-final-v5-oct27-FINAL.doc anyone?)
The things a content creator might be more interested in aren't really core git strengths: SEO, publishing schedules, editorial back-and-forth, social media, etc.
> if you search for Git-based CMSes you have NetlifyCMS, and…wait what? Is that all???
There are some more. I know of Grav which I intend to use migrating from NetlifyCMS. It has flat-file storage and Git connectivity. I think it can be installed on my shared hosting provider by just unpacking the thing (I don't have Composer / shell access).
I’m under the impression there is still plenty more you can do with a database backed CMS then you can with markdown files. For instance difficulty marking pages as related to one another. You could use the slug, but if you later change the slug, you must track down all of the pages that relate to it and update the slug there too. Aka no normalisation.
I agree, I had this debate recently for what to use on a static site and we went with a Git-based CMS (https://content.nuxtjs.org/) over Strapi. I use nuxt/content for my personal site as well and really enjoy working with it.
Maybe FrontAid CMS might be interesting to you, even though it is also proprietary. FrontAid CMS is similar to Forestry, but stores all your content in JSON instead of Markdown. It is therefore better applicable to (web) applications in addition to simple blogs. https://frontaid.io/
For all the (deserved) complaints people have about wordpress, it's been around since before git or markdown was even a thing. For all its faults, its been pretty resilient to time.
The real reason most websites disappear is a much more human one.
Yes, if you constantly update your installation, which is the exact opposite of what this blog post is about. Otherwise you run a pretty good risk of your installation being hijacked, overtaken by spam bots or similar after a few years.
Yep. I run a few WordPress sites and the amount of maintenance with them is crazy. The plugin model makes maintenance brutal too since it's all independent devs with different release cycles, testing practices, etc.
Following the philosophy of simple hosting + MD outlined in the OP then the WP equivalent would be something like a simple (official twenty*, or your own custom) WP theme with zero (or minimal amount of) plugins. Enable auto-updates on WP and you're done.
The more plugins you use, the faster entropy kicks in without maintenance.
Agreed, if you stick with the twenty* themes and keep it simple, you can get pretty far with very little maintenance.
For simple sites, especially where we don't do comments, I've started using Simply Static though to just generate pre-rendered pages. All the benefits of static site builder with all the conveniences of a CMS.
I've had it on my TODO-list to try out that plugin, thanks for reminding me!
How many sites have you tested it on, have you had any issues? I'm gonna play around with it now but what I'm mostly concerned about now is how you manage the site if you point the web server to the static directory.
I made a prototype at one point where you have your actual WP site in a subdirectory, like example.com/wordpress, which is IP restricted (or behind http auth). Then you crawl that site with wget (or curl) to generate static HTML and to finish it off by search/replacing the HTML files (removing the /wordpress portion). Then you serve the static files to general users.
If you happen to ever read this, I ended up with a setup similar to the hacky one I did before:
- WP moved to subdirectory (WP home and siteurl changed in DB)
- Static site served normally from (and generated to) webroot
Just needed to add sitemaps and robots.txt to the config to be included. Workflow is super smooth. Login to example.com/wordpress/wp-admin/, view your changes at example.com/wordpress and when you're happy, publish the site through the plugin.
Curious to see what kinds of issues I might be running into later on (this was just a quick test).
Same story with Jenkins, the plugin ecosystem is just ridiculous. If Linux distros did the same thing with packages, Linux would be broken literally all the time.
WordPress core yes, and it's pretty safe, but if you use many plugins (which most people do) that gets a lot riskier. Premium plugs are pretty safe but still I've had stuff break.
The kind of person whose needs would be satisfied by a static site generator doesn't seem likely to need 20 plugins, so it doesn't sound like an apples-to-apples comparison.
I don't base on number of plugins, I base on complexity personally.
I mostly agree with you, but in my experience most (not all) people who are capable of using a static site generator will do so. Those who aren't comfortable with it use WordPress. I'm not slamming WordPress, just making an observation.
> The real reason most websites disappear is a much more human one.
So true.
I have had a blog running since 2006, all was php based back then with (html) posts inside a database. The tooling shouldn't be a problem, I switched systems several times (once every few years).
Currently on Hugo with markdown posts. As long as you treat your migration carefully and take some time to migrate, every tool should suffice. It's mostly about human effort and human error when things get lost.
I wish people would stop using github as a swiss knife CDN (free hosting, hooks, history through git repo, etc) and build higher altitude solution that could leverage git/MD/whatever but with free/opensource/selfhosted/alternative tools (like gitea instead of github and minio instead of S3 for instance).
> build higher altitude solution that could leverage git/MD/whatever but with free/opensource/selfhosted/alternative tools
Not that I disagree with this, but that goes almost directly against "Mak[ing] that source public via GitHub or another favorite long-lived, erosion-resistant host. Git’s portable, so copy or move repositories as you go."
Any self-hosted service or solution is going to not be erosion-resistant by virtue of not being for-profit.
> Any self-hosted service or solution is going to not be erosion-resistant by virtue of not being for-profit.
And for-profit services are? Often enough it's the VCs which ruin a service by pressing out money before it inevitably dies because the experience has been degraded to an extreme degree.
Setting aside how weary I am of "M$" (it's just about tied with using "sheep" to describe Apple users), what on God's green earth does Microsoft have to do with the OP's point? GitHub itself is not and was never open source. It's not like Microsoft bought it and closed off the source. Do you think using Microsoft tools demonically injects anti-FSF sentiment into your work? ("I started using VS Code, and now whenever I try to work on GPL 3 software, my keyboard becomes too hot to touch until I close the window!")
Personally, I have a different take than the OP, although I'm not sure this is really a disagreement. I would argue open data formats are more important in this context. If I write text in Markdown, it doesn't matter whether I'm using Emacs or a closed-source, proprietary commercial editor; the text isn't tied in any meaningful way to the editor. Likewise, a Git repository hosted on GitHub isn't tied in any meaningful way to GitHub. It's... a git repo.
As long as GitHub isn't doing anything that I particularly object to -- and "oh woe, it is owned by Microsoft" is not an objection I share -- there's a pretty good case for continuing to use it. I'm taking a calculated risk that it's both unlikely to go away anytime soon and to markedly change business direction in an unfriendly way, but if it does? As long as my local copy is up to date, I can re-publish it anywhere and just, well, stop using GitHub.
Not really, "free/opensource/selfhosted/alternative" [0] as in "here this minio bucket that behaves like S3 but you can play with it without an amazon bill of death that may or may not be coming".
[0] Also, thanksnotreally for cherry-picking and putting words in my mouth with your use of `!M$`, screw you too.
IMO the biggest barrier to blogging (and the cause of most blogs dying) is inconvenience, and minimizing that if the biggest advantage of Markdown + Git. If there's any inconvenience at all, it naturally drags on the process of writing, and writing takes enough time and focus that if there's any friction it's too easy to push things off to the next day.
My co-author and I use Markdown and Git as the author suggests, and one of the best things is that between simple CI/CD pipelines and effortless scaling of a static site, we don't need to do any technical work so there's no friction on my lifestyle. We've been writing for almost a year now, 4+ posts a month, and 99% of that "work" has just been writing.
On the inconvenience front, I think that also makes it clear why so much stuff that would have been blogs, say prior to 2013 to pick a pseudo-random date [1], is that the convenience of various walled gardens got very convenient. It's really easy to post an update to the walled garden social media site of your preference (Twitter, Facebook, TikTok, Tumblr, whatever), and with network effects really conveniently easy to have some sense of readership (even if it just Likes or Faves or whatever).
There are some blogs that I realize will never "come back" so long as "everyone is on Twitter these days". Because Twitter is still so much more convenient that blogs (even ones in Markdown + Git).
[1] Okay, not actually random, it was the Google Reader shutdown year. Google Reader provided a lot of convenience to RSS, including social media-like network effects, that almost brought blogs mainstream.
Any suggestions for static site generators / where to host? I'm thinking about starting a new blog with Hugo / Digital Ocean.
I have a popular Spark blog on Wordpress (https://mungingdata.com/) and can relate to your sentiment that any inconvenience can hold up the writing process. Your post is motivating me to streamline my publishing process.
Hugo is great, and they have a section with tutorials on how to host on a bunch of different platforms, so pick the one that you're most comfortable with or seems best/cheapest:
Gitlab and Github both have static site hosting options. Netlify works great as well.
We've gotten it to the point where we just push new pages via git and it all works. Anything more and you'll find yourself procrastinating ever so little at a time...
The longest living websites are forgotten accidents. It's always some ancient webserver using maybe a single RAID-1 array of two tiny spinning disks, running forgotten software that is never ever updated. The uptime on those boxes are not uncommonly measured in decades. Somebody's credit card just keeps gets charged $40 a year ($10 for the domain and $30 for the website hosting), and the machine never gets touched.
Typically it sits in the back of a dusty rack for a website hosting vendor (or, rarely, a colo provider) and the gear has long since paid for itself, but also is unmaintainable due to not even having the remotest semblance of a service warranty or service parts (other than what somebody might have ordered as spares years before). If it ever loses power, everything starts back up on boot time and it keeps on chugging, defying the usual laws of computer entropy.
The only middle-way sort of solution I found a long time ago, even before joining the company, while looking where to host https://jdsalaro.com was to use GitLab, they being open-source and quite public about their, now our, values.
Nowadays I'm planning on writing again and have tried a setup that seems to be alright:
- migrate from Pelican to Hugo since the community is more vibrant, responsive and the themes much more mature and well maintained.
- Use a mature, well-maintained documentation-oriented template as base (I based https://jdsalaro.com off Pelican's bootstrap3)
- Create an Obsidian vault out of the posts/ directory to improve note-taking and editing capabilities.
- Host same as before on GitLab
- Deploy via CI/CD copying the posts/ directory to a public repository
- Probably mirroring to GitHub
I'll be able to report back once I've gotten the above fleshed-out :)
How does having the "BSD Zero Clause" with your content work? I open sourced my blog's software, but I still keep all the actual content in a separate private repo, since I don't want my written content distributed under the same license. Also works out well, since I have the github action automation in the private content repo.
I can still easily change the private flag down the road if I were to decommission it (though, unlikely -- I'd just archive it somewhere).
I have a similar setup for my blog and I'm using Netlify for both CI and hosting (free). It's been great! GitHub Actions could probably work well for you if you already have decent hosting.
Any particular guide or set of docs you recommend for that? I've heard a lot of good things about Netlify as far as hosting, and I've been wanting to set up a blog; seems like as good a time as ever to start learning.
Netlify seems to lack basic testing of their client. For example, I found that if you have a repo not hosted on GitHub, trying to publish it will cause the client to dump a stack trace and crash. That, the general whizbang design of the UI, and the state of the docs left me with a poor impression about the quality of Netlify's offerings. "Good" for a free service, maybe, but all their attempts at fancy integrations makes the whole thing less attractive, since it actually creates hassle (for a person needing to dodge it all just to make the most basic, vanilla use out of it).
I've evaluated Netlify for my blog at https://jdsalaro.com and I'd advise against that unless you're missing a CMS on your static site generator. Other than that I personally felt it didn't bring much to the table. I do admit that their CMS is slick, but it wasn't really a feature I needed.
Of course I'm not sure what other people are using them for, but in my case GitLab/GitHub covered all my bases pretty well since I wasn't looking for a pelican/hugo CMS and you still host somewhere else (not Netlify).
Github's main business is not Github Pages or be a CDN.
Netlify's main business is.
Netlify also provides deployment previews for commits and PRs, and a complete CI/CD pipeline that you would have to implement yourself with Github Actions or something else.
The CMS isn't what I need either, but there is so much more than just hosting.
I used to have my blog source on GitHub, but then it turned out I didn't want my half-finished works-in-progress public. To use a private repository would rather defeat the point; using a private repo and a public fork is inviting confusion. Now I just use a private repo on my own server, cloned to my dev machine. Does anyone have a usable solution for that problem?
I have a branch and remote called "publish", and a branch and remote called "draft". I suppose if I ever accidentally pushed the draft branch to the publish remote, I'd just delete the branch from the remote.
I knew someone would say that. It's not about imperfections per se; more that I sometimes write stuff that I then reconsider and delete before it ever sees print. Admittedly nowadays I mostly self-censor that stuff before even starting to write it, but a few years ago I wrote a lot more stuff that, after reflection, I considered it would be unwise to go ahead and publish.
I've been thinking about this a lot. I created Ponder so that I could quickly create drafts in my browser. Then I planned to move them to my blog (Markdown on GitHub) when I'm done.
I need more than 10 drafts (the current limit in Ponder), but I feel like I'm getting close to a solution that works better for me. I also want to be able to edit across machines (Mac, Chromebook, Phone), so I need to make the app work online.
Using git submodules might work, but it's not very convinient in my experience. I'm now thinking about having a GitHub Action in my private repo that pushes all public posts into a separate public repository automatically
I personally don't move anything into git until it's close to ready. WIP writing is kept on Bear (Markdown editor). But there are tons of other apps, including ones that sync to a cloud.
I only keep the output on github. So the generator folder with my drafts, posts etc is on my local machine, and whatever output it generates is in its own folder that is a git repo.
This reminds me once more of a piece of the puzzle which we're still missing. Markdown + git is great, but leaving your blog up on Github is just another central point of failure; Github feels like a fact of nature right now, but it's just a website at end of day.
Most broadly, it's called content-centric networking. Bittorrent is a piece of that puzzle, but too static, with no obvious way to connect disparate hashes together into a single entity. IPFS and Secure Scuttlebutt are groping in the right direction.
There was a project called gittorrent which, as you might guess, was trying to be 'bittorrent for git'. It never really went anywhere, the crew at https://radicle.xyz are looking to revive it and I wish them the best of luck.
What I want is a single handle, such that, if there are still copies of the data I'm looking for, out there on the network, I can retrieve them with that handle. And also, any forks, extensions, and so on, of that root data, with some tools to try and reconcile them, even though that may not always be possible.
That would be really powerful. It would made information more durable and resilient, and has the potential to change the way we interact with it. Like I find typos in documents sometimes, it would be nice to be able to generate a patch, sign it so it has a provenance, and release it out into the world.
When I browse old blogs which still exist, I routinely hit links to other blogs, video, and the like, which are just gone. Sometimes Wayback Machine helps, often it doesn't. This problem can't be fixed completely, when data is gone, it's gone, but we could do a lot more to mitigate it than what we're doing now.
A general solution for Wayback-style long-term distribution and archiving has been spec'd out (as part of the Memento Project), but nobody seems to be adopting it.
I currently think that the best option for a personal website / blog is a statically generated site. It’s just the most robust way to build things for the long haul. Minimal maintenance, and easy to move all your files to a new hosting provider. Having static html files is very robust.
I think for business websites it starts to make sense to use something like Wordpress. The fact that it’s open source is amazing, so you can always self host. But it’s more effort. But you get lots of neat plugins and templates. But it’s more complicated.
Both are great solutions, but my current thinking is ssg for personal, for business maybe Wordpress.
The toolchain has changed significantly in 4+ years. It started as literally a shell script invoking Gruber's original markdown.pl. Then I switched to CommonMark, etc.
But the core data hasn't "rotted" at all, which is good. Unix is data-centric, not code-centric.
Previous thread that mentions that Spolsky's blog (one of my favorites) "rotted" after 10+ years, even though it was built on his own CityDesk product (which was built on VB6 and Windows). He switched to WordPress. Not saying this is bad but just interesting. https://news.ycombinator.com/item?id=25675869
> The toolchain has changed significantly in 4+ years. It started as literally a shell script invoking Gruber's original markdown.pl. Then I switched to CommonMark, etc.
Wouldn't switching formats like that break peoples pages? That's one concern I have with using markdown for a site like this. For anything non-trivial, you either end up using HTML (in which case, what's the point) or some dielect-specific feature.
At the bottom of this article I note one incompatibility I encountered with the reference CommonMark vs. markdown.pl. But I just fixed those and have been using it for 3 years, and it's been great.
----
The key point is to have a diversity of implementations, so you don't get locked into one, which may go away in 10-20 years. In 10-20 years, I'm very confident that there will be CommonMark renderers available.
I think this is an important “feature” but find that people either get or don’t. When I describe how cool it is that the version and history of a post is included in git and therefore reliable, I get sort of pleasant nods. But then people will fixate on having bullet lists in tables or something and give up on markdown.
Not that blog posts are life changing or anything, but having a format that’s durable and reliable seems important.
I lost my blog from college back in 95 when I got my first job and didn’t think to archive my shell account. I wish I had, even as a memento.
One thing I think that factors into it is that age is kind of nice for slowly removing stuff that’s not used from existence. Posts from 20 years ago might not be good to keep around if no one is reading them. No the gradual degradation that comes naturally from new phones, computers, hosts, jobs might be a feature for some people who don’t necessarily want everything around forever.
The thing about the net is that it's free to go viewing it. It costs money to put something up _that you control_. For example you have to buy the DNS name. You have to get a host, etc.. I know there are IPFS folks out there but I just don't know if there's anything there yet.
Normal people need a way to have a permanent place that can't be taken down and doesn't auto-expire with your credit card.
I run my blog using perhaps the most boring option - Wordpress, with close-but-not-quite the default theme (different fonts mostly). Outside of adding a cover image on every post and occasional footnotes, I don’t really need much.
However, that’s pretty lightweight on decisions I had to actually make to publish, and all the alternatives seem to be more involved. I wouldn’t mind migrating off WordPress, but just on the theming side that has a decent chance of involving a non-free theme, making the idea of hosting it on a public repo somewhat of a non starter...
I disagree with the core tenet of the article. Does everything really have to live forever? Most blogs are probably not worth preserving, just like most speeches in history are not worth preserving. Most books are never reprinted.
Time is a great filter. Yes important stuff gets lost but not each thought or word is worth preserving and not every moment in life needs to be captured and kept for posterity.
I would prefer if more SO answers, more old tweets, more outdated tech blogs, more old unflattering photographs, ... disappear in the void. The death of geocities meant some valuable stuff was lost, but it also meant that much much more crap was lost.
The world does not gain from keeping every scrap, rather the filter of 'did someone care enough to preserve this' adds value as the signal to noise ratio improves over time.
Ancient Roman graffiti is seen as being valuable enough that significant time and resource has been spent on studying it. The anthropological value of it seems obvious to me. Perhaps low effort thought-leadership or social media drivel has little contemporary value, but it could be a fantastic historical resource. The selection bias of history that was chosen to be preserved (and the people who were doing the choosing) puts a rather firm set of constraints on our ability to understand the past.
Absolutely agree. At the same time, this roman graffiti is valuable because there's not that much of it and not much other records of the time. We have more from Rome than most other times and places in history, but overall no much. There are probably more words of Linkedin 'thought leaders' publisher per hour than all the the text that has reached us from Rome.
I would still argue that we live in a time where many things are needlessly preserved (not to mention the privacy implications...).
I cut my teeth on Hugo and mdbook, and mkdocs was a breath of fresh air. No crazy templating, no pseudo-markdown index files, and lots of flexibility in the site and page structure.
I'm using pandoc for my site (https://www.dharmab.com). It's a bio/profile, not a blog, but I think it's OK (other than I'm lazy and haven't auto-generated smaller resolution images for mobile).
I feel like the author makes a good point, but it does not solve the original problem he was describing. The only reason why those websites disappeared is because the owner was no longer interested in maintaining it. It's a human factor, not a technological one. Even if they had written their files in markdown and git, the website would have disappeared anyway if they stopped paying the domain/hosting fees.
Now if you're arguing for putting the website source on GitHub, that's an entirely different matter. GitHub addresses the human factor by being entirely free with no effort required for site upkeep. That's why it's durable, it's not about keeping it as git and markdown.
I think the author's point is to highlight the relationship between technological and human factors. Git+markdown has a better user experience if your user is a developer who uses git every day.
Some 15 years ago or so I made a PHP script that took a Word document, saved in XML format, and turned into a static HTML page as part of my blog. It would rely on the paragraph style, turning "Header 1" into <h1> etc, as well as extract images and such.
Lately I've been seriously thinking of reimplementing that, as I just can't seem to find a decent static site generator. They all seem to require a bunch of plugins and configuration just to do the bare minimum a blog requires.
I just want to write, add some code or images, have it look somewhat decent and publish it without worrying about getting hacked due to some WordPress security issue.
Depending on what you think of as "the bare minimum", I'd think Jekyll or Hugo would work.
E.g. the Hugo quick start guide is pretty much just "install Hugo, pick a theme, add your content" and gets you a barebones sequence-of-posts blog: https://gohugo.io/getting-started/quick-start/
The issue with the quick start is that it's a bit too bare bones by the look of it, and the documentation for anything more complicated gets very confusing by someone new to it.
> I just can't seem to find a decent static site generator.
Maybe you and I have very different ideas of what is required to write a blog, but it sounds like you should be able to use Hugo + the theme of your choice and just go with it. It's /very/ easy, and posts are all written in Markdown.
Thanks for the input. For me a bare-bones blog needs to support simple-to-use images, that is auto-resizing images with link to full etc without a ton of hoopla, syntax highlighter for code and some tex-ish thing for math.
From what I can see Hugo does seem to have a lot of that these days, so I'll try that.
One immediate issue is the highlighter not supporting my language at work (Delphi/Pascal) but I assume it being Pygments-compatible means I should be able to add that without much fuss.
The images are going to be the hard part. Nothing out there suited my needs/wants in that regard, so I've been working on my own Jekyll plugin that is still nowhere near production ready https://github.com/okeeblow/DistorteD
Those features in Hugo are mostly reliant on themes. Hugo has a concept called "shortcodes" which allow you to add capabilities that rely on adding additional JS, or HTML blocks in at specific points in your Markdown.
The theme I use on my website for instance has a shortcode called "fluid_imgs" which handles images pretty well, and I also wrote my own shortcode addition to support embedding images from Flickr more easily. Additionally the theme I'm using integrates MathJax and HighlightJS, which it utilizes in the standard way for Markdown (e.g. in most Markdown implementations you can specify the language after triple backticks to start a code block) by invoking highlighting on code blocks.
Getting what you're asking for should be pretty simple in Hugo with the appropriate theme or minimal effort anyway.
> Lately I've been seriously thinking of reimplementing that, as I just can't seem to find a decent static site generator.
If that's the case, go with that instead of trying to assimilate into workflows around existing static site generators. The world benefits from diversity, even diversity in software. Consider pouring your effort into making OpenLiveWriter work the way you want, if it doesn't already.
I'd like to suggest Fossil SCM[1]. Your content (text, images etc.) is stored in a SQLite database [@] which is great! SQLite is a recommended storage format[2] for datasets according to US Library of Congress. Also, the SQLite developers pledge[3] to support it atleast until 2050.
What if you started blogging in 2000? Or even in 2005, when git and Markdown existed, but were still not popular? Would you say... HTML and subversion/CVS is appropriate for long-term blog storage?
A bit morbid I know, but I've been thinking I would like my blog to outlive me. I have my blog on blogger.com, and while I've been tempted to switch to something more modern and fun in a container on a VPS, I've resisted, because I think blogger.com is not going away in my lifetime. A VPS and a personal domain on the other hand is going to get shutdown when someday I die and my credit card gets closed. At this stage Github Pages is probably also as likely to stick around as Blogger though.
My blog is Django and PostgreSQL on Heroku, but last year I decided I wanted a reliable long-term public backup... so I set up a scheduled GitHub Actions workflow to back it up to a git repository.
Bonus feature: since it runs nightly it gives me diffs of changes I make to my content, including edits to old posts.
I've found much the same. I hadn't touched my site in 2 years, and just started working on it again and was able to nearly seamlessly pick up where I left off thanks to it all being done in Markdown. The flip-side is that many of the resources I'd previously linked to are gone. I'm starting to take a more archival approach to source materials and references as I write about them in the future.
If you want something to live on the web long after you've lost interest in it static HTML/GitHub pages seems like the way to go. The chess board I built in high school to learn JavaScript is still going strong: https://github.com/PJ-Finlay/JavaScript-Chess-Board
Per the timestamp, that's ~5 years. I've got a personal domain I've had for 20 years now. GitHub wasn't there, and I wouldn't bet a whole lot it will be there in another 20. I think it's likely, but nothing close to guaranteed. Git didn't exist then, either.
It's pretty easy to mirror your own git repos, so that's a nice option, and to have stuff hosted on GitHub, your own, and maybe BitBucket as well. But nothing is going to beat having your own backups--which I DO still have, from as early as 2000.
Any kind of source that requires rendering is inherently fragile. We’ve all seen old posts messed up by a template change. There is no reason to think that GitHub will live (and continue to render markdown) much longer than, say, GeoCities.
The simplest subset of HTML you can manage coupled with an uncomplicated layout and no external assets is better in terms of longevity.
To go one step further you can add an HTTP redirect for /info/refs to direct to wherever the GIT repo is hosted. For example, if hosting with gitlab pages or on Netlify you can put in a file called _redirects containing something like:-
Then if your website is at https://example.com someone can just 'git clone https://example.com'. They can periodically do 'git pull origin master' and you can also push changes that way. There's no problem moving to a different git repo provider (github/gitlab/whatever).
Well, this works if publishing via Netlify anyway. Gitlab doesn't like the query parameters, but it also errors with 'no domain-level redirects to outside sites'
I agree with Markdown + Git. I recently migrated my personal tech blog from Jekyll to Nuxt with static generation and the @nuxt/content API. I had some difficulties preserving the /YYYY/MM/DD/slug.html routes that I had used for my markdown files. It is very nice to be able to mix markdown and Vue components, but this adds additional work for cross-posting and future site migrations.
For example, I recently wrote an article about YC's Work at a Startup that contains some interactive visualizations embedded in the markdown file I used to draft the content as Vue components. If I want to cross post, I'll probably need to maintain another version of the markdown that either contains static images of the interactive elements and/or links back to my site (on GitHub pages).
This is pretty much how I do it. I built my SSG using Hakyll and host the resulting binary on my own Debian repo, that way I can write a post in any flavour of text that Pandoc supports (though I go with Markdown for simplicity) and within about 20 seconds my CI pipeline has built and pushed a new version of my site (using scp). Easily done by myself too but at least I know with the CI approach that I'm writing to git first and then publishing.
I'm part way through the implementation of showing the commit history for each post and linking it to the commit hosted on sourcehut. It provides another view on the content the same as hosting a gopher or gemini version would: clone it and read the things in the posts folder however you want.
My web server hosts that directory with indexing enabled, but I don't use apache for it like most examples do. There's nothing special about it, it's just a directory tree built in a way that apt likes. (https://pkg.kamelasa.dev/). In fact, the entire configuration of the repo is visible there.
There's a step in the middle where I sign the packages with my GPG key, and the public key is available on Ubuntu's keyserver (http://keyserver.ubuntu.com/).
I don't need to run this workflow very often, as it'll take about 40 minutes to rebuild and push. But if I do update my SSG I know it'll end up in my debian repo with a version bump, so I'm happy.
On a second pipeline I can just do a simple 'add-apt-repository' and 'apt install'.
Your favorite static site generator can be used with free, no-ads hosting at indie https://neocities.org. Their $5/mo plan adds custom domains and higher disk/traffic quotas.
I hope with the IPFS and nix stuff I've worked on, we could someday have a world where you can distribute and archive the source of a static site, and the thing can be built and rendered on demand without tons of security and reproducibility issues.
This is true. I started my blog https://codevscolor.com in wordpress. 5 years later, last year, I moved it to markdown + git with gatsby. I am never going back.
This is the kind of thing pocket/instapaper do to extract the main content from a page in a format that's easier to read (and also probably to programmatically modify)
I use GitHub/GitLab pages and build my site using Pelican, so I'm pretty much already there.
Coincidentally, just two days ago I too went through my RSS subscriptions and, fortunately, most of them were still online, but sadly quite a few hadn't been updated in years, so it's likely that this will happen to them. Safe to say, I agree wholeheartedly with the author.
This grabbed more my attention: Internet Archive for web pages is not discoverable, hard to crawl and download content from.
I noticed that too. I would be happy to find content directly on the archive (which is a complex search problem) and be able to download a snapshot of a website (and all its assets) from a certain point of time. I would even pay for both services.
I worked on a wiki that fell off the web; I've thought about recovering it from scrapes, and rewriting entries so that search wasn't needed, so that the Wayback Machine mirror worked better. Maybe using Kiwix tools to make a self-contained, downloadable backup, too.
that's exactly what I did for my blog and achieved 90+ on lighthouse audit. Google is considering core web vitals to rank pages from this year (may 21). I don't want to rely on an external hosting provider who usually loads tons of tracking code, analytics scripts, irrelevant scripts to hosted sites (looking at you Wix).
"or fun, I tried “de-archiving” an old colleague’s blog from The Wayback Machine. Getting the first few pages was easy, but getting the whole thing, and with quality/precision, was very hard."
Any good tool to extract a website from archive.org?
Something that I feel is missing here is the problem with dead links. Even if the same content is still available on the very same blog, links might change when you decide to change your stack.
I might still be able to get around it, but it's still a pain.
Anyone have any experience with Hugo? It started out as a fast single binary static site generator but it seems it is now dependent on Go being installed on the host...
It's not dependent on Go being installed on the host... I'm not sure why you think this is the case. Like most Go applications, it's compiled into a single static binary.
Markdown does not even gracefully degrade to github because you are not using markdown, you are using markdown with some arbitrary extensions, and you are doing this because markdown is useless without extensions.
When I last had mine out I went a step further. Just a single static HTML page, with a little CSS in the HEAD. A single request for a single file, easily saved. No build or deploy process. No javascript, no stack of dependencies, no docker containers, no dependence on github or any other third-party site.
For a full blog, I would use a static site generator, but not for something well-served by a single static page.
After a year or so I migrated to a 5 USD droplet on Digital Ocean (back then GH Pages didn't offer https for custom domains) and integrated with Github webhooks to automate the deployment when pushing new markdown to the main branch.
Over the time it indeed started to degrade. The build time takes almost a minute, but after building the website is just a bunch of static html pages.
Nowadays it is annoying to write new posts because I like to write locally and refresh to browser to check if it's looking good or not. So I would say it degraded for me but for the reader it's still as fast as it was when there was just a couple of posts.
I thought about migrating the blog to something else, but because I used some custom markdown extensions for code highlight and other things, it would be painful to migrate all the blog posts. So I've been postponing it since 2019.