Hacker News new | past | comments | ask | show | jobs | submit login

I do wonder if the power of a distributed database is really needed here; it gets ~1 update a day, so there's no need to have clever consistency stuff. Most of the queries relate to either today's data (you open the map and zoom in to see how doomed your area is today), or the graphs showing a standard set of history (e.g. cases over the last year). You'd think you could extract that data to be static and not require database queries, and only fire up the database for the tiny proportion that go digging in history.



I agree. For example, the Dutch corona dashboard (coronadashboard.rijksoverheid.nl) is a statically rendered dashboard using Next that gets updated daily. No backend and it's super fast.

Maybe I'm not objective because I'm Dutch myself, but from both a user-facing and technical perspective I think the Dutch dashboard is by far the best corona dashboard in the world. It's very fast, has a lot of detailed visualizations, provides a lot of context and has fair amount of accessibility features.


Looking at the Dutch site on my phone (Samsung S10) I noticed it took a little while to load compared to the nigh instant loading of the UK Gov variant. Looking at Page Insights [0] [1] tells a similar picture. Desktop time to interactive times of 0.4s Vs 3.5s and Mobile time to interactive times of 4.5s Vs 13.1s.

The Dutch website seems to spend a lot of that time running the Next JS framework stuff, which the Gov.uk variant does not. It might work quickly on fast computers, but even on modern phones it seems to visibly pause.


On my iPhone Xr, older than S10, it loads very fast. Also performance depends on which page you benchmark. Landing page is faster for UK, but the cases pages is about twice as fast for the NL dashboard (time to interactive 2.4s for NL vs 4.4s for UK). Also first meaningful paint is faster (0.5s vs 0.8s). This proves that you can get decent performance without an overly bloated costly architecture.


It looks really nice. Unfortunately it turns out the main feature, the data, is phony.

https://dvhn.nl/groningen/Meer-ziekenhuispati%C3%ABnten-blij...

When the hospitals feel like it, they test patients that are already in the hospital for something else if they have COVID. And when they don’t feel like it, they don’t. Any patient found to have COVID, is added to the graph. So these numbers, and also derived numbers such as the R value, are statistically useless and vulnerable to manipulation.


I'd say that using Next for a static site is just as over engineered, personally.


I disagree. Expressing your frontend layout as code is not over engineering at all imo. It makes it easier to re-use code and is great for testability. Next is perfect for this use case. I actually think the code is quite elegant too. It's open source, so don't take my word for it, but have a look for yourself: https://github.com/minvws/nl-covid19-data-dashboard/tree/dev....


> Expressing your frontend layout as code is not over engineering at all imo

What do you mean by this? Isn't all frontend layout expressed as code?


HTML is not code imo. With React (and thus Next) you can treat your frontend code as full-fledged functions and objects.


Is there a dashboard of dashboards somewhere?


The dashboard has to deal with a complex data integration problem, with different sources with differences in completeness, accuracy, age, and granularity (at many levels), daily corrections in past data, changes in data structure and semantics over time, large data volume, 4pm traffic spikes. Moreover, an API that allows you to select different metrics for different areas. Being able to simply write a SQL query or update and have it be fast regardless of volume is quite a life-saver if you have a tiny (mostly 1 person) team and development speed/adaptability is essential.

Some example queries issued by the dashboard: https://github.com/publichealthengland/coronavirus-dashboard... https://github.com/publichealthengland/coronavirus-dashboard... https://github.com/publichealthengland/coronavirus-dashboard...


I think the point was that the analyst needs to do that, but the frontend doesn't really need to - it could render a bunch of statically aggregated data and the spikes become another CDN problem.


See the other comment about the Dutch dashboard. Covid data isn’t changing that quickly. Having the frontend render something more static simplifies the design. No sql queries are even needed and you don’t need to scale out your database.


Well the Dutch site takes much longer to load. All these comments are (rightfully) discussing the back end being incredibly over-engineered, but 99% of people do not care about that. They care about how quickly a page loads, which the gov.uk site does much better.

I guess that implies that using Next for a "static site" is not a great idea.


https://coronadashboard.rijksoverheid.nl loads in less than a second on my Pixel 6.


I had the same thoughts and then it was confirmed how insane this setup is part way through:

“At the time of writing, the Citus distributed database cluster adopted by the team on Azure is HA-enabled for high availability and has 12 worker nodes with a combined total of 192 vCores, ~1.5 TB of memory, and 24 TB of storage. (The Citus coordinator node has 64 vCores, 256 GB of memory, and 1 TB of storage.)”

That’s beyond overkill for something that as you say could be generated statically a couple of times a day.


It's probably overkill, but not really enough overkill to be worth spending much time on.

E.g. 12 worker nodes and 192 vCores means they've picked 16 core nodes. 1.5TB of memory across 12 nodes means 128GB per node. 24TB of storage is just 2TB per node.

So it's 12 relatively mid sized servers/VMs.

They could certainly do it with much less, and I have no interest in looking up what 12 nodes of that spec would cost on Azure, but at Hetzner it'd cost less than 1500 GBP/month including substantial egress. At most cloud providers the bandwidth bill for this likely swamps the instance cost, and the developer cost to develop this is likely many times the lifetime projected hosting cost even with that much overkill.

If they happen to have someone familiar with query caching and CDNs, I'm sure they could cut it significantly very quickly, and even an entirely average developer could figure out how to trim that significantly over time. But even at (low) UK government contract rates it's not worth much time to try to trim a bill like that much vs. just picking whatever the developers who worked on it preferred.


> generated statically a couple of times a day.

That would require actual work instead of selling an overpriced generic solution.


Did you look at the 3 different (non-trivial) APIs they are offering on top of the dashboard? Though I have a hard time understanding why use PostgreSQL instead of ClickHouse, for example.


No I didn’t tbh, I didn’t read much further. Notice how one sentence says Postgres was chosen because it was somebody’s preference


You will always be faster with worse tools you know than with better tools you don't know.


True but why does it also need terabytes of storage and 12 worker nodes?


i imagine getting something up, quicklh waa a priority, rather than spending longer architecting amd optimising.


My suspicion is that since this has to do with COVID, there is no real limit on what the cost should really be.

As for using the setup for other things, that seems less likely given this expensive setup.


> could be generated statically a couple of times a day

Hell, let's do some partial evaluation: just bake the computed HTML into the source code and recompile that a few times a day. No need to even read from a file when you can fetch it from rodata.

As for the reason why they did it this way, I assume it's a combination of CV-driven development along with the hackernoon-reading-junior-engineer-meets-cunning-salesperson effect which others have noted.


Yes the static render option seems optimal however if an API is being offered then something dynamic is mandated forcing scaling of the data tier. It seems like even a basic app cache would suffice.

Alternatively, we're building https://www.polyscale.ai/ that is a good fit for this type of use case. It's a global database cache and integrates with Postgres/MySQL etc. We host PoP's globally so the database reads are offset and local to users.

Agree with the other comments in that this feels like a shiny use case to quote to other prospects, but all good :)


My guess is that this is sales pitch. It will be rolled out to business customers to say "look at our shiny bells and whistles", and contracts will be signed.


I played with the website and it feels really nice.

My guess is that this was web people who were contracted to build a read-only daily updated dashboard instead of interactive web app so they treated it as another web app, just scaled up.


To add to this the scale of the data is presumably quite small as well. The geographical resolution is probably not super fine, there's only a handful of different kinds of data (deaths, vaccination whatnot) and the time resolution doesn't have to be too fine either (a day?). Even if you wanted to query it in very sophisticated ways you wouldn't need a database.


In fact, the UK dashboard had a suspicious outage when total case numbers exceeded the 1 million row limit of Excel... I suspect excel is used in the data prep stage, if not used in serving the dashboard.


It’s entirely unnecessary. The data is updated too infrequently to justify anything like this.

I built a one-pager vanilla JS site that polls the official Johns Hopkins aggregated data daily, and displays dynamically generated smoothed moving average charts, performs curve similarity analysis to identify similar patterns in different countries, and performs logarithmic regression to depict current doubling/halving times.

This happens entirely on the client side, with no server side component whatsoever (other than the http server to deliver the static HTML&JS that does all the work). See https://covid-19-charts.net/


I came here to say exactly this. Is there a reason why they didn’t do it? I couldn’t figure it out from the article.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: