Hacker News new | past | comments | ask | show | jobs | submit login
Log-log plot of new vs. total Covid-19 cases by country (aatishb.com)
279 points by IndrekR on March 29, 2020 | hide | past | favorite | 156 comments



Log-log of daily growth vs total cases is a neat graphics hack, one I'd not seen before, let alone thought of.

FT have been headlining a semi-log plot of deaths per country, normalised to days after the first ten deaths were reported.

https://www.ft.com/coronavirus-latest

Because dead bodies tend to behave characteristically, are inconvenient both directly and via surviving relations, and present a smaller testing target (about 1% of total cases) as well as representing a full course-of-illness endpoint, these data should be generally more reliable and cross-regionally consistent than confirmed cases. Deaths are, however, lagged by about two weeks.

FT also provide numerous other graphical representations, including an excellent small-multiples (Tufte fans) matrix of multiple countries' case trajectories.

My view is that all serious reporting should lead with similar visualisations.

Wikipedia's COVID-19 pages have similarly featured semi-log plots from early on, as does Worldometers.

https://en.wikipedia.org/wiki/2019–20_coronavirus_pandemic#D...

https://www.worldometers.info/coronavirus/

(Numerous additional pages with regional and specific behavioural characteristics within both sites.)

Anoter data visualiser, allowing arbitrary multi-country comparisons:

https://rys.io/covid/


I’m not sure what the situation is elsewhere, but I don’t trust the death statistics here in Poland.

Authorities here will be quick to remove a person from the death toll if they can attribute the death to something else than COVID-19, even when the person in question was tested positive [1].

People who have died in quarantine, having had symptoms but never tested, are also not included [2]. For that matter, testing patients with full-developed symptoms is officially outright discouraged [3].

Press sources in Polish:

[1]: https://mobile.twitter.com/MZ_GOV_PL/status/1243090118270947... [2]: https://poznan.wyborcza.pl/poznan/7,36001,25818730,koronawir... [3]: https://dentonet.pl/wytyczne-gis-nie-wystarcza/#gref


There's three classes of countries.

Conscientious testers like South Korea and Germany. These countries have low death/confirmed_case rate and cases are only a small factor from reality.

Disinterested testers such as UK which only tests hospital admissions (and royalty). The case numbers for these countries are completely unreliable and useless. Deaths are probably quite accurate.

Lastly countries that are trying to perform a coverup, or developing countries where there is poor access to healthcare. Both cases and deaths are unreliable and there's nothing you can do about that.


It's the dark zones I really worry about.

A useful heuristic I've formed for anomolous situations (and especially disasters /emergencies / catastrophes) is that it's where there's no signal at all, or where the signal makes. no sense, that you should be most concerened.

After major natural disasters, looking for regions in which there are _no_ reports of damage frequently repressent worst-hit areas -- all comms and monitoring are knocked out.

In the case of pandemic, it's the capacity (and often willingnes) to sense which is lacking: medical infrastructure, personnel, diagnostics, reporting standards and organisations, etc.

First signs will come a month or two in as 1,000s are dropping suddenly dead. Or before, as "patriotic" and tourist cases emerge -- nationals being diagnosed after leaving the country (this occurred in China with SARS giving rise to the sardonic term), or travellers transiting or arriving from the country testing positive, as happened with Italy and Iran recently.


The flip side of that is over counting. I have read (but do not have source handy) that Italy may be over stating the death toll by lumping any death of a covid positive patient into 'covid deaths'. So if you die of a heart attack and are covid positive, its a covid death, even if you were not seriously ill with significant covid type issues (ie, respiratory).


I'm no kind of expert on medical decisions/statistics, but I can see a possible justification for this. The premise of "flatten the curve" is that there's only so much capacity in the medical system, and once it's used up, not just COVID-19 patients, but anyone who needs urgent hospital care, is going to be worse off.

If the situation is bad enough in a place (like Italy) that a heart attack isn't getting its usual level of care due to the COVID epidemic...it seems reasonable to count that as at least a "COVID-related" death.


Exactly. It's a very bad time to get cancer, an organ transplant, a heart problem or trauma needing ICU. The base rate of deaths from those conditions is well known, so in time we'll be able to see which patients died as a side effect of corona (due to being more vulnerable or due to a shortage of medical treatment). Obviously complicated by the lack of road trauma and industrial accidents, and increase in in-home causation due to being off work (and we shouldn't exclude domestic violence, alcoholism, anxiety, suicide, robberies).


> Because dead bodies tend to behave characteristically, are inconvenient both directly and via surviving relations, and present a smaller testing target (about 1% of total cases) as well as representing a full course-of-illness endpoint, these data should be generally more reliable and cross-regionally consistent than confirmed cases. Deaths are, however, lagged by about two weeks.

Completely agree. Just to add another source of differences between countries for this metric: Some countries test post-mortem (Italy) others don't (Germany).


I’m not going to believe that without a good source. Testing in Germany is at 5000,000/week now, so anyone admitted to hospital would definitely be tested. A friend of mine here in Berlin had symptoms. They called and a team came to their place the next morning. One day later, they were called and got the test results (negative).

The explanation for the relatively low fatality rate I’ve heard is simply widespread testing catching many mild cases and relatively young patients because a lot of initial cases at least were linked to ski holidays and carnival events. Personally, I would add the hypothesis that Germans have far fewer interactions with family members across generations than especially Italy, but also the US. No one I know lived with their parents after finishing school. In the US, when Harvard shut down for the semester, the undergrads all went home to family. Here, students tend to live in regular apartments. And even those in student housing live there year-round, instead of vacating them during breaks and heading home.

I believe Austria and some other smaller European countries are also seeing at least similar CFRs, making Germany somewhat less exceptional.


We had a suspicious surprise death in the family 2w ago that the German hospital did not test (early outbreak + sudden death) nor autopsy and we suspect real chance COVID was a secondary factor that merited at least a check. Internal medicine md phd in our family, so no joke. Ok they didn't test, but we were surprised they did not autospy. Sad way to confirm SOP.

For others: In the US, you can ~always request an autopsy.


See my answer to mns in this thread for the sources.


Didn't realize that - that is also a good explanation for why Germany has less deaths per million in published numbers: they just aren't testing everyone that dies.


It's not completely impossible, but it's considered unlikely; people are sometimes tested post mortem (it's simply not a general policy), and people are usually tested before it gets that far.

cite: https://www.theguardian.com/world/2020/mar/22/germany-low-co...


Do you have a source for that? Asking as a person living in Germany, I never heard something like that.


I have this info from one of Robert Koch institutes press conferences from a few days ago. If I remember correctly it was claimed by a journalist in the Q and A section and not objected to in the answer from Robert Koch institutes president Lothar H. Wieler.

EDIT: Here is another source in cold print (search for "posthum"): https://www.welt.de/politik/ausland/article206741617/Coronav...


>Some countries test post-mortem (Italy) others don't (Germany).

It's true and misleading at the same time. True: Germany doesn't do post-mortem tests on people not tested before. But at the same time, everyone who dies and has a SARS2-infection is counted as a COVID-death, whether or not they died from the infection.


> True: Germany doesn't do post-mortem tests on people not tested before. But at the same time, everyone who dies and has a SARS2-infection is counted as a COVID-death, whether or not they died from the infection.

But how do you know that they had a SARS-2 infection if you're not testing?


You test them ante-mortem.


Oooh....

Source on the DE lack of PM Dx?


See my answer to mns in this thread for sources.


Seen, as well as substantive corroboration elsewhere. Very concerning.

Thanks.


> dead bodies [...] data should be generally more reliable

Not to be too heartless towards sick people, but I think deaths is the only data that's really useful. In addition to reliability as you mention, I have no idea if "infected" means "light cough with a fever and the sniffles" or if it means "on a ventilator in a hospital". Deaths and being maimed are permanent and way more significant to me.

(Hopefully you'll forgive my overarching editing in quoting you)

Also note that you can set the plot in the linked page to show deaths. It's just not the default.


There is a similar confounder for deaths: old and multi diseased. Yes you may know the body count with corona but you do not know the body count due to corona.


At population scales, anomalous excess mortality can be detected and shown actuarially, based on expected mortality rates.

Which raises another key point: in an epidemic, it's the population at large which is affected and diseased. Individual assessments (or treatment) are often of limited use or applicability.


I'm not sure if we will be able to separate the confounding factors such as economic impact at the population level. Do you have an idea how this can be done?

I have seen a few studies using this method to look at excess mortality during the great recession [1],[2], but there wasn't a pandemic at the same time.

https://www.thelancet.com/journals/lancet/article/PIIS0140-6...

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3070776/


To a certain extent the distinction is moot.

Deaths directly from an infectious agent, or due to society-wide institutional impacts ... are still deaths, and would have, at least in a statistical sense, have been avoided without the epidemic. Specific mechanism differs, but the ultimate avoidance measure is the same: avoid pandemic situations, as a society.


Don't forget the other direction: the people who die due to the shortage of ICU beds. There will be uninfected people who die that in normal times would have survived.

Only the excess deaths will allow us to have a true comparison, but sadly we only get those long after.


Those are confounded too by the reduction in other deaths such as traffic related ones.


Why is it a confounder? If people have less traffic accidents due to staying at home, that makes the virus a little less destructive in terms of lives (just remember to account for the economic cost).


Because you won't be able to break out Covid-19 caused deaths from the aggregate.


I disagree, as noted elsewhere in this thread. Pandemic effects should be consider society-wide. Specific causal and medical or other mechanisms will differ, but it's the net effect which matters.


Nonetheless, dead + coronavirus symptoms + positive coronavirus test is a pretty objective way of measuring things. People who will end up dead are more likely to be tested, so it’s also a more complete data set.


Some people would have died anyway of the existing condition they are having. An interesting question is how much their life expectancy had been affected by having also caught the coronavirus.


Here is one calculation that estimates the Equal Value Life Year Gained as 15 years. Then, if we do nothing to mitigate the pandemic and 1 out of 200 people die, then the average person looses 27 days. https://www.forbes.com/sites/theapothecary/2020/03/27/how-ec...

This article says it's 7.8 years on average and up to 13.26 million QALYs are at stake in the US. So it's expected that life expectancy for the whole population will decline by at most 14 days due to Covid 19. https://www.nationalreview.com/corner/another-covid-cost-ben...


"if we do nothing to mitigate the pandemic and 1 out of 200 people die,"

Where does that figure come from?

Edit: also, I think 1 in 200 is very, very roughly 5-10 times all other infectious diseases in a normal year. So maybe that would provide perspective.


Let me rephrase it: If governments does nothing, 1 out of 200 people will die.

This study expects 510,000 people to die in the UK (depending on the R0 value) and 2200000 in the US. That's 1 in 135. https://www.imperial.ac.uk/media/imperial-college/medicine/s...

It will be between 1 in 200 and 1 in 100.


"Infected" means exposed to and replicating the SARS-COV-2 virus. Antigen or antibody tests should be able to detect this (subject to false positive/negative rates), if systematically and comprehensively administered.

The tests are not being administered, for numerous reasons, chiefly availability.


Infected is obviously "tested positive". I'm not sure how that's any different whether they have a light cough or they're dead? They still tested positive for covid 19


what about “number of people hospitalised with flu-like symptoms” possibly adding “not provably unrelated to coronavirus” (i.e. untested people get included, even those with flu get tested included unless it's proved they don't have both)

not every country has comparable death handling, but you can't hide a person on a respirator.

(alas, i guess it's not widely reported)


Flu, and numerous other infectuous diseases, are independently testable. Within reasonable error bounds, COVID-19 is "respiratory disease. not otherwise indicated", or "ideopathic pneumonia". Diagnosis by exclusion is nonideal, but often accepted, at least for other conditions.


I am not a fan of these semi-log plots of total cases. They don't really tell much.

If the question is "how bad it is?", than the linear plot makes more sense. Going from 1000 deaths to 10000 is 10 times as bad as going from 100 to 1000, it shouldn't be shown in the same interval if that's the info you are conveying.

If the question is "are thing improving?" it is not that helpful either. Here, what we want to know is if the growth is exponential or not, and there is no obvious feature in the plot telling us that. The plots of South Korea and Italy have similar shapes, one just looks more stretched out than the other, and yet, they describe completely different situations.

The only case when a semi-log plot can be useful is when it shows a straight line, meaning an uncontrolled exponential growth. But such a phase never lasts, it can't, even in a worst case scenario.


The point is to see if it's flattening out or not over time. Clearly for the USA it isn't. Exponential curves start getting useless after a while from a trendline perspective .


>Because dead bodies tend to behave characteristically, are inconvenient both directly and via surviving relations, and present a smaller testing target (about 1% of total cases) as well as representing a full course-of-illness endpoint, these data should be generally more reliable and cross-regionally consistent than confirmed cases.

Are you accounting for healthcare system overload? The case fatality rate in northern Italy is probably going to be very different than in Japan for example.


First, the relevant comparison is the relative reliability of deaths vs. confirmed cases data.

That clarified: overloaded healthcare would probably exacerbate this. Dead bodies don't require medical treatment, though registration as a COVID casualty requires post mortem testing. Live infecteds might well be turned away from facilities, choose not to seek treatment, or expire before receiving it. All of which contributes to case undercounts.

The difference in overall CFR would likely be at best a modest multiple of best-case treatment -- admissions are stil only a fairly small fraction (4-20% figures that I've seen, lowest in Iceland) of total infecteds.

Italy's nominal CFR is still 55%, and has ranged as high as 75%. (https://www.worldometers.info/coronavirus/country/italy/) That suggests to me (and far more qualified commentators) a huge undercount of actual total cases.

At 1% mortality with 10,000 dead, Italy would have had 1 million cases two weeks ago. The current confirmed total is 92,000, an undercount by a factor of 10, even discounting the time lag. We might assume Italy's CFR is higher than elsewhere, but pick even fairly high values and testing still seems higghly inadequate. Two weeks ago, total confirmed cases were 21,000. A fifty-percent mortality is not credible.

TL;DR: Deaths are a vastly more reliable, though still approximate and undercounted metric, than confirmed cases.


Something feels wrong here. But I am just a clueless programmer and armchair expert. A thing I see is that you shouldn't use CFR this way because the CFR is only known after the end of the pandemic. Hoever I agree with the conclusion that the number of cases is not reliable, but not with the way you deduct this.


If I'm understanding you, that's largely the point I'm making. Italy's apparent anomolously high CFR is all but certainly a measurement artefact and not a clinical actual ground truth.

The deviation itself, though, points to insufficient monitoring rather than exceptional lethality. You cannot just look at apparent CFR and without question.


One valuable property of plotting deaths (total) vs. deaths (last week), or cases vs. cases, is that the shape of the plot is immune to consistent distortions in reporting; if country A is consistently down-playing the death rate by x%, the slope will still be the same, although the position along the line will be x% smaller in both axis. If the distortion is changed, to y%, then there will be a departure from the common slope, but if the real rate has not changed the graph will return to a positive (but shallower) slope after a couple of weeks.

And indeed, every country but South Korea and Japan show pretty much the same pattern. South Korea had a pretty vigorous response, but Japan? Perhaps reporting standards in Japan keep changing?

Hm, Iran deaths look funny, too. They suddenly leveled off at 900/wk (linear chart). Ah... cases log chart shows a short deviation, then resuming slope (even steeper!) So, reporting change?

[Edit: SK also outlier, Iran, added cases vs. cases]


This looks like an actual log-log plot, but of new cases vs existing cases. Not a super useful view except to point out that yes, growth rate tends to be proportional to infection count (pure exponential growth) until something majorly changes.

A linear plot would tell exactly the same story, just be compressed too far towards the origin to be very readable.


Yes, I realised that and edited my comment, apparently before you'd completed yours.

The view is useful in that it makes departure from the exponential growth trend blindingly clear. A bit fiddly trying to work out timelines however.


These are fascinating. But all of this data seems to indicate to me that this virus has a far longer incubation period than what I was led to believe. Right, how can it be that it continues to spread at the same rate weeks into quarantine?


What I heard was an average of 5 day incubation period (rarely up to 2 weeks). But early symptoms are often things like slight sore throat or mild fever or a slight cough, which might be easily confused for other respiratory viruses or seasonal allergies. People with minor symptoms are told to stay home and not tested in many places.

People often start experiencing more serious symptoms 1 or 1.5 weeks after symptom onset. Then getting a test result can take another 0.5–1.5 weeks.

Deaths are typically not occurring until something like 3–4 weeks after initial infection. Then in some cases data about deaths can take at least a few days to be aggregated in regional/national statistics.

So any public health measure instituted today will take at least 2 weeks to show up in the data about number of cases, and at least 3 weeks to show up in the data about deaths. Full effect of public health interventions is probably not seen in the data until a month or more later.

And keep in mind that public health interventions are only as effective as the public’s willingness to follow them. If someone is getting their news from a source which repeatedly claims that this is all overhyped and there’s nothing to worry about, they might not take appropriate personal action.


Most mainstream news has moved away from their earlier “we should more worried about the flu” consensus they were pushing during the democratic primaries.


Among information I've heard:

- Testing ramp catching up with community spread. A US congressional source was saying a week ago that the disease waas spreading faster than testing. Likely still the case.

- Entrenched community outbreak and transmission.

- High-density regions with rapid transmission. The US east coast has leapfrogged early outbreaks in WA and CA (though these also likely have undertesting). New York State (overwhelmingly NYC) alone would be the world's 6th largest outbreak.

- Many regions remain in denial, have grossly inadequate response, or are mishandling the outbreak. In the US, Florida, Alabama, and Mississippi are saying and doing innane things, even by standards set at the national level. Mexico is in fiull denial. Brazil's mafia are imposing quarantines where the. government won't. And rumblings (see Tyler Cowan) are that Japan may come unglued again.

The virus does spread asymptomatically, and many (mostly younger) patients can carry the infection with mild or no symptoms, but infecting others. See: https://www.worldometers.info/coronavirus/coronavirus-incuba...


A possible (I think probable) explanation is that we're seeing in this graph a ramp in testing, not a ramp in infection.

If we were to just randomly select individuals for infection and antibody s teeming, we could generate a clearer picture of this. Absent that, look at total death figures 14 days later for a decent proxy...


It’s likely due to a lack of testing.


Deaths lag by three weeks.


...the China National Health Commission reported the details of the first 17 deaths up to 24 pm 22 Jan 2020. A study of these cases found that the median days from first symptom to death were 14 (range 6-41) days, and tended to be shorter among people of 70 year old or above (11.5 [range 6-19] days) than those with ages below 70 year old (20 [range 10-41] days.[6]

https://www.worldometers.info/coronavirus/coronavirus-death-...


You are missing the 6 day incubation period.


Fair point.

Not to dispute, but to note the critical importance of definitions and measurement ability, this raises a few other points:

- Are we comparing and rating deaths vs. cases on time of viable exposure or first clinically detectable presentation?

- What is the earliest an infection is reliably detectable?

- At what point during incubation is a personnthemselves infectuous?

Probably others.

I'm pretty certain based on past experiences that these discussions are being had among medical and epidemiological circles.

At what point though is the question one of failing to achieve potentially complete coverage versus fundamental limits of detection?

That said, deaths lagging exposures by ~20 days is valid. Thank you.


This kind of analysis is also called a phase space plot: the function is plotted against its derivative. And when the function is an exponential growth then the derivative is the same which gives a similar plot for all the different countries. When the function deviates from the exponential like in China you can spot the difference very early in these kind of plots.

https://en.wikipedia.org/wiki/Phase_space

or for an example:

https://en.wikipedia.org/wiki/Duffing_equation

or in math notation: the governing equation for exponential growth is :

y' = ay

which is a linear function where the slope is the growth rate. The plot shows y' vs y. This is the straight line in the plot. Any deviations from exponential growth can be easily spotted now.



For the logistic function the relation is

y' = y - y^2

This is why the linear relation later on bends back to lower values until y' becomes 0 for the end of the infection.


There's a nice video about this on the minutephysics channel: https://www.youtube.com/watch?v=54XLXg4fYsc


Yes! That’s where I saw it, so I started making my own graphs like that for the US states that I and my relatives live in. In excel you have to use a scatter plot and then set both axes to logarithmic. But it doesn’t let you use more than one data series on the same graph. (At least, not in the Mac version which tends to be inferior.) And it’s not animated so I was thinking of throwing something together in Python.


Title should read 'confirmed Covid-19 cases by country', that makes a very large difference. Those figures are not to be trusted to begin with so any kind of processing you apply to them does not result in graphs that output a picture that you can then draw conclusions from.

Each country has their own standards in what is a confirmed case and what isn't and some countries actively discourage accurate reporting.


It makes a difference in the absolute numbers, but it doesn't really matter for the trends, since each country's testing policy is relatively consistent with itself. In other words, the graph shapes & trends are stil comparable, whether a country's testing captures 1%, 10% or 100% of its cases.


In some countries it will even matter for the trends. There are some countries that actively cook the numbers to make their politicians look good, and where there are very sudden kinks in the graphs you can be sure that the whole story hasn't been told. As or the testing capacity, that's a big factor too and it is non-linear in many places: testing criteria are changed based on how much stock there is of test materials and how much capacity on the machinery.

The closer to capacity, the stricter the criteria.


The testing policy in most countries is to increase the number of tests over time. How is that consistent? consistent would be testing a random sample of size n every day without changing n.


In addition, some places like the US are really a collection of states. The federal government has little to do with testing policy, so it varies considerably by the state. Producing a total for the aggregate United States really blurs things.


That accounts for the left-right gap between countries as they follow the same trajectory upwards. That is to say, it is almost totally irrelevant when the data is displayed this way.


The number of confirmed cases is not really a great value to track, because countries test differently and change tactic after a while.

Look at this graph for instance, number of tests per million people vs number of confirmed cases per million people. They're highly correlated, which means the more you test the more confirmed cases you'll have. In some countries it's the opposite, the more cases you'll have seeking medical assistance, the more tests you do.

https://ourworldindata.org/grapher/tests-vs-confirmed-cases-...


You missed the point, that this sampling bias applies equally to the x axis and the y axis in this plot. So this plot allows you to fairly compare growth across different countries, regardless of their testing regime.


No, there are two different biases at work and they might not cancel out each other.


Unless I'm missing something, the data for that seems pretty dubious. It says New Zealand has done 120.8 tests per million people, which would be ~4.97*120.8 = 600 tests in total, but NZ has been doing more than double that per day. This data is also 9 days out of date.


This is missing one of the most affected countries: Switzerland.


According to [1] the numbers for Switzerland are about 13k tests and 1.7k cases per million, which would be somewhere below the Faeroe Island.

[1] https://www.bag.admin.ch/dam/bag/de/dokumente/mt/k-und-i/akt...


Does anyone know of a source for hospitalisation numbers by country? I know some states in the USA provide this (https://covidtracking.com/data/) and some countries in Europe do the same, but I can't find a site that collects all this data.

It seems to me that if you want to eliminate the effect of the totally different testing strategies (which moreover vary substantially over time), then hospitalisation numbers are far more indicative of the spread than positive test results. At least in the sense that you can compare them on different days.


I think "influenza-like illness (ILI) and severe acute respiratory infections (SARI)"* numbers would be the best, it would remove testing differences and could be compared to expected numbers from prior years.

In the US states send the data to the National Syndromic Surveillance Program (NSSP) but I can't find a public source for the numbers that IL sends. Here are some plots that I make for IL (a state that does not report hospitalizations yet but will likely do so starting some point this week):

https://msliczniak.github.io/COVID19IL/plots/index.html

* https://www.who.int/influenza/surveillance_monitoring/ili_sa...


The most transparent number is a direct comparison between deaths last year and deaths this year, in the same timespan (eg. Jan 2019 vs Jan 2020, and so on). But there is a lot of gamesmanship at play among nations, both for internal security and external geopolitics reasons, so these numbers are too much too often false after convenient miscalcultation or plain manipulation.


I'm not sure if we will be able to separate the confounding factors such as economic impact at the population level. Do you have an idea how this can be done? I have seen a few studies using this method to look at excess mortality during the great recession [1],[2], but there wasn't a pandemic at the same time.

https://www.thelancet.com/journals/lancet/article/PIIS0140-6...

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3070776/


> but there wasn't a pandemic at the same time.

There certainly was a pandemic, the 2009 H1N1 one. [1] In the last few months I've started to suspect that that was what killed my grandma in July 2010, she died very suddenly because of some respiratory issues (I live in Eastern Europe).

[1] https://en.wikipedia.org/wiki/2009_flu_pandemic


That is an interesting connection I will have to keep in mind. I don't think that would be an issue for the first link at least, which only looked at additional cancer deaths in the US and OECD due to the recession.


Economic impact at this early stage of the pandemic is pretty non-existent though? Italy is just having a very few people not paying for basic food at supermarkets in the Southern part of the country right now... that is where jobs are underpaid, hidden in the black market or just semi- daily- tricks to survive more or less legally, and the first to evaporate when there are no people in the street to fool. So the second stage of a pandemic is social discontent and revolts if people can’t eat... but we are not there yet? That’s why I really think all statistically surplus deaths right now all over the globe are just Covid deaths, but you will not have the exact number... only undertaker businesses may have a proper idea, maybe, if not just the Army in non-democratic countries.


I'm not sure if you looked at the papers I linked, but the deaths are not from starvation or violence. Examples would be much more mundane, like a skipped doctors visit which could have detected an early cancer.


You are right indeed, but Italian local press under the https://www.today.it/ local reporters umbrella (on top left, just click citynews for all the cities covered) is reporting some difficulties right now, not collateral deaths... aka the health system still works, apart of ICUs and ventilators... we are being subjected to a strong, patriotic rhetoric against Covid over here, but the only regional health system gone tits up is Lombardy right now... and even there, normal service for the chronically ill still goes on as usual, with some delay.


And deaths from covid in Lombardy are already in the tens of thousands, so it's not like a few collateral deaths are going to make a statistically significant difference


Deaths are hardest to hide, so I'd always go with that over the other options - but Intensive Care beds seem to be the most critical stat now.


They are harder to hide but will only be known later. Some countries report only hospital death for now. But many die at home and in retirement communities. But it can track growth if the data stay consistent.


I’m not sure what to make of this. “Lots of distributions give you straight-ish lines on a log-log plot” (http://bactra.org/weblog/491.html) so it isn’t surprising that the slopes of the lines are somewhat constant over time.

Because taking the logarithm is such an equalizing operator, I also doubt whether it is surprising that lines seem to overlap for each country. Zooming in, there still is a difference of about 20% in new cases/total reported cases between countries, even in the range of 5k-10k total confirmed cases. Taken over the course of multiple days, that can make quite a difference.


The graph doesn't make much sense without a bit of explanation. (It certainly didn't to me, anyway.) The minutephysics video linked to from cptroot's comment, and also here [1] for your convenience, does a great job of that.

In short, you're right it's not surprising that the lines are log-log linear for uncontrolled growth of the virus, and that it's similar for lot of countries. What's interesting is the few (so far) cases where it drops below that log-log linear line, which indicates a containment strategy that's starting to work.

[1] https://www.youtube.com/watch?v=54XLXg4fYsc


But that containment strategy isn't working for Italy at all.


It takes at least 2 weeks until the number of deaths starts to decrease. Look at the graph in the weeks to come and you will notice the difference.


They’ve started containment 9th March, how is that now two weeks ago?


> “Lots of distributions give you straight-ish lines on a log-log plot”

I agree. I don't remember the last time I've seen a log-log plot that doesn't look linear. I've never found a lot of use in them for actually illuminating much.


I strongly distrust any figures from pretty much any country without a complete and transparent testing regimen - (Hi South Korea, you know what you're doing!). The wide variance of testing protocols, even within countries - is going to kill our ability to really do much with these numbers.

I'd be far more interested in _death_ rates. I.e., what was the normal death rate, and what is it now? It's not sexy, it needs to be seasonally adjusted, and it's subject to noise, but it's a much better heuristic than "covid case", because the base number isn't as gameable.


But are all these graphs comparable? Testing and reporting is not comparable in these countries.


A takeaway is that the underlying epidemiological behaviour is consistent despite significant regional differences in management and monitoring.

The accompanying video makes this point explicitly. Do watch it if you've not.

https://invidio.us/watch?v=54XLXg4fYsc

I'd also suggest extreme attention be paid to locations with improbably low case and death reports, or where severity mix and/or mortality are strongly out of line with expectations.


The video shows a plot of "new cases", which might be measuring the rollout of testing, not the actual increase in number of people with the virus. It is also not showing data per-capita. In other words, this is not terribly useful IMO


> The video shows a plot of "new cases", which might be measuring the rollout of testing

Maybe, but probably not given how consistent the trend is over time, response, and as the disease progresses.

> It is also not showing data per-capita.

Per-capita would be much LESS representative. The virus spreads locally, at a scale far smaller than country borders. The point is to show the progression of an outbreak. If you show per-capita, you would be showing the number of outbreaks per country and minimizing the growth of any individual outbreak. For instance China, with 4x the population of the US, would be moved much farther down the graph than the US. That would only make the data look fuzzier, and convey zero useful information.

In the end it would probably be a very minimal difference given the logarithmic scale.


Uniformity across regimes suggests a strong underlying correspondence.


> or where severity mix and/or mortality are strongly out of line with expectations.

Yes, that was the only conclusion I was able to make based on Germany's data being so different in the proportion of known cases against reported deaths. Some arguments from there like to explain their results as "superior" hospitals and the "lack of hygiene" as the cause of the results of Italy, Spain, France etc. and I just don't buy it.

It seems to me rather a result of a combination of more factors: my explanation at the moment is that they already had more old people in hospitals, and that they had for a lot of such people already "diagnosed" what brought them there, not changing it now when they die. As we see from the user nathell reporting from Poland, they are not the only land with such an idea.

It earns them short term prestige for "superior" health system, but it obscures what's going on. If they manage to keep doing this with the newly admitted cases remains at the moment unknown.

The real story behind all the graphs, anyway, is not in the numbers which we can see from the statistics of "reported cases" or "deaths", but those that are only to read between the lines, and which are the actual causes for all the measures introduced by all the countries, and that is for the countries at the start:

- how far are they from the health system being overflown

and for the countries where it progressed:

- how many people are without necessary medical care due to the health system not being able to handle, and how many people die due to that.

Both of those are something that the countries would rather not directly report. So that's why, from some point on, the only conclusion we will be able to reliably have would be possible if we'd be able to have the actual death statistics (including the cases not claimed to be Covid-19 cases) and compare these statistics with the statistics in the "normal times." All surges are then surely indirectly (due to the disruption of all the country's systems) and directly caused by Covid-19, even if they aren't reported as such.

Back to the data we have now, I also find the ft.com graphics the best on the web at the moment.

https://www.ft.com/coronavirus-latest

The log-log graph from this title is only useful post-factum and I don't agree with its current advantages claimed in the video at the moment. It is obscuring the current severe issues in most of the countries:

Contrary to the claims of the video, at the moment, most of the countries indeed should look at the charts of log of whatever on the y axis against the linear time on the x axis. With the doubling time of around 3 days, that means that whichever capacities you manage to provide, like hospital beds, or more health workers, whenever you double them, that "advantage" disappears in just three more days.

Linear time on x axis is the only sensible choice. The charts should allow us having some idea what the future brings.


> Some arguments from there like to explain their results as "superior" hospitals and the "lack of hygiene" as the cause of the results of Italy, Spain, France etc. and I just don't buy it.

That sounds pretty wrong, and it's not what I've been hearing here in Germany. The "official" story is that testing ramped up earlier relative to when the outbreak started, and that through luck so far the virus hasn't infected many retirement homes and old people in general. [0]

If you break down cases / deaths in Germany by state and by district, you find that the districts that had big outbreaks the earliest also have the highest death rate now.

Heinsberg, the original hotspot, is now at something around 2.5% [1]. So I think it really is consistent with the story that testing started relatively earlier so fewer cases were missed, at least initially (there are reports now that non-symptomatic contact persons of confirmed cases are not getting tested anymore due to test shortages). This would imply that the death rate would converge to the world-wide average over time, which seems to be happening.

[0] https://www.deutschlandfunk.de/covid-19-warum-die-todesrate-...

[1] https://interaktiv.tagesspiegel.de/lab/karte-sars-cov-2-in-d...


> That sounds pretty wrong, and it's not what I've been hearing here in Germany.

Depends on who passes you the news. My quotes are from what "Dr. Stefan Hockertz" an "immunologist" directly claimed, and there are some other German-speaking claimed experts that wrote or made similar statements, including "it's not worse than flu" even as there were so many dead in Italy and Spain -- then their argument for why it's not happening in Germany is the said "superiority" and the reason for happening in Italy their "lack of hygiene". Some of these were than repeated in some articles written in French, which I've happened to read. French were actually interested to find what is Germany doing "better." What can be seen is however, German "doctors" criticizing Italians for "reporting those who died with coronavirus as dying from coronavirus" totally ignoring the scale and speed at which that happened in Italy, Spain and France.

The said distinction was apparently often used in Germany (or not? maybe you can investigate more? I'd be very interested in what you find!) but I do believe it's misleading, I think the sudden order of magnitude surges can have only a single cause, or some equivalent. If they claim it's not this coronavirus, and the surge factually exists, then there must be some additional new illness that otherwise wasn't recognized.


I think there are some people everywhere who like to find confirmation for their nationalist prejudice in this crisis. It's wrong, but it's also just way too early to draw any conclusions. You shouldn't be proud of your achievment when you don't know what's yet to come.

For what it's worth, I just searched for "why is the mortality rate lower in germany" in french on Google [0] and I've briefly looked at most of the hits on the first page, many of them cite German experts noting exactly the points that I mentioned.

[0] https://www.google.com/search?q=pourquoi+le+taux+de+mortalit...

Edit: Hockertz specifically seems to be something of a Coronavirus "denialist" [1], I don't think he's expressing a view that many share.

[1] https://www.br.de/nachrichten/wissen/faktenfuchs-aussagen-de...


> I don't think he's expressing a view that many share.

"Many" is hard to estimate, but my impression is that there was, at least at some points of time, obvious political motivation to downplay the seriousness of the epidemics. Apparently, at the moment when even the Netherlands, also being even more "pro market" and having publicly stated "herd immunity" goal (by their prime minister), already closed the restaurants, there were still open places across the border in Germany, and at least some parts of Germany were still reluctant to admit the seriousness of the issue. By the way, I still can't find some useful timeline of the measures introduced in Germany in the

https://en.wikipedia.org/wiki/2020_coronavirus_pandemic_in_G...

I am aware that many decisions are up to the regional institutions there, and that's why generalizing and equalizing any statement to whole Germany is by definition wrong. But we can recognize, at least, different "interests" influencing what is happening and how it is covered in the media.

In that sense, as the impact of the epidemics to Germany potentially increases, I expect the climate to be always more similar to the one in the countries which are already more seriously affected.

However blaming first southern people for being "inferior" to Germans and having "less hygiene" fails to the fertile ground there, one other relatively recent reaction was "the Spanish cucumbers are guilty" affair where the cucumbers from Spain were destroyed but eventually the cause turned out to completely originate in Germany:

https://en.wikipedia.org/wiki/2011_Germany_E._coli_O104:H4_o...

"Spain consequently expressed anger about having its produce linked with the deadly E. coli outbreak, which cost Spanish exporters US$200 million per week"

Eventually it was established that the origin was "an organic farm[1] in Bienenbüttel, Lower Saxony, Germany."


> However blaming first southern people for being "inferior" to Germans and having "less hygiene" fails to the fertile ground there

I understand that you find that offensive, I do as well. But please don't generalize from statements made by fringe conspiracy theorists. That's almost as divisive as the original statement.


> please don't generalize

I don't, I specifically point exactly against the generalization:

> I am aware that many decisions are up to the regional institutions there, and that's why generalizing and equalizing any statement to whole Germany is by definition wrong.

and additionally give an example of one older affair where exactly the same prejudices that I mentioned resulted in measurable consequences.

> "Spain consequently expressed anger about having its produce linked with the deadly E. coli outbreak, which cost Spanish exporters US$200 million per week"

Is there anything that you dispute in what I've written?

If you want me to additionally support my claim that there was downplaying of the seriousness of the coronavirus epidemics, I have also:

"Of course, people will still die, but I lean out of the window and say: It could well be that in 2020 we won't have more deaths than in any other year."

https://www.faz.net/aktuell/gesellschaft/gesundheit/coronavi...

"Virologe Hendrik Streeck : „Wir haben neue Symptome entdeckt“ 16.03.2020"

A virologist, less than 2 weeks ago. Two doctors openly downplaying is already a symptom to me. I can imagine that slipping to the educated politicians, but the doctors...

Regarding which deaths are reported Covid-19 in Germany, my initial observation, please see other comments in this whole topics, e.g:

https://news.ycombinator.com/item?id=22720011

or

https://news.ycombinator.com/item?id=22719448

linking to the Guardian explaining:

"Unlike in Italy, there is currently no widespread postmortem testing for the novel coronavirus in Germany. The RKI says those who were not tested for Covid-19 in their lifetime but are suspected to have been infected with the virus “can” be tested after death, but in Germany’s decentralised health system this is not yet a routine practice."

As I also already mentioned, Germany is big with a lot of local policies which aren't unified across the federation, so let me stress it again, all that should be taken with grain of salt.

EDIT: it also appears to me that even the "debunking" of that first doctor guy didn't even try to address that he's misusing prejudices against "the southern people" etc. I haven't tied to analyze the "debunking" too carefully (not enough time) but my general impression is "oh, it's too early to tell if he is right."


What I'm saying is that this particular explanation for the apparent difference in fatality ratios of Coronavirus infections is a fringe one, and it's not right to point to that as an example of sentiment in Germany in general.

I'm not disputing any of the other things that you wrote. Distancing started in Germany on March 12, at that point it was just a legally non-binding recommendation, it became legally binding on March 16. The timing is similar to what happened in many other European countries with the exception of Italy.

Based on that, it's fair to say that the seriousness was downplayed initially, but that is pretty much universal. We knew this was going to be a global problem by mid February at the very latest.


Germany tested more and tested earlier. This allowed Germany to look into the future, nothing more.

If you look at the increase in deaths it’s pretty obvious that Germany is now catching up fast.

Testing alone obviously doesn’t do anything against the epidemic so it still remains to be seen whether everything that happened besides testing helped and whether the time won through being able to look into the future was used well.


I like this chart because it shows when various countries jumped off the trend, presumably heading towards recovery. But I agree, to be useful as a comparison it should be normalized by total population per country in addition to the testing/reporting part that you mentioned.

Additionally, it'll also show a second wave, if that starts happening anywhere.


Switch the view so it shows reported deaths (drop down in top right). Those numbers should be more reliable comparable.


Even within US (Washington - my state) has way lower numbers than New York even though it was hit first. The first reported case and death weren’t far from where I live.

So either Washington has flattened the curve or we’re doing a lot fewer tests than New York. I know a couple of friends who have covid-19 symptoms but haven’t been tested since there aren’t enough kits and they are quarantining themselves at home.

So my guess is Washington cases number are at-least 2X higher than what’s reported.

Overall at this rate US will hit a million reported cases in a couple of weeks. It seems we are the country doing a great job at testing and reporting but the virus is spreading like wild fires in metros.


Well, consider the population difference too — NY State has 20M residents to Washington’s 7.5M, which is less than the population of NYC alone. Be wary of comparing raw counts when the underlying population sizes are so different!


I've been working on this which is confirmed cases per state population https://us-covid19-per-capita.net

It might make more sense if it tracked deaths per state population instead of confirmed cases b/c of the different testing rates.


One plot I was hoping for by now is a back-in-time plot. Something that assumes for each case we find positive today, we assume that person has been polluting the world with covid-19 viruses in an effort to track events and spread based on todays data with the notion that they really caught it 2 weeks ago.

The idea is that even if we started social distancing about 7 days ago, we won't see any benefits to that for another 7 days (since the average symptom time is about 2 weeks). And so any spikes you see in these graphs is all of the people that got infected 2 weeks ago, and aren't really sick until right now.


Wouldn't it make more sense to compare new cases vs active cases? At the end of this plot you can see China's new cases increasing again, but the number of total cases is so large that the x axis doesn't move.


Hard to use active cases since it's not monotonic. If you switch to linear the resurgence becomes more visible.


If only it was true.

All of these articles need to say ‘tested’ and ‘published’ cases. If you aren’t testing randomly, or if you aren’t publishing (China), then the data is really showing the rate of testing of sick people.


When comparing countries only relative numbers (cases per capita) should bei used. Everything else is misleading.


lg(cases/pop) = lg(case) - lg(pop) = lg(case) + C


Excellent math. For those who aren't so comfortable with the mathematical notation, the point of this comment was that the log of the cases per capita is the same as the log of just the cases, plus a constant offset. This constant offset is present in the linked chart, but is visually compressed - you don't really notice it that much. So if you look at the chart and see how the lines do/don't line up on top of one another, that's partly due to the constant offset.


You are right if you are just interested in the slope. But most people don't just look at the slope, they look at the numbers and what to know, which country is doing better or worse. And that is when the constant (which is different for each country) matters.


lg(1000 * 1000 * 1000) - lg(1000) = 6


Complete opposite. Arguably better would be to look at local population density, but even that is just a proxy for how easily the virus should spread.

Total number is much better than per-capita, because the point is to see how quickly the disease spreads, which it does from a single point. No country is even close to majority infected so weighting by the size of the arbitrary borders enclosing the outbreak is only misleading.

What useful information would it convey, when moving the US 4x farther up the line than China? Or Italy? The point is to show how quickly the virus spreads through a population, which it does with great consistency. Per capita doesn't even tell you anything about how effective the response was been, it only tells you how big the country is. Per-capita would de-normalize this data!


Can zoom on mobile to tick more countries, but can't unzoom if my viewpoint is on the plot:(


There is also explanatory video about that graph on minutephysics Youtube channel: https://youtu.be/54XLXg4fYsc


Wait, but if you plot the total number of cases with respect to time on a log-normal scale, you're still going to get a straight line, right? So why is plotting against time a bad idea?


The plot is function against derivative of function. This is also called a phase space plot which is quite often used in the dynamic systems theory.

https://en.wikipedia.org/wiki/Phase_space


That doesn't answer my question.


You can't easily compare countries if you plot against time. China's outbreak was in February, USA's is now. As pointed out in the video (https://invidio.us/watch?v=54XLXg4fYsc), viruses don't care if it's March 29th or February 2nd. They are a "function" of the number of infected (and other parameters, but not time).


It depends on how you are modeling the data. In a logy plot, an exponential trend y=a * exp(b * x) appears linear, and is easiest to visualize/extrapolate. In a log-log plot, a power law y=a * x^b appears linear instead.


But since virus spread is exponential, why is plotting the number of cases vs time a bad idea? It would also give us a line plot.


It depends partly on your assumptions, and how you are modeling it. An exponential curve assumes that all members of a population have equal contact with all other members of the population. A power law assumes that there is some social graph that maps into N dimensions, and that infections occur along an N-1 dimensional surface in that space. Either is a valid way to model the infection, and result in different preferred ways to plot the data.


One thing the I don’t see discussed is imho analysis should not rely on a single plot. Multiple different plots describe different thing.

Let’s use “all the plots”


What happened to Qatar? This one country's plot looks weird after clicking "select all".


Its crazy how correlated the outbreaks are sort of regardless of policy differences outside of China.

So correlated that it also makes you wonder if Japan hasn't been sweeping a lot of cases under the rug trying to salvage the Olympics. It will be interesting from here to see what happens with their cases and if they magically "rejoin the line."


Is there a good reason, that I am not seeing, why the total population is not relevant?


As far as we know, no country is approaching saturation yet - e.g. if 10% of population has been infected, then that would slow the infection speed just by 10% which would not be noticeable in these scales. The total population will become obviously relevant if, say, 30% or 50% of the country is infected, but that's kind of the worst case scenario that we'd like to avoid.


Tell me why I'm wrong, usually the more a chart is 'perfect' the more the data is 'averaged' and so the less you can actually see what's going on

Also this chart is about number of cases, which is not a good way to compare evolution between countries as all countries have different test method


it's the ratio of new cases to total, which accounts for the comparability. most countries remain comparable with themselves, and those that do change method, since the growth is exponential, it tends to drown out any incomparabilities quickly enough.

anyway, the proof is in the pudding. the data isn't smushed to the point of making individual variability invisible


Why is the growth rate with time faster in every other country except China ?


China had longer for it’s quarantine to show up in the data. Remember, time is not an axis of this graph, just # of new infections vs total infections.


Software question - what library did you use for the chart ?


https://github.com/aatishb/covidtrends (linked at bottom of page)


Looks like Plotly for the chart and Vue.js for the slider.


Please add per-capita options!


On a plot of log(cases), per capita is just an offset. It shifts the line up or down on the plot, and doesn't make a difference. In a post upthread, mzs reminds us of the math:

log(cases/pop) = log(cases) - log(pop) = log(cases) + C

Log plots tend to visually compress relatively small effects such as these, so you don't really notice it, but it is much of why the lines aren't completely on top of one another.

In other words - a per-capita plot should look the same, but the lines would be harder to distinguish, because they'd be mostly right on top of one another.


You are assuming people are only interested in the slope. Then the constant doesn't matter. But it does matter when you want to know which country is an infection hot spot or not. A thousand cases in the US is something completely different from a thousand cases in Vatican city.


Did anyone else notice how worldometers.info, which maintains a table listing number of cases, recoveries, deaths, etc. for each country, ordered by number of cases, had China listed first even after USA surpassed it in number of cases. No explanations were given. Then they removed China from the table altogether. What is going on behind the scenes at that website.


China appears to be there now for me, and I hadn't noticed it missing previous days.


Also Italy was gone at some point and re-appeared. I guess it’s just a glitch.


I see now China is at the very bottom, below Timor-Leste.


The "basic" list seems to be properly sorted: https://www.worldometers.info/coronavirus/countries-where-co...


This means absolutly nothing. Please stop making dramatic charts out of pure sensationalism.


Log(dX)/log(X) shows how the outbreak behaves compared to pure exponential growth ( y = x) so any deviation shows how effective a country is at preventing exponential growth. It's super useful.


ok i may be completly off here but focusing on counting Covit-19 test positives seems rather useless- what we should be focusing on is context. lots of work to be done here. start by comparing deaths with similar cause (flue) from same timespan last year. or even simpler: look at recovered cases. Also: the effects caused by actions by goverments will be barely visible.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: