The problem I'm thinking about is that you can end up with as many people having...

dredmorbius · 2024-08-12T03:11:33 1723432293

So, first, that's a fair point.

It's also a challenge that public-health epidemiologists have been dealing with for a long time, for which there's been a tremendous recent explosion in both data and research methods. And there are ways to test for this, which I'm not fully aware of, though I've some basic familiarity.

I've already addressed some of this above, so with some repetition:

- People simply don't move that much, have been moving less over time within the U.S.,[1] and moreover don't move consistently. So wherever there's an initial strong cause, you'll have a fairly large cohort remaining on that site and showing impacts over time, particularly those who are most susceptible to such influences. Again: neonates, infants, children. Many cancer / disease clusters are found by such mechanisms.

- Where people do move, the end result is something of a "blurring of the signal". You'll get a blob at the origin, and maybe scattered points elsewhere. Those will tend to be at likely points of migrations: nearby neighbourhoods and towns, nearby large cities, regional/national cities of prominence, and (sometimes) locations with established immigrant communities (whether intranational or international). These are ... somewhat ... predictable patterns. The signal will tend to be strongest at or near the source.

- Deeper and extended data. Where topical data (e.g., diagnosis and current residence) don't seem to correlate with a known possible cause, or show a rare-but-below-threshold cluster, epidemiologists will dig for further information. Possibly with patient surveys, possibly other methods. What they're looking for in that case will be recent, or non-recent, movement patterns. Once a probable cluster source is identified, that can be used as a specific clue for further research. This is of more use to an epidemiologist who can conduct such further research than a data scientist who's working off extant databases (partial, limited data capture, etc., etc.), but are possible. And yes, this is one of the fundamental limitations of strictly broadly-captured data research.

There's a lot of medical research, even within healthcare and governmental organisations which relies on fairly low-quality and easily-collected data. The reason is that those data exist and are cheap. The questions are how to maximise utility of such sources, and knowing when to dig deeper.

Again: people moving really isn't the major problem you're making it out to be. Yes, it makes the job somewhat more challenging. But it's still generally tractable.

________________________________

Notes:

1. See "Despite the pandemic narrative, Americans are moving at historically low rates" (2021) <https://www.brookings.edu/articles/despite-the-pandemic-narr...>, "Americans are moving at historically low rates, in part because Millennials are staying put" (2017) <https://www.pewresearch.org/short-reads/2017/02/13/americans...>, and "Americans no longer want to move for work. Here's why." (2023) <https://www.cbsnews.com/news/moving-for-work-mobilty-record-...>. In the 1950s and 60s, as much as 20% of Americans moved every year. By 2021 that had fallen to 8%.

littlestymaar · 2024-08-14T08:57:02 1723625822

I'm not saying that moving people are a complete showstopper for epidemiologists, what I'm saying is that it make the map visualization a poor fit for the task because it will induce the casual reader with access to only the map (that is, not epidemiologists with access to the full data) into making wrong conclusions.