The problem I'm thinking about is that you can end up with as many people having grown in say a pesticide-polluted rural environment later living on big cities as you have people remaining where the problem comes from. The map can then seem to show literally the opposite of the phenomenon occurring.
You can correct for these things, but it's going to require more data, and a more complex causal model describing the phenomenon if you don't want your controls to introduce bias themselves.
It's also a challenge that public-health epidemiologists have been dealing with for a long time, for which there's been a tremendous recent explosion in both data and research methods. And there are ways to test for this, which I'm not fully aware of, though I've some basic familiarity.
I've already addressed some of this above, so with some repetition:
- People simply don't move that much, have been moving less over time within the U.S.,[1] and moreover don't move consistently. So wherever there's an initial strong cause, you'll have a fairly large cohort remaining on that site and showing impacts over time, particularly those who are most susceptible to such influences. Again: neonates, infants, children. Many cancer / disease clusters are found by such mechanisms.
- Where people do move, the end result is something of a "blurring of the signal". You'll get a blob at the origin, and maybe scattered points elsewhere. Those will tend to be at likely points of migrations: nearby neighbourhoods and towns, nearby large cities, regional/national cities of prominence, and (sometimes) locations with established immigrant communities (whether intranational or international). These are ... somewhat ... predictable patterns. The signal will tend to be strongest at or near the source.
- Deeper and extended data. Where topical data (e.g., diagnosis and current residence) don't seem to correlate with a known possible cause, or show a rare-but-below-threshold cluster, epidemiologists will dig for further information. Possibly with patient surveys, possibly other methods. What they're looking for in that case will be recent, or non-recent, movement patterns. Once a probable cluster source is identified, that can be used as a specific clue for further research. This is of more use to an epidemiologist who can conduct such further research than a data scientist who's working off extant databases (partial, limited data capture, etc., etc.), but are possible. And yes, this is one of the fundamental limitations of strictly broadly-captured data research.
There's a lot of medical research, even within healthcare and governmental organisations which relies on fairly low-quality and easily-collected data. The reason is that those data exist and are cheap. The questions are how to maximise utility of such sources, and knowing when to dig deeper.
Again: people moving really isn't the major problem you're making it out to be. Yes, it makes the job somewhat more challenging. But it's still generally tractable.
I'm not saying that moving people are a complete showstopper for epidemiologists, what I'm saying is that it make the map visualization a poor fit for the task because it will induce the casual reader with access to only the map (that is, not epidemiologists with access to the full data) into making wrong conclusions.
You can correct for these things, but it's going to require more data, and a more complex causal model describing the phenomenon if you don't want your controls to introduce bias themselves.