Zooming out from this specific town and article, it's kind of interesting to look at regional differences in cancer incidence. Like, I get that people live very different lives in different places, and that states have different resources to bring to public health problems, but the areas of red in the midwest, northeast and gulf coast are quite striking.
Beware of the sample size issue. The Midwest has lots of low-population counties, for which a single cancer case plus or minus makes a great difference in the incidence rate.
See also: Bayesian Data Analysis, Gelman et al., 2nd edition, section 2.8.
But we should expect the plus or minus to be fairly evenly distributed if there weren't an underlying reason besides randomness and small sample sizes. That would lead to regions with lots of blue/red/yellow speckling. Instead we see fairly homogeneous regions of red and blue with transition zones between them. This definitely doesn't look like a noisy undersampled data set
I've done a lot of cross-country exploring of the USA by motorcycle in the past, and it always shocked me how obviously harmfully polluted the air can be in residential areas neighboring industrial activity.
I don't understand how it continues unchecked. Sometimes I'd spend as little as an hour breathing the stuff and my nose and throat would be sore for days afterwards.
My assumption is the residents are too poor so they lack representation and likely are employed by the same companies behind the pollution. The government should be more actively monitoring these industrial activities and protecting its citizens from their negligence.
> The government should be more actively monitoring these industrial activities and protecting its citizens from their negligence.
Local and state politicians don’t want to be seen as killing industrial jobs. The Federal government really needs to take action here, but this is obviously a non starter in the current administration.
Pittsburgh’s air (while still f’ed up) is infinitely better than it was 30 or 40 years ago, largely because of EPA regulations.
I would be curious to see this alongside / with some relative derived metric for life expectancy. To my knowledge cancer can definitely be influenced by environmental/lifestyle factors but its incidence would also be increased by the relative age of the people in an area. For example if everyone dies by 50/60 of alcoholism, addiction, diabetes, or work-related injuries the cancer incidence would be lower than expected even though the people there are definitely unhealthier than the general populace
Actually, I see that the stat is age adjusted. So not sure what effect that would have on the comparison that I described
The vast differences between the SW corner of Virginia and the surrounding states suggest there's some sort of difference in reporting or screening going on.
I feel like the coloring might make it appear more striking than it is? The difference in cancer rate thresholds between the blue and red colors is just over 20%. We don’t know the actual rates, of course, just noting the coloring range is surprisingly narrow.
The mortality data are possibly more telling. High incidence may be a matter of screening and early detection. Dead bodies have stories to tell and are harder to hide.
The outline of the Ohio-Mississippi river valleys is particularly clear.
The mortality map shows equivalent death rates across borders. Indiana, Kentucky, Illinois, Arkansas, Mississippi and Louisiana. And is markedly higher in non-urban (and desperately poor) counties. By contrast, incidence rates (clinical detection preceding mortality) clusters more toward urban and wealthier counties.
As always, it's important to remember that even in the absence of environmental factors, the probability of a particular town of this size having cancer rates this high might be very low, but the probability of there existing such a town in the U.S. might actually be high. This is simply due to variance.
Four one in a million events in 15k inhabitants. Yes in any ranked list of stochastic outcomes there must be a highest scorer and a lowest scorer, but in case of Wayland GA it's still off the charts. With these diseases for any given town in the US you'd expect 0 or 1, hardly ever 2, if independently distributed.
Your conclusion is probably correct, but you also need to know how many classes of events you looked at. The particular form of cancer wasn't set in advance, it arose out of the data, so you need to multiply the number of towns by the number of types "rare cancer".
My programming is better than my math[1], so I brute forced it as a simulation. Using 2000 towns, each with a population of 10000, and testing for 200 different types of cancer with a prevalence of 1 in 100000, I got this distribution:
1: 36393
2: 1831
3: 63
4: 1
That is to say, in the single simulation I ran[2], with a higher prevalence of 1 in 100000 vs 1 in 1000000, but a lower population of 10000 rather than 15000, I found that 1 in 2000 towns had 4 incidences of one of the 200 types of cancer I considered. This proves nothing directly, and environment probably played a role in the town in the story, but it does imply to me that it's not absurd that a small town might have 4 incidences of a rare cancer merely by chance.
[1] What's the right way to calculate this mathematically?
[2] My slow Perl code took about 10 minutes to run while I was writing this, and I committed to showing the numbers generated regardless of what they came out as.
[3] Oops, reading more closely I now see that the linked article is distinguishing between "rare" and "very rare".
Since you put in all the work! It's the binomial distribution [1]. What you've done is to simulate the distribution of a sum of binomial distributions.
Your point of there being more than two types of rare cancer and that any selection of outliers over all the types would give rise to the same sentiment is totally correct.
I did the quick mental calculation with one in a million. As you see in your simulation the difference between 1e-5 and 1e-6 makes all the difference here.
We know this must be a statistical artefact because of the way cancer arises in humans when they are exposed to a carcinogens. Take tobacco smoke for example, it does not cause just one type of cancer to rise in frequency (say lung cancer), but a whole range of cancers [0]. You just can’t expose people to a general carcinogen like “pollution” and end up with one kind of rare cancer. Cancer just doesn’t work this way.
What I think you mean to say is that if these cancers were caused by environmental contamination, we would expect to see elevated levels of other cancers as well.
First, I'm not sure this is true for all forms of cancer and all types of carcinogens. While it's true that tobacco smoke contains a wide variety of carcinogens and causes a wide variety of cancers, it doesn't seem impossible that some particular chemical might cause an increase in rhabdomyosarcomas without producing a general increase. For example, I don't think that asbestos exposure leads to an across the board increase in all cancers, just an increase in lung cancer and mesothelioma. What makes you so sure that there cannot exist a more focused carcinogen that causes just rhabdomyosarcoma?
Second, you seem to be assuming that just because the article focuses on one particular type of cancer, that the town does not have elevated levels of other cancers. Do you have outside evidence that makes you so sure of this that you conclude that the increased rate of once type of cancer "must be a statistical artifact" rather than evidence of a greater underlying problem that merits further research?
Yes it is possible for some specific rare cancers to have some single specific cause (mesothelioma and asbestos are probably the best known example), but this is not what is seen with nearly all other carcinogens.
The reason is two fold:
1) Almost all carcinogens are not localised to a precise body location (asbestos is a rare example). Once inside a person the carcinogen is distributed everywhere. This is why exposure to tobacco smoke for example gives rise to cancers in distant locations to the lung.
2) Carcinogenic pollutants are almost never just a single chemical, but a wide range of compounds.
Cancer is not like infectious disease where you have one causative agent that gives rise to a distinct set of symptoms. The nature by which cancer arise (loss of cellular replication control), by its very nature rarely proceeds down the same path - expose one person to a specific carcinogen and they might get liver cancer and lung cancer in a different person.
Thanks. So you think that apart from very special cases like asbestos, any chemical cause would cause other cancers as well, and there is no reasonable chance that a particular chemical could cause only rhabdomyosarcoma. I'll defer to your expertise here.
I was partly confused by one of the links from the article[1], which says "These cancers are derived from similar cellular lineage; therefore, these sarcomas are closely related in cellular composition. It is biologically plausible that a common exposure exists." Is there a way in which this might make sense? Are they suggesting that it might be biologically contagious rather than chemically induced?
And the second question? When you conclude that the prevalence of this particular must be a statistical artifact rather than a reproducible effect, do you have reason to believe that there are not elevated levels of other cancers in the area? Or is this not actually necessary to show?
I don't think it is biologically plausible at all that a common chemical exposure exists. It is possible, just unlikely. It also doesn't match up with what we know about pollutant chemicals being a wide range of compounds, nor the broad range of cancers seen with other known pollutant chemicals.
I would be far more likely to believe that a large number of unrelated cancers in an area were caused by pollutant carcinogens, than a small number of rare cancers.
I don't know if there is an elevated rate of cancer in this area, but I assume not since it would be the first thing that anyone looking at this rare cancer cluster would look for.
"Variance" does not give people cancer. But pollution does. The nuance is lost in your comment, and what you might more accurately say is "statistics/variance predicts that there is a high probability that somewhere in the US there is a town that has enough pollution (or other cause) to create cancer rates this high. It happens to be Waycross, GA"
No, that's not what "statistics/variance predicts". What statistics predicts is that even if there was no pollution whatsoever in the entire country we would still see some towns with increased cancer rates.
That said, I didn't do the math and it's entirely possible that the rates in Waycross are way above what one might expect by chance.
Your original comment was still valuable though because it makes people consider another, and important angle. Someone will probably do the math and show that even after considering statistical variance, this is still an outlier, but I feel like I have a more complete view with your comment.
Genetics also gives people a chance. There is some probability that somewhere in the US there is a town that has enough people with cancer-prone genetics to create cancer rates this high.
The probability of you randomly selecting a sample of people who happen to have a high rate of cancer is very low.
The probability of you EVENTUALLY selecting a sample of people who have a high rate of cancer after selecting many samples is actually very high.
The probability of you eventually selecting a sample of people who happen to have a high rate of cancer AND happen to live in the same town AND happen to have cancers that are classified as rare is very very low.
There is no doubt this is an anomaly by every logical measure and not a statistical quirk.
This is an example of where your gut notion of a cancer cluster actually trumps the statistical logic. While logic is always correct and the logic you were using was technically wrong, this is a good example of how your intuitive gut feeling of what a cluster is, trumps your logical mind. Don't ignore your gut feelings, don't ignore your intuitions. The logical part of the human mind is just as flawed as your intuitions and it can build an incorrect scaffold of compositions that support an incorrect conclusion.
When your logic collides with your intuitions, do not automatically assume that logic trumps all. Evaluate both.
Your reliance on "logic" and not statistical training (it appears you don't have much) has lead you astray. Read the other replies to my comment to understand why you haven't understood.
I read it and it appears to me you are still wrong. You are better served to explain to me explicitly why you think you are not wrong then for me to read your mind.
There are events that contain so much information that the probability of the event occurring due to entropy are basically impossible. The event fitting those 3 conditions I mentioned is one example.
Let me add one more conditional to the cancer-cluster event: Happening within the same time-frame.
The answer to the clickbait headline is "pollution," and the town is Waycross, GA. Here's your "money" paragraph:
> In late 2015, the Georgia Department of Public Health said it could find no link among the children’s cancer cases. Then it backtracked and said more investigation was needed. In December, the federal government stepped in. The Agency for Toxic Substances and Disease Registry, an arm of the Centers for Disease Control and Prevention, said it would work alongside state officials in evaluating contamination at the railroad yard as well as an Atlanta Gas Light property that once held a power plant, which was torn down after closing 60 years ago.
Added nuance: A government-sponsored investigation using a narrow data set generated by a company responsible for the cleanup of a single site, of which Waycross has multiple under investigation, decided there was no conclusive evidence of Waycross having abnormally high levels of cancer, excluding diagnoises of cancer that occured outside of the town even if the diagnosises were on town residents.
Also: A town newspaper did not cover this, and the head editor as well as the mayor believes this is a case of "Facebook moms" and overly-concerned parents using personal griefs to wreck Waycross's tenuous economic bounce back.
>excluding diagnoises of cancer that occured outside of the town even if the diagnosises were on town residents.
I would be floored if they did this. I'm a statistician who reports on cancer rates for another state, and the universal standard for assigning locations to cases is the patient's residential address. Looking at Georgia's 2016 report[1], they mention using the SEER*Stat software provided by SEER[2]. I have the same software, and I've never seen it offer any information about the reporting facility beyond a general category ("hospital", "lab", etc). They would've had to work outside the well-worn paths using a custom extract from the cancer registry. Not impossible, but also not reasonable.
It's true that epidemiologists, or whoever looked into the clusters, could've been a separate unit from the statisticians behind the annual reports. And maybe they're not used to cancer epidemiology and didn't ask about best practices. But I'd be surprised if they messed up this basic thing.
Also, I've heard of the study mentioned in the article where only one out of 428 cluster investigations proved true. But the way it was portrayed at a cancer registry conference, the low rate is a combination of small sample sizes and over-reporting of "clusters." Cancer's a rare and frightening event, which are two things people have trouble seeing rationally.
it's also clearly irrelevant from a medical standpoint where the diagnosis was made... unless there's local bribing of doctors going on!
so why insist on only counting local diagnoses?
EDIT: perhaps there are regulations that mandate doctors to diagnose similar percentages, to prevent overdiagnosis... and perhaps this is being weaponized against discovering localized exposure to carcinogens?
> "On the cancer cluster question, ATSDR cites data from the Georgia Comprehensive Cancer Registry to make the case that there isn’t one. GCCR data from the past decade shows childhood cancer incidence has been below the state average here. GCCR’s Waycross-area data, however, "doesn’t include people who are diagnosed at hospitals in other cities", as Lexi, Harris, Gage, and Raylee all were."
Charitably, I guess the researchers just don't have resources to locate patients more accurately than by their diagnosing hospitals location. It's a huge amount of noise on top of patients actual location history, which would seem to make their survey only capable of detecting persistent hotspots that span the location of many hospitals.
The point is it's an investigation not a conclusion.
The author heavily focuses on pollution in the area being a possible cause, and points out inadequacy in investigation performed on and arguments against pollution being the cause. But all her evidence is anecdotal.
The point isn't to assert that she is certain of the cause, it's to provide a spotlight on evidence which she has seen which has not been given adequate examination and the resulting suffering from what could possibly be preventable.
While that's a solid point, I think a critical part of why people look down on "clickbait" is the additional ingredient of "withholds key information to force readers to click ... and that key information makes the content substantially less interesting than the headline implied." The reason we call it clickbait is because it implies that it would clearly not be something you'd want to click, were the hook left unbaited.
Otherwise, it's not all that bad - just clever headline choice. (The point of headlines has always been to intrigue prospective readers, after all.)
Since this article is a substantial piece of journalism that remains interesting if you know the tl;dr, rather than an insubstantial chunk of #content, I don't think it really counts as clickbait.
When I was small we learned the inverted pyramid model of news, that the most important information should be at the top (the headline) and succeeding content should be ordered primarily in descending order of information significance.
Did this change at some point? Or is this just tabloid practices infecting journalism in the era of maximizing clicks? I find the old model infinitely better especially as people tend not to read past headlines in most cases.
The key information does reduce the expected level of interest. Controversy about cancer clusters and pollution is nothing new. But there are other possibilities, e.g. it might be caused by some genetic disorder that's found in a family who happens to live there, and the article might be about identifying and diagnosing that. It might be about some infectious disease that causes cancer like HPV. It might be about a cultural practice, e.g. eating carcinogenic foods like hijiki seaweed. Unless you actually live there, or know somebody who does, all these are more interesting to read about than the real "residents say it's pollution, industry says it isn't, everybody argues but nothing is really decided" story that's happened so many times before.
The article seems well-researched and informative. It made me think about the balance between economic growth's positive and negative outcomes. If it takes a modicum of editorial trickery to get more eyes on such a piece, I think we're fine.
> A non-clickbait title would be: "Pollution suspected of causing rare cancers in Waycross, GA"
That's arguably worse than clickbait. It actually discourages people from reading the article who might find it interesting.
It suffers from being both too specific and too general at the same time.
It's too specific because Waycross, GA, is a town of under 14k people and as far as I can see there is no reason to expect that many people outside of Georgia to have ever heard of it. Naming it in the headline suggests that the story is aimed at those people who do recognize it.
It's too general because of course it is pollution. The other things that cause cancer don't tend to cause cancer spikes in small areas.
Both the town and the cause are actually incidental to this story. The interesting aspects of the story are how the various parties reacted when some started noticing there might be a cancer cluster.
The headline mirrors the article's portrayal of the quest to find out the truth. If the article was just a few paragraphs long, I would have felt slighted. But as such, it read more like a mini-novel/documentary.
And of course, it's pure coincidence that the headline is perfectly tuned to cause fear to small-town Georgia residents and to their friends and family elsewhere, and the extra page views and ad revenue from people worried if it's their town are of no relevance whatsoever.
If there were a viable news business that allowed local journalists to investigate the issue, perhaps this wouldn't have happened, or have been uncovered earlier.
"The power plant helped light Waycross from 1916 to 1953. Along the way, it discharged coal tar—a thick, black byproduct of converting coal to fuel—into the canal system. Prolonged exposure to coal tar has been linked to various types of cancer...."
The first half of the article also calls out multiple toxic organic chemical contaminants from several operations including a railroad and a waste-processing plant ... severe enough to require Superfund remediation.
In that era ('Better things for better living through chemistry') most people were unaware of the dangers of industrial chemicals in the air, water and soil. They got little press, at least until 'Silent Spring'.
For the same reason "all the best schools are small" but aren't really [1]. It's the "Law of small numbers" that is: scale-dependency of variance. It's much easier to get an large variance if you sample in small sets.
This is a reminder that coal, operating under regulation releases more radioactive material over a few decades than a Nuclear power plants worst failure mode.
Yep, the most recent power plant disaster at fukushima (2011) didn't have a single fatality death due radiation (the fatalities have been attributed to non-radioactive causes such as building collapse, fire smoke, etc)
True. On the down side, it’s hard to see a failure mode for a coal plant that leave such a massive area devastated for so long, or one that’s so costly to manage.
The coal/natural gas plant near me was burning what appears to be some nasty coal just yesterday (bright yellow smoke)... is the smoke color a good indicator of it's toxicity? (most of the time, the smoke is white, not sure if it is because they were burning natural gas at that time)
“Coal” plants burn carbon, making carbon dioxide. It’s colorless and pretty much harmless, excepting it’s inpact as a GHG.
However, coal contains (depending on the source), mercury, sulfur, radioactive elements, tars, rare earth elements.
Some of these are volatile when burned. Some become ash.
A plant running anthracite is naturally very clean.
The plant has to have scrubbers to remove the volatile ones.
Especially the yellow smoke (likely sulfur)
However, a one time emission of sulfur is likely insignificant.
Finally you mentioned it could be natural gas. Natural gas, mostly methane, is also colorless except it releases water vapor. Given the right conditions, you might see plumes of white smoke coming out - that’s harmless water vapor.
due to regulations most of the time the smoke is relatively free of particulates. They massive amount of carbon released is another issue but when it comes to health concerns the major one should be the coal ash. It is radioactive and packed with heavy metals and carcinogens.
The CDC ruled that there isn't a cancer cluster in the town.
> On the cancer cluster question, ATSDR cites data from the Georgia Comprehensive Cancer Registry to make the case that there isn’t one. GCCR data from the past decade shows childhood cancer incidence has been below the state average here. GCCR’s Waycross-area data, however, doesn’t include people who are diagnosed at hospitals in other cities, as Lexi, Harris, Gage, and Raylee all were.
>On the cancer cluster question, ATSDR cites data from the Georgia Comprehensive Cancer Registry to make the case that there isn’t one. GCCR data from the past decade shows childhood cancer incidence has been below the state average here. GCCR’s Waycross-area data, however, doesn’t include people who are diagnosed at hospitals in other cities, as Lexi, Harris, Gage, and Raylee all were.
I’m a Georgia native but from Atlanta. I’ve never been to Waycross. I lived in South Georgia for about a year a couple hours away.
For anyone who read the Vidalia Onion article on the front page it’s about 1.5 hours north of here. There’s a beautiful college in Valdosta an hour away. The region is nice because it transitions into more of a Florida type of landscape.
It probably goes the other way: the responsible company's legal team determined they can't avoid liability any more, so they get some favorable studies to mitigate the consequences/reduce the size of the potential class action.
When we lived there, a very large drainage canal ran along our property. Certainly needed it for summer rainstorms. Never occurred to me that it could be carrying dangerous toxins too.
https://statecancerprofiles.cancer.gov/map/map.withimage.php...