Hacker News new | past | comments | ask | show | jobs | submit login

Er, so all graphs of temperature should start at absolute zero? It's absurd to expect all graphs to start at zero -- the point of a bar graph is to show relative comparisons, and the whole point of the labels on the axes is to relate the graph to an absolute scale. You even admit this, "[a] graph is supposed to convey the relative magnitudes of the data points" -- Exactly, and so the graph should be scaled in a way that efficiently conveys the relative information. I apologize for the ad hominem, but this is just serious PEBKAC: you need to learn to read graphs. The mistaken conclusions you arrived at could have been avoided if you paid attention to the labels.

Yes, a poorly made graph can make a tiny effect look huge, e.g. if it was the case that 99% of people drank, and the graph was just plotting variation within the remaining 1%, that could be pretty misleading. This is yet another reason that the consumers of information should pay attention to the labels and understand what the real message is from the data. Don't expect the creators of the graph to present the data in the "fairest" way (whatever that is); instead it's our responsibility to think critically and make sure we understand what's really going on. The creator of the graph will naturally act to advance his own interests.

tl;dr: It's a mistake to consider the labels to be "fine print".




You are both missing the forest for the trees. Graph axes should scale in proportion to observed variance of the statistic being shown, automatically creating a "natural scale". The magnitude of the scale is of course proportional to the error in the measurement(s) you are showing, it has nothing to do with absolute numbers.

"The creator of the graph will naturally act to advance his own interests" -> "The biased creator of a graph with too much riding on his hypothesis will surreptitiously act to advance his own interests". People who do this are not worth listening to.


Fully agreed/conceded. I was addressing the graph consumer's point of view. There's definitely good practices the graph creator can follow to most effectively/honestly convey the information contained in the data.


When I see a graph that start's at some random point on the Y axis I assume the person is extremely biased and instantly ignore what they are saying. If I don't see error bars on data points I assume they have zero idea how accurate there information is.

Unfortunately I suspect that this approach is less common among the under educated because trying to create misleading charts is vary common.


You make a good point about temperature, but I stand by my critique because temperature is a quantity and these graphs are all plotting percentages, which should always been graphed from 0% to 100%. If your percentages are all within the range of 99% to 100% then you shouldn't be plotting them as percentages, and if you did then your graph would effectively convey the point that there really isn't a lot of variation among the data.

I'll reiterate my opinion that the purpose of a bar chart is not to give relative orderings (if you want that just list them in order) but to illustrate the magnitude of the differences between data points with respect to the natural range of the data. Percentages have a natural range of values: 0-100. Temperature does not have a natural fixed range, unless we're talking about the weather which is incidentally why I like Fahrenheit.


> with respect to the natural range of the data.

There's hardly ever such a thing as a "natural" range. Even if there is, as in the case of percentages, what's the point on wasting half of the graph on variation that has a mundane explanation? Like if you're plotting popularity of religions in the USA, do you really want to spend about 90% of the vertical space of your graph on Christianity? The point a graph is to shed light on the part of the variation that doesn't have a mundane explanation -- i.e., the interesting part.

What you should be complaining about are graphs that don't have labels, as used by some drug companies in advertisements[1]. Maybe that "graphs without labels are bad" meme has maybe been misinterpreted here.

1. In the USA drug companies are permitted to advertise prescription medications.


> There's hardly ever such a thing as a "natural" range.

Another good point, but if bar A is twice as tall as bar B then variable A's value had better be twice that of B's. Anything else is precisely the kind of shady drug-company data manipulation that you're so concerned about. If vertical space is at an absolute premium (not the case here IMO) and you have to cut the bars then draw little cut marks on them to let the reader know that the bottoms aren't at zero.

(Of course I'm talking about cases when there is a familar concept of zero, like percentages or money or population, not temperature. And if the value goes negative then the bar grows down, cf. any recent chart of US economic indicators.)

If you don't base your bars at zero then your bar chart exaggerates differences between data because the reader looks at two bars and compares their relative sizes the same way he looks at the levels in two glasses of beer to compare their volumes.


You're contradicting yourself again. If there's no "natural" scale, how do we decide when something is "twice" as big as another? How do you decide which scale is "natural" anyway? And it's still pointless to spend most of the vertical space on redundant mundane information when it could be used to actually show the relationships between the groups being graphed.

Overall you're introducing ad hoc and nebulous special cases that are poorly justified, like the notion of a 'familiar zero'. The real solution is to learn to properly read graphs, and to ignore graphs that omit labels.

> like percentages or money or population

If you're plotting the net worth of the top 10 richest people on Earth, do you really want to use a miles-long graph or drown out all the actual information by just showing that the top 10 richest are within one pixel-worth-of-dollars of each other?


We decide X is twice as big as Y when X = 2Y. This is not up for debate.

And re. net worth: That's exactly how I would plot it if I wanted to make a point about income inequality. Carlos Slim is worth US$53 billion. The per capita income in Mexico is US$13,500 and in the US it's $46,000. (Yes I know this is comparing yearly income to lifetime accumulated wealth but ignore that for a moment.) If I give the average Mexican 1 pixel and the average American 3 or 4 pixels, then Mr. Slim gets almost 4 million pixels. At 96 dpi that's roughly 3400 feet. Your monitor would have to be (much) taller than the Burj Khalifa (2717 ft) to accurately show this bar char. This is the picture I would draw if I wanted to make a point about income inequality, a person sitting next to the Burj Khalifa with a ridiculous laptop in is lap that towers over it.

If I just wanted to rank the top 10, I would just list them in order like they do in Forbes.


> We decide X is twice as big as Y when X = 2Y. This is not up for debate.

Yes, but doubled relative to what? e.g. what's twice as big as 50F? Is it 100F or 559.67F? The first one is relative to 0F and the second is relative to absolute zero (50F is 283.15K).

Your point about how to illustrate income inequality is fine, but that's not at all what I was talking about: I was talking about a graph that depicts the distribution of net worth amongst the top 10 richest people. No, a rank order doesn't fully convey that. A table of numbers is hard to understand in a holistic way.


There's a popular, if a little contested, notion of scales that I won't source right now that divides scales into ordered, interval, ratio and absolute depending on things like whether there is such a thing as "twice as big". Fahrenheit is an interval scale, which is why we have Rankine (ratio).

Twice as hot would, in a physics sense, be 1019 °R (560 °F) because SI temperature uses the ratio scale. Fahrenheit is then just syntactic sugar for people who don't care about the physical temperature as much as they care about the range of their comfort zone. When measuring deviation from 0 °F, Fahrenheit also becomes a ratio scale, but not a very useful one (deviation from 50 °F would be better, or you could just use Celsius).

So the real question is, what do you mean "twice as big"? Twice as much warmer than "pretty cold", or twice as energetic?

As for the 10 richest people, I'm not sure they're that near each other, but assuming they are, that's going to be the important bit. If you don't want to focus on that, you have a few options. You could draw the cutmarks. You could make it a zoom lens picture. Or you could establish a baseline, such as the highest income tax bracket or the median top 10 income, offset all incomes appropriately, and draw the bar graph from that. In the latter case, you should draw some bars as negative, and label the baseline appropriately (not 0). I don't think it would make for a good graph.

However, for a distribution, absolutely no offsets allowed.


No, you would use a logarithmic graph for income.


You always chart money starting from zero.

In the case of billionaires you are looking at an estimate of there net worth so while you can chart Carlos Slim Helu & family at 53.5B and William Gates III at 53.0B if you start the chart at 50 billion your error bars would take up 50% of the chart's vertical space.

And that's the major reason why trying to create a chart that exaggerates differences is such a bad idea. If your error bars are 5% and the gap is 1% then pretending there is a significant gap is stupid.


Ok, lets plot uptime on a scale of 0-100%. 99% uptime, 99.9% uptime, 99.9999% uptime, it's all just short of 100%.

Similarly, I'll go to my boss and plot success rates of a certain operation (can't say what due to confidentiality) which usually fails, but occasionally gives us a big win (success rate 0-3%). I'll be sure to plot on a range of 0-100% so that he knows we almost always fail. I certainly don't want to cheat and make him think my improvements of 1% -> 1.5% are a big deal.


In the case of uptime I'd use an inverse logarithmic plot of the amount of downtime.


You chart downtime as a percentage, because you don't actually care about uptime.


Ok, I'll chart downtime as a percentage on a scale of 0-100%. Same problem.


Don't play dumb. Nobody is saying you should plot ROI on a 0% to 100% scale when you have a 730% ROI.

Also, if you are charting rates that go from from 11/1000 to 15/1000 then building a chart that goes from 0% to 2% is reasonable, but 1% to 1.6% is not.


Another good point, but if bar A is twice as tall as bar B then variable A's value had better be twice that of B's.

That's ridiculous. What if the things that you're comparing have an exponential relationship? You should probably use a logarithmic scale---that's what it's designed for, after all.


So then A's logarithm is twice as big as B's logarithm. I don't see the problem here, as long as you put your zero at zero and label your axes.

A grid would help too, because logarithms are rarely perceived and the information consumer needs the warning.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: