For all of you out there like me who think of themselves as visual learners, graphs and diagrams can be very helpful when trying to understand data. They are concise representations of usually complicated statistical analyses, but you have to be careful when you try to figure out exactly what the visuals are presenting. At first glance, there is often a clear trend, relationship, or comparison but, if you look closer, the data could be showing something different. Graphs are easily misinterpreted and, I hate to say this, sometimes that is the reason they are used. My goal is to let you all know some of the basic things to look at when trying to figure out the figures. [Disclaimer: I have a bit of experience creating graphs and diagrams (I did win a research poster competition with them – toot toot! *my own horn*) and am learning to use SAS for statistical analysis (the very basics), but I have had a lot of guidance from actual statisticians and am by no means an expert!]
We’ll start by breaking down one of the commonly seen (and adapted) climate change graphs.
Axes: Look at the x- (horizontal) and y- (vertical) axes (plural for axis)
- What variables are being measured?
- What are the units? (inches, hours, ppm – parts per million)
- What is the scale for each axis? (0-100, 1800-2000)
- What are the increments? (5’s, 10’s, 100’s)
Text: The title, legend, and captions provide additional information about the data
- What does the title tell you about the graph? *Of course this version of the graph does not have a title. If it did, it would most likely read ‘millennial temperature reconstruction’ (from the original paper). Raise your hand if you know which famous graph we’re looking at!
- Is there a legend? What do the symbols/colors represent (what are the different groups)?
- Does the graph have a caption? What additional information does it provide?
- What is the general shape? *cough* hockey stick
- What is the general trend? Is it positive (increasing) or negative (decreasing)?
- Is there anything that stands out? (a random point, a larger or smaller bar, more than one ‘hump’)
- Is the data too ‘pretty’ (does it fit the ‘ideal’ too well)? *There is often an ‘ideal distribution’ of data (ex: the bell curve) and if a graph perfectly (no skewing, no data points that are a bit “out there”, etc.) it may mean that the graph was cleaned up a little too much and may not be the best representation of the data.
So, we were taking a closer look at Mann et al.’s notorious ‘hockey stick’ graph, which presents past temperature anomalies (temperatures that vary from the average). On the x-axis is the year (1000-2000) in increments of 50 years (each tick mark = 50 years, every 200 years is labeled… 1200, 1400, etc). Temperature variations, in degrees Celsius, from the 1961-1990 average (represented as 0.0) are on the y-axis in increments of 0.1 degrees (with labels at every 0.5o). The caption tells you where the data came from (thermometers, tree rings, corals, ice cores, and historical records) and what the colors indicate. It is also noted that this data is for the Northern Hemisphere.
Familiarity with scientific graphs allow us to make a couple of assumptions:
1. The gray ‘background noise’ is data error. Notice: error decreases as time approaches year 2000 when we have actual thermometer data (makes sense, right).
2. The black line is a ‘smoothed out’ representation of the data. Notice: the line follows the general fluctuations in the other data, and makes it a bit easier to see the dips and peaks in temperature.
There are many types of graphs and charts. For a quick quiz on the basics of reading graphs click here. For a more in-depth look at scientific graphs, take a look at Vanderbilt’s handout (for biostatistics) or Vision Learning (more general science).
Update: The Washington Post’s article “Why this National Review temperature graph is so misleading” clearly illustrates why paying attention to scale matters. Simply by expanding the scale, the National Review presented a radically different graph. The same change in scale was used on graphs showing the National Debt and the Dow Jones Industrial Average, with equally significant results.
The tweeted graph from the National Review: the temperature scale is from -10o Fahrenheit to 110o F.
A graph of the same data using a scale from 56.5o Fahrenheit to 58.5o F.