The dirty little secret that data journalists aren’t telling you
 Take a look at the map above. It tells the story of a year's population change in the United States, according to the latest census data. It shows where the population is growing, like the coasts, the Sun Belt and the oil fields of western North Dakota.
Take a look at the map above. It tells the story of a year's population change in the United States, according to the latest census data. It shows where the population is growing, like the coasts, the Sun Belt and the oil fields of western North Dakota.
It shows, too, where numbers are in decline -- along the Mississippi River and in much of the rural Northeast, from northern Maine down through New York, western Pennsylvania and into the heart of the Appalachians.
This is the story that I believe the good folks at the Pew Charitable Trusts wanted to tell last week when they dug into these numbers. But using the exact same data set, they generated a map that looks like this:
"Population growth slowed last year in some of the nation's most expensive counties, like those in California's Silicon Valley, and picked up in more affordable counties in the Sun Belt," according to the text above the map. But it's next to impossible to discern that just from looking at the map. In fact, it's hard to discern just about anything.
The difference between my map and Pew's -- again, they both use the exact same data set -- underscores a bit of a dirty little secret in data journalism: Visualizing data is as much an art as a science. And seemingly tiny design decisions -- where to set a color threshold, how many thresholds to set, etc. -- can radically alter how numbers are displayed and perceived by readers.
Pew treated the data in a seemingly sensible and rational way. It sliced up the full range of data -- from minus-6.3 percent (Terrell County, Tex.) to plus-28.7 percent (Loving County, Tex.) -- into five buckets of equal size: -6.3 to 0.7, 0.7 to 7.7, 7.7 to 14.7, etc. Assign a color to each bucket, color each county according to which bucket its population falls into, and voila! A map. Right?
The problem is that while Pew's buckets are nice and evenly distributed, the numbers are not. There are 3,141 counties in the Census Bureau's data set, and 3,138 of them fall into either the first or second buckets. Only three counties -- extreme outliers, all of them -- posted a population gain greater than 7.7 percent last year.
So those three darker shades of the scale are essentially unused, and the entire map gets washed out into two similar colors.
[The rise of humankind, in one mesmerizing map]
But there's a potentially bigger problem, too. Some counties lost population, while others gained. That's a pretty big distinction. Mapmakers often respect big distinctions like that by using a bivariate color scale -- say, one set of colors for positive values (like blue), and another set of colors for negative ones (like red).
That's what I ended up doing in the map at the top of this page. But that crucial positive-negative threshold gets lost in Pew's map -- that lightest shade of colors on its map encompasses all negative values and some positive ones as well. The map becomes blind to perhaps the most significant dividing line in the data -- the border between growth and recession.
I don't write any of this to pick on the Pew Charitable Trusts (full disclosure: I used to work at the Pew Research Center, a project it funds). Its overall analysis of the census data in its story is 100 percent sound, and like anyone else who does this for a living I've made my share of clunker maps over the years.
But each time I put numbers on a map, I'm struck by how it's possible to radically alter the appearance of a visualization just by tweaking a couple of basic parameters. And with the proliferation of maps like these, as well as tools that make it easy for just about anyone to make them, it's helpful to understand just how much these decisions can affect what you see on the printed (or digital) page.
Numbers carry a veneer of authority and objectivity that words can seem to lack. But communicating with numbers is, in many ways, just like communicating with words. You make decisions about what to emphasize and what to downplay, and about how to convey a full understanding of the subject at hand.
Ideally, those decisions lead to presenting the numbers in the clearest possible light. But as with words, inarticulate framing can lead you to muddle rather than clarify.
Src: washingtonpost.com
