Tuesday, March 31, 2020

Morning Assortment

     I've got to get to the grocery this morning and see about assembling a week's worth of groceries, so today's post is a little hasty.  Along with that, Holden the cat is still learning that cats are not allowed on my desk at breakfast time, a rule about which he expresses great doubt.
 *  *  *
     But let's talk about a few things, data and ways to present it among them.

    One of the best books I can't remember the title of, a book ostensibly about commercial art, had a very good section on how to avoid telling lies with charts and graphs.  It is staggeringly easy to do so, intentionally or not, because of a few factors.

     The first is that we love a pretty picture.  If scales and hues need to be adjusted to get an eye-catching presentation (or just to fit the  page or screen), we will do so.  You end up doing things like over-emphasizing small variations between very large numbers (commonly done by rescaling or trimming bar charts or graphs to remove "all that wasted space.)

     Another is that while most of the growth (and decay) processes in the world tend to proceed logarithmically, our perceptions and expectations are linear.  Even our senses scale exponentially rather than linearly.  In the very short term, the straight line and the swooping curve track closely enough to get by -- but in the long term, they diverge rapidly.  Once a processes gets started, it ends up going like a rocket!  That's why, outside of a few hard-hit and early-onset areas, you're probably looking around and thinking, "Hunh.  Not much of a pandemic."  In NYC, ERs are packed, gurneys in the hallways, and they're nearly out of ICU beds.  Even here in Indianapolis, the biggest hospitals are starting to feel the pinch -- and we've got a couple weeks to go before the peak, if present predictions hold.

     A third is confirmation bias: we're good at cherry-picking what we see or read to conform what we already expect.  This is the bane of experimental work, and why in things like drug trials, there has to be a "control" group, who do everything your test group does -- except use the drug under test.

     A fourth is "granularity."  For the United States, the Johns Hopkins coronavirus map only goes to the county level; for Canada, case data is per Province or Territory, and for most of Europe, it's per country.  These are not sections of equal population; they're just handy chunks that probably reflect how the data comes in to JHU.  The IHME data and predictions, on the other hand, are state-by-state at their narrowest; you're not going to find anything about measures taken by cities and counties on their pages, though it may affect their predictive models.  You can't read this data any deeper than it goes.

     Fifth and last, our good old friend, Dunning-Kruger Syndrome: we don't know what we don't know.  Heck, I can do math, I can read a study written in plain English -- why shouldn't I make my own predictions?  One reason would be that I don't know how good a model a locked-down cruise ship full of the kinds of people who can afford to go on a cruise might be for a large American city, full of a wide assortment of people doing a wide assortment of things; YMMV, but remember, there are folks who make a living doing this sort of thing and the reputable ones are extremely cautious about inferring too much.

     Please, let's just do what we can to get through this.


Anonymous said...

“The Visual Display of Quantitative Information” by Edward R. Tufte

Douglas2 said...

Darrell Huff's "How to Lie with Statistics"?