Statistics, or Stories?

Politicians make me queasy. So do self-help books, Twitter wars, linguistics professors, even TED talks. It’s not that they lack value—Twitter excepted, of course—it’s that they rely too heavily on examples and stories to persuade.

Why? It’s effective. As Chip and Dan Heath put it in Made to Stick (one of those self-help books), anecdotes—simple, concrete, and relatable—”have the amazing dual power to simulate and to inspire.” But such a quote isn’t as persuasive as the real examples they offer in the book, like how Steve Denning managed to convince the World Bank senior leadership to restructure the organization by sharing the story of a single Zambian health worker searching for malaria treatment info. Stories are more convincing than statistics.

But with billions of people and anecdotes in the world, you can cherry-pick one for almost any argument you’d like. So for every story Elizabeth Warren tells about an immigrant who lost everything to Trumpian cruelty, I’m sure Donald Trump could find a loyal storyteller who lost everything to immigrants. It breaks my heart that stories outpersuade statistics: in other words, that people are human.

The issue, to me, is that people (a) don’t remember numbers, and (b) distrust statistics. Point (a) is probably an irreversible pillar of human nature that will always sway the persuasion pendulum in favor of stories. For point (b), anecdotes often seem more trustworthy because they happened to real, imaginable people, while we generally don’t know where most of the stats we consume come from. The Heath brothers put it better than I can: “tinkering with statistics provides lucrative employment for untold numbers of issue advocates. Ethically challenged people with lots of analytical smarts can, with enough contortions, make almost any case from a given set of statistics.” But the same thing happens with stories! It seems that the problem with statistics is their deceptive pretense to be fact, while apparently stories don’t start off with this claim in the first place.

Of course, it would be impossible and non-productive to qualify every argument with accurate statistics. I didn’t begin this post with, “Seventy-two percent of politicians make me queasy (standard error of 8.2 percentage-points).” But education in statistics and statistical ethics can make us better listeners and communicators. Our statistics should be prospective and inclusive, and we must have the flexibility to accept and share statistics that don’t support our arguments.

If you’re trying to convince someone, stories are your best bet. But to understand an issue, statistics can be skyscrapers built of story-bricks. For some final advice, I’ll turn to the Heath brothers one last time: “when it comes to statistics, our best advice is to use them as input, not output. Use them to make up your mind on an issue. Don’t make up your mind and then go looking for the numbers to support yourself—that’s asking for temptation and trouble.” But don’t do that with stories, either.

Quotes are cherry-picked from the popular book by Chip and Dan Heath, Made to Stick: Why some ideas take hold and others come unstuck (Penguin Random House, 2007), pages 237 and 147.

Deconstructing Humor using Text Mining on /r/Jokes

I recently turned in a project for an elective class I’m taking on humor where I analyzed a bunch of jokes posted to the /r/Jokes subreddit. I thought that some of the results were applicable to this blog, so I’ll summarize the interesting ones here:

1) Donald Trump jokes are funnier than average

Jokes that included the word “Trump” received an average of 141 more net upvotes than jokes that didn’t involve the name (mean score 258 vs. 117, p=0.0016). Moreover, jokes that used both the word “Trump” and the word “orange” in the same post scored on average 837 more net upvotes than jokes with just “Trump,” but this result wasn’t statistically significant because only 57 jokes used both words (mean score 1074 vs. 237, p=0.27).

For comparison, Obama jokes are not significantly more upvoted than average (mean score 149 vs. 118, p=0.45, n=588).

2) Chickens, deer, turkeys, cows, and elephants are the funniest animals

These are chosen by comparing their frequency in jokes against their frequencies in everyday English. Frogs, monkeys, and ducks tend to appear in jokes pretty often as well.

3) “He” and “man” are substantially more common than “she” and “woman,” but “wife” and “girlfriend” appear more than “husband” and “boyfriend

So men are the subject of more jokes, unless the joke happens to be about a man-woman relationship?

Also, jokes using female-gendered words tend to repeat them more times than jokes using male-gendered words.

The data for this analysis was from Taivo Pungas’ public Github repository at My code is available here. You can download the report as a PDF too, if you’re interested in other findings or methodological details.

Why Data?

Welcome to my new blog!

Beginning today I will be posting short, highly approachable stories about how I use large datasets in different ways to approach my everyday activities. Starting out, my hopes for this blog are threefold: (1) to share creative ways to look at publicly available data, (2) to improve my own understanding of techniques and of the world, and (3) to make data manipulation and visualization more accessible to everyone.

I work in a public health sciences lab, and spend most of my time there looking at large datasets. But why bring that home? In short, because “big data” isn’t just available in pre-canned fashion to scientists and is capable of revealing interesting things to without needing PhD-trained researchers and rigorous statistics. While I don’t condone bad statistics, my mission is to show that interesting data lies all around us—and you can learn a lot simply by asking engaging questions, and maybe a few handy skills at the keyboard.

Finally, please do contact me at any time with questions, insights, suggestions, errors with my work, etc. My email is, and be sure to check out the rest of my website as well! I look forward to hearing from you!