On Meatless Meat

I’ve got to admit—from keeping ourselves fed to revamping our crossword site, full-blown adult life is busier than I expected!

In keeping with the newfound freedom of adulthood, this post marks a bit of a topical shift, or expansion, for the blog. I’ll keep posting about stats and data when I find them interesting, but I’d also like to start sharing some thoughts more directly relevant with our new life here in Berkeley.

Of course, the biggest change to my daily routine has been my job here at Memphis Meats, a really awesome startup in the lab-grown meat industry. As it stands, conventional meat companies in this country produce around 100 billion pounds of meat a year. (Yes, there are only about 50 billion pounds of human in America.) To hit that ridiculous mark, the American slaughterhouse processes 30 million cattle, 120 million hogs, 240 million turkeys, and 9 billion chickens. (Yes, that means 27 entire chickens for each of us.) This all goes to say that meat alternatives only need to attract 1%—one percent!—of the American meat market to become an instant billion-dollar industry.

In the world of science it’s universally accepted now that we eat far too much real meat. (For a place to start, I’d highly recommend the Netflix documentary Game Changers!) Most of us are vaguely concerned with deforestation, carbon emissions, animal living conditions, animal slaughter, meat-packing conditions, or heart disease, but those are things that seem nebulous to us, while cutting meat from our diet is just too real. I grew up eating cereal for breakfast, meat for lunch, and meat for dinner. In Chicago I spent most of my shopping time deliberating in the butcher aisle, with two-dollar racks of ribs and seven-dollar pork butts up for grabs. I guess I get it from my family, who gets it from society. When May met my grandparents in Alabama the first time, they eagerly asked her, “So, what’s your absolute favorite food?” Then, to present the natural choices, “Is it beef, pork, or seafood?”

But there’s an alternative solution to conventional vegetarianism, one that we really realized not long after moving to Berkeley. Unlike our Chicago grocery stores, which had a few Beyond Burgers and gross vegan sausages hiding near the butcher aisle, here we’ve got entire aisles dedicated to MorningStar, Quorn, Gardein, Tofurkey, Beyond, Impossible, and a host of other up-and-coming “fake” meat brands vying for early industry dominance. There are nuggets, sausages and burgers; but there are also hams, roasts, sliced bacon, duck meat, jerky, beef wellington—you name it.

So as I’ve realized that I should convert my cereal–meat–meat routine to an oatmeal–less meat–no meat regimen, I’ve started leaning into these alternative brands. (After all, most of the recipes I know are heavily meatcentric.) Several of the products May and I have tried have fallen really flat (the bacon we tried was literally flat, which is a bit eyebrow-raising for crisped bacon.) But other products would have certainly fooled me in a blind tasting.

Bottom line, we’re in the middle of one of the century’s defining engineering stories—and as someone who is now occupationally and personally invested, I’m going to start documenting my experiences as a consumer. Beginning just below and in the future, in addition to any other updates or little analyses on the blog, I’ll be posting our stories and reviews of these non-meat products. I’ll keep a running, unified table here on this page, with more detailed reviews in the blog posts. Hope you enjoy!


The Alpha Patty/Crumble: Meatless Sausage

Overall enjoymentMeat similarityPrice per poundFrozen?Date tried
Patty8/107/10$7.98YesJan ’21
Crumble7/106/10$7.98YesJan ’21

Kicking us off, we have a frozen, breakfast-style sausage by Alpha Patty. I dig the branding, but for half a pound of “meat” in a comically large bag, it’s probably not for the faint-of-freezer-space.

It’s available as a patty or as a crumble, so we decided to get both:

I cooked the patties straight (per package directions) for breakfast, along with some canned biscuits and an oat-milk gravy (May liked the oat-gravy; I thought it was a mistake).

If you served the sausage as conventional meat to me, I probably would’ve been fooled for two or three bites, placing Alpha Patty at a 7/10 meat-similarity. There’s a noticeable soy flavor, almost shiitake-like, that amplifies with each bite, but the texture is good and it’s certainly passable as an unassuming breakfast sausage when smothered in gravy.

For the crumble, I cooked it into a pasta with olives, onions, some (real) heavy cream, and other things. The “meat” was again fairly unassuming and didn’t contribute too much flavor in comparison with the other ingredients, but again, fairly passable. I’d probably expect real ground sausage to be a bit firmer, less homogeneous, and more flavorful, but next time I’d probably go for a more spiced-up variety like an Italian sausage if available.

Would I buy again? No, probably not, at least not while there are other options to try. I think this is a very safe alternative sausage, but it doesn’t knock any doors down for me. It does an okay job of hiding the soy flavor (better than Beyond, I’ll claim), but ultimately I’d like it to be a little more sausagey.

Ballots & Bookies

Swing states, mail-in votes, litigation threats… there was so much to keep track of as the election news rolled in this week, and it was hard to contextualize it toward results. It looked like Trump’s victory before Biden’s, but how convincing were those hints? It seemed like none of the news channels would give us a firm number until the very end. But in the thick of it, that was the only question any of us wanted answered—just how likely was either candidate to win?

When it comes to pasting a number on complex probabilities, I tend to trust those who make the most money off of getting the odds right. Here’s what the betting world said about the last few days; I’ve put some key states on there and when the Associated Press called them for Trump or Biden.

So, if you genuinely expected Trump to win by the end of Tuesday, you weren’t alone!

Quick thought: Letter frequencies in crossword puzzles

Just a quick follow-up on my last post about Crossword puzzles and our new website, CrossWorthy.net. (New puzzles there every Sunday! Check it out if you haven’t had a chance.)

Our crossword-filling algorithm works in such a way that it maximizes the remaining possibilities left by each word, based on the other letters in that word. For example, the three-letter “APE” would be favored over “AXE”, given the choice, because P is a more versatile letter than X. Thus, common letters would be favored, while uncommon letters—and therefore, the words that contain them, like JACK or OOZE—would be left out.

My thoughts are: (1) does this really happen in my algorithm? and (2) do professional crosswords (e.g. NY Times crosswords) suffer from this bias too? Here’s a bar chart:

The red is a corpus of the 20,000 most-used English words, a “reference point” as if all those words were used uniformly in crossword generation. The green is from 34,505 letters from complete, nonsense-free crosswords from our CrossWorthy algorithm. The blue is from 848 NY Times crosswords (183k letters) since 2018.

Apparently, the CrossWorthy algorithm heavily over-favors S and A. (It would be interesting to come up with a “versatility” metric for letters, with a different implication than “commonness”. A and S must be quite versatile. Compare A to I, for instance, who is just as common in the 20k word corpus. My gut says this has to do with the index(es) in the word where the letter can occur: both A and I are both common as second or third letters, but A is an easier first letter than I. Even the blue NY Times bars seem affected by this letter-versatility thing, at first glance, though not as badly.) As expected, our algorithm also under-utilizes rare letters: Z, Q, J, X, V, W, K, F, Y. The interesting part is that the NY Times crosswords don’t seem to underuse rare letters compared to the corpus… I guess the pros just come up with more interesting crossword words!

Crossworthy Combinations

Wow, it’s been busy lately! May and I are in the middle of moving across the country to Berkeley, California, where I’ll start a new job in an entirely new industry. We’re also finalizing May’s immigration paperwork, while I’m trying to get a last paper out the door to close out my public health research. Amid everything, May and I have decided to become our own cruciverbalists of sorts, launching a new website – crossworthy.net – where anyone can play our original crossword puzzles!

We’ve been interested in crosswords for a while, now, and May even produced a new mini (5×5) crossword for every day in the month of May. I became invested in the project when I realized how fascinatingly difficult it was to create crossword boards. Minis were hard enough, but “Midis” (7×7) and full-size boards (15×15) became nigh impossible.

Take this average, empty crossword grid, for instance:

There are a couple 13-letter words to fill and a couple 12-letter words to fill, so this definitely isn’t a trivial board. And from a large corpus of words I got from several sources (dictionaries, phrases, celebrity names, etc.), there are 1223 three-letter words, 2043 four-letter words, 2734 five-letter words, and so on… meaning there are about 1.7 x 10100 possible ways to arrange the horizontal words only, or more than the number of atoms in the universe (around 1080, apparently).

Given that only a relative handful of these would also give sensical vertical words too, the chances of filling a proper crossword board seem pretty slim. It makes sense to start by inserting words in the hardest spots – otherwise, by the time we get to them, we may be completely out of luck. Most professionals will fill in the longest words first, then build around them. But, as it happens, there are fewer 3-letter words than there are for these long words, so the first (and “hardest”, in a sense) word I’d choose to fill is just:

short for “cascading style sheets,” a ubiquitous web design language.

Why did I choose CSS as opposed to HAM, PJS, or another of the 1,223 three-letter words available? As it turns out, using CSS means that the three vertical, intersecting columns can still be filled by 823 different words in the database, more than it’d be if any of those other 1,222 entries were picked.

Now, the word-slot with fewest remaining possibilities is 5-Down, or “C _ _ _”: only 147 words match this pattern. So the process repeats, and we write in:

because this affords the 3 empty, horizontal “cross”-words 509 total possibilities, more than any other option.

Continuing in this vein, we can automate this algorithm pretty easily – with a few nuances, like giving more weight to longer words, thematic words, etc. Unfortunately, even with some extra optimizations after finishing, the whole process only works about 2 percent of the time. The other 98% get stuck with a few word-slots that just can’t be filled (see the histogram below). The whole process, coded up in Python with some additional optimizations tagged onto the end, takes about 10 seconds per attempted board, working out for an average of one good board every 8 minutes of running time or so.

Then, of course, comes the clue-writing, and then, of course, the playing!


Aside from our website crossworthy.net, we’re also @CrossWorthy on Twitter, and you can also sign up to get an email for every new crossword (every Sunday).

We would love to hear feedback on any of the boards or clues we generate, suggestions, and anything else! We’ve got a special email, crossworthypuzzles@gmail.com, where you can send any thoughts, or, as always, you can get in touch any other method.

Slow and steady… gets published eventually?

Lately, I feel like I’ve been talking about fracking nonstop. I’ve been researching fracking as a public health scientist since January 2018, but in this transitional time for me—graduation looming, the job search ever continuing—it seems that the last two and a half years of fracking research are all flashing by one last (?) time. In job interviews, I frequently find myself pontificating about fracking, sometimes formalized in slide decks. Then there’s my undergraduate honors thesis, which I just defended this morning in front of a faculty committee, where I describe my newest evidence that fracking activity may increase hospitalization rates in local communities.

So there’s a good chance I was speaking about fracking during an interview yesterday when an email popped into my inbox, declaring that my first paper—on fracking, of course—was finally published. (Even better, it’s open access, meaning you can read it without fancy university credentials; that privilege only cost our research fund $5000.) It mostly follows from the realization that fracking companies need not disclose the chemicals they inject into the earth to stimulate greater oil production. My co-authors and I investigated a prominent effort to improve public disclosure of said chemicals, but we concluded that it ultimately fails.

The funny thing is that the work described in the paper is mostly stuff I did two years ago or more. The paper itself was then written, rejected, reworked, rejected, entirely rewritten, rejected, reformatted, rejected, revised, then ultimately accepted. It went through five different journals, the last one requiring three rounds of back-and-forth peer review. Along the way there were such frustrations as:

  • having to redo all the analyses because the submission & revision process had outlasted an entire year of new data (for a while, I was afraid I’d have to do this a second time)
  • duking it out with adamant reviewers who had unbridgeable differences with us about our methodology
  • completely forgetting why I’d done some small detail in a particular manner, despite my best efforts to keep good notes (this one’s a real head-banger!)

The process was so slow and excruciating that, by the final few steps, it was the part of my job I dreaded the most. There I was, having to dredge up two-year-old code, data, and thought processes, just to satisfy some reviewer’s particular inquiry. Now that the paper’s out I do feel some pride, but even that feels rather muted because it’s no longer super relevant to my current research. By the time there was any light on this paper’s publication horizon, I had long since moved on to far more interesting projects. I wonder when will those be published?

I suppose it’s the old 80/20 rule – final delivery is the hardest part! But academia is a particularly slow-paced environment. That’s its great value: researchers should be able to afford a careful, focused, methodical approach with thoughtful feedback cycles, away from the pressures and influences of the corporate world. But to a college senior eager to launch a career, it all just feels so tortoiselike sometimes!

Some AI transcriptions of popular song lyrics

Amazon Transcribe is a Siri-like tool that can write down text from an audio recording. It’s probably useful for closed captioning, etc.

I thought I’d have some fun by feeding it a few famous songs:

Shake It Off (Taylor Swift)

“people, people, but people, people. Hey, just think you’ve been getting down and out about the liars and the dirty, dirty cheats of the world. You could have been getting down to this sick beat his new girlfriend. She’s like, Oh my God, I’m just shaking to go there with you. Come on over, baby with shake, shake, shake”

Piano Man (Billy Joel)

“It’s nine o’clock on Saturday. Oh, regular crowd shuffles. There’s an old man sitting next to me making love to his tonic and gin, he says. You play me a memory. I’m not really sure, but it’s sad and it’s sweet. And I knew it. I wouldn’t worry. Sing us, Sing us well around in the booth. You got a spill. It all right now, John at the bar is a friend of mine. It gets me my drinks for free, and it’s quick with a joke on light Up Your Smoke. But there’s someplace that he’d rather bill. I believe this is killing me. Smile. Run away from his face. My place. Now Paul is a real estate novelist who never had time for a while, and he’s talking with David, was still in the Navy and probably will be practicing politics. Businessmen. So get stuff, Yes, but it’s drinking. It’s a pretty good crowd for a Saturday, and the manager gives me a smile because he knows that it’s me they’ve been coming to see, to forget about life for a while. Sales like, What are you doing”

Lose Yourself (Eminem)

“one moment, sweaty in these weak on heavy. There’s vomit on his sweater already. Mom’s spaghetti. He’s nervous, but on the surface he looks calm and ready to drop bombs. But he keeps on forgetting what he wrote down. The whole crowd goes so loud he opens his mouth. But the words won’t come out jumping. How everybody’s token Down with reality. Oh, there goes, gravity goes so you won’t give up that easy. No backsies. It don’t matter. He knows that he’s so sad that he knows when he goes back to this mobile phone booth. Amusing. Wait, escaping through this hole that is gaping. This world is mine for the taking. Make me King as we move toward a new world order. A normal life is for the post mortem. It only grows. Homie grows hotter, blows no, goes home and barely knows his own nose. See, it goes cold. Cold products booth. The music. Mr No way game’s changed with rage. Cage. I was playing in the beginning, the mood on changed to spit out stays, but I can’t prime step right in the next life. I believe somebody’s paying right people. Life for my family. This man stands for And, you know, I think my life in these times. So it’s getting even. Wanna see being a father and a prima baby Mama Drama. Damn him like a snail of guns Formulated. This has got to go. I cannot wait. And the music”

Thriller (Michael Jackson)

“it’s me. Wait, Thio, You wait Creature froth without the solos All getting down Stand with inside of a corpse Shell way, Way with 40,000 years and ghouls from every tomb you fight to stay alive for yeah”

Where is the Love (Black-Eyed Peas)

“What’s wrong with the world, Mama? People living like Thank God. Oh Mama, I think the whole world’s addicted to the drama only attracted to things and bring the trauma overseas trying to stop terrorism. But we still gotta tell risk here, living in the U. S. Saying a pigsty, I pleasant quips and K k k. But if you have love for your own race, you’re only space ruminate and discriminate only generates hate on when your hate and your band against Ray what you demonstrate. And that’s exactly angle works. Operates. You gotta have loved to set a straight ticket told mind. Meditate so gravitates in love way Same always changed New days of strangers in love and peace is strong while pieces a pump that don’t belong nations dropping bombs, chemical gases, filling lungs of little ones with ongoing suffering as the young. So ask yourself is a loving, really, really what is going wrong in this world that we live in. People keep on making wrong decisions. Only visions of the Nativity respecting each other wars going on. But the reasons under cover the truth is kept secret, swept a little drunk and you never know you never know. I was waiting weight of the world on my shoulder. As I’m getting older, your people get colder. Most of us only care about money making and selfishness. Gotta follow with the wrong direction. Wrong information always shown by the media Negative images is the main criteria. Infecting the young man’s passage from bacteria gets one act like what they see. Whatever happened to the values of humanity? Whatever happened to the fairness and equality instead of spreading them, were spreading animosity, lack of understanding, leading with community. The reason why? Sometimes I really wonder if that’s the reason why. Sometimes I’m feeling down. It’s no wonder why. Sometimes I feel it under cash Way only got wait.”

Carrot Stix (Yours Truly)

“Karen Beans. I turn it green light Brooklyn since my teens, but I ain’t got no means. I ain’t got no money to buy all of my 13. How much was to eat a balanced style with the greens? 40 years of beaten and not a single salad. They see my face is pallet. Well, I say they’re facing invalid but three cardiac procedures and triple bypass since go go. Way to make a man convinced in the garden I’m a king and you can’t take that away And I got no job. Does that prove my food all day and getting me a thing? But it’s on my resume. I don’t get it. I got my environment. So I want to eat some veggies. But I can report that when they tell me. Take it easy, but they know they don’t go slows. Grow him on my own. So I pull a garden. You pull up with hope hope the Garden of Eden in spring, every seating and summer. I’m sweating, but I’m out here with my parents are here needing my breeding, my feeding. So I’m proceeding to 20 me bleed and I’m exceeding a speeding, deceiving and pee in the weeds and at a receding. And nothing can stop me from succeeding because I’ve been reading it. I put in my globe picking my spade. People responds better. Those parts put me up every like a piece of paper on my part was the prince. But I’ll unpopular properly planting my property Probably pretty soon, pulling the piece in the pods he’s produced pie. What are the odds? The bees, I’m afraid, Work to the 10 then in the stands, planting peas like Gregor and Garden King. And you can take that away. I got no because I grew my food all day and getting me a thing. But it’s on my resume. I don’t get a dollar, but I got mine of Ivan. I got the horses in the bag and the horses get me back. Three tons of solid horse manure. I put it on my back. I drag it to the garden for the plants. It’s like the armor. Yeah, just a garden. And I’m a full on farmer. I put in my glove. Oh, this was my labor of love. Shit like heaven above. Don’t eat from the stone. I don’t eat that achieve what I grow. Don’t make me. But you don’t. If you don’t like my garden, Brody And you were straight A meaning. They tell me rice, but I eat.”

SEAGULLS! (Stop It Now) (Bad Lip Reading)

“a penny for your thoughts. I hate Brenda and a bad guy hit me in the shit and I peed on my pants. Uh, it’s nothing a little music can now down to the beach. I’m strong in what she goes poking. My son said she goes, Stop it now. Everyone not to stroll on that beach said, You guys gonna come in and wait? When these Persian When I tried to run this way, Way back by you proven Booth. Hey, show you some dance moves night. Want to Joe Dante? There on the beach. Run those birds. Your psycho wiener. Let me grab my Peter, please. Come on, man. Quick that that’s bank. Put a fish in our basket. You owe me an apology. Just hold your breath, T One time Frank back, you’ll be back by train from Hera. I got your back. Quiet. I understand. On one candidate box. Your special gift. That’s good. Uh huh. One day I was walking into, found the lock and I rolled the log over underneath. It was a time stick, and I was like, Someday when you are older, you duties. Stop it now. Yeah, whatever. You’re sort of pitchy. Do you like? Listen, man, I’m not your friend, Loom, Don’t fall asleep. Stop it now.”

Coronavirus, Disease Burden, and the New York Times

So, UChicago just canceled their entire spring term, migrating all classes to online platforms and sending students home after we take winter quarter finals next week. Suddenly, I have one week to say bye to all the friends and acquaintances I’ve made over the last four years. And graduation? Senior week? The concerts and shows we’ve put together? Friends with on-campus jobs? Friends who can’t fly home? The coronavirus is deadly, but right now it’s hard for students to look past their own uncertain situation.

The most difficult part of comprehension is contextualization. You can’t open the news without being blasted by the hundreds of thousands of cases, and thousands of deaths. I think we tend to judge a particular threat based on how frequently we see it in the news. Global diabetes causes over 1.5 million deaths yearly, road traffic deaths exceed 1.2 million, and suicides are nearly 800,000. Coronavirus is at 5,000 now, and growing. Based on exposure in the news, would you have been able to rank these by their burden?

The university and global communities should be laser-focused on containing COVID-19, but I’m disappointed we don’t see more widely publicized efforts to improve road safety or mental health during “normal times.” A fraction of the bandwidth that the coronavirus receives could revolutionize our awareness of issues that burden our world. A fraction of the containment efforts would save thousands of lives.


I used the awesome New York Times Article Search API to search for different keywords relevant to major causes of death. The number of results among NYT articles (since 1851) are plotted against estimates of global deaths, classified by the keyword.

There is absolutely nothing scientific about this. I only wanted to gain an intuition for the scale of disease deaths vs. news presence. (Remember that the coronavirus has only been on the scene for months, while other keywords have had decades to build up news hits.)


Data and R code is here.

Are We Good at Being Random?

One day in high school, my statistics teacher announced an activity: flip a mental coin 25 times (I don’t recall the actual number) and write down the results. Be as random as you can. Then, flip an actual coin 25 times and write down those results too.

Then she walked out of the room. She clearly thought we’d be bad at being random, and so we set about proving her wrong. When she re-entered, every student presented their two sequences of ‘heads’ and ‘tails’:

HTTHHTHHHTHHTTTHTHTH…
HHTHHTTHHHHHTTTHHTTT…

The teacher then strolled around the classroom, stopping at every student’s desk and identifying which sequence was from the true coin and which was from the student’s mind. She did this without fail.

Four years on, this feat still impresses me. How hard is it, really, for humans to be random? And how easily can we tell if randomness is fake? So I sent out this form to some contacts, asking them to submit a sequence of mental coin flips. The theory to test: randomness is more “streaky” than we think. For instance, “HHHHH” is more likely to appear in a truly random sequence of coin flips than it would in a mental-random sequence.

If this theory is true, the probability of switching between Heads and Tails is higher for humans than for random coins. For the 11 responses I received, the red points below mark the probabilities of switching between Heads and Tails for each individual mental flip. The white blobs are null distributions—indications of where these dots tend to be after simulating 10,000 random “coins” via the computer. As you can tell, humans tend to have substantially higher rates of H/T switching than random generators.

Second, the longest runs of continuous Heads or continuous Tails might be shorter in our sequences than in truly random sequences. Blue dots show the longest continuous run for each of the respondents, while the null distributions are again generated by 10,000 simulated random flips:

Hmm, seems substantially lower. We tend to not report any continuous runs longer than “HHHHH” or “TTTTT”, but real random generators tend to spit out runs of up to 7 or 10 continuous Heads. (And the more total flips you do, the longer your longest sequence might be.)

So, next time you’re trying to be random, don’t be afraid of streaks!


Here’s a table of unadjusted one-sided p-values obtained from the simulated null distributions for each respondent:

Respondent Number# Flips GivenP-Value for Pr(Switch H/T)P-Value for Longest Run
1760.01200.0700
237000
342400
41020.05710.1846
5200.18160.2359
6650.01600.3565
75500.1566
81150.00610.1520
9620.70210.6307
10680.00050.0895
111960.01340.9787

The data and code for this post are available here. If you’d like to add your own data to the pile, you can submit your mental-random flips here.


To see how streaky random coins are, here are the first 5 random sequences I generated:

HHHTHTTTTHHHTHHHTHTH
TTTHHHTHHHTTHHTTTHHT
TTHHHTTHHHTTHHHTTTTH
THHHHHHTTTTTTTTHTHTT
TTTHTHHTHHHHTHTHTTTH

Statistics, or Stories?

Politicians make me queasy. So do self-help books, Twitter wars, linguistics professors, even TED talks. It’s not that they lack value—Twitter excepted, of course—it’s that they rely too heavily on examples and stories to persuade.

Why? It’s effective. As Chip and Dan Heath put it in Made to Stick (one of those self-help books), anecdotes—simple, concrete, and relatable—”have the amazing dual power to simulate and to inspire.” But such a quote isn’t as persuasive as the real examples they offer in the book, like how Steve Denning managed to convince the World Bank senior leadership to restructure the organization by sharing the story of a single Zambian health worker searching for malaria treatment info. Stories are more convincing than statistics.

But with billions of people and anecdotes in the world, you can cherry-pick one for almost any argument you’d like. So for every story Elizabeth Warren tells about an immigrant who lost everything to Trumpian cruelty, I’m sure Donald Trump could find a loyal storyteller who lost everything to immigrants. It breaks my heart that stories outpersuade statistics: in other words, that people are human.

The issue, to me, is that people (a) don’t remember numbers, and (b) distrust statistics. Point (a) is probably an irreversible pillar of human nature that will always sway the persuasion pendulum in favor of stories. For point (b), anecdotes often seem more trustworthy because they happened to real, imaginable people, while we generally don’t know where most of the stats we consume come from. The Heath brothers put it better than I can: “tinkering with statistics provides lucrative employment for untold numbers of issue advocates. Ethically challenged people with lots of analytical smarts can, with enough contortions, make almost any case from a given set of statistics.” But the same thing happens with stories! It seems that the problem with statistics is their deceptive pretense to be fact, while apparently stories don’t start off with this claim in the first place.

Of course, it would be impossible and non-productive to qualify every argument with accurate statistics. I didn’t begin this post with, “Seventy-two percent of politicians make me queasy (standard error of 8.2 percentage-points).” But education in statistics and statistical ethics can make us better listeners and communicators. Our statistics should be prospective and inclusive, and we must have the flexibility to accept and share statistics that don’t support our arguments.

If you’re trying to convince someone, stories are your best bet. But to understand an issue, statistics can be skyscrapers built of story-bricks. For some final advice, I’ll turn to the Heath brothers one last time: “when it comes to statistics, our best advice is to use them as input, not output. Use them to make up your mind on an issue. Don’t make up your mind and then go looking for the numbers to support yourself—that’s asking for temptation and trouble.” But don’t do that with stories, either.


Quotes are cherry-picked from the popular book by Chip and Dan Heath, Made to Stick: Why some ideas take hold and others come unstuck (Penguin Random House, 2007), pages 237 and 147.

Deconstructing Humor using Text Mining on /r/Jokes

I recently turned in a project for an elective class I’m taking on humor where I analyzed a bunch of jokes posted to the /r/Jokes subreddit. I thought that some of the results were applicable to this blog, so I’ll summarize the interesting ones here:

1) Donald Trump jokes are funnier than average

Jokes that included the word “Trump” received an average of 141 more net upvotes than jokes that didn’t involve the name (mean score 258 vs. 117, p=0.0016). Moreover, jokes that used both the word “Trump” and the word “orange” in the same post scored on average 837 more net upvotes than jokes with just “Trump,” but this result wasn’t statistically significant because only 57 jokes used both words (mean score 1074 vs. 237, p=0.27).

For comparison, Obama jokes are not significantly more upvoted than average (mean score 149 vs. 118, p=0.45, n=588).

2) Chickens, deer, turkeys, cows, and elephants are the funniest animals

These are chosen by comparing their frequency in jokes against their frequencies in everyday English. Frogs, monkeys, and ducks tend to appear in jokes pretty often as well.

3) “He” and “man” are substantially more common than “she” and “woman,” but “wife” and “girlfriend” appear more than “husband” and “boyfriend

So men are the subject of more jokes, unless the joke happens to be about a man-woman relationship?

Also, jokes using female-gendered words tend to repeat them more times than jokes using male-gendered words.


The data for this analysis was from Taivo Pungas’ public Github repository at https://github.com/taivop/joke-dataset. My code is available here. You can download the report as a PDF too, if you’re interested in other findings or methodological details.