Deconstructing Humor using Text Mining on /r/Jokes

I recently turned in a project for an elective class I’m taking on humor where I analyzed a bunch of jokes posted to the /r/Jokes subreddit. I thought that some of the results were applicable to this blog, so I’ll summarize the interesting ones here:

1) Donald Trump jokes are funnier than average

Jokes that included the word “Trump” received an average of 141 more net upvotes than jokes that didn’t involve the name (mean score 258 vs. 117, p=0.0016). Moreover, jokes that used both the word “Trump” and the word “orange” in the same post scored on average 837 more net upvotes than jokes with just “Trump,” but this result wasn’t statistically significant because only 57 jokes used both words (mean score 1074 vs. 237, p=0.27).

For comparison, Obama jokes are not significantly more upvoted than average (mean score 149 vs. 118, p=0.45, n=588).

2) Chickens, deer, turkeys, cows, and elephants are the funniest animals

These are chosen by comparing their frequency in jokes against their frequencies in everyday English. Frogs, monkeys, and ducks tend to appear in jokes pretty often as well.

3) “He” and “man” are substantially more common than “she” and “woman,” but “wife” and “girlfriend” appear more than “husband” and “boyfriend

So men are the subject of more jokes, unless the joke happens to be about a man-woman relationship?

Also, jokes using female-gendered words tend to repeat them more times than jokes using male-gendered words.

The data for this analysis was from Taivo Pungas’ public Github repository at My code is available here. You can download the report as a PDF too, if you’re interested in other findings or methodological details.