Wow, it’s been busy lately! May and I are in the middle of moving across the country to Berkeley, California, where I’ll start a new job in an entirely new industry. We’re also finalizing May’s immigration paperwork, while I’m trying to get a last paper out the door to close out my public health research. Amid everything, May and I have decided to become our own cruciverbalists of sorts, launching a new website – crossworthy.net – where anyone can play our original crossword puzzles!
We’ve been interested in crosswords for a while, now, and May even produced a new mini (5×5) crossword for every day in the month of May. I became invested in the project when I realized how fascinatingly difficult it was to create crossword boards. Minis were hard enough, but “Midis” (7×7) and full-size boards (15×15) became nigh impossible.
Take this average, empty crossword grid, for instance:
There are a couple 13-letter words to fill and a couple 12-letter words to fill, so this definitely isn’t a trivial board. And from a large corpus of words I got from several sources (dictionaries, phrases, celebrity names, etc.), there are 1223 three-letter words, 2043 four-letter words, 2734 five-letter words, and so on… meaning there are about 1.7 x 10100 possible ways to arrange the horizontal words only, or more than the number of atoms in the universe (around 1080, apparently).
Given that only a relative handful of these would also give sensical vertical words too, the chances of filling a proper crossword board seem pretty slim. It makes sense to start by inserting words in the hardest spots – otherwise, by the time we get to them, we may be completely out of luck. Most professionals will fill in the longest words first, then build around them. But, as it happens, there are fewer 3-letter words than there are for these long words, so the first (and “hardest”, in a sense) word I’d choose to fill is just:
short for “cascading style sheets,” a ubiquitous web design language.
Why did I choose CSS as opposed to HAM, PJS, or another of the 1,223 three-letter words available? As it turns out, using CSS means that the three vertical, intersecting columns can still be filled by 823 different words in the database, more than it’d be if any of those other 1,222 entries were picked.
Now, the word-slot with fewest remaining possibilities is 5-Down, or “C _ _ _”: only 147 words match this pattern. So the process repeats, and we write in:
because this affords the 3 empty, horizontal “cross”-words 509 total possibilities, more than any other option.
Continuing in this vein, we can automate this algorithm pretty easily – with a few nuances, like giving more weight to longer words, thematic words, etc. Unfortunately, even with some extra optimizations after finishing, the whole process only works about 2 percent of the time. The other 98% get stuck with a few word-slots that just can’t be filled (see the histogram below). The whole process, coded up in Python with some additional optimizations tagged onto the end, takes about 10 seconds per attempted board, working out for an average of one good board every 8 minutes of running time or so.
Then, of course, comes the clue-writing, and then, of course, the playing!
We would love to hear feedback on any of the boards or clues we generate, suggestions, and anything else! We’ve got a special email, email@example.com, where you can send any thoughts, or, as always, you can get in touch any other method.