You probably already know the most common letter in the English language is the letter e. It accounts for about 12% of all the letters you’re likely to see on a given page (go ahead…count ’em!). On your keyboard, only the venerable space bar gets more of a workout than the letter e.
On a slightly grander scale, the most common word in English is the (i’ve used it eight times already), followed by other shorties like be, to, of, and and a.
But what about common phrases? What combination of words are at the top of the list of the things we write and say? Or, to be more precise, what’s the most popular phrase in modern American English?
I don’t know.
First of all, the most common phrase where, exactly? The phrases that are most commonly used in, say, the Wall Street Journal won’t be the same as the high-frequency phrases in Harlequin Romance novels, which are different again from popular phrases used on Facebook. Are we talking about academic writing, spoken English, fiction, consumer magazines, Broadway plays, Fox News? The sources make a big difference. So does the time period — last five years? last five decades? last five centuries?
Secondly, just what is a phrase anyway? Yes, it’s a collection of words, but phrases also carry a certain conceptual completeness. In the nick of time is a well-known and frequently-used phrase, but we’re less likely to think of in the nick as a legitimate phrase, even though this particular combination of words occurs just as often as it’s more familiar parent.
Linguists, possibly the nerdiest of nerdy professionals, get around these issues in two ways. They tend to steer clear of phrases, per se, dealing instead with collections of words called N-grams. In the nick of time is a 5-gram, while in the nick, two words shorter, is a 3-gram. They also build linguistic corpora — immense collections, typically of millions or billions of words, from sources selected to represent something, like the corpus of all Shakespeare works, all the words in the Oxford English Dictionary, or the vast collection of verbiage in Google Books.
Happily for us, linguists also build online tools that anyone can use to explore N-grams in various corpora. You can do some pretty cool stuff with Google’s N-gram Viewer for displaying trends in the collection of Google Books (and if you really want to get crazy with it, check out their advanced search features). But to get at the most popular English phrases, I downloaded the database known as COCA — the Corpus of Contemporary American English (why it’s not COCAE is a bit of a mystery), thoughtfully made available by the good linguists at Brigham Young University. COCA is a 450-million word corpus of written and spoken English from a wide variety of sources, and covers the period 1990-2012. I used the database to crank out some big lists of the highest-frequency 4-grams and 5-grams, and took a look.
An awful lot of the word groupings are familiar but decidedly unphraselike. For example the end of the, in the middle of, and the rest of the were all high-frequency 4-grams, but they don’t really strike the ear as legitimate phrases.
So I sorted through the list, and picked out the:
TOP TEN MOST COMMON PHRASES IN CONTEMPORARY AMERICAN ENGLISH
10. We don’t know
9. In the first place
8. The New York Times
7. What do you think
6. Thank you very much
5. I don’t want to
4. On the other hand
3. For the first time
2. At the same time
and the Number 1 phrase in the entire English language…
1. I don’t know
See. I told you.
P.S. #8 really surprised me, but the numbers (hopefully) don’t lie. And in case you’re wondering, linguists count contractions as two words, so I don’t know is a 4-gram, not a 3-gram; I don’t want to is the only 5-gram on the list (the next most common 5-gram is President of the United States).
Need research? Quezi's researchers can answer your questions at uclue.com