Apr
28
2014

What is the Most Common Phrase in the English Language?

"Google" peaked in the '40s, thanks to Barney Google cartoons, and again in the 1990s (image used with permission from Google)

“Google” peaked in the ’40s, thanks to Barney Google cartoons, and again in the 1990s (image used with permission from Google)

You probably already know the most common letter in the English language is the letter e. It accounts for about 12% of all the letters you’re likely to see on a given page (go ahead…count ‘em!). On your keyboard, only the venerable space bar gets more of a workout than the letter e.

On a slightly grander scale, the most common word in English is the (i’ve used it eight times already), followed by other shorties like be, to, of, and and a.

But what about common phrases? What combination of words are at the top of the list of the things we write and say? Or, to be more precise, what’s the most popular phrase in modern American English?

I don’t know.

Seriously, I didn’t have a clue. Finding letter frequencies and word frequencies is a ten-second research task, but teasing out phrase frequencies is something else entirely.

First of all, the most common phrase where, exactly? The phrases that are most commonly used in, say, the Wall Street Journal won’t be the same as the high-frequency phrases in Harlequin Romance novels, which are different again from popular phrases used on Facebook. Are we talking about academic writing, spoken English, fiction, consumer magazines, Broadway plays, Fox News? The sources make a big difference. So does the time period — last five years? last five decades? last five centuries?

Secondly, just what is a phrase anyway? Yes, it’s a collection of words, but phrases also carry a certain conceptual completeness. In the nick of time is a well-known and frequently-used phrase, but we’re less likely to think of in the nick as a legitimate phrase, even though this particular combination of words occurs just as often as it’s more familiar parent.

Linguists, possibly the nerdiest of nerdy professionals, get around these issues in two ways. They tend to steer clear of phrases, per se, dealing instead with collections of words called N-grams. In the nick of time is a 5-gram, while in the nick, two words shorter, is a 3-gram. They also build linguistic corpora — immense collections, typically of millions or billions of words, from sources selected to represent something, like the corpus of all Shakespeare works, all the words in the Oxford English Dictionary, or the vast collection of verbiage in Google Books.

Happily for us, linguists also build online tools that anyone can use to explore N-grams in various corpora. You can do some pretty cool stuff with Google’s N-gram Viewer for displaying trends in the collection of Google Books (and if you really want to get crazy with it, check out their advanced search features). But to get at the most popular English phrases, I downloaded the database known as COCA — the Corpus of Contemporary American English (why it’s not COCAE is a bit of a mystery), thoughtfully made available by the good linguists at Brigham Young University. COCA is a 450-million word corpus of written and spoken English from a wide variety of sources, and covers the period 1990-2012. I used the database to crank out some big lists of the highest-frequency 4-grams and 5-grams, and took a look.

An awful lot of the word groupings are familiar but decidedly unphraselike. For example the end of the, in the middle of, and the rest of the were all high-frequency 4-grams, but they don’t really strike the ear as legitimate phrases.

So I sorted through the list, and picked out the:

=================================================================
TOP TEN MOST COMMON PHRASES IN CONTEMPORARY AMERICAN ENGLISH
=================================================================

10. We don’t know
9. In the first place
8. The New York Times
7. What do you think
6. Thank you very much
5. I don’t want to
4. On the other hand
3. For the first time
2. At the same time

and the Number 1 phrase in the entire English language…

1. I don’t know

=================================================================

See. I told you.

P.S. #8 really surprised me, but the numbers (hopefully) don’t lie. And in case you’re wondering, linguists count contractions as two words, so I don’t know is a 4-gram, not a 3-gram; I don’t want to is the only 5-gram on the list (the next most common 5-gram is President of the United States).

Related questions:

  Need research? Quezi's researchers can answer your questions at uclue.com

Written by | 1,700 views | Tags: , , , ,

4 Comments »

  • eiffel says:

    “I don’t know” is in the first place for the first time. On the other hand, at the same time, “We don’t know” is in The New York Times!

    I don’t want to thank you very much. What do you think?

  • David says:

    I’ll have to ask the President of the United States.

  • eiffel says:

    As for “The New York Times” ranking highly on the list, I think this tells us more about the sources of the COCA than about the popularity of the phrase.

  • David says:

    I’m not sure what to make of its popularity. COCA comes from a wide variety of sources. Less than 20% of its 450 million words are from newspapers, and those are a mix of 10 papers (including, of course, the NY Times). I think a lot of people, in a lot of contexts, actually do make reference to “the New York Times”. I haven’t looked for “NY Times” — wonder how that stacks up?

RSS feed for comments on this post.


Leave a Reply

Privacy Policy | Acknowledgements