Two years ago, Google released a collection of n-grams from web pages and made it available on Linguistic Data Consortium's website. "We processed 1,024,908,267,229 words of running text and are publishing the counts for all 1,176,470,663 five-word sequences that appear at least 40 times. There are 13,588,391 unique words, after discarding words that appear...