LDC Catalog - Web 1T 5-gram Version 1 - Dave's Blog

Search
My timeline on Mastodon

LDC Catalog - Web 1T 5-gram Version 1

2009 Mar 16, 4:22"This data set, contributed by Google Inc., contains English word n-grams and their observed frequency counts. The length of the n-grams ranges from unigrams (single words) to five-grams. We expect this data will be useful for statistical language modeling, e.g., for machine translation or speech recognition, as well as for other uses." 6 DVDs for only $150 with licensing restri... ok nm.PermalinkCommentslanguage google statistics database text
Creative Commons License Some rights reserved.