Re: [Openadaptxt-linguists] Phrases & text
Brought to you by:
keypoint,
openadaptxt
|
From: Michael B. <fi...@ak...> - 2012-02-08 00:27:56
|
Ok, took us a while cause I had to build a Gaelic corpus. Question though. If we just analyze for 2-4 word combos, won't that result in some oddities like "Jimmy said"? Or do you set a fairly high cutoff? The English corpus (the 2-4 word items) does't actually look that big for something built on 4MB of text. Cheers Michael 20/01/2012 12:20, sgrìobh Jens Christensen: > The phrases that we added are simply a selection of the most common 2-4 word phrases that appear in the original corpora (it varies a bit from language to language). |