|
From: <fri...@us...> - 2009-05-25 22:51:11
|
Revision: 9769
http://zaf.svn.sourceforge.net/zaf/?rev=9769&view=rev
Author: friedelwolff
Date: 2009-05-25 22:50:59 +0000 (Mon, 25 May 2009)
Log Message:
-----------
Some information on the files in this directory
Added Paths:
-----------
trunk/dict/zu/wordlists/README
Added: trunk/dict/zu/wordlists/README
===================================================================
--- trunk/dict/zu/wordlists/README (rev 0)
+++ trunk/dict/zu/wordlists/README 2009-05-25 22:50:59 UTC (rev 9769)
@@ -0,0 +1,36 @@
+This directory contains word lists used to build the spell checker.
+
+There are different types of word lists, and they are of differing quality.
+Lots of improvements are still possible in terms of finer classifications,
+review, etc.
+
+In all the files, lines starting with '#' are considered to be comments, and
+are removed when building the spell checker. See the note below about the
+special comments for classified files.
+
+
+File names in the form of wordlist.something.in are for files containing plain
+words, one per line. These have mostly not been reviewed that much, and have
+been obtained from various sources. Check each file for details. These files
+are probably best suited for new missing words, unclassified words from new
+corpora, and possibly the categories that won't exhibit rich morphology.
+
+Files names in the form classified.something.in are for files that contains
+classified words, one per line. Each file will only contain words of a certain
+part of speach, and sometimes, of much finer categories. For exapmle, all class
+1a nouns should be separately and not mixed with other class 1 nouns or nouns
+in general. This allows the build process to tag the words correctly for use
+with the spell checker. Word categories without rich morphology can also be put
+in such files - we will probably be thankful later.
+
+Their file names have these components: classified.$POS.$PRIO.in
+
+$POS - the specific, fine-grained part of speech
+$PRIO - the priority of the word list, mostly as a function of frequency or
+ gut feeling
+
+Therefore classified.noun9.1.in contains important/frequently used words classified as being in class 9.
+
+Classified files can contain a special comment of the form
+# flags: ABCx
+to indicate with which affix flags they should be stored in the .dic file.
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|