From: Ted P. <tpederse@d.umn.edu> - 2008-07-23 19:35:34
|
Thanks Bridget, This all sounds good! Ted On Wed, Jul 23, 2008 at 12:15 PM, Bridget Thomson McInnes <bth...@cs...> wrote: > Hi Ted, > >> Do we have any way to convert the text to lower case? If not that's >> fairly easy to do, but couldn't remember if that was something we had >> included or not... >> > > We do. I am sorry it didn't get into the supervised-disambiguate > documentation. It is the --lc option. > >> Also, in the help for supervised-disambiguate.pl - there are two >> entries for --mesh, there isn't a space under the --wekacv entry, >> and a few of the left hand margins are slightly off (some indented >> once, others not at all, something like that...) The entry for >> --cv has a kind of funny line break in it...--nokey seems to have a >> repeated word (This) Very very minor cosmetic issues. >> > > Thanks! I made the changes to these in the documentation. > >> Slightly (but just barely) more substantial issues in the --help are >> that I don't think it makes it clear that SOURCE can either be a >> directory containing lots of .mm files or a single .mm file...and it >> also says that at least one feature must be selected - however, >> it will run ok with just the SOURCE indicated, so we should probably >> make it clear what will happen if you just give it SOURCE. >> --version shows copyright as 2007 but we probably want that to be >> 2007-2008 at this point, and you should list yourself as >> first author on copyright and other stuff... >> > > I fixed the copyright in the version number. > > And added to the perldoc and help > > If no feature option is chosen, the default feature setting > is used: > > --ngramcount "--ngram 1 --frequency 2" > > in order to be clear that no feature option is required to run the > supervised-disambiguate.pl program. > > I also A directory containing the CuiTools xml-like .mm formatted > training file(s) or a single file in the CuiTools xml-like > .mm format. > > I modified the help and the perldoc description to be more > clear that SOURCE can be either a directory containing the > .mm files or a single .mm file. > > Here is what I wrote: > > This is a wrapper program for supervised WSD using CuiTools > programs. It takes as input (SOURCE) a directory containing > files in the CuiTools xml-like .mm format or a single file > in the xml-like .mm format. The program extracts specified > features and trains/tests a classifier using the WEKA data > mining package. The overall results are stored in a file > called overallResults located in the results directory. If > no feature option is chosen, the default feature setting is > used: --ngramcount "--ngram 1 --frequency 2" > > I hope this makes the documentation clearer. > > Thanks! > > Bridget > -- Ted Pedersen http://www.d.umn.edu/~tpederse |