From: David H. <dl...@st...> - 2006-07-15 01:33:04
|
On 7/14/06, Tonio Wandmacher <ton...@un...> wrote: > first of all: Thank you very much for making your software available. It > spares me a lot of work! > I have two questions concerning the model construction. > > 1. Is there a switch to allow upper case characters without modifying the > code? Even though I have "A-Z" included in valid.chars list, all words are > transformed to lower case. Before I say anything, despite my email address, I'm not affiliated with this project, so take everything I say with a grain of salt. That said, if you look at preprocessing/tokenizer.c:249 (in some version...) you'll see a call to "tolower", simply remove the call, and that should help you out. Devs, if you read this, would there be interest in making this a switch? I can try to write up a patch... > > 2. Is there a reason why the maximum number of columns is set to 2999 ? I > would like to try out squared termxterm matrices, as used by Reinhard Rapp > (2003), who got excellent results in the TOEFL synonym test. > Does anyone have experiences about the optimal number of columns? > Have you looked at the default.params file? Otherwise, I don't know exactly what's going on here. HTH, David |