Re: [infomap-nlp-users] Upper case chars / Limitation of columns

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On 7/14/06, Tonio Wandmacher <ton...@un...> wrote:
> first of all: Thank you very much for making your software available. It
> spares me a lot of work!
> I have two questions concerning the model construction.
>
> 1. Is there a switch to allow upper case characters without modifying the
> code? Even though I have "A-Z" included in valid.chars list, all words are
> transformed to lower case.

Before I say anything, despite my email address, I'm not affiliated
with this project, so take everything I say with a grain of salt. That
said, if you look at

preprocessing/tokenizer.c:249  (in some version...) you'll see a call
to "tolower", simply remove the call, and that should help you out.
Devs, if you read this, would there be interest in making this a
switch? I can try to write up a patch...

>
> 2. Is there a reason why the maximum number of columns is set to 2999 ? I
> would like to try out squared termxterm matrices, as used by Reinhard Rapp
> (2003), who got excellent results in the TOEFL synonym test.
> Does anyone have experiences about the optimal number of columns?
>

Have you looked at the default.params file? Otherwise, I don't know
exactly what's going on here.

HTH,
David