Thread: RE: [infomap-nlp-users] Three questions; was: spare matrix format

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

"There is no limit to the number of neighbors. You'll simply have to change=
=20
MAX_NEIGHBOR and PRINT_NEIGHBOR in associate.h to a bigger number and use=
=20
associate -n <big_number>."

Got it, thanks !

> Another, unrelated question that should be easily answered then. If I=20
> recall correctly, some words in infomap are not taken into account,=20
> for example, the word 'the' does not hold semantic content and is=20
> therefore not taken into consideration in computing correspondences.=20
> Am I right there? Since I am trying to get this program to work using=20
> Dutch documents, I was wondering which info-map file holds the words=20
> that are ignored in the computation, so it would be more easy to adapt=20
> it to the current languange.

"It's "stop.list" in the admin directory which contains the words to be=20
disregarded. If you want to use a different set of stopwords, you can=20
simply set STOPLIST_FILE in "default-params.in" to point to a different=20
stoplist, the format of which should be such that each line contains=20
exactly one stopword."

I see. Taking a look at the stop.list right now, I must say it looks rather=
 extensive. Could you or anyone else tell me what exactly the reasons are s=
ome words are in this stoplist and others not? Several words, such as 'arti=
cles', 'americans', 'training', etc, would seem to hold semantic content bu=
t appear here nevertheless. Is there any literature why these words are in =
the list, or could the underlying reasoning be explained?

Thanks,

Mich (strolling the internet right now looking for Dutch books - the Guthen=
berg project, although having an archive of quite a few of these, seems to =
be quite outdated in terms of spelling changes that arose over the last two=
 centuries).

**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.
**********************************************************************