[infomap-nlp-users] How to add extra linguistic information

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi

I have actually downloaded Infomap and I am trying to play with it.

Reading the documentaton about the algorithm description on the  
website (http://infomap-nlp.sourceforge.net/doc/algorithm.html), I  
found this information:

"It is at this stage of the preprocessing that the WORDSPACE software  
can incorporate extra linguistic information such as part of speech  
tags and multiword expressions, if these are suitably recorded in the  
corpus."

Unfortunately no futher information is provided.

I would like exactly to exploit extra linguistic information as part  
of speech tags and multiword expressions.

Currently, I am just using the single-file  format as described in http://infomap-nlp.sourceforge.net/doc/input_formats.html 
  and it works great if I use the raw text corpus.

The next step I would like to experiment is  to give a text already  
preprocessed (tokenized, lemmatized, POStagged, NERTagged). So,  I  
would like Infomap to skip this preprocess creating directily the  
suitable matrix for the svd.

How can I do it? What would the matrix be like? Is there a way that  
the coll dimensions in the matrix be only the extra linguistic  
features, rather than the words?

What does "suitably recorded in the corpus" mean?

Is there a particular input format?

Thank you

Sincerely

Davide