[infomap-nlp-users] Re: problems in constructing a sentence model

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

hi infomap users,

I have the same problem as montse=20
(http://sourceforge.net/mailarchive/forum.php?thread_id=3D8855577&forum_id=
=3D37265)
 - I just described to this list and don't know how to reply to a previous=
=20
message -

I have a corpus with >1,000,000 paragraphs that I'd like to index. but I=20
get the same error message about memory allocation. I could put everything=
=20
in a single file (actually I would prefer that) but I'd like to keep the=20
file names when querying the index. as far as I understand I only get the=
=20
byte offset o the document when building a model on a single file and then=
=20
I have to find the document name myself. or is there any smart way of=20
doing that.=20

I'd like to add the filenames in the single corpus document in,=20
let's say, <DOCID>name</DOCID> tags before the actual text of the=20
document. would be great if the index builder could take these names and=20
use them when replying to a query instead of giving me the internal=20
doc-ID's with the offset.

is that maybe easy to implement (I didn't check the source code yet).
thanks in advance for your reply!

best,

J=F6rg

***********/\/\/\/\/\/\/\/\/\/\/\************************************
**  J=F6rg Tiedemann                 tie...@le...             **
**  Alfa-Informatica               http://www.let.rug.nl/~tiedeman ** =20
**  Rijksuniversiteit Groningen     Harmoniegebouw, room 1311-429  **
**  Oude Kijk in 't Jatstraat 26    phone: +31 (0)50-363 5935      **
**  9712 EK Groningen               fax:   +31 (0)50-363 6855      **
*************************************/\/\/\/\/\/\/\/\/\/\/\**********