From: Joerg T. <tie...@le...> - 2006-02-14 14:33:54
|
hi infomap users, I have the same problem as montse=20 (http://sourceforge.net/mailarchive/forum.php?thread_id=3D8855577&forum_id= =3D37265) - I just described to this list and don't know how to reply to a previous= =20 message - I have a corpus with >1,000,000 paragraphs that I'd like to index. but I=20 get the same error message about memory allocation. I could put everything= =20 in a single file (actually I would prefer that) but I'd like to keep the=20 file names when querying the index. as far as I understand I only get the= =20 byte offset o the document when building a model on a single file and then= =20 I have to find the document name myself. or is there any smart way of=20 doing that.=20 I'd like to add the filenames in the single corpus document in,=20 let's say, <DOCID>name</DOCID> tags before the actual text of the=20 document. would be great if the index builder could take these names and=20 use them when replying to a query instead of giving me the internal=20 doc-ID's with the offset. is that maybe easy to implement (I didn't check the source code yet). thanks in advance for your reply! best, J=F6rg ***********/\/\/\/\/\/\/\/\/\/\/\************************************ ** J=F6rg Tiedemann tie...@le... ** ** Alfa-Informatica http://www.let.rug.nl/~tiedeman ** =20 ** Rijksuniversiteit Groningen Harmoniegebouw, room 1311-429 ** ** Oude Kijk in 't Jatstraat 26 phone: +31 (0)50-363 5935 ** ** 9712 EK Groningen fax: +31 (0)50-363 6855 ** *************************************/\/\/\/\/\/\/\/\/\/\/\********** |