From: Mich <pse...@zo...> - 2004-10-28 02:25:45
|
While I'm - actually again - at it, could i ask a very basic question? What exactly is the format for a multiple file list of corpora? Right now, i have a small amount of .txt files in a single directory, and according to the manual, another file should point towards the other files. I have done this by numbering the txt filenames as 1.txt 2.txt 3.txt etc, and have made a 'reference file' with all these filenames underneath another: 1.txt 2.txt 3.txt [..] 19.txt etc. However, infomap-build doesn't seem to recognize this, stopping with 'can't open current corpus file' and 'make *** [/home/jrandom/infomap-models/ned/wordlist] Error 1' so, i figured the text file probably has a different format from what i thought it would be. Actually, i had to guess, since the manual states that "In a multiple-file corpus, each disk file that is part of the corpus must contain exactly one document. No tags are used; the entire contents of the file are considered to make up the text of the document and are processed by the Infomap software." which really leaves me puzzled as to the exact specifications of the reference file. I have tried several alterations in my reference file, but infomap-build seems either to stop with mentioned error message, or continue and treat the reference file as a single corpus anyway. If someone would send an example of a multifile reference-file, i would be most pleased (as the one in the documentation seems lacking). Thank you kindly Mich |