From: Dominic W. <wi...@ma...> - 2005-10-03 16:12:16
|
Hi Alan, Another thing to check is whether anything is getting written to your model directory. Running "ls -l" on the output files can give you a good idea of how far the processing got while still producing useful data. If you want to trun this on your files and send us the output that might help. However, since you have the line "typecount = 0" in the output to "prepare_corpus" below, I'm guessing that even the wordlists are not being properly computed. One frustrating but possible suggestion is that the tags are case-sensitive, and you should try <DOC> instead of <doc>, etc. Best wishes, Dominic On Oct 3, 2005, at 12:23 AM, Scott Cederberg wrote: > On 10/1/05, Alan J. Salmoni <sa...@gm...> wrote: >> Hi. >> >> I'm using infomap 0.8.6 to build a matrix but nothing I do seems to >> work. >> >> I've formatted the King James' Bible using the doc and text tags thus: >> >> <doc> >> <text> >> The Old Testament of... >> [...missing text...] >> End of the Project Gutenberg Edition of the King James Bible >> </text> >> </doc> >> > > Do you have only a single set of <doc>/</doc> tags and a single set of > <text>/</text> tags surrounding the entire document? (It seems like > that's what's suggested above, but I'm not sure.) If so, the system > is interpreting the entire Bible as one long document. I'm not sure > how this would lead to the error below (there should still be word > vectors), but it'd be worth retrying with a set of <doc> and <text> > tags around, say, every chapter. > > Let me know if that helps... > > > Scott > >> and tried the infomap-build command thus: >> >> :infomap-build -s /Users/alan/Projects/infomap/corpus/kjv10.txt test1 >> >> The output is below. However, when I run the associate command thus: >> >> :associate -t -c test1 God >> >> I get: >> >> :No word/document vector for "God". >> >> Or "GOD" or "god". I've tried this on OS X, Ubuntu and Mandrake Linux >> all with the same results. I've checked the KJ Bible to ensure that no >> errant tags are contained (they are not) and followed the examples in >> the user manual to the letter and still nothing. I've also tried >> changing the SVD_ITER value to 400 thus: >> >> :infomap-build -D SVD_ITER=400 -s >> /Users/alan/Projects/infomap/corpus/kjv10.txt test1 >> >> which took the new value into account, but again the associate command >> doesn't work. Can anyone shed any light on what is going wrong or on >> what I should be doing but don't yet know about? >> >> Rgds, >> >> Alan J. Salmoni. >> >> This is the output of the infomap-build command (with SVD-ITER >> changed to 400): >> >> Sourcing param file "/usr/local/share/infomap-nlp/default-params" >> Sourcing extra param file "/tmp/infomap-build.kyInQJ" >> Contents are: >> SVD_ITER=400 >> Removing extra param file >> WORKING_DATA_DIR = "/Users/alan/Projects/infomap/corpus/models/test1" >> CORPUS_DIR = "/Users/alan/Projects/infomap/corpus" >> CORPUS_FILE = "/Users/alan/Projects/infomap/corpus/kjv10.txt" >> FNAMES_FILE = "" >> ROWS = "20000" >> COLUMNS = "1000" >> SINGVALS = "100" >> SVD_ITER = "400" >> PRE_CONTEXT_SIZE = "15" >> POST_CONTEXT_SIZE = "15" >> WRITE_MATLAB_FORMAT = "0" >> VALID_CHARS_FILE = "/usr/local/share/infomap-nlp/valid_chars.en" >> STOPLIST_FILE = "/usr/local/share/infomap-nlp/stop.list" >> COL_LABELS_FROM_FILE = "0" >> COL_LABEL_FILE = "" >> echo "Making datadir" >> Making datadir >> mkdir -p /Users/alan/Projects/infomap/corpus/models/test1 >> >> ================================================== >> Building target: >> /Users/alan/Projects/infomap/corpus/models/test1/wordlist >> Prerequisites: /Users/alan/Projects/infomap/corpus/kjv10.txt >> Sat Oct 1 15:49:47 BST 2005 >> .................................................. >> prepare_corpus \ >> -cdir "/Users/alan/Projects/infomap/corpus" \ >> -mdir "/Users/alan/Projects/infomap/corpus/models/test1" \ >> -cfile "/Users/alan/Projects/infomap/corpus/kjv10.txt" \ >> -fnfile "" \ >> -chfile "/usr/local/share/infomap-nlp/valid_chars.en" \ >> -slfile "/usr/local/share/infomap-nlp/stop.list" \ >> -rptfile "" >> Locale set to en_US. >> Opening File for "r": >> "/usr/local/share/infomap-nlp/valid_chars.en" >> Opening File for "r": >> "" >> my_fopen: No such file or directory >> Opening File for "r": >> "/usr/local/share/infomap-nlp/stop.list" >> Opening File for "w": >> "/Users/alan/Projects/infomap/corpus/models/test1/wordlist" >> Opening File for "r": >> "/Users/alan/Projects/infomap/corpus/kjv10.txt" >> Opening File for "w": >> "/Users/alan/Projects/infomap/corpus/models/test1/numDocs" >> Typecount = 0 >> Preparing to sort ... Sorting ... Done. >> Opening File for "w": >> "/Users/alan/Projects/infomap/corpus/models/test1/dic" >> .................................................. >> Finishing target: >> /Users/alan/Projects/infomap/corpus/models/test1/wordlist >> ================================================== >> >> >> ================================================== >> Building target: /Users/alan/Projects/infomap/corpus/models/test1/coll >> Prerequisites: >> /Users/alan/Projects/infomap/corpus/models/test1/wordlist >> /Users/alan/Projects/infomap/corpus/models/test1/dic >> /Users/alan/Projects/infomap/corpus/models/test1/numDocs >> Sat Oct 1 15:49:48 BST 2005 >> .................................................. >> count_wordvec \ >> -mdir /Users/alan/Projects/infomap/corpus/models/test1 \ >> -matlab 0 \ >> -precontext 15 \ >> -postcontext 15 \ >> -rows 20000 \ >> -columns 1000 \ >> -col_labels_from_file 0 \ >> -col_label_file "" >> model data dir is "/Users/alan/Projects/infomap/corpus/models/test1". >> count_wordvec.c: looking for 0 rows >> which had better match 0 >> Reading the dictionary... Opening File for "r": >> "/Users/alan/Projects/infomap/corpus/models/test1/dic" >> Opening File for "r": >> "/Users/alan/Projects/infomap/corpus/models/test1/numDocs" >> Initializing row indices...Done. >> Initializing column indices...Done. >> Allocating matrix memory...done. >> Initializing matrix...done. >> model data dir is "/Users/alan/Projects/infomap/corpus/models/test1". >> count_wordvec.c: about to call process_wordlist >> Entering process_wordlist. >> About to call initialize_wordlist. >> Opening File for "r": >> "/Users/alan/Projects/infomap/corpus/models/test1/wordlist" >> Returned from initialize_wordlist. >> Writing the co-occurrence matrix. >> Entering write_matrix_svd; rows = 0 and columns = 1000. >> Opening File for "w": >> "/Users/alan/Projects/infomap/corpus/models/test1/coll" >> Opening File for "w": >> "/Users/alan/Projects/infomap/corpus/models/test1/indx" >> .................................................. >> Finishing target: >> /Users/alan/Projects/infomap/corpus/models/test1/coll >> ================================================== >> >> >> ================================================== >> Building target: /Users/alan/Projects/infomap/corpus/models/test1/left >> Prerequisites: /Users/alan/Projects/infomap/corpus/models/test1/coll >> /Users/alan/Projects/infomap/corpus/models/test1/indx >> Sat Oct 1 15:49:48 BST 2005 >> .................................................. >> cd /Users/alan/Projects/infomap/corpus/models/test1 && rm -f svd_diag >> left \ >> rght sing >> cd /Users/alan/Projects/infomap/corpus/models/test1 && svdinterface \ >> -singvals 100 \ >> -iter 400 >> >> This is svdinterface. >> >> Writing to: left >> Writing to: rght >> Writing to: sing >> Writing to: svd_diag >> Reading: indx >> Reading: indx >> Reading: coll >> >> FEWER THAN EXPECTED SINGULAR VALUES >> .................................................. >> Finishing target: >> /Users/alan/Projects/infomap/corpus/models/test1/left >> ================================================== >> >> >> ================================================== >> Building target: >> /Users/alan/Projects/infomap/corpus/models/test1/wordvec.bin >> Prerequisites: /Users/alan/Projects/infomap/corpus/models/test1/left >> /Users/alan/Projects/infomap/corpus/models/test1/dic >> Sat Oct 1 15:49:49 BST 2005 >> .................................................. >> encode_wordvec \ >> -m /Users/alan/Projects/infomap/corpus/models/test1 >> Opening File for "r": >> "/Users/alan/Projects/infomap/corpus/models/test1/left" >> Opening File for "w": >> "/Users/alan/Projects/infomap/corpus/models/test1/wordvec.bin" >> Reading the dictionary... >> Opening File for "r": >> "/Users/alan/Projects/infomap/corpus/models/test1/dic" >> Initializing row indices...Done. >> .................................................. >> Finishing target: >> /Users/alan/Projects/infomap/corpus/models/test1/wordvec.bin >> ================================================== >> >> >> ================================================== >> Building target: >> /Users/alan/Projects/infomap/corpus/models/test1/artvec.bin >> Prerequisites: >> /Users/alan/Projects/infomap/corpus/models/test1/wordvec.bin >> /Users/alan/Projects/infomap/corpus/models/test1/wordlist >> /Users/alan/Projects/infomap/corpus/models/test1/dic >> /Users/alan/Projects/infomap/corpus/models/test1/numDocs >> Sat Oct 1 15:49:49 BST 2005 >> .................................................. >> count_artvec -m /Users/alan/Projects/infomap/corpus/models/test1 >> Opening File for "w": >> "/Users/alan/Projects/infomap/corpus/models/test1/artvec.bin" >> Opening File for "r": >> "/Users/alan/Projects/infomap/corpus/models/test1/numDocs" >> Reading the dictionary... Opening File for "r": >> "/Users/alan/Projects/infomap/corpus/models/test1/dic" >> Initializing row indices...Done. >> Allocating matrix memory...done. >> Initializing matrix...done. >> Opening File for "r": >> "/Users/alan/Projects/infomap/corpus/models/test1/wordvec.bin" >> count_artvec.c: about to read 0 rows from wordvector file. >> Entering process_wordlist. >> About to call initialize_wordlist. >> Opening File for "r": >> "/Users/alan/Projects/infomap/corpus/models/test1/wordlist" >> Returned from initialize_wordlist. >> .................................................. >> Finishing target: >> /Users/alan/Projects/infomap/corpus/models/test1/artvec.bin >> ================================================== >> >> >> ================================================== >> Building target: >> /Users/alan/Projects/infomap/corpus/models/test1/model_params.txt >> Prerequisites: >> /Users/alan/Projects/infomap/corpus/models/test1/model_params.bin >> /Users/alan/Projects/infomap/corpus/models/test1/model_info.bin >> /Users/alan/Projects/infomap/corpus/models/test1/corpus_format.bin >> Sat Oct 1 15:49:49 BST 2005 >> .................................................. >> write_text_params -mdir >> /Users/alan/Projects/infomap/corpus/models/test1 >> .................................................. >> Finishing target: >> /Users/alan/Projects/infomap/corpus/models/test1/model_params.txt >> ================================================== >> >> >> ------------------------------------------------------- >> This SF.Net email is sponsored by: >> Power Architecture Resource Center: Free content, downloads, >> discussions, >> and more. http://solutions.newsforge.com/ibmarch.tmpl >> _______________________________________________ >> infomap-nlp-users mailing list >> inf...@li... >> https://lists.sourceforge.net/lists/listinfo/infomap-nlp-users >> > > > ------------------------------------------------------- > This SF.Net email is sponsored by: > Power Architecture Resource Center: Free content, downloads, > discussions, > and more. http://solutions.newsforge.com/ibmarch.tmpl > _______________________________________________ > infomap-nlp-users mailing list > inf...@li... > https://lists.sourceforge.net/lists/listinfo/infomap-nlp-users > |