From: Scott C. <ced...@gm...> - 2005-10-03 04:24:07
|
On 10/1/05, Alan J. Salmoni <sa...@gm...> wrote: > Hi. > > I'm using infomap 0.8.6 to build a matrix but nothing I do seems to work. > > I've formatted the King James' Bible using the doc and text tags thus: > > <doc> > <text> > The Old Testament of... > [...missing text...] > End of the Project Gutenberg Edition of the King James Bible > </text> > </doc> > Do you have only a single set of <doc>/</doc> tags and a single set of <text>/</text> tags surrounding the entire document? (It seems like that's what's suggested above, but I'm not sure.) If so, the system is interpreting the entire Bible as one long document. I'm not sure how this would lead to the error below (there should still be word vectors), but it'd be worth retrying with a set of <doc> and <text> tags around, say, every chapter. Let me know if that helps... =20 Scott > and tried the infomap-build command thus: > > :infomap-build -s /Users/alan/Projects/infomap/corpus/kjv10.txt test1 > > The output is below. However, when I run the associate command thus: > > :associate -t -c test1 God > > I get: > > :No word/document vector for "God". > > Or "GOD" or "god". I've tried this on OS X, Ubuntu and Mandrake Linux > all with the same results. I've checked the KJ Bible to ensure that no > errant tags are contained (they are not) and followed the examples in > the user manual to the letter and still nothing. I've also tried > changing the SVD_ITER value to 400 thus: > > :infomap-build -D SVD_ITER=3D400 -s > /Users/alan/Projects/infomap/corpus/kjv10.txt test1 > > which took the new value into account, but again the associate command > doesn't work. Can anyone shed any light on what is going wrong or on > what I should be doing but don't yet know about? > > Rgds, > > Alan J. Salmoni. > > This is the output of the infomap-build command (with SVD-ITER changed to= 400): > > Sourcing param file "/usr/local/share/infomap-nlp/default-params" > Sourcing extra param file "/tmp/infomap-build.kyInQJ" > Contents are: > SVD_ITER=3D400 > Removing extra param file > WORKING_DATA_DIR =3D "/Users/alan/Projects/infomap/corpus/models/test1" > CORPUS_DIR =3D "/Users/alan/Projects/infomap/corpus" > CORPUS_FILE =3D "/Users/alan/Projects/infomap/corpus/kjv10.txt" > FNAMES_FILE =3D "" > ROWS =3D "20000" > COLUMNS =3D "1000" > SINGVALS =3D "100" > SVD_ITER =3D "400" > PRE_CONTEXT_SIZE =3D "15" > POST_CONTEXT_SIZE =3D "15" > WRITE_MATLAB_FORMAT =3D "0" > VALID_CHARS_FILE =3D "/usr/local/share/infomap-nlp/valid_chars.en" > STOPLIST_FILE =3D "/usr/local/share/infomap-nlp/stop.list" > COL_LABELS_FROM_FILE =3D "0" > COL_LABEL_FILE =3D "" > echo "Making datadir" > Making datadir > mkdir -p /Users/alan/Projects/infomap/corpus/models/test1 > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D > Building target: /Users/alan/Projects/infomap/corpus/models/test1/wordlis= t > Prerequisites: /Users/alan/Projects/infomap/corpus/kjv10.txt > Sat Oct 1 15:49:47 BST 2005 > .................................................. > prepare_corpus \ > -cdir "/Users/alan/Projects/infomap/corpus" \ > -mdir "/Users/alan/Projects/infomap/corpus/models/test1" \ > -cfile "/Users/alan/Projects/infomap/corpus/kjv10.txt" \ > -fnfile "" \ > -chfile "/usr/local/share/infomap-nlp/valid_chars.en" \ > -slfile "/usr/local/share/infomap-nlp/stop.list" \ > -rptfile "" > Locale set to en_US. > Opening File for "r": > "/usr/local/share/infomap-nlp/valid_chars.en" > Opening File for "r": > "" > my_fopen: No such file or directory > Opening File for "r": > "/usr/local/share/infomap-nlp/stop.list" > Opening File for "w": > "/Users/alan/Projects/infomap/corpus/models/test1/wordlist" > Opening File for "r": > "/Users/alan/Projects/infomap/corpus/kjv10.txt" > Opening File for "w": > "/Users/alan/Projects/infomap/corpus/models/test1/numDocs" > Typecount =3D 0 > Preparing to sort ... Sorting ... Done. > Opening File for "w": > "/Users/alan/Projects/infomap/corpus/models/test1/dic" > .................................................. > Finishing target: /Users/alan/Projects/infomap/corpus/models/test1/wordli= st > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D > Building target: /Users/alan/Projects/infomap/corpus/models/test1/coll > Prerequisites: /Users/alan/Projects/infomap/corpus/models/test1/wordlist > /Users/alan/Projects/infomap/corpus/models/test1/dic > /Users/alan/Projects/infomap/corpus/models/test1/numDocs > Sat Oct 1 15:49:48 BST 2005 > .................................................. > count_wordvec \ > -mdir /Users/alan/Projects/infomap/corpus/models/test1 \ > -matlab 0 \ > -precontext 15 \ > -postcontext 15 \ > -rows 20000 \ > -columns 1000 \ > -col_labels_from_file 0 \ > -col_label_file "" > model data dir is "/Users/alan/Projects/infomap/corpus/models/test1". > count_wordvec.c: looking for 0 rows > which had better match 0 > Reading the dictionary... Opening File for "r": > "/Users/alan/Projects/infomap/corpus/models/test1/dic" > Opening File for "r": > "/Users/alan/Projects/infomap/corpus/models/test1/numDocs" > Initializing row indices...Done. > Initializing column indices...Done. > Allocating matrix memory...done. > Initializing matrix...done. > model data dir is "/Users/alan/Projects/infomap/corpus/models/test1". > count_wordvec.c: about to call process_wordlist > Entering process_wordlist. > About to call initialize_wordlist. > Opening File for "r": > "/Users/alan/Projects/infomap/corpus/models/test1/wordlist" > Returned from initialize_wordlist. > Writing the co-occurrence matrix. > Entering write_matrix_svd; rows =3D 0 and columns =3D 1000. > Opening File for "w": > "/Users/alan/Projects/infomap/corpus/models/test1/coll" > Opening File for "w": > "/Users/alan/Projects/infomap/corpus/models/test1/indx" > .................................................. > Finishing target: /Users/alan/Projects/infomap/corpus/models/test1/coll > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D > Building target: /Users/alan/Projects/infomap/corpus/models/test1/left > Prerequisites: /Users/alan/Projects/infomap/corpus/models/test1/coll > /Users/alan/Projects/infomap/corpus/models/test1/indx > Sat Oct 1 15:49:48 BST 2005 > .................................................. > cd /Users/alan/Projects/infomap/corpus/models/test1 && rm -f svd_diag lef= t \ > rght sing > cd /Users/alan/Projects/infomap/corpus/models/test1 && svdinterface \ > -singvals 100 \ > -iter 400 > > This is svdinterface. > > Writing to: left > Writing to: rght > Writing to: sing > Writing to: svd_diag > Reading: indx > Reading: indx > Reading: coll > > FEWER THAN EXPECTED SINGULAR VALUES > .................................................. > Finishing target: /Users/alan/Projects/infomap/corpus/models/test1/left > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D > Building target: /Users/alan/Projects/infomap/corpus/models/test1/wordvec= .bin > Prerequisites: /Users/alan/Projects/infomap/corpus/models/test1/left > /Users/alan/Projects/infomap/corpus/models/test1/dic > Sat Oct 1 15:49:49 BST 2005 > .................................................. > encode_wordvec \ > -m /Users/alan/Projects/infomap/corpus/models/test1 > Opening File for "r": > "/Users/alan/Projects/infomap/corpus/models/test1/left" > Opening File for "w": > "/Users/alan/Projects/infomap/corpus/models/test1/wordvec.bin" > Reading the dictionary... > Opening File for "r": > "/Users/alan/Projects/infomap/corpus/models/test1/dic" > Initializing row indices...Done. > .................................................. > Finishing target: /Users/alan/Projects/infomap/corpus/models/test1/wordve= c.bin > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D > Building target: /Users/alan/Projects/infomap/corpus/models/test1/artvec.= bin > Prerequisites: /Users/alan/Projects/infomap/corpus/models/test1/wordvec.b= in > /Users/alan/Projects/infomap/corpus/models/test1/wordlist > /Users/alan/Projects/infomap/corpus/models/test1/dic > /Users/alan/Projects/infomap/corpus/models/test1/numDocs > Sat Oct 1 15:49:49 BST 2005 > .................................................. > count_artvec -m /Users/alan/Projects/infomap/corpus/models/test1 > Opening File for "w": > "/Users/alan/Projects/infomap/corpus/models/test1/artvec.bin" > Opening File for "r": > "/Users/alan/Projects/infomap/corpus/models/test1/numDocs" > Reading the dictionary... Opening File for "r": > "/Users/alan/Projects/infomap/corpus/models/test1/dic" > Initializing row indices...Done. > Allocating matrix memory...done. > Initializing matrix...done. > Opening File for "r": > "/Users/alan/Projects/infomap/corpus/models/test1/wordvec.bin" > count_artvec.c: about to read 0 rows from wordvector file. > Entering process_wordlist. > About to call initialize_wordlist. > Opening File for "r": > "/Users/alan/Projects/infomap/corpus/models/test1/wordlist" > Returned from initialize_wordlist. > .................................................. > Finishing target: /Users/alan/Projects/infomap/corpus/models/test1/artvec= .bin > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D > Building target: > /Users/alan/Projects/infomap/corpus/models/test1/model_params.txt > Prerequisites: /Users/alan/Projects/infomap/corpus/models/test1/model_par= ams.bin > /Users/alan/Projects/infomap/corpus/models/test1/model_info.bin > /Users/alan/Projects/infomap/corpus/models/test1/corpus_format.bin > Sat Oct 1 15:49:49 BST 2005 > .................................................. > write_text_params -mdir /Users/alan/Projects/infomap/corpus/models/test1 > .................................................. > Finishing target: > /Users/alan/Projects/infomap/corpus/models/test1/model_params.txt > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D > > > ------------------------------------------------------- > This SF.Net email is sponsored by: > Power Architecture Resource Center: Free content, downloads, discussions, > and more. http://solutions.newsforge.com/ibmarch.tmpl > _______________________________________________ > infomap-nlp-users mailing list > inf...@li... > https://lists.sourceforge.net/lists/listinfo/infomap-nlp-users > |