From: Trevor C. <tr...@ya...> - 2005-02-11 23:24:40
|
thanks michiel - i've noticed the wordlist file generated in the <modeltag> subdirectory has document labels, so they could be collected there also. still need to check all of this for consistency somehow, though... regards, trevor ------------- Maybe it will not help you, but I have similar problems with print_doc; that is, it states Segmentation fault, core dumped" and puts the following in a text-file: "Exception: STATUS_ACCESS_VIOLATION at eip=610D3DA0 eax=61792D11 ebx=61792D0C ecx=61792D11 edx=00000000 esi=10010210 edi=61792D11 ebp=0022E308 esp=0022E2FC program=F:\Cygwin\usr\local\bin\print_doc.exe, pid 1912, thread main cs=001B ds=0023 es=0023 fs=0038 gs=0000 ss=0023 Stack trace: Frame Function Args 0022E308 610D3DA0 (61792D11, 00000000, 0022E328, 6100650A) 0022E328 610D3C35 (61792D0C, 00000000, 00000000, 00000000) 0022EFC8 004010E5 (00000007, 61792C88, 100100A8, 0022F020) 0022F008 61006145 (0022F020, 0022F31C, 77F64EAC, 0022F340) 0022FF88 61006350 (00000000, 00000000, 00000000, 00000000) End of stack trace" ---------- As you might notice, I'm using cygwin, which works, except for print_doc. Would it be possible to take the doc.id and, using that, find back the title without using print_doc? What I'm currently doing is using a side-route: the document id's seem to increase as a function of their placement in the corpus-file. So, what I'm doing is feeding the model a word of which I know is present in every document, thus feeding back the output. Then, I sort the doc.id's, and put them next to the list of files in the corpus (I have all documents appended to another at an earlier stage) and 'bind' the id's to the title. Is this a valid approach? If it is, maybe it's useful to Trevor as well. Thanks, Mich ___________________________________________________________ ALL-NEW Yahoo! Messenger - all new features - even more fun! http://uk.messenger.yahoo.com |