From: Beate D. <do...@IM...> - 2004-10-27 16:28:28
|
Hi! > My question, however, concerns if it > would be possible to do document to document comparisons. Could somebody > here provide me with info on this? To answer Mich's question: It is already possible to use documents as queries. By specifying the option "-i d" (for "input is document"), "associate" expects document identifiers rather than words as query "terms". In the case of a single-file corpus, document identifiers are the offsets in the corpus (the numbers enclosed in <f> and </f> in the wordlist file), and in case of a multiple-file corpus, in which each file constitutes a document, the identifiers are the names of the files. E.g., "associate -d -i d ... <doc_1> .. <doc_k> NOT <doc_k+1> .. <doc_k+n>" will return documents which are similar to doc_1 .. doc_k and dissimilar to doc_k+1 .. doc_k+n, where the doc_i are the identifiers of the corresponding documents. To actually compare two documents, you can use "associate -q -i d ..." for each of the two documents to obtain their vector representations, and then simply compute the scalar product. Best, Beate |