Re: [infomap-nlp-users] document x document

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi!

> My question, however, concerns if it
> would be possible to do document to document comparisons. Could somebody
> here provide me with info on this?

To answer Mich's question:

It is already possible to use documents as queries.
By specifying the option "-i d" (for "input is document"), "associate" 
expects document identifiers rather than words as query "terms".

In the case of a single-file corpus, document identifiers are the offsets 
in the corpus (the numbers enclosed in <f> and </f> in the wordlist file), 
and in case of a multiple-file corpus, in which each file constitutes a 
document, the identifiers are the names of the files.

E.g., "associate -d -i d ... <doc_1> .. <doc_k> NOT <doc_k+1> .. 
<doc_k+n>" will return documents which are similar to doc_1 .. doc_k and 
dissimilar to doc_k+1 .. doc_k+n, where the doc_i are the identifiers of 
the corresponding documents.

To actually compare two documents, you can use "associate -q -i d ..." for 
each of the two documents to obtain their vector representations, and then 
simply compute the scalar product.

Best,
Beate