From: Mich <pse...@zo...> - 2005-03-03 19:52:32
|
Hello Trevor, Wednesday, March 2, 2005, 8:04:37 PM, you wrote: TC> Hello and thank you to the developers of infomap, it is TC> proving most useful. I have an issue regarding document x document TC> comparison. Looking through the archive, I found this comment: TC> TC> "To actually compare two documents, you can use "associate -q TC> -i d ..." for each of the two documents to obtain their vector TC> representations, and then simply compute the scalar product." TC> This would be great, but the 0.8.5 version of infomap that TC> i'm using has an associate that doesn't take the "-i" option. TC> TC> Is there a way in this version to compare documents to one TC> another or retreive their vectors? It was me who originally asked this question concerning document x document comparisons. Incidentally, someone approached me today concerning a case of possible plagiarism and if infomap could be used for this. My idea is it could, but mostly using something such as you mentioned there - ie, if you have enough documents in your corpus - including the suspected case of plagiarism, it should be possible to find out which document yields the greatest similarity. Anyway, the suggestion was put forward by entering the document of comparison as a full plaintext parameter in the 'associate' program. (ie: associate ... -d [document as plaintext]". This would be a 'crude way' of doing comparisons. Let me know if you find anything, and if you succeed in getting those vector representations and scalar products. Cheers, Mich |