Re: [infomap-nlp-users] Document vectors

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

> There must be an easier way, but I think not many people will be
> interested in the raw document vectors (or am I wrong)?

Hi Menno,

It sounds like your work-around to get the document vectors is pretty
effective, though as you say there should be an easier way.

For word and query vectors there's an "associate -q" option which simply
prints out the query vector rather than performing a search. One way I've
often used to get document vectors is simply to pass the whole document as
an argument to "associate -q", which is pretty unsatisfactory though it
does have the benefit that you can get document vectors for textfiles that
weren't in your original corpus.

If the "associate -q" option was combined with the "associate_doc"
function Beate described, this would solve the problem properly, and I
could see benefits to making this available (eg. for work on document
clustering). It sounds as though you've already got a workable solution,
but if enough other people on the list express an interest we should look
into it.

I'm delighted to hear about people using the infomap software as part of a
richer and more complex system of features - I'd be interested to hear
more about your work whenever you are ready.

Best wishes,
Dominic