Download Latest Version compiled.zip (1.3 MB)
Email in envelope

Get an email when there's a new version of RedLDA

Home
Name Modified Size InfoDownloads / Week
compiled.zip 2014-01-05 1.3 MB
readme.txt 2014-01-05 1.2 kB
RedLdaGibbsSampler.java 2014-01-05 21.6 kB
preprocessRedLDA.py 2014-01-05 12.3 kB
lineBasedFingerPrint.py 2014-01-05 2.8 kB
Totals: 5 Items   1.3 MB 0
(C) Copyright Raphael Cohen, Iddo Aviram, Michael and Noemie Elhadad 2013
RedLdaGibbsSampler is a suite for topic modeling of corpora with redundancy.

License:
RedLdaGibbsSampler is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the Free
Software Foundation; either version 2 of the License, or (at your option) any
later version. Based on Heinreich LdaGibbsSampler. WITHOUT ANY WARRANTY.


USAGE:
> python preprocessRedLDA.py [directory with documents]
> java RedLdaGibbsSampler [corpus-name] [K-number of topics] [number of iterations] [alpha] [beta] [hyprtparameter optimization- true/false]
 
Example usage (toy corpus provided):
python preprocessRedLDA.py toycorpus
java RedLdaGibbsSampler toycorpus 3 1500 0.5 0.1 true

Step 1 will identify the document clusters and copied tokens.
Step 2 will run gibbs sampling.

Topic modeling produces 4 output files:
Topic distribution per document - [corpus].doctopic.dist
Topic distribution per word - [corpus].topicword.dist
Topic word counts - [corpus].toycorpus.wordCountPerTopic
Word assignments (document per line) - [corpus].wordAssignments




 
 
Source: readme.txt, updated 2014-01-05