RedLDA - Browse Files at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size
compiled.zip	2014-01-05	1.3 MB
readme.txt	2014-01-05	1.2 kB
RedLdaGibbsSampler.java	2014-01-05	21.6 kB
preprocessRedLDA.py	2014-01-05	12.3 kB
lineBasedFingerPrint.py	2014-01-05	2.8 kB
Totals: 5 Items		1.3 MB

(C) Copyright Raphael Cohen, Iddo Aviram, Michael and Noemie Elhadad 2013
RedLdaGibbsSampler is a suite for topic modeling of corpora with redundancy.

License:
RedLdaGibbsSampler is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the Free
Software Foundation; either version 2 of the License, or (at your option) any
later version. Based on Heinreich LdaGibbsSampler. WITHOUT ANY WARRANTY.


USAGE:
> python preprocessRedLDA.py [directory with documents]
> java RedLdaGibbsSampler [corpus-name] [K-number of topics] [number of iterations] [alpha] [beta] [hyprtparameter optimization- true/false]
 
Example usage (toy corpus provided):
python preprocessRedLDA.py toycorpus
java RedLdaGibbsSampler toycorpus 3 1500 0.5 0.1 true

Step 1 will identify the document clusters and copied tokens.
Step 2 will run gibbs sampling.

Topic modeling produces 4 output files:
Topic distribution per document - [corpus].doctopic.dist
Topic distribution per word - [corpus].topicword.dist
Topic word counts - [corpus].toycorpus.wordCountPerTopic
Word assignments (document per line) - [corpus].wordAssignments

Source: readme.txt, updated 2014-01-05

RedLDA Files

Redundancy Aware LDA Gibbs Sampler

RedLDA Files

Redundancy Aware LDA Gibbs Sampler

Get an email when there's a new version of RedLDA