Name | Modified | Size | Downloads / Week |
---|---|---|---|
compiled.zip | 2014-01-05 | 1.3 MB | |
readme.txt | 2014-01-05 | 1.2 kB | |
RedLdaGibbsSampler.java | 2014-01-05 | 21.6 kB | |
preprocessRedLDA.py | 2014-01-05 | 12.3 kB | |
lineBasedFingerPrint.py | 2014-01-05 | 2.8 kB | |
Totals: 5 Items | 1.3 MB | 0 |
(C) Copyright Raphael Cohen, Iddo Aviram, Michael and Noemie Elhadad 2013 RedLdaGibbsSampler is a suite for topic modeling of corpora with redundancy. License: RedLdaGibbsSampler is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. Based on Heinreich LdaGibbsSampler. WITHOUT ANY WARRANTY. USAGE: > python preprocessRedLDA.py [directory with documents] > java RedLdaGibbsSampler [corpus-name] [K-number of topics] [number of iterations] [alpha] [beta] [hyprtparameter optimization- true/false] Example usage (toy corpus provided): python preprocessRedLDA.py toycorpus java RedLdaGibbsSampler toycorpus 3 1500 0.5 0.1 true Step 1 will identify the document clusters and copied tokens. Step 2 will run gibbs sampling. Topic modeling produces 4 output files: Topic distribution per document - [corpus].doctopic.dist Topic distribution per word - [corpus].topicword.dist Topic word counts - [corpus].toycorpus.wordCountPerTopic Word assignments (document per line) - [corpus].wordAssignments