Topic Model Alignment - Browse Files at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size
example.zip	2012-12-03	1.6 MB
topicsJS-alignement.py	2012-12-03	2.8 kB
README.txt	2012-12-02	1.5 kB
Totals: 3 Items		1.6 MB

Created by: Raphael Cohen
Date: Dec 2, 2012

This script aligns two topic models produced by MALLET (http://mallet.cs.umass.edu/)
Reciprocal topic pairs are reported with JS divergence measure.
Reciprocal pair (i,j) is defined when the distance of topic i from the first model (M1) and topic j from the second model (M2)
is minimal for all pairs (i,k) for k in M2  and (l,j) for l in M1 (best match for both topics).

This is useful for:
1. Qualitatively comparing different modeling parameters or algorithms
2. Identifying stable topics when running a few times

Input are two topic-state gz files produced by MALLET 
for example after running:
[MALLET DIR]/bin/mallet train-topics --input data.mallet --num-topics 25 --num-iterations 2000 --output-state topic-state.gz

Feel free to use / change this code at your own risk.


USAGE:
%python JS-divergence.py topic-state1.gz topic-state2.gz

Option 2 - specify smoothing factor

%python JS-divergence.py topic-state1.gz topic-state2.gz 0.0000001


Result (example):
   JS Divergence      t1    t2
(0.5524645751867814, '15', '11')
(0.1312120698128315, '20', '10')
(0.06103903882230567, '24', '12')
(0.03749075669779891, '6', '20')
(0.09601937201648371, '18', '19')
(0.025672544059170105, '9', '18')
(0.1120611237407785, '2', '3')
(0.11165026591229285, '10', '24')
(0.05849937442765494, '3', '5')
(0.1523135314850376, '23', '6')
(0.15335010058956877, '22', '9')
(0.026982916171330196, '11', '8')

Source: README.txt, updated 2012-12-02

Topic Model Alignment Files

Aligns two LDA topic models

Topic Model Alignment Files

Aligns two LDA topic models

Get an email when there's a new version of Topic Model Alignment