Download Latest Version example.zip (1.6 MB)
Email in envelope

Get an email when there's a new version of Topic Model Alignment

Home
Name Modified Size InfoDownloads / Week
example.zip 2012-12-03 1.6 MB
topicsJS-alignement.py 2012-12-03 2.8 kB
README.txt 2012-12-02 1.5 kB
Totals: 3 Items   1.6 MB 0
Created by: Raphael Cohen
Date: Dec 2, 2012

This script aligns two topic models produced by MALLET (http://mallet.cs.umass.edu/)
Reciprocal topic pairs are reported with JS divergence measure.
Reciprocal pair (i,j) is defined when the distance of topic i from the first model (M1) and topic j from the second model (M2)
is minimal for all pairs (i,k) for k in M2  and (l,j) for l in M1 (best match for both topics).

This is useful for:
1. Qualitatively comparing different modeling parameters or algorithms
2. Identifying stable topics when running a few times

Input are two topic-state gz files produced by MALLET 
for example after running:
[MALLET DIR]/bin/mallet train-topics --input data.mallet --num-topics 25 --num-iterations 2000 --output-state topic-state.gz

Feel free to use / change this code at your own risk.


USAGE:
%python JS-divergence.py topic-state1.gz topic-state2.gz

Option 2 - specify smoothing factor

%python JS-divergence.py topic-state1.gz topic-state2.gz 0.0000001


Result (example):
   JS Divergence      t1    t2
(0.5524645751867814, '15', '11')
(0.1312120698128315, '20', '10')
(0.06103903882230567, '24', '12')
(0.03749075669779891, '6', '20')
(0.09601937201648371, '18', '19')
(0.025672544059170105, '9', '18')
(0.1120611237407785, '2', '3')
(0.11165026591229285, '10', '24')
(0.05849937442765494, '3', '5')
(0.1523135314850376, '23', '6')
(0.15335010058956877, '22', '9')
(0.026982916171330196, '11', '8')

Source: README.txt, updated 2012-12-02