Download Latest Version pyktree-0.4.1.tar.gz (22.0 kB)
Email in envelope

Get an email when there's a new version of ktree

Home / clueweb_clusters
Name Modified Size InfoDownloads / Week
Parent folder
README 2014-01-31 845 Bytes
dump_signatures.tar.gz 2014-01-31 1.0 MB
clueweb12.cfg.gz 2014-01-31 188.4 kB
clueweb09.cfg.gz 2014-01-31 76.4 kB
clueweb09_output.txt 2014-01-27 1.7 kB
clueweb12_output.txt 2014-01-27 1.7 kB
clueweb12_level2_stats.csv.gz 2014-01-26 6.1 MB
clueweb12_level1_stats.csv.gz 2014-01-26 13.9 kB
clueweb12_level1_clusters.csv.gz 2014-01-26 5.5 GB
clueweb09_level2_stats.csv.gz 2014-01-26 7.0 MB
clueweb09_level2_clusters.csv.gz 2014-01-26 4.5 GB
clueweb09_level1_stats.csv.gz 2014-01-26 13.6 kB
clueweb09_level1_clusters.csv.gz 2014-01-26 3.9 GB
Totals: 13 Items   13.9 GB 0
Clusters - http://github.com/cmdevries/LMW-tree
-----------------------------------------------

The *cluster.csv.gz files contain a mapping from document ID to cluster ID.

The *stats.csv.gz files contain the tree structure and other statistics.

The first row of each file describes the data contained in the columns.

The *output.txt files contain the output of the LMW-tree program running the
streaming and parallel EM-tree algorithm. Includes running times and other
statistics.


TopSig related - http://topsig.googlecode.com
---------------------------------------------

The *.cfg.gz files contain the TopSig configuration files used when indexing
the ClueWeb 2009 and 2012 collections.

The dump_signatures.tar.gz contains a program to convert the TopSig index into
a format that is easier to load for other programs such as LMW-tree.
Source: README, updated 2014-01-31