Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
README | 2014-01-31 | 845 Bytes | |
dump_signatures.tar.gz | 2014-01-31 | 1.0 MB | |
clueweb12.cfg.gz | 2014-01-31 | 188.4 kB | |
clueweb09.cfg.gz | 2014-01-31 | 76.4 kB | |
clueweb09_output.txt | 2014-01-27 | 1.7 kB | |
clueweb12_output.txt | 2014-01-27 | 1.7 kB | |
clueweb12_level2_stats.csv.gz | 2014-01-26 | 6.1 MB | |
clueweb12_level1_stats.csv.gz | 2014-01-26 | 13.9 kB | |
clueweb12_level1_clusters.csv.gz | 2014-01-26 | 5.5 GB | |
clueweb09_level2_stats.csv.gz | 2014-01-26 | 7.0 MB | |
clueweb09_level2_clusters.csv.gz | 2014-01-26 | 4.5 GB | |
clueweb09_level1_stats.csv.gz | 2014-01-26 | 13.6 kB | |
clueweb09_level1_clusters.csv.gz | 2014-01-26 | 3.9 GB | |
Totals: 13 Items | 13.9 GB | 0 |
Clusters - http://github.com/cmdevries/LMW-tree ----------------------------------------------- The *cluster.csv.gz files contain a mapping from document ID to cluster ID. The *stats.csv.gz files contain the tree structure and other statistics. The first row of each file describes the data contained in the columns. The *output.txt files contain the output of the LMW-tree program running the streaming and parallel EM-tree algorithm. Includes running times and other statistics. TopSig related - http://topsig.googlecode.com --------------------------------------------- The *.cfg.gz files contain the TopSig configuration files used when indexing the ClueWeb 2009 and 2012 collections. The dump_signatures.tar.gz contains a program to convert the TopSig index into a format that is easier to load for other programs such as LMW-tree.