<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Recent changes to models</title><link>https://sourceforge.net/p/jobimtext/wiki/models/</link><description>Recent changes to models</description><atom:link href="https://sourceforge.net/p/jobimtext/wiki/models/feed" rel="self"/><language>en</language><lastBuildDate>Tue, 08 Apr 2014 11:53:34 -0000</lastBuildDate><atom:link href="https://sourceforge.net/p/jobimtext/wiki/models/feed" rel="self" type="application/rss+xml"/><item><title>models modified by Chris Biemann</title><link>https://sourceforge.net/p/jobimtext/wiki/models/</link><description>&lt;div class="markdown_content"&gt;&lt;pre&gt;--- v23
+++ v24
@@ -34,4 +34,4 @@
 #Sense Clusters
 Dataset |Download| Holing System|Clustering
 ------|--------|--------|--------
-news120M|[I](http://sourceforge.net/projects/jobimtext/files/data/sensecluster/news120M_sense_cluster_cw.tgz/download)|Stanford Parser &amp; Lemma|Chinese Whisperer
+news120M|[I](http://sourceforge.net/projects/jobimtext/files/data/sensecluster/news120M_sense_cluster_cw.tgz/download)|Stanford Parser &amp; Lemma|Chinese Whispers
&lt;/pre&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Chris Biemann</dc:creator><pubDate>Tue, 08 Apr 2014 11:53:34 -0000</pubDate><guid>https://sourceforge.neteecf3edf26d2d7a806cc958fc5d1cadd9fab023e</guid></item><item><title>models modified by Eugen Ruppert</title><link>https://sourceforge.net/p/jobimtext/wiki/models/</link><description>&lt;div class="markdown_content"&gt;&lt;pre&gt;--- v22
+++ v23
@@ -3,6 +3,8 @@

 This page lists the datasets and models which are available at:
 https://sourceforge.net/projects/jobimtext/files/data/
+
+We also computed Models for various time slices on [Google Books](http://sourceforge.net/p/jobimtext/wiki/LREC2014_Google_DT/) data.

 [TOC]

&lt;/pre&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Eugen Ruppert</dc:creator><pubDate>Wed, 05 Mar 2014 15:02:24 -0000</pubDate><guid>https://sourceforge.netec370006ec34361b10cc3aba91534d0d8dc21ab0</guid></item><item><title>models modified by Martin Riedl</title><link>https://sourceforge.net/p/jobimtext/wiki/models/</link><description>&lt;div class="markdown_content"&gt;&lt;pre&gt;--- v21
+++ v22
@@ -31,5 +31,5 @@

 #Sense Clusters
 Dataset |Download| Holing System|Clustering
-------|--------|--------
+------|--------|--------|--------
 news120M|[I](http://sourceforge.net/projects/jobimtext/files/data/sensecluster/news120M_sense_cluster_cw.tgz/download)|Stanford Parser &amp; Lemma|Chinese Whisperer
&lt;/pre&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Martin Riedl</dc:creator><pubDate>Fri, 19 Jul 2013 17:47:07 -0000</pubDate><guid>https://sourceforge.netc969dc9558199514476c7cf472406b43ce3f029f</guid></item><item><title>models modified by Martin Riedl</title><link>https://sourceforge.net/p/jobimtext/wiki/models/</link><description>&lt;div class="markdown_content"&gt;&lt;pre&gt;--- v20
+++ v21
@@ -1,4 +1,4 @@
-Data and Models
+Datasets and Precomputed Models
 =======

 This page lists the datasets and models which are available at:
@@ -13,7 +13,7 @@
 - **[en_wikipedia](http://sourceforge.net/projects/jobimtext/files/data/dataset/dataset_wikipedia_en.tar.gz/download)**: This dataset is constructed using English Wikipedia. It consists of 35.9 million sentences.
 - **[en_google_books](http://commondatastorage.googleapis.com/books/syntactic-ngrams/index.html)**: This dataset is constructed by Yoav Goldberg ([A Dataset of Syntactc-Ngrams over Time from a Very Large Corpus of English Books, *SEM 2013](http://commondatastorage.googleapis.com/books/syntactic-ngrams/syntngrams.final.pdf)) and can be downloaded [here](http://commondatastorage.googleapis.com/books/syntactic-ngrams/index.html).

-#Models
+#Distributional Thesaurus Models

 Dataset |Download| Holing System|Word Count | Feature Count | Word Feature Count| Word Feature Significances | Similarities
@@ -27,3 +27,9 @@
 en_news120M pruned | [I](http://sourceforge.net/projects/jobimtext/files/data/models/news120M_3gram2_pruned.zip/download) | 3gram w. hole at pos. 2 | all |none|none|top 1000/word for 100k frequent words|top 200 for 100k frequent words
 de_news70M  pruned | [I](http://sourceforge.net/projects/jobimtext/files/data/models/de_news70M_pruned.zip/download)| 3gram w. hole at pos. 2 | all |none|none|top 1000/word for 100k frequent words|top 200 for 100k frequent words
 google_books|[I](http://sourceforge.net/projects/jobimtext/files/data/models/google_books_top_1M_words.tar.gz/download)| dependency parses| all | none | none| top 1000/word for words with wordcount &gt; 100| top 200 for words with wordcount &gt; 100
+
+
+#Sense Clusters
+Dataset |Download| Holing System|Clustering
+------|--------|--------
+news120M|[I](http://sourceforge.net/projects/jobimtext/files/data/sensecluster/news120M_sense_cluster_cw.tgz/download)|Stanford Parser &amp; Lemma|Chinese Whisperer
&lt;/pre&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Martin Riedl</dc:creator><pubDate>Fri, 19 Jul 2013 17:46:44 -0000</pubDate><guid>https://sourceforge.net7a9a570bf7b302e18b24334a9fd9411e213f47f8</guid></item><item><title>models modified by Martin Riedl</title><link>https://sourceforge.net/p/jobimtext/wiki/models/</link><description>&lt;div class="markdown_content"&gt;&lt;pre&gt;--- v19
+++ v20
@@ -26,4 +26,4 @@
 en_news120M pruned | [I](http://sourceforge.net/projects/jobimtext/files/data/models/news120M_stanford_lemma_np_pruned.zip/download)| StanfordParser &amp; Lemma | all |none |none|top 1000/word for 100k frequent words|top 200 for 100k frequent words
 en_news120M pruned | [I](http://sourceforge.net/projects/jobimtext/files/data/models/news120M_3gram2_pruned.zip/download) | 3gram w. hole at pos. 2 | all |none|none|top 1000/word for 100k frequent words|top 200 for 100k frequent words
 de_news70M  pruned | [I](http://sourceforge.net/projects/jobimtext/files/data/models/de_news70M_pruned.zip/download)| 3gram w. hole at pos. 2 | all |none|none|top 1000/word for 100k frequent words|top 200 for 100k frequent words
-google_books|[I](http://sourceforge.net/projects/jobimtext/files/data/models/google_books_top1M_words/download)| dependency parses| all | none | none| top 1000/word for words with wordcount &gt; 100| top 200 for words with wordcount &gt; 100
+google_books|[I](http://sourceforge.net/projects/jobimtext/files/data/models/google_books_top_1M_words.tar.gz/download)| dependency parses| all | none | none| top 1000/word for words with wordcount &gt; 100| top 200 for words with wordcount &gt; 100
&lt;/pre&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Martin Riedl</dc:creator><pubDate>Tue, 09 Jul 2013 13:40:06 -0000</pubDate><guid>https://sourceforge.netb0181c3adcf6e7df6bca9fc019ba59ffbd3b77f4</guid></item><item><title>models modified by Martin Riedl</title><link>https://sourceforge.net/p/jobimtext/wiki/models/</link><description>&lt;div class="markdown_content"&gt;&lt;pre&gt;--- v18
+++ v19
@@ -26,4 +26,4 @@
 en_news120M pruned | [I](http://sourceforge.net/projects/jobimtext/files/data/models/news120M_stanford_lemma_np_pruned.zip/download)| StanfordParser &amp; Lemma | all |none |none|top 1000/word for 100k frequent words|top 200 for 100k frequent words
 en_news120M pruned | [I](http://sourceforge.net/projects/jobimtext/files/data/models/news120M_3gram2_pruned.zip/download) | 3gram w. hole at pos. 2 | all |none|none|top 1000/word for 100k frequent words|top 200 for 100k frequent words
 de_news70M  pruned | [I](http://sourceforge.net/projects/jobimtext/files/data/models/de_news70M_pruned.zip/download)| 3gram w. hole at pos. 2 | all |none|none|top 1000/word for 100k frequent words|top 200 for 100k frequent words
-google_books|[I]()| dependency parses| all | none | none| top 1000/word for words with wordcount &gt; 100| top 200 for words with wordcount &gt; 100
+google_books|[I](http://sourceforge.net/projects/jobimtext/files/data/models/google_books_top1M_words/download)| dependency parses| all | none | none| top 1000/word for words with wordcount &gt; 100| top 200 for words with wordcount &gt; 100
&lt;/pre&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Martin Riedl</dc:creator><pubDate>Mon, 08 Jul 2013 13:47:12 -0000</pubDate><guid>https://sourceforge.net4a011a3db484b1699aeb9a3dc0285b60489d5ac4</guid></item><item><title>models modified by Martin Riedl</title><link>https://sourceforge.net/p/jobimtext/wiki/models/</link><description>&lt;div class="markdown_content"&gt;&lt;pre&gt;--- v17
+++ v18
@@ -7,11 +7,11 @@
 [TOC]

 #Datasets
-There are two datasets [available](http://sourceforge.net/projects/jobimtext/files/data/dataset) that can be used for calculating a new thesaurus. The format of the files is that each line contains one sentence.
+Following [available](http://sourceforge.net/projects/jobimtext/files/data/dataset) that can be used for calculating a new thesaurus. The format of the files is that each line contains one sentence.

 - **[en_news10M](http://sourceforge.net/projects/jobimtext/files/data/dataset/dataset_news10M.gz/download)**: This dataset is taken from [LCC](http://corpora.uni-leipzig.de/download.html). It consists of 10 million English sentences taken from news web pages.
 - **[en_wikipedia](http://sourceforge.net/projects/jobimtext/files/data/dataset/dataset_wikipedia_en.tar.gz/download)**: This dataset is constructed using English Wikipedia. It consists of 35.9 million sentences.
-
+- **[en_google_books](http://commondatastorage.googleapis.com/books/syntactic-ngrams/index.html)**: This dataset is constructed by Yoav Goldberg ([A Dataset of Syntactc-Ngrams over Time from a Very Large Corpus of English Books, *SEM 2013](http://commondatastorage.googleapis.com/books/syntactic-ngrams/syntngrams.final.pdf)) and can be downloaded [here](http://commondatastorage.googleapis.com/books/syntactic-ngrams/index.html).

 #Models

@@ -26,3 +26,4 @@
 en_news120M pruned | [I](http://sourceforge.net/projects/jobimtext/files/data/models/news120M_stanford_lemma_np_pruned.zip/download)| StanfordParser &amp; Lemma | all |none |none|top 1000/word for 100k frequent words|top 200 for 100k frequent words
 en_news120M pruned | [I](http://sourceforge.net/projects/jobimtext/files/data/models/news120M_3gram2_pruned.zip/download) | 3gram w. hole at pos. 2 | all |none|none|top 1000/word for 100k frequent words|top 200 for 100k frequent words
 de_news70M  pruned | [I](http://sourceforge.net/projects/jobimtext/files/data/models/de_news70M_pruned.zip/download)| 3gram w. hole at pos. 2 | all |none|none|top 1000/word for 100k frequent words|top 200 for 100k frequent words
+google_books|[I]()| dependency parses| all | none | none| top 1000/word for words with wordcount &gt; 100| top 200 for words with wordcount &gt; 100
&lt;/pre&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Martin Riedl</dc:creator><pubDate>Mon, 08 Jul 2013 13:45:31 -0000</pubDate><guid>https://sourceforge.netf948a31372eed08ade3563d9f810535452f23c1e</guid></item><item><title>models modified by Martin Riedl</title><link>https://sourceforge.net/p/jobimtext/wiki/models/</link><description>&lt;div class="markdown_content"&gt;&lt;pre&gt;--- v16
+++ v17
@@ -22,7 +22,7 @@
 en_wikipedia | [I](http://sourceforge.net/projects/jobimtext/files/data/models/wikipedia_en_malt_demo.tar.gz/download) | Malt Parser &amp; Lemmatized|none|none |none|top 1000/word|top 100
 en_news10M |[I](http://sourceforge.net/projects/jobimtext/files/data/models/similarities-news10M-maltparser.tar.gz/download) | Malt Parser &amp; Lemmatized| all|all |all|all|top 200
 en_news120M | [I](http://sourceforge.net/projects/jobimtext/files/data/models/similarities-news120M_stanford_lemma_np.tar.gz_1/download) [II](http://sourceforge.net/projects/jobimtext/files/data/models/similarities-news120M_stanford_lemma_np.tar.gz_2/download) [III](http://sourceforge.net/projects/jobimtext/files/data/models/similarities-news120M_stanford_lemma_np.tar.gz_3/download) | Stanford Parser &amp; Lemmatized| all|all |all|all|top 200
-en_wikipedia 
-en_news120M | [I](http://sourceforge.net/projects/jobimtext/files/data/models/news120M_stanford_lemma_np_pruned.zip/download)| StanfordParser &amp; Lemma | all |none |none|top 1000/word for 100k frequent words|top 200 for 100k frequent words
-en_news120M | [I](http://sourceforge.net/projects/jobimtext/files/data/models/news120M_3gram2_pruned.zip/download) | 3gram w. hole at pos. 2 | all |none|none|top 1000/word for 100k frequent words|top 200 for 100k frequent words
-de_news70M  | [I](http://sourceforge.net/projects/jobimtext/files/data/models/de_news70M_pruned.zip/download)| 3gram w. hole at pos. 2 | all |none|none|top 1000/word for 100k frequent words|top 200 for 100k frequent words
+en_news120M pruned mysql | [I](http://sourceforge.net/projects/jobimtext/files/data/models/news120M_stanford_lemma_np_pruned.mysql.gz/download)| StanfordParser &amp; Lemma | all |none |none|top 1000/word for 100k frequent words|top 200 for 100k frequent words
+en_news120M pruned | [I](http://sourceforge.net/projects/jobimtext/files/data/models/news120M_stanford_lemma_np_pruned.zip/download)| StanfordParser &amp; Lemma | all |none |none|top 1000/word for 100k frequent words|top 200 for 100k frequent words
+en_news120M pruned | [I](http://sourceforge.net/projects/jobimtext/files/data/models/news120M_3gram2_pruned.zip/download) | 3gram w. hole at pos. 2 | all |none|none|top 1000/word for 100k frequent words|top 200 for 100k frequent words
+de_news70M  pruned | [I](http://sourceforge.net/projects/jobimtext/files/data/models/de_news70M_pruned.zip/download)| 3gram w. hole at pos. 2 | all |none|none|top 1000/word for 100k frequent words|top 200 for 100k frequent words
&lt;/pre&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Martin Riedl</dc:creator><pubDate>Mon, 03 Jun 2013 09:12:55 -0000</pubDate><guid>https://sourceforge.net8aec6f9ef032bc61bc591f3e7ec46ee1adf56ef7</guid></item><item><title>models modified by Martin Riedl</title><link>https://sourceforge.net/p/jobimtext/wiki/models/</link><description>&lt;div class="markdown_content"&gt;&lt;pre&gt;--- v15
+++ v16
@@ -16,19 +16,12 @@
 #Models

-
-- **en_wikipedia_maltparser [I](http://sourceforge.net/projects/jobimtext/files/data/model/similarities-wiki-maltparser-part-I.tar.gz/download) [II](http://sourceforge.net/projects/jobimtext/files/data/model/similarities-wiki-maltparser-part-II.tar.gz/download)**: This model is computed using the from the English Wikipedia dataset. As "Holing System" dependencies extracted from the [MaltParser](http://www.maltparser.org/) are used.
-
-- **[en_news10M_maltparser](http://sourceforge.net/projects/jobimtext/files/data/model/similarities-news10M-maltparser.tar.gz/download)**: This model is computed using the from the en_news10M dataset. As "Holing System" dependencies extracted from the [MaltParser](http://www.maltparser.org/) are used.
-
-- **en_news120M_stanfordparser [I](http://sourceforge.net/projects/jobimtext/files/data/model/similarities-news120M_stanford_lemma_np.tar.gz_1/download) [II](http://sourceforge.net/projects/jobimtext/files/data/model/similarities-news120M_stanford_lemma_np.tar.gz_2/download) [III](http://sourceforge.net/projects/jobimtext/files/data/model/similarities-news120M_stanford_lemma_np.tar.gz_3/download)**: This thesaurus is calculated based on 120M sentences extracted from news web pages which are taken from [LCC](http://corpora.uni-leipzig.de/download.html). The similarities are computed using the dependencies from the [Stanford Parser](http://nlp.stanford.edu/software/lex-parser.shtml).
-
 Dataset |Download| Holing System|Word Count | Feature Count | Word Feature Count| Word Feature Significances | Similarities
 ------|--------|--------|----------|----|----|-----|---------
-en_wikipedia |[I](http://sourceforge.net/projects/jobimtext/files/data/model/similarities-wiki-maltparser-part-I.tar.gz/download) [II](http://sourceforge.net/projects/jobimtext/files/data/model/similarities-wiki-maltparser-part-II.tar.gz/download) | Malt Parser &amp; Lemmatized| all|all |all|all|top 200
+en_wikipedia |[I](http://sourceforge.net/projects/jobimtext/files/data/models/similarities-wiki-maltparser-part-I.tar.gz/download) [II](http://sourceforge.net/projects/jobimtext/files/data/models/similarities-wiki-maltparser-part-II.tar.gz/download) | Malt Parser &amp; Lemmatized| all|all |all|all|top 200
 en_wikipedia | [I](http://sourceforge.net/projects/jobimtext/files/data/models/wikipedia_en_malt_demo.tar.gz/download) | Malt Parser &amp; Lemmatized|none|none |none|top 1000/word|top 100
-en_news10M |[I](http://sourceforge.net/projects/jobimtext/files/data/model/similarities-news10M-maltparser.tar.gz/download) | Malt Parser &amp; Lemmatized| all|all |all|all|top 200
-en_news120M | [I](http://sourceforge.net/projects/jobimtext/files/data/model/similarities-news120M_stanford_lemma_np.tar.gz_1/download) [II](http://sourceforge.net/projects/jobimtext/files/data/model/similarities-news120M_stanford_lemma_np.tar.gz_2/download) [III](http://sourceforge.net/projects/jobimtext/files/data/model/similarities-news120M_stanford_lemma_np.tar.gz_3/download) | Stanford Parser &amp; Lemmatized| all|all |all|all|top 200
+en_news10M |[I](http://sourceforge.net/projects/jobimtext/files/data/models/similarities-news10M-maltparser.tar.gz/download) | Malt Parser &amp; Lemmatized| all|all |all|all|top 200
+en_news120M | [I](http://sourceforge.net/projects/jobimtext/files/data/models/similarities-news120M_stanford_lemma_np.tar.gz_1/download) [II](http://sourceforge.net/projects/jobimtext/files/data/models/similarities-news120M_stanford_lemma_np.tar.gz_2/download) [III](http://sourceforge.net/projects/jobimtext/files/data/models/similarities-news120M_stanford_lemma_np.tar.gz_3/download) | Stanford Parser &amp; Lemmatized| all|all |all|all|top 200
 en_wikipedia 
 en_news120M | [I](http://sourceforge.net/projects/jobimtext/files/data/models/news120M_stanford_lemma_np_pruned.zip/download)| StanfordParser &amp; Lemma | all |none |none|top 1000/word for 100k frequent words|top 200 for 100k frequent words
 en_news120M | [I](http://sourceforge.net/projects/jobimtext/files/data/models/news120M_3gram2_pruned.zip/download) | 3gram w. hole at pos. 2 | all |none|none|top 1000/word for 100k frequent words|top 200 for 100k frequent words
&lt;/pre&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Martin Riedl</dc:creator><pubDate>Mon, 03 Jun 2013 09:09:56 -0000</pubDate><guid>https://sourceforge.net602e6721c791f69ce3e827f7892113995842dee3</guid></item><item><title>models modified by Martin Riedl</title><link>https://sourceforge.net/p/jobimtext/wiki/models/</link><description>&lt;div class="markdown_content"&gt;&lt;pre&gt;--- v14
+++ v15
@@ -30,6 +30,6 @@
 en_news10M |[I](http://sourceforge.net/projects/jobimtext/files/data/model/similarities-news10M-maltparser.tar.gz/download) | Malt Parser &amp; Lemmatized| all|all |all|all|top 200
 en_news120M | [I](http://sourceforge.net/projects/jobimtext/files/data/model/similarities-news120M_stanford_lemma_np.tar.gz_1/download) [II](http://sourceforge.net/projects/jobimtext/files/data/model/similarities-news120M_stanford_lemma_np.tar.gz_2/download) [III](http://sourceforge.net/projects/jobimtext/files/data/model/similarities-news120M_stanford_lemma_np.tar.gz_3/download) | Stanford Parser &amp; Lemmatized| all|all |all|all|top 200
 en_wikipedia 
-en_news120M | [I](http://sourceforge.net/projects/jobimtext/files/data/models/news120M_stanford_lemma_np_pruned.zip/download)| StanfordParser &amp; Lemma | all |none |none|top 1000/word|top 200 for 100k frequent words
-en_news120M | [I](http://sourceforge.net/projects/jobimtext/files/data/models/news120M_3gram2_pruned.zip/download) | 3gram w. hole at pos. 2 | all |none|none|top 1000/word|top 200 for 100k frequent words
-de_news70M  | [I](http://sourceforge.net/projects/jobimtext/files/data/models/de_news70M_pruned.zip/download)| 3gram w. hole at pos. 2 | all |none|none|top 1000/word|top 200 for 100k frequent words
+en_news120M | [I](http://sourceforge.net/projects/jobimtext/files/data/models/news120M_stanford_lemma_np_pruned.zip/download)| StanfordParser &amp; Lemma | all |none |none|top 1000/word for 100k frequent words|top 200 for 100k frequent words
+en_news120M | [I](http://sourceforge.net/projects/jobimtext/files/data/models/news120M_3gram2_pruned.zip/download) | 3gram w. hole at pos. 2 | all |none|none|top 1000/word for 100k frequent words|top 200 for 100k frequent words
+de_news70M  | [I](http://sourceforge.net/projects/jobimtext/files/data/models/de_news70M_pruned.zip/download)| 3gram w. hole at pos. 2 | all |none|none|top 1000/word for 100k frequent words|top 200 for 100k frequent words
&lt;/pre&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Martin Riedl</dc:creator><pubDate>Thu, 04 Apr 2013 14:41:54 -0000</pubDate><guid>https://sourceforge.net94fdcc98229bd9f8fc6f302ad1e5c6973cfd365d</guid></item></channel></rss>