<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Recent changes to LowLevelModules</title><link>https://sourceforge.net/p/webtextanalysis/wiki/LowLevelModules/</link><description>Recent changes to LowLevelModules</description><atom:link href="https://sourceforge.net/p/webtextanalysis/wiki/LowLevelModules/feed" rel="self"/><language>en</language><lastBuildDate>Tue, 20 May 2014 08:56:26 -0000</lastBuildDate><atom:link href="https://sourceforge.net/p/webtextanalysis/wiki/LowLevelModules/feed" rel="self" type="application/rss+xml"/><item><title>LowLevelModules modified by Kostia</title><link>https://sourceforge.net/p/webtextanalysis/wiki/LowLevelModules/</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;&lt;strong&gt;Page still im progress&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Information about low-level text mining/NLP abilities included with text-analysis.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Text-Analysis includes the following low-level text mining/NLP abilities: &lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Clustering Analysis: kmeans: Mac Queen, J. (1967) &lt;span&gt;[1]&lt;/span&gt;, Hartigan, J. A. and Wong, M. A. (1979) &lt;span&gt;[2]&lt;/span&gt;, Neural-Gas &lt;span&gt;[3]&lt;/span&gt; - Hierarchical clustering &lt;/li&gt;
&lt;li&gt;Tokenizers: ICU&lt;span&gt;[4]&lt;/span&gt;, Konchady&lt;span&gt;[5]&lt;/span&gt; &lt;/li&gt;
&lt;li&gt;Stemmers: Porter, Wordnet &lt;/li&gt;
&lt;li&gt;Wordnet: wordnet interface, lexical relations, similarity, interactive browser &lt;/li&gt;
&lt;li&gt;Principal Component Analysis (PCA) &lt;/li&gt;
&lt;li&gt;Linear Discriminant Analysis (LDA) &lt;/li&gt;
&lt;li&gt;Support Vector Machines (SVM) &lt;/li&gt;
&lt;li&gt;String Similarity: Jaccard, Jaro-Winkler, Levenstein, Luhn, Soundex &lt;/li&gt;
&lt;li&gt;String matching: Aho–Corasick algorithm &lt;/li&gt;
&lt;li&gt;Keyword extraction: RAKE (Rapid Automatic Keyword Extraction) &lt;span&gt;[6]&lt;/span&gt; &lt;/li&gt;
&lt;li&gt;Summarisation: Luhn, Lexical cohesion. &lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span&gt;[1]&lt;/span&gt; Mac Queen, J. (1967) Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, eds L. M. Le Cam &amp;amp; J. Neyman, 1, pp. 281–297. Berkeley, CA: University of California Press. &lt;span&gt;[2]&lt;/span&gt; Hartigan, J. A. and Wong, M. A. (1979). A K-means clustering algorithm. Applied Statistics 28, 100–108. &lt;span&gt;[3]&lt;/span&gt; Martinetz T., Berkovich S., and Schulten K (1993). ‘Neural-Gas’ Network for Vector Quantization and its Application to Time-Series Prediction. IEEE Transactions on Neural Networks, 4 (4), pp. 558–569. &lt;span&gt;[4]&lt;/span&gt; ICU - International Components for Unicode (&lt;a href="http://site.icu-project.org" rel="nofollow"&gt;http://site.icu-project.org&lt;/a&gt;) &lt;span&gt;[5]&lt;/span&gt; Manu Konchady. Text Mining Application Programming. Charles River Media Programming &lt;span&gt;[6]&lt;/span&gt; Stuart Rose, Dave Engel, Nick Cramer and Wendy Cowley. Automatic Keyword Extraction from Individual Documents. Text Mining: Applications and Theory, Wiley 2010. &lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Kostia</dc:creator><pubDate>Tue, 20 May 2014 08:56:26 -0000</pubDate><guid>https://sourceforge.net49e0307497dcf01ad6bea0cde3d9e3bf845ac950</guid></item></channel></rss>