LowLevelModules modified by Kostia

Kostia — Tue, 20 May 2014 08:56:26 -0000

Page still im progress

Information about low-level text mining/NLP abilities included with text-analysis.

Text-Analysis includes the following low-level text mining/NLP abilities:

Clustering Analysis: kmeans: Mac Queen, J. (1967) [1], Hartigan, J. A. and Wong, M. A. (1979) [2], Neural-Gas [3] - Hierarchical clustering
Tokenizers: ICU[4], Konchady[5]
Stemmers: Porter, Wordnet
Wordnet: wordnet interface, lexical relations, similarity, interactive browser
Principal Component Analysis (PCA)
Linear Discriminant Analysis (LDA)
Support Vector Machines (SVM)
String Similarity: Jaccard, Jaro-Winkler, Levenstein, Luhn, Soundex
String matching: Aho–Corasick algorithm
Keyword extraction: RAKE (Rapid Automatic Keyword Extraction) [6]
Summarisation: Luhn, Lexical cohesion.

[1] Mac Queen, J. (1967) Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, eds L. M. Le Cam & J. Neyman, 1, pp. 281–297. Berkeley, CA: University of California Press. [2] Hartigan, J. A. and Wong, M. A. (1979). A K-means clustering algorithm. Applied Statistics 28, 100–108. [3] Martinetz T., Berkovich S., and Schulten K (1993). ‘Neural-Gas’ Network for Vector Quantization and its Application to Time-Series Prediction. IEEE Transactions on Neural Networks, 4 (4), pp. 558–569. [4] ICU - International Components for Unicode (http://site.icu-project.org) [5] Manu Konchady. Text Mining Application Programming. Charles River Media Programming [6] Stuart Rose, Dave Engel, Nick Cramer and Wendy Cowley. Automatic Keyword Extraction from Individual Documents. Text Mining: Applications and Theory, Wiley 2010.

Recent changes to LowLevelModules

LowLevelModules modified by Kostia