Menu

LowLevelModules

Kostia

Page still im progress

Information about low-level text mining/NLP abilities included with text-analysis.

Text-Analysis includes the following low-level text mining/NLP abilities:

  • Clustering Analysis: kmeans: Mac Queen, J. (1967) [1], Hartigan, J. A. and Wong, M. A. (1979) [2], Neural-Gas [3] - Hierarchical clustering
  • Tokenizers: ICU[4], Konchady[5]
  • Stemmers: Porter, Wordnet
  • Wordnet: wordnet interface, lexical relations, similarity, interactive browser
  • Principal Component Analysis (PCA)
  • Linear Discriminant Analysis (LDA)
  • Support Vector Machines (SVM)
  • String Similarity: Jaccard, Jaro-Winkler, Levenstein, Luhn, Soundex
  • String matching: Aho–Corasick algorithm
  • Keyword extraction: RAKE (Rapid Automatic Keyword Extraction) [6]
  • Summarisation: Luhn, Lexical cohesion.

[1] Mac Queen, J. (1967) Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, eds L. M. Le Cam & J. Neyman, 1, pp. 281–297. Berkeley, CA: University of California Press. [2] Hartigan, J. A. and Wong, M. A. (1979). A K-means clustering algorithm. Applied Statistics 28, 100–108. [3] Martinetz T., Berkovich S., and Schulten K (1993). ‘Neural-Gas’ Network for Vector Quantization and its Application to Time-Series Prediction. IEEE Transactions on Neural Networks, 4 (4), pp. 558–569. [4] ICU - International Components for Unicode (http://site.icu-project.org) [5] Manu Konchady. Text Mining Application Programming. Charles River Media Programming [6] Stuart Rose, Dave Engel, Nick Cramer and Wendy Cowley. Automatic Keyword Extraction from Individual Documents. Text Mining: Applications and Theory, Wiley 2010.


Related

Wiki: Home

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.