DSTK - DataScience ToolKit is an opensource free software for statistical analysis, data visualization, text analysis, and predictive analytics. Newer version and smaller file size can be found at: https://sourceforge.net/projects/dstk3/
It is designed to be straight forward and easy to use, and familar to SPSS user. While JASP offers more statistical features, DSTK tends to be a broad solution workbench, including text analysis and predictive analytics features. Of course you may specify JASP for advanced data editing and RapidMiner for advanced prediction modeling.
DSTK is written in C#, Java and Python to interface with R, NLTK, and Weka. It can be expanded with plugins using R Scripts. We have also created plugins for more statistical functions, and Big Data Analytics with Microsoft Azure HDInsights (Spark Server) with Livy.
License: R, RStudio, NLTK, SciPy, SKLearn, MatPlotLib, Weka, ... each has their own licenses.
Features
- Data Scraping (Web Scraping, Video2Text, Image2Text)
- Data and Text Preprocessing (with stemming, stopwords...)
- Data Exploration and Visualizations (histogram, bar, pie, boxplot, ...)
- Document Clustering
- Text Analytics (Text Link Analysis, POSTagging, Sentiments Analysis, ...)
- Predictive Analytics (both numerical and text, Naive Bayes, with additional Weka add-ins )
- Plugins with Big Data features (need Microsoft Azure account)
- Expandable with Plugins using R Scripts
- Text Explorer/Analytics uses Gate's Gazetteer .lst files and online university sentiment word lists