hypKNOWsys aims at developing a Java-based workbench for knowledge discovery and knowledge management. Currently, hypKNOWsys has released two intermediate tools: DIAsDEM Workbench (text mining for semantic tagging) and WUMprep (Web mining pre-processing)
Be the first to post a text review of hypKNOWsys. Rate and review a project by clicking thumbs up or thumbs down in the right column.
First of all, DIAsDEM Workbench 2.2 is released under the GNU general public license. Thereby, we bring together the licenses of hypKNOWsys Algorithms, which is based on Weka, and DIAsDEM Workbench. DIAsDEM Workbench 2.2 contains many new tasks such as import of Reuters corpora, interface to the Google Web API, import of HTML files, list-based word sense disambiguation, conditional term frequency statistics as well as an improved cluster quality monitor. Minor improvements took place in the tasks 'Vectorize Text Units 2.2', which now optionally supports vector length normalization, and 'Monitor Cluster Quality 2.2', which now has a slightly improved GUI. Moreover, 'Thesaurus Editor 2.2' offers a new table-based GUI along with a limited search function. 'Batch Script Editor' now supports cut, copy, and paste operations of batch script tasks. To facilitate visualization and further analysis, 'Weka Knowledge Explorer' has been directly integrated in DIAsDEM Workbench 2.2 (choose 'Solutions' and 'Miscellaneous'). Unlike DIAsDEM Workbench 2.1, the wording of concepts (e.g., in the cluster quality context) has been improved to better convey their inherent meaning. In addition, the new hypKNOWsys clustering tasks extends the Weka clustering algorithms by providing a variety of clustering algorithms (i.e., k-means, bi-secting k-means, the Batch SOM algorithm as proposed by Kohonen, the Jarvis-Patrick clustering algorithm, and the SNN clustering algorithm as proposed by Ertoez, Steinbach, and Kumar) along with four distance measures (i.e., Euclidean, cosine distance, extended Jaccard and extended Dice). These clustering algorithms output status messages and optionally compute cluster validity indices (i.e., Davies-Bouldin index, average silhouette width, original Dunn index, Dunn index as proposed by Bezdek, overall cluster quality index as introduced by He et al., and SDbw index as proposed by Halkidi et al.). Furthermore, both algorithms now support parameter looping over the number of clusters (e.g., 10/25/100). In addition, the clustering task is now capable of drawing a random sample of text unit vectors. In the application phase, text unit vectors may optionally be accessed sequentially to minimize memory requirements imposed by the previous load-all-vectors-in-memory approach. Moreover, sparse ARFF files can be created and input to the hypKNOWsys clustering task to reduce memory usage and improve clustering speed. Finally, the all textual files in ZIP release file have DOS/Windows line breaks, whereas their counterparts in the TAR.GZ release file have Unix/Linux line breaks. However, both release archive file have the same contents.
First of all, DIAsDEM Workbench 2.2 is released under the GNU general public license. Thereby, we bring together the licenses of hypKNOWsys Algorithms, which is based on Weka, and DIAsDEM Workbench. DIAsDEM Workbench 2.2 contains many new tasks such as import of Reuters corpora, interface to the Google Web API, import of HTML files, list-based word sense disambiguation, conditional term frequency statistics as well as an improved cluster quality monitor. Minor improvements took place in the tasks 'Vectorize Text Units 2.2', which now optionally supports vector length normalization, and 'Monitor Cluster Quality 2.2', which now has a slightly improved GUI. Moreover, 'Thesaurus Editor 2.2' offers a new table-based GUI along with a limited search function. 'Batch Script Editor' now supports cut, copy, and paste operations of batch script tasks. To facilitate visualization and further analysis, 'Weka Knowledge Explorer' has been directly integrated in DIAsDEM Workbench 2.2 (choose 'Solutions' and 'Miscellaneous'). Unlike DIAsDEM Workbench 2.1, the wording of concepts (e.g., in the cluster quality context) has been improved to better convey their inherent meaning. In addition, the new hypKNOWsys clustering tasks extends the Weka clustering algorithms by providing a variety of clustering algorithms (i.e., k-means, bi-secting k-means, the Batch SOM algorithm as proposed by Kohonen, the Jarvis-Patrick clustering algorithm, and the SNN clustering algorithm as proposed by Ertoez, Steinbach, and Kumar) along with four distance measures (i.e., Euclidean, cosine distance, extended Jaccard and extended Dice). These clustering algorithms output status messages and optionally compute cluster validity indices (i.e., Davies-Bouldin index, average silhouette width, original Dunn index, Dunn index as proposed by Bezdek, overall cluster quality index as introduced by He et al., and SDbw index as proposed by Halkidi et al.). Furthermore, both algorithms now support parameter looping over the number of clusters (e.g., 10/25/100). In addition, the clustering task is now capable of drawing a random sample of text unit vectors. In the application phase, text unit vectors may optionally be accessed sequentially to minimize memory requirements imposed by the previous load-all-vectors-in-memory approach. Moreover, sparse ARFF files can be created and input to the hypKNOWsys clustering task to reduce memory usage and improve clustering speed. Finally, the all textual files in ZIP release file have DOS/Windows line breaks, whereas their counterparts in the TAR.GZ release file have Unix/Linux line breaks. However, both release archive file have the same contents.
We have released the tutorial 'Getting Started with DIAsDEM Workbench 2.2: A Case-Based Approach' today. It is available in the download section of the DIAsDEM.workbench22 package.
Be the first person to add a text review.
Copyright © 2009 Geeknet, Inc. All rights reserved. Terms of Use
Thanks for your rating!
Would you also like to write a review?