VecText is an application that converts raw text to a structured format suitable for various data mining software. The application is written in interpreted programming language Perl. A part of the functionality is realized by external modules (e.g., Lingua::Stem::Snowball for stemming). The graphical user interface enables user-friendly software employment without requiring specialized technical skills and knowledge of a particular programming language, names of libraries and their functions, etc. All preprocessing actions are specified using common graphical elements organized into logically related blocks. The graphical user interface is implemented in Perl/Tk. In the command-line interface mode, all options need to be specified using the command line parameters. This way of non-interactive communication enables incorporating the application into a more complicated data mining process integrating several software packages or performing multiple conversions in a batch.
Features
- document vector representation
- natural language processing