Donate Share

RapidMiner -- Data Mining, ETL, OLAP, BI

File Release Notes and Changelog

Release Name: 3.0

Notes:
Major release. Yale contains 197 operators now, including the
learning schemes and others from the current Weka release 3.4.5
and LibSVM 2.8. Many improvements and bugfixes (see change log).

Please note that external operators or plugins written or released
before the release of Yale 3.0 will not work. Please wait for new plugin
releases which will follow during the next days. 

Changes: Changes from Yale 2.4.1 to Yale 3.0 [2005/07/11] ------------------------------------------------------------ * New operators: - FeatureNameFilter (using regular expressions) - FeatureValueTypeFilter (replaces FeatureTypeFilter) - FeatureBlockTypeFilter - operators for all Weka tasks instead of specifying the Weka operator with a parameter (see below) - MultipleLabelLearning - MultipleLabelPerformanceEvaluator - MultipleLabelIterator - AverageBuilder - RenameAttribute (renaming and type changing) - Data generators for testing purposes - MinMaxWrapper for linear combinations of average and minimum values (which might lead to more stable optimizations) - CorrelationMatrix (which can also produce feature weights) - SimpleBinDiscretization - SimpleFrequencyDiscretization - Single2Series - PerformanceWriter (in addition to the ResultWriter) - ParameterCloner - ParameterSetWriter - GridParameterOptimization (replaces old ParameterOpt.) - NelderMeadParameterOptimization - PatternParameterOptimization - ParameterIteration (which simply iterates through given parameter combinations instead of optimize them) - IOConsumer (consumes unused outputs) - ARFFWriter - WrapperXValidation (replaces old MethodXValidation) - SimpleWrapperValidation (replaces old SimpleMethodValidation) - NominalExampleSetGenerator - JViToPlotter (additional to build in plotters) * Removed operators: - The external operators for the C versions of MySVM, SVMLight, and C45 are not longer part of the Yale core. Please use the Java implementations JMySVM, LibSVM, and J48 - LegalNumberExampleFilter was replaced by the operator ExampleFilter. This operator can handle both missing values and user defined value conditions - MethodXValidation was replaced by WrapperXValidation. The old operator was not able to handle mere feature weighting methods additional to selection - ParameterOptimization (see above). In addition, the parameter parameter_file was removed from all parameter optimization operators - SimpleMethodValidation (see above) - FeatureTypeFilter was replaced by an improved FeatureValueTypeFilter - BatchedValidationChain * Improved data management and statistics. Yale can handle larger data sets now * Undo and Redo function * Several new performance criteria including MinMaxCriterion for weighted linear combinations of the minimum and the average of arbitrary criteria * Some operators are deprecated now. Deprecated operators provide messages during application and validation and should not longer be used * New plotter concept, introducing Yale color plotter, GnuPlotPlotter for 3D plots, scatter plots, and distribution plotter (histograms). Plots are only automatically created for smaller data sets (settings) * In addition to the new plotter concept the operator JViToPlotter can be used to plot some of the IOObjects of Yale. The current version at least supports ExampleSet and some numerical models * Syntax highlighting in message viewer and XML editor, colors can be specified in the preferences dialog * New Weka version 3.4.5 integrated * New LibSVM version 2.8 integrated * Generic operator classes and operator sub types. This allows the building of generic operators with one class for several operators. This feature is used for the new Weka operator style where each learning scheme matches one Yale operator (and not a parameter of an operator) * Added Learner Capabilities. Each learning scheme can now define which type of data set is supported by the learner * Added stratified sampling for cross validation on data with a categorical label. This ensures that the subsets provide the same class distribution than the whole data set * Added several additional selection and crossover schemes for evolutionary feature operators. * Learners and performance evaluators can now deliver the input example set as output if this is desired. This also applies for models and ModelApplier. * New structure of settings dialog * (Optional) Tip of the Day at startup * Automatical update check during start-up (once in a month, no personal data is transmitted or collected). * Command line version waits at breakpoints and can be resumed by pressing enter * Only a user defined amount of lines will be logged, the default is 1000. This value can be changed in the settings dialog * Since massive logging may slow down experiments the default log verbosity for new experiments is "init" * Removed some verbosity levels which were not frequently used * Plugins can also provide a GenericOperatorFactory in their operator description file which can be used to register additional generic operators * Improved operator group structure in GUI and package structure * Improved Javadoc documentation, at least all classes should have a class comment * Learners cannot write the model directly into a file any longer. Please use the operator ModelWriter for this purpose. * Implementation details: - ATTENTION: Since operators should know their own operator description the usage of the empty operator constructor is not longer allowed. Operators must be created with OperatorService.createOperator(String name) The usage of empty operator constructors is not longer allowed for operator creation! - Using Arff loader from Weka instead of KDB package - Changed the method name getIdAttribute() to getId() in ExampleSet, some methods from Example were removed - Added a copy method to Parameters - It is now possible to query examples by their id - It is also possible to query examples by their index. This is only recommended for memory based example tables and should not be used for iteration purposes. Each operator which must iterate through complete example sets should use ExampleReaders. However, this change allows Yale to construct Weka instances on the fly which drastically decreases memory usage - Operators can now define the default behavior for input consumption and a parameter will be automatically defined and queried. This allows that some operators (like validation chains or performance evaluators) can pass their input (the example set for example) to the following operators - Added two helper methods getDeliveredOutputClasses() and getAllOutputClasses(Class[] innerOutput). One of these methods should be used to return the delivered output of an operator chain at the end of checkIO(). These methods reflect the consumation behaviour changes. Please refer to the Yale tutorial for further informations. - The implementation of the simple feature selection operators was improved. The memory usage is reduced especially in case of forward selection - SparseArrayDataRows need less memory than SparseMapDataRows with the same runtime. This datamanagement type should be used if data is sparser than 50% - Using sparse array data rows after Nominal2Binary filtering * Bugfixes: - bug in unix start scripts (plugins were not properly loaded) - variance adaption in feature weighting - wrong conversion from Weka instances to Yale example sets for data sets with more attributes than examples - Bug in average handling of validation operators mixed up weights and performance values for some feature operation experiments - strange plotting of some example sets - validation of experiments containing disabled operators - fixed bug in database handling which prevents feature selection to work correctly on example sets based on databases (csv and dBase too)