This release contains some important changes and bug fixes:
* We changed the architecture of the alignment packages. The new aligner classes are found in packages which refer to the source of the underlying algorithm code (e.g. bioweka.aligners.jaligner.SmithWatermanGotohAligner).
* The new release solves a problem when using the "supplied test set" option with BlastClassifier or SimpleAlignmentClassifier, which lead to incorrect results in previous releases (other options like cross-validation or percentage split were not affected).... read more
Bioinformatics published the paper "BioWeka - extending the Weka framework for bioinformatics" (doi:10.1093/bioinformatics/btl671) written by Jan E. Gewehr, Martin Szugat and Ralf Zimmer.
The previous release didn't contain the documentation of Weka-CG. This was fixed. No functional changes were made in this release.
In this release we added a filter under bioweka.filters.universal called AddLastAttributeByName that adds the last attribute (often the class assignments) from a second data set to the current data set by joining the instances by their first attribute (i.e. name attribute).
Further, the release contains these bug fixes:
* Calling the toString method on an AvgScoreEvaluator object whose initialize method had not been called leads to a NullPointerException. This bug was fixed.
* The default value for the buildCommand property of the BlastClassifier object specified the option "-o T" which tells the formatdb command of the BLAST suite to build an index file for the sequence database. This leads to a better performance. However, as it seems, BLAST behaves differently in different versions with respect to this option, so we decided not to include the option in the default settings.
* Under certain circumstances the PsiBlastParser class fails to grep the output of blastpgp. The new releases solves this problem.
The BioWeka 0.5.0 release provides a new filter called AddLastAttributeFrom. This filter can be used to add an attribute, e.g. the class annotation, from an external dataset to the current dataset.
In addition, the new BioWeka version includes two minor bug fixes:
* the relation name of a filtered data set does not contain anymore any white spaces or the pipe operator. This change has become necessary because the BlastClassifier uses the relation name as the name for the BLAST database and BLAST uses the database name as the file name for the database. However, Windows forbids the pipe operator within file names.
* the iprscan.xsl stylesheet declares the sequence.name attribute as of type nominal instead of string. This is a workaround for a well-known problem of Weka with sparse instances and string attributes.
Fixed problems with the bioweka.sh shell script for Linux. There were no changes in this release concerning the BioWeka Java library.
The new BioWeka release 0.4.2 contains some minor bug fixes and improvements:
* The default, internal SMO classifier of the Eclat classifier now returns estimated probabilities (instead of just 0.0 or 1.0) and thus increases the frame prediction accuracy. This affects the components EclatFrameFinder, EclatFrameClassifier and Eclat.
* Andreas Draeger provided new versions of his global alignments classes for the BioJava project, which fix some bugs. The BioWeka project incorporates these changes with the newest release.
Latest changes in BioWeka 0.4.1:
* Added scripts (BATCH file for Windows, shell script for Linux) to start Weka with all BioWeka libraries
* Integrated the WLSVM class to use LibSVM within Weka
* Integrated the Weka-CG library
The Bachelor thesis "BioWeka: Extending the Weka framework for Bioinformatics" by Martin Szugat is available online (see Download)
The new release 0.4 of BioWeka combines the packages BioWeka, XML-Stylesheets, FoldRec, BioJava 1.4 Extensions and other tools and data files in one single distribution: bioweka-0.4.zip. In addition, there is some new functionality:
* The AlignmentScorer filter performs an all-against-all alignment using a specified Aligner and stores the scores within the data set.
* Such a data set can be used in conjunction with the new ScoreClassifier class to classify sequences based on the precomputed alignment scores.... read more
BioWeka 0.3 and XML-Stylesheets 0.2 offer these new features:
* Classifying DNA/RNA sequences using a reimplementation of Eclat (based on codon usage and SVMs)
* Merge multiple data sets and store relation names in a nominal attribute
* Classification based on heuristic alignments using BLAST or PSI-BLAST
* Classification based on global alignments using an implementation of the Needleman-Wunsch algorithm, implemented by Andreas Drger
* Cut DNA/RNA sequences after the first stop codon
* Calculate the open reading frames of DNA/RNA sequences
* Extracting internal microarray data from MAGE-ML files
* Load tab-delimited microarray data (e.g. TIGR's tav, mev and Stanford as well as spot format) into Weka
* Transform score alignments depending on the instance weight and the alignment rank
* Bug fixes for JAligner, Loader classes and many more... read more
BioWeka 0.2 offers these features:
- Loaders and Savers for the sequence formats: FASTA, GenBank, EMBL and Swiss-Prot
- Classifiers based on sequence alignment: local and secondary structure element alignment
- Evaluators to transform (alignment) scores into (pseudo) likelihoods: average, sum and maximum score based as well as rank-based evaluators
- Filters for sequence data: RNA/DNA to amino acid translator, RNA to DNA transcriber, DNA/RNA complement and reverse complement translator, symbol counter, e.g. for codon usage analysis, sequence analysis, e.g. calculate average pK values or different amino acid indices based on the AAindex matrices... read more
BioWeka 0.1 includes reading and writing FASTA sequence files as well as converting XML files into the ARFF format using XSLT stylesheets (e.g. from the XML-Stylesheets subproject: http://www.bioweka.org/xml\).
The first version of BioWeka's XML-Stylesheets was released. The package contains two XSLT stylesheets for converting InterProScan (http://www.ebi.ac.uk/InterProScan/) XML files and ProML (http://www.bio.ifi.lmu.de/2005/proml) files into Weka's ARFF (http://www.cs.waikato.ac.nz/~ml/weka/arff.html) format. Download them from http://prdownloads.sourceforge.net/bioweka/xml-stylesheets-0.1.zip?download.
The BioWeka project has now two subprojects:
* BioWeka Distribution (http://www.bioweka.org/dist): The aim here is to provide a ready-to-use distribution of BioWeka.
* XML-Stylesheets (http://www.bioweka.org/xml): This project provides some XSL-Stylesheets to convert biological XML formats (e.g. ProML and InterProScan) to the ARFF format.
Due to performance problems the web site was moved from sourceforge.net to united-domains.de. The URL is the same as before: http://www.bioweka.org.