BioWeka / News: Recent posts

BioWeka 0.7.0 released - please update

This release contains some important changes and bug fixes:

* We changed the architecture of the alignment packages. The new aligner classes are found in packages which refer to the source of the underlying algorithm code (e.g. bioweka.aligners.jaligner.SmithWatermanGotohAligner).

* The new release solves a problem when using the "supplied test set" option with BlastClassifier or SimpleAlignmentClassifier, which lead to incorrect results in previous releases (other options like cross-validation or percentage split were not affected).... read more

Posted by Martin Szugat 2007-04-24

Paper about BioWeka published

Bioinformatics published the paper "BioWeka - extending the Weka framework for bioinformatics" (doi:10.1093/bioinformatics/btl671) written by Jan E. Gewehr, Martin Szugat and Ralf Zimmer.

Posted by Martin Szugat 2007-03-21

BioWeka 0.6.1 released

The previous release didn't contain the documentation of Weka-CG. This was fixed. No functional changes were made in this release.

Posted by Martin Szugat 2006-11-25

BioWeka 0.6.0 released

In this release we added a filter under bioweka.filters.universal called AddLastAttributeByName that adds the last attribute (often the class assignments) from a second data set to the current data set by joining the instances by their first attribute (i.e. name attribute).

Further, the release contains these bug fixes:

* Calling the toString method on an AvgScoreEvaluator object whose initialize method had not been called leads to a NullPointerException. This bug was fixed.
* The default value for the buildCommand property of the BlastClassifier object specified the option "-o T" which tells the formatdb command of the BLAST suite to build an index file for the sequence database. This leads to a better performance. However, as it seems, BLAST behaves differently in different versions with respect to this option, so we decided not to include the option in the default settings.
* Under certain circumstances the PsiBlastParser class fails to grep the output of blastpgp. The new releases solves this problem.

Posted by Martin Szugat 2006-11-24

BioWeka 0.5.0 released

The BioWeka 0.5.0 release provides a new filter called AddLastAttributeFrom. This filter can be used to add an attribute, e.g. the class annotation, from an external dataset to the current dataset.

In addition, the new BioWeka version includes two minor bug fixes:

* the relation name of a filtered data set does not contain anymore any white spaces or the pipe operator. This change has become necessary because the BlastClassifier uses the relation name as the name for the BLAST database and BLAST uses the database name as the file name for the database. However, Windows forbids the pipe operator within file names.
* the iprscan.xsl stylesheet declares the attribute as of type nominal instead of string. This is a workaround for a well-known problem of Weka with sparse instances and string attributes.

Posted by Martin Szugat 2006-11-09

BioWeka 0.4.3 released

Fixed problems with the shell script for Linux. There were no changes in this release concerning the BioWeka Java library.

Posted by Martin Szugat 2006-08-04

BioWeka 0.4.2 released

The new BioWeka release 0.4.2 contains some minor bug fixes and improvements:

* The default, internal SMO classifier of the Eclat classifier now returns estimated probabilities (instead of just 0.0 or 1.0) and thus increases the frame prediction accuracy. This affects the components EclatFrameFinder, EclatFrameClassifier and Eclat.
* Andreas Draeger provided new versions of his global alignments classes for the BioJava project, which fix some bugs. The BioWeka project incorporates these changes with the newest release.

Posted by Martin Szugat 2006-05-23

BioWeka 0.4.1 released

Latest changes in BioWeka 0.4.1:

* Added scripts (BATCH file for Windows, shell script for Linux) to start Weka with all BioWeka libraries
* Integrated the WLSVM class to use LibSVM within Weka
* Integrated the Weka-CG library

Posted by Martin Szugat 2005-12-04

New web site

The web site was revised: and were merged into a single Wiki.

Posted by Martin Szugat 2005-10-23

Bachelor thesis online

The Bachelor thesis "BioWeka: Extending the Weka framework for Bioinformatics" by Martin Szugat is available online (see Download)

Posted by Martin Szugat 2005-10-17

BioWeka 0.4 released

The new release 0.4 of BioWeka combines the packages BioWeka, XML-Stylesheets, FoldRec, BioJava 1.4 Extensions and other tools and data files in one single distribution: In addition, there is some new functionality:

* The AlignmentScorer filter performs an all-against-all alignment using a specified Aligner and stores the scores within the data set.

* Such a data set can be used in conjunction with the new ScoreClassifier class to classify sequences based on the precomputed alignment scores.... read more

Posted by Martin Szugat 2005-10-17

BioWeka 0.3 and XML-Stylesheets 0.2 released

BioWeka 0.3 and XML-Stylesheets 0.2 offer these new features:

* Classifying DNA/RNA sequences using a reimplementation of Eclat (based on codon usage and SVMs)
* Merge multiple data sets and store relation names in a nominal attribute
* Classification based on heuristic alignments using BLAST or PSI-BLAST
* Classification based on global alignments using an implementation of the Needleman-Wunsch algorithm, implemented by Andreas Drger
* Cut DNA/RNA sequences after the first stop codon
* Calculate the open reading frames of DNA/RNA sequences
* Extracting internal microarray data from MAGE-ML files
* Load tab-delimited microarray data (e.g. TIGR's tav, mev and Stanford as well as spot format) into Weka
* Transform score alignments depending on the instance weight and the alignment rank
* Bug fixes for JAligner, Loader classes and many more... read more

Posted by Martin Szugat 2005-09-12

BioWeka 0.2 released

BioWeka 0.2 offers these features:

- Loaders and Savers for the sequence formats: FASTA, GenBank, EMBL and Swiss-Prot

- Classifiers based on sequence alignment: local and secondary structure element alignment

- Evaluators to transform (alignment) scores into (pseudo) likelihoods: average, sum and maximum score based as well as rank-based evaluators

- Filters for sequence data: RNA/DNA to amino acid translator, RNA to DNA transcriber, DNA/RNA complement and reverse complement translator, symbol counter, e.g. for codon usage analysis, sequence analysis, e.g. calculate average pK values or different amino acid indices based on the AAindex matrices... read more

Posted by Martin Szugat 2005-09-02

BioWeka 0.1 released

BioWeka 0.1 includes reading and writing FASTA sequence files as well as converting XML files into the ARFF format using XSLT stylesheets (e.g. from the XML-Stylesheets subproject:\).

Posted by Martin Szugat 2005-08-26

XML-Stylesheets 0.1

The first version of BioWeka's XML-Stylesheets was released. The package contains two XSLT stylesheets for converting InterProScan ( XML files and ProML ( files into Weka's ARFF ( format. Download them from

Posted by Martin Szugat 2005-06-13

BioWeka Distribution and XML-Stylesheets

The BioWeka project has now two subprojects:

* BioWeka Distribution ( The aim here is to provide a ready-to-use distribution of BioWeka.
* XML-Stylesheets ( This project provides some XSL-Stylesheets to convert biological XML formats (e.g. ProML and InterProScan) to the ARFF format.

Posted by Martin Szugat 2005-06-05

The site has moved

Due to performance problems the web site was moved from to The URL is the same as before:

Now there is a wiki under and at present one blog under

Posted by Martin Szugat 2005-04-07