Menu

Home

Hamish McWilliam

x2fasta

Introduction

Due to the increasing size of the major biological sequence databases (e.g. DDBJ, EMBL-Bank, GenBank, RefSeq and UniProtKB) there is an increasing need for simple tools which can reformat the flat-file formats used by these databases into fasta sequence format for use with other tools.

While many multi-purpose sequence reformatting tools are available (e.g. EMBOSS and Readseq), due to their generic support for many sequence formats and extensive feature sets, they have limited performance compared to older dedicated tools. Unfortunately the older tools have issues with modern platforms (e.g. support for files >2GB, library dependencies, binary compatibility, etc.).

The 'x2fasta' project aims to provide a collection of highly efficient sequence reformatting tools based on the the programs provided with WU-BLAST:

  • gb2fasta: convert a GenBank-format file into FASTA format - primary sequence.
  • gt2fasta: convert a GenBank-format file into FASTA format - protein translations.
  • pir2fasta: convert a file in PIR/NBRF format into FASTA format
  • sp2fasta: convert files in SWISS-PROT or EMBL format into FASTA format

And where possible maintaining command-line and output compatibility with these tools.

The Programs

The currently available programs are:

Source Code

Complete source code can be found in the x2fasta Subversion repository (see Code).

Project Members: