Download Latest Version gb2fasta_java-20140518.jar (5.4 kB)
Email in envelope

Get an email when there's a new version of x2fasta

Home / sp2fasta
Name Modified Size InfoDownloads / Week
Parent folder
archive 2014-05-08
sp2fasta_ansic-20140511.tar.gz 2014-05-11 19.5 kB
README.sp2fasta 2014-05-11 3.6 kB
sp2fasta_perl-20140509.tar.gz 2014-05-09 10.9 kB
sp2fasta_java-20140509-sources.jar 2014-05-09 7.1 kB
sp2fasta_java-20140509.jar 2014-05-09 5.9 kB
Totals: 6 Items   47.0 kB 0
Convert SWISSPROT / EMBL format sequence into fasta format
==========================================================

Implementations
---------------

* ANSI C: sp2fasta_ansic-<version>.tar.gz
Suitable for any platform with an ANSI C compiler (e.g. cc). Development and 
testing has mainly been performed on Linux with gcc.

* Java:   sp2fasta_java-<version>.jar
An executable jar for use with Java. Compiled for Java 1.5, for eariler 
versions of the Java specification, you will need to recompile from source.

* Perl:   sp2fasta_perl-<version>.tar.gz
An implementation for Perl 5 environments.

Usage
-----

Convert UniProtKB, SWISS-PROT, EMBL-Bank or EMBL-CDS formatted sequence into 
fasta sequence format.

Usage:
  sp2fasta -h
  sp2fasta -V
  sp2fasta [-c case] [-g] [-l dbPrefix] [-s] [-u] [dataFileName ...]

-h  This message.
-V  Version information

-c  Specify the character case of the output sequence: 
    original (o), lower (l) or upper (u). Default: o
-g  Ignored for compatibility with WU-BLAST sp2fasta.
-l  Specify database label.
    Default: 'emb' for nucleotide, 'sp' for protein, 'tr' for "Unreviewed" 
    protein, 'sp' for unidentified sequence data.
-s  Simple fasta headers, entry 'ID' and description only.
-u  UniProtKB style fasta headers, appends 'OS', 'GN', 'PE' and 'SV' data to 
    description.

Default input is read from STDIN unless file names are specified. To explicitly 
specify STDIN to be used for input, use '-' as a file name.

Example Usage
-------------

A. Swiss-Prot

Converting SWISS-PROT 45 in to the standard sp2fasta fasta format:

  > sp2fasta sprot45.dat > sprot45

This gives headers like:

  >sp|P15711|104K_THEPA 104 kDa microneme-rhoptry antigen.

The default database prefix is 'sp' for amino-acid entries. The database 
prefix can be specified using the -l option.

B. UniProtKB

With UniProtKB release 14.0 the format of the DE lines changed to be more 
structured:

ID   104K_THEPA              Reviewed;         924 AA.
AC   P15711; Q4N2B5;
...
DE   RecName: Full=104 kDa microneme/rhoptry antigen;
DE   AltName: Full=p104;
DE   Flags: Precursor;
...

Using the default options this gives structured descriptions in the fasta 
format headers:

>sp|P15711|104K_THEPA RecName: Full=104 kDa microneme/rhoptry antigen; AltName: Full=p104; Flags: Precursor;

In some cases these descriptions can be very long and are a little difficult 
to read. The -u option can be used to trim the description to just the primary 
description (i.e. the first one) and simulate the UniProtKB fasta format 
headers:

sp2fasta -u uniprot_sprot.dat > uniprot_sprot.fasta

This gives headers like:

>sp|P15711|104K_THEPA 104 kDa microneme/rhoptry antigen OS=Theileria parva GN=TP04_0437 PE=2 SV=1

C. EMBL-Bank

Converting an EMBL-Bank data file in to the standard sp2fasta fasta format:

> sp2fasta pln01.dat > pln01

This gives headers like:

>emb|AB000093|AB000093 Arabidopsis thaliana gene for inorganic phosphate...

The default database prefix is 'emb' for nucleotide entries. The database 
prefix can be specified using the -l option.

D. Simple Format

To generate fasta formatted output with simple headers use the -s option:

> sp2fasta -s sprot45.dat > sprot45

This gives headers like:

>104K_THEPA 104 kDa microneme-rhoptry antigen.

E. Database Prefix

To specify an alternative database name to use in the output use the -l option:

> sp2fasta -l swissprot sprot45.dat > sprot45

This gives headers like:

>swissprot|P15711|104K_THEPA 104 kDa microneme-rhoptry antigen.

Note: if a label is specified it will be used for all input sequences 
regardless of their sequence type.
Source: README.sp2fasta, updated 2014-05-11