| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| archive | 2014-05-08 | ||
| sp2fasta_ansic-20140511.tar.gz | 2014-05-11 | 19.5 kB | |
| README.sp2fasta | 2014-05-11 | 3.6 kB | |
| sp2fasta_perl-20140509.tar.gz | 2014-05-09 | 10.9 kB | |
| sp2fasta_java-20140509-sources.jar | 2014-05-09 | 7.1 kB | |
| sp2fasta_java-20140509.jar | 2014-05-09 | 5.9 kB | |
| Totals: 6 Items | 47.0 kB | 0 | |
Convert SWISSPROT / EMBL format sequence into fasta format
==========================================================
Implementations
---------------
* ANSI C: sp2fasta_ansic-<version>.tar.gz
Suitable for any platform with an ANSI C compiler (e.g. cc). Development and
testing has mainly been performed on Linux with gcc.
* Java: sp2fasta_java-<version>.jar
An executable jar for use with Java. Compiled for Java 1.5, for eariler
versions of the Java specification, you will need to recompile from source.
* Perl: sp2fasta_perl-<version>.tar.gz
An implementation for Perl 5 environments.
Usage
-----
Convert UniProtKB, SWISS-PROT, EMBL-Bank or EMBL-CDS formatted sequence into
fasta sequence format.
Usage:
sp2fasta -h
sp2fasta -V
sp2fasta [-c case] [-g] [-l dbPrefix] [-s] [-u] [dataFileName ...]
-h This message.
-V Version information
-c Specify the character case of the output sequence:
original (o), lower (l) or upper (u). Default: o
-g Ignored for compatibility with WU-BLAST sp2fasta.
-l Specify database label.
Default: 'emb' for nucleotide, 'sp' for protein, 'tr' for "Unreviewed"
protein, 'sp' for unidentified sequence data.
-s Simple fasta headers, entry 'ID' and description only.
-u UniProtKB style fasta headers, appends 'OS', 'GN', 'PE' and 'SV' data to
description.
Default input is read from STDIN unless file names are specified. To explicitly
specify STDIN to be used for input, use '-' as a file name.
Example Usage
-------------
A. Swiss-Prot
Converting SWISS-PROT 45 in to the standard sp2fasta fasta format:
> sp2fasta sprot45.dat > sprot45
This gives headers like:
>sp|P15711|104K_THEPA 104 kDa microneme-rhoptry antigen.
The default database prefix is 'sp' for amino-acid entries. The database
prefix can be specified using the -l option.
B. UniProtKB
With UniProtKB release 14.0 the format of the DE lines changed to be more
structured:
ID 104K_THEPA Reviewed; 924 AA.
AC P15711; Q4N2B5;
...
DE RecName: Full=104 kDa microneme/rhoptry antigen;
DE AltName: Full=p104;
DE Flags: Precursor;
...
Using the default options this gives structured descriptions in the fasta
format headers:
>sp|P15711|104K_THEPA RecName: Full=104 kDa microneme/rhoptry antigen; AltName: Full=p104; Flags: Precursor;
In some cases these descriptions can be very long and are a little difficult
to read. The -u option can be used to trim the description to just the primary
description (i.e. the first one) and simulate the UniProtKB fasta format
headers:
sp2fasta -u uniprot_sprot.dat > uniprot_sprot.fasta
This gives headers like:
>sp|P15711|104K_THEPA 104 kDa microneme/rhoptry antigen OS=Theileria parva GN=TP04_0437 PE=2 SV=1
C. EMBL-Bank
Converting an EMBL-Bank data file in to the standard sp2fasta fasta format:
> sp2fasta pln01.dat > pln01
This gives headers like:
>emb|AB000093|AB000093 Arabidopsis thaliana gene for inorganic phosphate...
The default database prefix is 'emb' for nucleotide entries. The database
prefix can be specified using the -l option.
D. Simple Format
To generate fasta formatted output with simple headers use the -s option:
> sp2fasta -s sprot45.dat > sprot45
This gives headers like:
>104K_THEPA 104 kDa microneme-rhoptry antigen.
E. Database Prefix
To specify an alternative database name to use in the output use the -l option:
> sp2fasta -l swissprot sprot45.dat > sprot45
This gives headers like:
>swissprot|P15711|104K_THEPA 104 kDa microneme-rhoptry antigen.
Note: if a label is specified it will be used for all input sequences
regardless of their sequence type.