Menu

Tree [d518d2] master /
 History

HTTPS access


File Date Author Commit
 CHANGES 2016-08-23 Chang Park Chang Park [d518d2] 2016-08-23
 README 2016-08-23 Chang Park Chang Park [d518d2] 2016-08-23
 sam2fasta.py 2016-08-23 Chang Park Chang Park [d518d2] 2016-08-23

Read Me

sam2fasta.py

A python file, that converts a .sam file to .fasta file.

If no command line options are given. then it displays the following.

Usage: sam2fasta.py [ref.fasta] [in.sam] [out.fasta]
where:
	ref.fasta: reference fasta file to read
	in.sam: input sam file to read
	out.fasta: output sequence file in aligned fasta format


reference fasta file is a fasta file, that has reference sequence.
Currently, only one reference sequence is supported.

Output fasta file is an aligned fasta file.
aligned fasta file is a fasta file whose sequences have the same length.
To make it, both left and right end clipping and padding with "." are used.


Example:

---------------------------------------------------------------------
original fasta file (not used in this program, but used in bwa bwasw)
---------------------------------------------------------------------

>read1
CGTAGCATGCATGGCGATCAGCTACAGTACGGTACACGACTAAAACAGGTTTGGTTG
>read2
GCCGTAGCATGCATGGCGATCAGCTACAGTACGGTACACGACTAAAACAGG
>read3
GGGAGAGCCGTAGCATGCATGGCGATCAGCTACAGTACGGTACACGACTAAA

------------------------------
reference fasta file (r.fasta)
------------------------------

>human
AGAGCCGTAGCATGCATGGCGATCAGCTACAGTACGGTACACGACTAAAACAGGTTT

----------------------
input sam file (i.sam)
----------------------

@SQ	SN:human	LN:57
read1 0 human 6 11 52M5S * 0 0 CGTAGCATGCATGGCGATCAGCTACAGTACGGTACACGACTAAAACAGGTTTGGTTG	*	AS:i:52	XS:i:0	XF:i:1	XE:i:1	XN:i:0
read2 0 human 4 21 51M   * 0 0 GCCGTAGCATGCATGGCGATCAGCTACAGTACGGTACACGACTAAAACAGG	*	AS:i:51	XS:i:0	XF:i:3	XE:i:1	XN:i:0
read3 0 human 1 10 3S49M * 0 0 GGGAGAGCCGTAGCATGCATGGCGATCAGCTACAGTACGGTACACGACTAAA	*	AS:i:49	XS:i:0	XF:i:2	XE:i:1	XN:i:0

------------
Command line
------------
python sam2fasta.py r.fasta i.sam o.fasta


-------------------------
output fasta file (o.fasta)
-------------------------
>human
AGAGCCGTAGCATGCATGGCGATCAGCTACAGTACGGTACACGACTAAAACAGGTTT
>read1
.....CGTAGCATGCATGGCGATCAGCTACAGTACGGTACACGACTAAAACAGGTTT
>read2
...GCCGTAGCATGCATGGCGATCAGCTACAGTACGGTACACGACTAAAACAGG...
>read3
AGAGCCGTAGCATGCATGGCGATCAGCTACAGTACGGTACACGACTAAA........



Known Issues:
	Multiple reference sequences are not supported.
	Only a small subset of SAM cigar fields are supported.
	Insertions are not handled, they are removed.
	Unaligned sequences are not included in the output file.

Author:
	Chang Park

See Also:
	bwa, samtools