sam2fasta Code
Status: Alpha
Brought to you by:
naltang
File | Date | Author | Commit |
---|---|---|---|
CHANGES | 2016-08-23 | Chang Park | [d518d2] 2016-08-23 |
README | 2016-08-23 | Chang Park | [d518d2] 2016-08-23 |
sam2fasta.py | 2016-08-23 | Chang Park | [d518d2] 2016-08-23 |
sam2fasta.py A python file, that converts a .sam file to .fasta file. If no command line options are given. then it displays the following. Usage: sam2fasta.py [ref.fasta] [in.sam] [out.fasta] where: ref.fasta: reference fasta file to read in.sam: input sam file to read out.fasta: output sequence file in aligned fasta format reference fasta file is a fasta file, that has reference sequence. Currently, only one reference sequence is supported. Output fasta file is an aligned fasta file. aligned fasta file is a fasta file whose sequences have the same length. To make it, both left and right end clipping and padding with "." are used. Example: --------------------------------------------------------------------- original fasta file (not used in this program, but used in bwa bwasw) --------------------------------------------------------------------- >read1 CGTAGCATGCATGGCGATCAGCTACAGTACGGTACACGACTAAAACAGGTTTGGTTG >read2 GCCGTAGCATGCATGGCGATCAGCTACAGTACGGTACACGACTAAAACAGG >read3 GGGAGAGCCGTAGCATGCATGGCGATCAGCTACAGTACGGTACACGACTAAA ------------------------------ reference fasta file (r.fasta) ------------------------------ >human AGAGCCGTAGCATGCATGGCGATCAGCTACAGTACGGTACACGACTAAAACAGGTTT ---------------------- input sam file (i.sam) ---------------------- @SQ SN:human LN:57 read1 0 human 6 11 52M5S * 0 0 CGTAGCATGCATGGCGATCAGCTACAGTACGGTACACGACTAAAACAGGTTTGGTTG * AS:i:52 XS:i:0 XF:i:1 XE:i:1 XN:i:0 read2 0 human 4 21 51M * 0 0 GCCGTAGCATGCATGGCGATCAGCTACAGTACGGTACACGACTAAAACAGG * AS:i:51 XS:i:0 XF:i:3 XE:i:1 XN:i:0 read3 0 human 1 10 3S49M * 0 0 GGGAGAGCCGTAGCATGCATGGCGATCAGCTACAGTACGGTACACGACTAAA * AS:i:49 XS:i:0 XF:i:2 XE:i:1 XN:i:0 ------------ Command line ------------ python sam2fasta.py r.fasta i.sam o.fasta ------------------------- output fasta file (o.fasta) ------------------------- >human AGAGCCGTAGCATGCATGGCGATCAGCTACAGTACGGTACACGACTAAAACAGGTTT >read1 .....CGTAGCATGCATGGCGATCAGCTACAGTACGGTACACGACTAAAACAGGTTT >read2 ...GCCGTAGCATGCATGGCGATCAGCTACAGTACGGTACACGACTAAAACAGG... >read3 AGAGCCGTAGCATGCATGGCGATCAGCTACAGTACGGTACACGACTAAA........ Known Issues: Multiple reference sequences are not supported. Only a small subset of SAM cigar fields are supported. Insertions are not handled, they are removed. Unaligned sequences are not included in the output file. Author: Chang Park See Also: bwa, samtools