[Bio-bwa-help] BWA can generate invalid SAM (qname validation)
Status: Beta
Brought to you by:
lh3lh3
|
From: Keiran R. <kr...@sa...> - 2014-05-14 12:15:26
|
Hi Heng,
One of the projects I'm working on seems to have many variations of FASTQ file and has thrown up a problem which I feel should be addressed.
The BAM/SAM specification indicates that QNAME must conform as follows:
1 QNAME String [!-?A-~]{1,255} Query template NAME
Unfortunately it appears that BWA doesn't check this, nor does the current samtools on import so you could spend significant time mapping data before passing it to a tool that will detect the problem.
Could an option to validate the input please be added (on by default)?.
Test files attached
$ bwa mem $SCRATCH112/pan_cancer_test_sets/ref/genome.fa.gz -T 0 test_1.fq test_2.fq | samtools view -Sb - | samtools sort - sorted
[M::main_mem] read 2 sequences (200 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 0, 0, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] skip orientation FR as there are not enough pairs
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 2 reads in 0.010 CPU sec, 0.006 real sec
[main] Version: 0.7.8-r455
[main] CMD: bwa mem -T 0 /lustre/scratch112/sanger/kr2/pan_cancer_test_sets/ref/genome.fa.gz test_1.fq test_2.fq
[main] Real time: 8.859 sec; CPU: 8.440 sec
[samopen] SAM header is present: 86 sequences.
bwa: 0.7.8-r455
samtools: 0.1.19-96b5f2294a
Kind regards,
Keiran Raine
Principal Bioinformatician
Cancer Genome Project
Wellcome Trust Sanger Institute
kr...@sa...
Tel:+44 (0)1223 834244 Ext: 7703
Office: H104
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
|