Re: [Bio-bwa-help] HLA typing in bwakit-0.7.12
Status: Beta
Brought to you by:
lh3lh3
From: Holly Z. B. <zh...@eb...> - 2015-01-28 19:55:44
|
Thanks Heng for your prompt reply. Looks to me the whole set of commands was finished as the final bam file and *hal.top, *hla.all file were produced (though the later two are empty). Could you suggest a way to capture the k8 and postalt run logs? Best. Holly On 28 Jan 2015, at 14:24, Heng Li wrote: > I have just run bwa on the same input with the following command line (option -k to keep *.hla.HLA-*.fq files): > > bwa.kit/run-bwamem -o ERR013023 -t18 -kHR"@RG\tID:ERR013023\tSM:NA20502" hs38DH.fa ERR013023_1.filt.fastq.gz ERR013023_2.filt.fastq.gz | sh > > I see non-empty *.hla.HLA-*.fq files: > > -rwxrwx--- 1 hl123 GENETICS 35670 Jan 28 14:03 ERR013023.hla.HLA-A.fq > -rwxrwx--- 1 hl123 GENETICS 20390 Jan 28 14:03 ERR013023.hla.HLA-B.fq > -rwxrwx--- 1 hl123 GENETICS 18683 Jan 28 14:03 ERR013023.hla.HLA-C.fq > -rwxrwx--- 1 hl123 GENETICS 47075 Jan 28 14:03 ERR013023.hla.HLA-DQA1.fq > -rwxrwx--- 1 hl123 GENETICS 41750 Jan 28 14:03 ERR013023.hla.HLA-DQB1.fq > -rwxrwx--- 1 hl123 GENETICS 181744 Jan 28 14:03 ERR013023.hla.HLA-DRB1.fq > > I don't know why you get empty output. Is bwa.kit/run-bwamem finished? One possibility is that "k8 bwa-postalt.js" has been killed before it writes anything out. > > Nonetheless, even if you can get non-empty *.hla.HLA-*.fq files, you need to combine different runs from the same sample in order to use HLA typing (this sample has multiple runs); and even if we can do that, the HLA typing accuracy is probably not good given ~8-fold coverage. I have not evaluated typing in this case. Given the complication, it may be better to skip HLA typing, which is the default, for low-coverage 1000g samples. > > Heng > > On Jan 28, 2015, at 12:42, Holly Zheng Bradley <zheng@EBI.AC.UK> wrote: > >> Hi Heng, >> >> In preparation to map the 1000 Genomes Project data to GRCh38, I have been playing with your bwakit-0.7.12 and would like to understand HLA mapping a little more. >> >> In my final BAM files created by bwakit-0.7.12, there are hits against the HLA contigs, indicating the HLA mapping worked. Two example lines below: >> >> $ samtools view ERR013023.merge_chunks.merge.md.bam | awk -F"\t" '{if($3~/HLA/) print;}' | less >> ERR013023.22767813 2129 HLA-A*01:01:01:01 418 60 108M HLA-A*01:16N 109 0 CAGGGGCCCTCCTGGCGGGGGCGCAGGACCGGGGGAGCCGCGCCGGGAGGAGGGTCGGGCAGGTCTCAGCCACTGCTCGCCCCCAGGCTCCCACTCCATGAGGTATTT '%/'/%')1)24.66466444553333/4'544667668:8:97<:<99?9A:?=:<CA;:C:?DAH?DDD;GGHDBFHHBA@ECHHHHDHHBHHHHHHHHHHHHHHH RG:Z:ERR013023 NM:i:1 lt:Z:chr6,29942669,29942777,-; BQ:Z:@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@CFH MD:Z:0A107 >> ERR013023.13238999 2131 HLA-A*01:01:01:01 434 60 12S96M chr6 29942643 0 GGCCCGGCGGGGGGGGGCGCAGGACGGGGGGAGCCGCGACGGGAGGAGGGTCGGGCAGGTCTCAGCCACTGCTCGCCCCCAGGCTCCCACTCCATGAGGTATTTCTTC /+%'$$$$'))$,(,2/-10.0122,23321+24556777:;98;85=DH><HAE;;;<:<;B>=><8ED>?:CCC@;>ABFG@?<=?D@HHHHHHHHHHHHHHHHHH SA:Z:chr6,29942687,-,12S96M,60,3; >> XA:Z:chr6,-29942687,12S96M,3;HLA-A*01:09,-134,12S96M,2;chr6_GL000250v2_alt,-1200434,12S96M,2;HLA-A*01:16N,-155,12S96M,2;HLA-A*01:01:01:02N,-355,12S96M,2;HLA-A*36:01,-134,12S96M,2;HLA-A*01:14,-134,12S96M,2;HLA-A*01:11N,-434,12S96M,2;HLA-A*01:01:38L,-434,12S96M,2;HLA-A*01:03,-434,12S96M,2;HLA-A*01:04N,-350,12S96M,2;chr6_GL000251v2_alt,-1422109,12S96M,2;HLA-A*11:01:18,-434,12S94M2S,2;HLA-A*11:60,-266,12S94M2S,2;HLA-A*11:01:01,-434,12S94M2S,2;HLA-A*11:50Q,-434,12S94M2S,2;HLA-A*11:110,-134,12S94M2S,2;HLA-A*11:69N,-431,12S94M2S,2;HLA-A*11:05,-434,12S94M2S,2;HLA-A*11:75,-205,12S94M2S,2;HLA-A*11:77,-255,12S94M2S,2;HLA-A*11:02:01,-434,12S94M2S,2;HLA-A*11:74,-251,12S94M2S,2;HLA-A*11:25,-134,12S94M2S,2;HLA-A*03:01:01:03,-134,12S96M,3;HLA-A*03:01:01:02N,-434,12S96M,3;HLA-A*03:02:01,-434,12S96M,3;HLA-A*03:01:01:01,-434,12S96M,3;HLA-A*03:21N,-134,12S96M,3;HLA-A*03:11N,-434,12S96M,3;HLA-A*03:36N,-365,12S96M,3;HLA-A*30:02:01:02,-434,12S94M2S,3;HLA-A*01:20,-134,12S94M2S,3;HLA-A*30:04:01,-434, >> 12S94M2S,3;HLA-A*01:02,-434,12S94M2S,3;HLA-A*30:01:01,-434,12S94M2S,3;HLA-A*30:02:01:01,-134,12S94M2S,3;HLA-A*30:89,-134,12S94M2S,3; MD:Z:13C12C69 RG:Z:ERR013023 NM:i:2 AS:i:86 XS:i:86 om:i:0 lt:Z:chr6,29942685,29942781,-; BQ:Z:@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@D >> >> However, for all cases I checked, HLA related files such as *hla.all and *hla.top are all of size 0; the *hla*.fq files are all size 0 as well. The hla log file reads: >> >> *** Processing gene HLA-A... >> >> ** Empty input file. Abort! >> >> *** Processing gene HLA-B... >> >> ** Empty input file. Abort! >> >> *** Processing gene HLA-C... >> >> ** Empty input file. Abort! >> >> *** Processing gene HLA-DQA1... >> >> ** Empty input file. Abort! >> >> *** Processing gene HLA-DQB1... >> >> ** Empty input file. Abort! >> >> *** Processing gene HLA-DRB1... >> >> ** Empty input file. Abort! >> >> >> Sometimes, the HLA log reads: >> >> *** Processing gene HLA-A... >> >> ** De novo assembling... >> /nfs/1000g-work/G1K/work/bin/bwakit-0.7.12/ropebwt2 -drq20 -x `expr 29 + 2` ERR013023.hla.HLA-A.fq > ERR013023.hla.HLA-A.tmp.raw.fmd 2> ERR013023.hla.HLA-A.tmp.raw.fmd.log >> /nfs/1000g-work/G1K/work/bin/bwakit-0.7.12/fermi2 correct -D -t 2 -k 29 ERR013023.hla.HLA-A.tmp.raw.fmd ERR013023.hla.HLA-A.fq 2> ERR013023.hla.HLA-A.tmp.ec.fq.gz.log | gzip -1 > ERR013023.hla.HLA-A.tmp.ec.fq.gz >> /nfs/1000g-work/G1K/work/bin/bwakit-0.7.12/ropebwt2 -dNCr ERR013023.hla.HLA-A.tmp.ec.fq.gz > ERR013023.hla.HLA-A.tmp.ec.fmd 2> ERR013023.hla.HLA-A.tmp.ec.fmd.log >> /nfs/1000g-work/G1K/work/bin/bwakit-0.7.12/fermi2 occflt -t 2 ERR013023.hla.HLA-A.tmp.ec.fmd > ERR013023.hla.HLA-A.tmp.flt.sub 2> ERR013023.hla.HLA-A.tmp.flt.sub.log >> /nfs/1000g-work/G1K/work/bin/bwakit-0.7.12/fermi2 sub -ct 2 ERR013023.hla.HLA-A.tmp.ec.fmd ERR013023.hla.HLA-A.tmp.flt.sub > ERR013023.hla.HLA-A.tmp.flt.fmd 2> ERR013023.hla.HLA-A.tmp.flt.fmd.log >> /nfs/1000g-work/G1K/work/bin/bwakit-0.7.12/fermi2 assemble -l 55 -m 82 -t 2 ERR013023.hla.HLA-A.tmp.flt.fmd 2> ERR013023.hla.HLA-A.tmp.pre.gz.log | gzip -1 > ERR013023.hla.HLA-A.tmp.pre.gz >> /nfs/1000g-work/G1K/work/bin/bwakit-0.7.12/fermi2 simplify -CSo 60 -m 82 ERR013023.hla.HLA-A.tmp.pre.gz 2> ERR013023.hla.HLA-A.tmp.mag.gz.log | gzip -1 > ERR013023.hla.HLA-A.tmp.mag.gz >> ** Selecting contigs overlapping target exons... >> ** Mapping exons to de novo contigs... >> ** Typing... >> - Number of genes for each exon: [] >> - List of primary exon(s): [] >> - Found 0 genes fully covered by perfect matches on the primary exon(s) >> - Total length of contigs consistent/inconsistent with perfect genes: 0/0 >> - Typing in the imperfect mode... >> - Processing primary exon(s)... >> - Collected 0 top genotypes on the primary exon(s); minimal edit distance: 2147483647 >> - Processing other exon(s)... >> - Perfect mode is not attempted >> >> >> I imagine the *hla.*.fq files are the de novo assembly consensus at the HLA genes. I cannot understand why the *fq files are all size 0 while there are hits in the HLA regions (as shown in the BAM files). It was some low coverage samples I used in these test cases. Command line I used is something like: >> >> /nfs/1000g-work/G1K/work/bin/bwakit-0.7.12/run-bwamem -o ERR013023 -HR"@RG\tID:ERR013023\tSM:NA20502" /nfs/1000g-work/G1K/work/bin/bwakit-0.7.12/hs38DH.fa /nfs/1000g-archive/vol1/ftp/data/NA20502/sequence_read/ERR013023_1.filt.fastq.gz /nfs/1000g-archive/vol1/ftp/data/NA20502/sequence_read/ERR013023_2.filt.fastq.gz | sh & >> >> Do you see any problem? What would the *hla.top and *hla.all files suppose to contain? I am trying to decide if it is necessary to output the *hla.top and *hla.all at the end of the whole process. If yes, the chunky intermediate *hla.top and *hla.all files need to be merged. >> >> Thanks much for your help. >> >> Best. >> >> Holly >> EBI >> ------------------------------------------------------------------------------ >> Dive into the World of Parallel Programming. The Go Parallel Website, >> sponsored by Intel and developed in partnership with Slashdot Media, is your >> hub for all things parallel software development, from weekly thought >> leadership blogs to news, videos, case studies, tutorials and more. Take a >> look and join the conversation now. http://goparallel.sourceforge.net/ >> _______________________________________________ >> Bio-bwa-help mailing list >> Bio...@li... >> https://lists.sourceforge.net/lists/listinfo/bio-bwa-help > |