I just want to make sure I am interpreting the qmotif output correctly. In the "summary" section, is the "bases_containing_motifs count" field the one that represents the total telomere length?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
This field is a sum of all the telomeric motif lengths multiplied by the number of reads they were observed on.
If you are after the total telomere length without the read count multiplication, then you will need to loop through the <motifs> elements, and accumulate the <motif> lengths in there.
HTH
Cheers,
Oliver Holmes
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
And one more question: I tried running qmotif on BAM files containing individual chromosomes (i.e., one chromosome per BAM file). For some reason, some chromosomes gave me 0 for bases_containing_motifs count whereas other chromosomes gave me very large numbers. This was consistent for different samples and for both BWA and Isaac aligners (i.e., the same chromosomes had 0 values in all samples tried and for both aligners). Is there a reason that this be the case? The reads were aligned to hg19.
If it helps, the chromosomes giving me 0 are Y, 6, 8, 13, 14, 20, and 22.
Thanks again,
Brett
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Brett,
Could it be that the bams you are testing against don't have any coverage in the regions that qmotif is looking?
Quick way to check would be to take the regions in your qmotif ini file, and use something like samtools to get the reads that overlap those positions.
If you don't have any reads in the regions that qmotif is looking in, then it would be reasonable to see a zero in the bases_containing_motifs count.
Also, is it the case that these chromosome give you values when you run qmotif against a bam containing all the chromosomes, but don't give values when run against the individual bams? If so then that is a concern, and I would be grateful if you could let me know.
Thanks for your help,
Ollie Holmes
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
There are reads mapping to these regions; below are the number of reads (using your samtools view command) for each of the 5 samples I tested and for bwa/isaac:
These counts are similar to the telomere regions of chromosome 7 (samtools view $file chr7:10001-12238 chr7:159126558-159128662), which does have counts:
To answer your second question, there does not seem to be any difference in behavior on the individual-chromosome BAMs versus the combined BAM file. When I added up the bases_containing_motifs count for the individual-chromosome BAMs, the total was the same as bases_containing_motifs count for the combined BAM file. Further, there are no reads mapped to chromosome 6 (for example) in the BAM file produced by qmotif in either the individual-chromosome BAM or the combined BAM.
Thanks again for all your help!
Brett
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Brett,
Apologies for the delay.
OK, so it looks like you have coverage at these regions, but qmotif is also looking for reads that contain the strings defined in your ini file.
Would it be possible to extend the samtools command used previously to include a grep for the stage 1 search string? eg.
If no reads are returned (which is what I get when running against a WGS test bam on our cluster) then qmotif will return 0 for these regions.
Note that if the stage1_string_rev_comp is set to true, then the reverse complement of the stage 1 search string will also be used.
You could try modifying the regions to include more reads, or change the stage 1 motif to be more inclusive.
Hope this helps.
Cheers,
Ollie
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I just want to make sure I am interpreting the qmotif output correctly. In the "summary" section, is the "bases_containing_motifs count" field the one that represents the total telomere length?
Hi,
This field is a sum of all the telomeric motif lengths multiplied by the number of reads they were observed on.
If you are after the total telomere length without the read count multiplication, then you will need to loop through the <motifs> elements, and accumulate the <motif> lengths in there.
HTH
Cheers,
Oliver Holmes
Thanks Ollie!
And one more question: I tried running qmotif on BAM files containing individual chromosomes (i.e., one chromosome per BAM file). For some reason, some chromosomes gave me 0 for bases_containing_motifs count whereas other chromosomes gave me very large numbers. This was consistent for different samples and for both BWA and Isaac aligners (i.e., the same chromosomes had 0 values in all samples tried and for both aligners). Is there a reason that this be the case? The reads were aligned to hg19.
If it helps, the chromosomes giving me 0 are Y, 6, 8, 13, 14, 20, and 22.
Thanks again,
Brett
Hi Brett,
Could it be that the bams you are testing against don't have any coverage in the regions that qmotif is looking?
Quick way to check would be to take the regions in your qmotif ini file, and use something like samtools to get the reads that overlap those positions.
If you don't have any reads in the regions that qmotif is looking in, then it would be reasonable to see a zero in the bases_containing_motifs count.
Also, is it the case that these chromosome give you values when you run qmotif against a bam containing all the chromosomes, but don't give values when run against the individual bams? If so then that is a concern, and I would be grateful if you could let me know.
Thanks for your help,
Ollie Holmes
Hey Ollie,
There are reads mapping to these regions; below are the number of reads (using your samtools view command) for each of the 5 samples I tested and for bwa/isaac:
1625 PGPC-0001B.bwa.6.bam.telomore_regions.sam
295 PGPC-0001B.isaac.6.bam.telomore_regions.sam
1613 PGPC-0002B.bwa.6.bam.telomore_regions.sam
292 PGPC-0002B.isaac.6.bam.telomore_regions.sam
1208 PGPC-0054.bwa.6.bam.telomore_regions.sam
221 PGPC-0054.isaac.6.bam.telomore_regions.sam
1328 PGPC-0055.bwa.6.bam.telomore_regions.sam
270 PGPC-0055.isaac.6.bam.telomore_regions.sam
1012 PGPC-0056.bwa.6.bam.telomore_regions.sam
156 PGPC-0056.isaac.6.bam.telomore_regions.sam
These counts are similar to the telomere regions of chromosome 7 (samtools view $file chr7:10001-12238 chr7:159126558-159128662), which does have counts:
1280 PGPC-0001B.bwa.7.bam.telomere_regions.sam
1593 PGPC-0001B.isaac.7.bam.telomere_regions.sam
1698 PGPC-0002B.bwa.7.bam.telomere_regions.sam
1856 PGPC-0002B.isaac.7.bam.telomere_regions.sam
1185 PGPC-0054.bwa.7.bam.telomere_regions.sam
1329 PGPC-0054.isaac.7.bam.telomere_regions.sam
1132 PGPC-0055.bwa.7.bam.telomere_regions.sam
1411 PGPC-0055.isaac.7.bam.telomere_regions.sam
1150 PGPC-0056.bwa.7.bam.telomere_regions.sam
1379 PGPC-0056.isaac.7.bam.telomere_regions.sam
Any idea what might be going on here?
To answer your second question, there does not seem to be any difference in behavior on the individual-chromosome BAMs versus the combined BAM file. When I added up the bases_containing_motifs count for the individual-chromosome BAMs, the total was the same as bases_containing_motifs count for the combined BAM file. Further, there are no reads mapped to chromosome 6 (for example) in the BAM file produced by qmotif in either the individual-chromosome BAM or the combined BAM.
Thanks again for all your help!
Brett
Hi Brett,
Apologies for the delay.
OK, so it looks like you have coverage at these regions, but qmotif is also looking for reads that contain the strings defined in your ini file.
Would it be possible to extend the samtools command used previously to include a grep for the stage 1 search string? eg.
If no reads are returned (which is what I get when running against a WGS test bam on our cluster) then qmotif will return 0 for these regions.
Note that if the stage1_string_rev_comp is set to true, then the reverse complement of the stage 1 search string will also be used.
You could try modifying the regions to include more reads, or change the stage 1 motif to be more inclusive.
Hope this helps.
Cheers,
Ollie