#102 GAP5: Find read pair bug

Any
open
nobody
None
5
2015-01-01
2013-10-21
Robert Willows
No

Hi all,

I've discovered that when using Find Read Pairs in GAP5 to find read pairs in an assembly which span contigs, all of the read pairs reported in the information given about the spanning read pairs by the contig comparator are ALWAYS reported as forwards and forwards direction e.g.

Read pair:
From contig DENOVO_c11(#2157181) at 64 reading GAPC_0042_FC:6:104:8598:9213#0/1(#2156633)
With contig DENOVO_c224(#36154940) at 462 reading GAPC_0042_FC:6:104:8598:9213#0/2(#36162837)
Direction of first read is forwards
Direction of second read is forwards
Length 35

Read pair:
From contig DENOVO_c190(#34488635) at 1191 reading GAPC_0042_FC:6:2:17711:1932#0/1(#34493286)
With contig DENOVO_c354(#38868888) at 2904 reading GAPC_0042_FC:6:2:17711:1932#0/2(#38887671)
Direction of first read is forwards
Direction of second read is forwards

I have three libraries. Two paired end >--< and One mate paired <---->.
No matter which library I choose or which type of comparison I choose (end vs end, all vs all end vs all) the read pairs reported in the "Contig Comparator" are ALWAYS in the forwards forwards direction.

When manually looking at the two examples above:
In the first example first both directions are correct but the position reported for the read for the second contig is not at 462 but at 1847. Is it a coincidence that 462 is rounded up 1/4 of 1847?

In the second example the reads are positioned correctly but the direction of the second read in the contig is reverse not forwards.

So in my library of mate pairs which I want to use for scaffolding "find read pairs" finds 85000 spanning read pairs (of 650000 pairs in the library). EVERY SPANNING pair is reported in the forwards forwards direction.
At first I thought that only forwards forwards were detected but after manually looking at 10 of the reported spanning pairs above, it seems the problem is with the reporting of direction of the second read. The direction is wrong for the second read in half the cases. Also in one case the position of the second reported read is incorrect being 1/4 of the position.
I manually found some spanning reverse reverse reads using the template status and none of these get reported. So no reads were found by "find read pairs" that are reverse reverse even though these are represented in the assembly when manually looking for spanning reads using the template status colours.

The input was from a caf file generated by MIRA. I used tg_index to create the database from the caf file. The caf file was from an assembly of 454 Titanium reads and Illumina mate paired and paired end reads by MIRA version 4.04 but this afternoon I check with assemblies from MIRA using versions 3.18 and 4.03 assemblies with the same result.

Regards
Robert

Related

Bugs: #102

Discussion

  • James Bonfield
    James Bonfield
    2013-10-25

    I had a look with our current in-house version (close to the current SVN commit) and it reports both forward and reverse.

    Either this is a bug I've already fixed (it's been far too long since a release) or it is something with the data that is different. What does your library look like if you use the Show Libraries dialogue?

     
  • Robert Willows
    Robert Willows
    2013-10-25

    Hi James,

    I've checked the libraries and they are reported correctly in the show library dialogue. So it seems as if it is only the reporting.

    I decided to get the information by exporting as a sam file then parsing out the information from the sam file. But I found that the sam file reports mapping position of the read as the unpadded coordinate in the contig but the position of its pair as the padded coordinate.

    I am fairly certain that I'd downloaded the latest version when I compiled it a few months ago. I will check when I get to work on Monday.

    Thanks
    Robert

    Sent from my iPhone

    On 25/10/2013, at 11:55 PM, "James Bonfield" jkbonfield@users.sf.net wrote:

    I had a look with our current in-house version (close to the current SVN commit) and it reports both forward and reverse.

    Either this is a bug I've already fixed (it's been far too long since a release) or it is something with the data that is different. What does your library look like if you use the Show Libraries dialogue?

    [bugs:#102] GAP5: Find read pair bug

    Status: open
    Created: Mon Oct 21, 2013 10:35 AM UTC by Robert Willows
    Last Updated: Mon Oct 21, 2013 10:35 AM UTC
    Owner: nobody

    Hi all,

    I've discovered that when using Find Read Pairs in GAP5 to find read pairs in an assembly which span contigs, all of the read pairs reported in the information given about the spanning read pairs by the contig comparator are ALWAYS reported as forwards and forwards direction e.g.

    Read pair:

    From contig DENOVO_c11(#2157181) at 64 reading GAPC_0042_FC:6:104:8598:9213#0/1(#2156633)
    With contig DENOVO_c224(#36154940) at 462 reading GAPC_0042_FC:6:104:8598:9213#0/2(#36162837)
    Direction of first read is forwards
    Direction of second read is forwards
    Length 35

    Read pair:

    From contig DENOVO_c190(#34488635) at 1191 reading GAPC_0042_FC:6:2:17711:1932#0/1(#34493286)
    With contig DENOVO_c354(#38868888) at 2904 reading GAPC_0042_FC:6:2:17711:1932#0/2(#38887671)
    Direction of first read is forwards
    Direction of second read is forwards

    I have three libraries. Two paired end >--< and One mate paired <---->.
    No matter which library I choose or which type of comparison I choose (end vs end, all vs all end vs all) the read pairs reported in the "Contig Comparator" are ALWAYS in the forwards forwards direction.

    When manually looking at the two examples above:
    In the first example first both directions are correct but the position reported for the read for the second contig is not at 462 but at 1847. Is it a coincidence that 462 is rounded up 1/4 of 1847?

    In the second example the reads are positioned correctly but the direction of the second read in the contig is reverse not forwards.

    So in my library of mate pairs which I want to use for scaffolding "find read pairs" finds 85000 spanning read pairs (of 650000 pairs in the library). EVERY SPANNING pair is reported in the forwards forwards direction.
    At first I thought that only forwards forwards were detected but after manually looking at 10 of the reported spanning pairs above, it seems the problem is with the reporting of direction of the second read. The direction is wrong for the second read in half the cases. Also in one case the position of the second reported read is incorrect being 1/4 of the position.
    I manually found some spanning reverse reverse reads using the template status and none of these get reported. So no reads were found by "find read pairs" that are reverse reverse even though these are represented in the assembly when manually looking for spanning reads using the template status colours.

    The input was from a caf file generated by MIRA. I used tg_index to create the database from the caf file. The caf file was from an assembly of 454 Titanium reads and Illumina mate paired and paired end reads by MIRA version 4.04 but this afternoon I check with assemblies from MIRA using versions 3.18 and 4.03 assemblies with the same result.

    Regards
    Robert

    Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/staden/bugs/102/

    To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

     

    Related

    Bugs: #102