Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

#14 problems in .bnk file made with toAmos -utg

open
nobody
None
5
2010-04-01
2010-03-31
tim a
No

My amos bank appears to have errors in it. I have detected these problems using both analyzeSNPs and hawkeye. My assembly is hybrid and was made by the celera assembler.
I used: toAmos -f gsb/antrc230_0.1_hybrid.frg -a gsb/gs35/9-terminator/gs35.asm -o - -utg | bank-transact -m - -b gsb-utg.bnk -c
I have included screen shots of a unitig with no reads showing in hawkeye (I realise that contigs can have missing reads because of surrogates), the basic statistics of that unitig, a unitig with consensus that does not match the reads, the corresponding section of the contig with the same reads with correct consensus and a unitig with some reads that seem misplaced.
The assembly used has not yet been published and thus I have not included it but I may be able to provide it confidentially.

Discussion

  • tim a
    tim a
    2010-03-31

    a hawkeye screen shot of a unitig with incorrect consensussequence

     
  • tim a
    tim a
    2010-03-31

    a hawkeye screen shot of the corresponding contig showing the same reads with correct consensus sequence

     
  • tim a
    tim a
    2010-03-31

    a hawkeye screen shot of a surrogate with some reads that seem to be misplaced.

     
  • tim a
    tim a
    2010-03-31

    a hawkeye screen shot of a unitig with no reads shown.

     
  • tim a
    tim a
    2010-03-31

    a hawkeye screen shot of the corresponding data for the unitig with no reads shown.

     
  • tim a
    tim a
    2010-03-31

    • priority: 5 --> 6
     
  • tim a
    tim a
    2010-03-31

    From my supervisor:
    "It appears to be a rather significant bug with how toAmos treats the data from an ASM file. It makes extensive use of associative arrays indexed by object identifiers. In the case of a normal output, there would only be a single reference to any read, whereas in the case of Unitig output, we get surrogates that break the assumption that associative identifiers are unique.

    From the UTG you gave me, I can see that toAmos has used the same record in both placement instances. It might require extending the associative ids to avoid the redundancy."

     
  • tim a
    tim a
    2010-04-01

    From my supervisor:
    "I believe I've fixed the problem. Let me know if you still have problems.

    All I did was make sure that the hashes that looked to be the source of errors had a unique keys. I simply extended the key to be both the read ID and parent contig ID. You won't see any difference in the AFG output with respect to IID or EIDs."

     
  • tim a
    tim a
    2010-04-01

    • priority: 6 --> 5