Menu

#14 gatekeeper problem with illumina PE data

v1.0_(example)
open
5
2015-02-20
2014-07-25
stefanie
No

Hello,

I'm trying to get CA to run and generate untitigs, but I'm stuck and need help:

  • I have Illumina data, 300bp, paired end
  • data were quality-trimmed and surviving pairs were kept

  • I generated the .frag file using -technology illumina-long -type illumina.
    When using this to runCA, the gatekaper failed, and the file ara_CA01.gkpStore.BUILDING.errorLog was 2.5G in size and said that the sequences have anywhere from one to 15 invalid QVs.

  • so following advice on this forum I generated the .frag file using -technology illumina-long -type sanger.
    Now the file ara_CA01.gkpStore.BUILDING.errorLog is empty, but the gatekeeper still fails. Terminal output includes

    ERROR: Failed with signal ABRT (6)
    runCA failed.
    Failure message:
    gatekeeper failed

and the content of the file ara_CA01.gkpStore.err is given below.

Any advice on what's going wrong here?

Thanks,

stefanie


Starting file '/raid6/stefanie/arabidopsis/rawdata/illumina/ara.frg'.

Processing INNIE SANGER QV encoding reads from:
      '/raid6/stefanie/arabidopsis/rawdata/illumina/SRR1491375_tr_1P'
  and '/raid6/stefanie/arabidopsis/rawdata/illumina/SRR1491375_tr_2P'

gatekeeper: AS_PER_gkStore_IID.C:168: void gkStore::gkStore_computeRanges(AS_IID, AS_IID, int64&, int64&, int64&, int64&, int64&, int64&, int64&, int64&, int64&): Assertion `bgnIID <= endIID' failed.

Failed with 'Aborted'

Backtrace (mangled):

/usr/local/bin/wgs-8.1/Linux-amd64/bin/gatekeeper(_Z17AS_UTL_catchCrashiP7siginfoPv+0x27)[0x422e37]
/lib64/libpthread.so.0[0x3635c0f710]
/lib64/libc.so.6(gsignal+0x35)[0x3635832925]
/lib64/libc.so.6(abort+0x175)[0x3635834105]
/lib64/libc.so.6[0x363582ba4e]
/lib64/libc.so.6(__assert_perror_fail+0x0)[0x363582bb10]
/usr/local/bin/wgs-8.1/Linux-amd64/bin/gatekeeper[0x43bed7]
/usr/local/bin/wgs-8.1/Linux-amd64/bin/gatekeeper(_ZN8gkStream5resetEjj+0x1a3)[0x437dc3]
/usr/local/bin/wgs-8.1/Linux-amd64/bin/gatekeeper(_ZN12gkStoreStats4initEP7gkStore+0xd2)[0x437612]
/usr/local/bin/wgs-8.1/Linux-amd64/bin/gatekeeper(_Z22AS_GKP_summarizeErrorsPc+0x4c)[0x41925c]
/usr/local/bin/wgs-8.1/Linux-amd64/bin/gatekeeper(main+0x992)[0x408332]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x363581ed1d]
/usr/local/bin/wgs-8.1/Linux-amd64/bin/gatekeeper[0x405f19]

Backtrace (demangled):

[0] /usr/local/bin/wgs-8.1/Linux-amd64/bin/gatekeeper::AS_UTL_catchCrash(int, siginfo*, void*) + 0x27  [0x422e37]
[1] /lib64/libpthread.so.0() [0x3635c0f710]
[2] /lib64/libc.so.6::(null) + 0x35  [0x3635832925]
[3] /lib64/libc.so.6::(null) + 0x175  [0x3635834105]
[4] /lib64/libc.so.6() [0x363582ba4e]
[5] /lib64/libc.so.6::(null) + 0  [0x363582bb10]
[6] /usr/local/bin/wgs-8.1/Linux-amd64/bin/gatekeeper() [0x43bed7]
[7] /usr/local/bin/wgs-8.1/Linux-amd64/bin/gatekeeper::gkStream::reset(unsigned int, unsigned int) + 0x1a3  [0x437dc3]
[8] /usr/local/bin/wgs-8.1/Linux-amd64/bin/gatekeeper::gkStoreStats::init(gkStore*) + 0xd2  [0x437612]
[9] /usr/local/bin/wgs-8.1/Linux-amd64/bin/gatekeeper::AS_GKP_summarizeErrors(char*) + 0x4c  [0x41925c]
[10] /usr/local/bin/wgs-8.1/Linux-amd64/bin/gatekeeper::(null) + 0x992  [0x408332]
[11] /lib64/libc.so.6::(null) + 0xfd  [0x363581ed1d]
[12] /usr/local/bin/wgs-8.1/Linux-amd64/bin/gatekeeper() [0x405f19]

GDB:

Discussion

  • Brian Walenz

    Brian Walenz - 2014-07-25

    I think this indicates that no reads were loaded, but I don't know why. Can you send a sample of the fastq files (to me directly, or post on the ticket)? NCBI isn't responding, and the download from DDBJ is claiming 15 hours remain.

    While you're at it, 'ls -l' of the gkpStore.FAILED directory might be slightly helpful.

    You do want illumina-long and tech=sanger, so no problems there. The rest of the options just set metadata in the store.

    I have seen Illumina reads just get dropped like this. I can only think that they are failing a QV check, for example, too many low quality bases.

     
  • Brian Walenz

    Brian Walenz - 2014-07-25
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -1,4 +1,3 @@
    -
     Hello, 
    
     I'm trying to get CA to run and generate untitigs, but I'm stuck and need help: 
    
    • assigned_to: Brian Walenz
     
  • stefanie

    stefanie - 2014-07-25

    Brian, thank you for your quick response!

    sample of the fastq files:

    head SRR1491375_tr_1P
    @SRR1491375.7.1 HWI-ST365:208:D08BNACXX:4:1101:1495:2191 length=100
    AAAAACCGAGCTGATGATGATCCTGACAACTTTTTGTTATTTGAATTCTGATCATCTTTGTCGTCATGAACGCTGCTTTTCTCGGGATGCGCGGAATGGT
    +SRR1491375.7.1 HWI-ST365:208:D08BNACXX:4:1101:1495:2191 length=100
    BCCFFFFFFHHHHIJJIJJGIJJJJJJJEIIJJIJGGIHIIJIJJIJIJJIJIGIJJIJJJIJIGGJJGHGFEFFDEEEEEDDDDDBDDDDDDD>>BCCC
    @SRR1491375.8.1 HWI-ST365:208:D08BNACXX:4:1101:1365:2203 length=100
    AAAAGTTTGAATCAAATCCATATATAAAAATAAATGTGTAAAGGCTAAAGATCAACAATAATTTGTTGACAAAATCAAATACCAATAATTTTGATTAATG
    +SRR1491375.8.1 HWI-ST365:208:D08BNACXX:4:1101:1365:2203 length=100
    @C@FFDDDHHGHDHIIIFGGHHJCIIIIJJJIHHIIEDEHGHIEEGIIIIIJBGGIIGGGIIHHJFHIEIIHGIIHGHIGCHEHHHEDFFDFFDEEEEED
    @SRR1491375.9.1 HWI-ST365:208:D08BNACXX:4:1101:1713:2079 length=100
    TATTTTCAAAGATTCATTCAATTCCCAATAGAGTGGAAAGATCCTTTCATACTCTAATCTTATATTCTGTGTTTATGCTTTCTTACTCAATTATGTTGCT

    head SRR1491375_tr_2P
    @SRR1491375.7.2 HWI-ST365:208:D08BNACXX:4:1101:1495:2191 length=100
    AATATACACATTTGCTAATAAATAATGTTTTGTTGATTAGATGGTCGGCCATAGCTCGTAAAATACCAAGAAGAACAGACAATGAGATCAAGAAC
    +SRR1491375.7.2 HWI-ST365:208:D08BNACXX:4:1101:1495:2191 length=100
    @@BDDFFDGHHGHJFJJGHJJJJJGJJHIIJEEHIBHIGIIEHJBCGIIIGGGHIJJIICDHIGIEHICHBEEDFFFFCEEEDDDDCDD;@CC<?
    @SRR1491375.8.2 HWI-ST365:208:D08BNACXX:4:1101:1365:2203 length=100
    TAAAGAAAACGTTAACTAGATCATGTGGGTGTTTATGATTCCACGTTTGCTCTTCTGAGAAGAAACAAATTAAATGTTTTAATTTGGTTTGCATCCCAGC
    +SRR1491375.8.2 HWI-ST365:208:D08BNACXX:4:1101:1365:2203 length=100
    BC@FFFFFHGHFHGGIIIIBHIIEHHGIJEFFHIBHI@GIJBGHHHHIIEFDFHGJJIIGGIHIJIIIEFHHGFEFADFFE@CEEEED?DD@CDDC?CCD
    @SRR1491375.9.2 HWI-ST365:208:D08BNACXX:4:1101:1713:2079 length=100
    ACAACTATAAACCAGGCTCATCAAATAATAGTCCAAATACATATACCTATGGCAGG

    ls -l' of the gkpStore.FAILED directory

    there is no such directory. Here are the files / directories that were generated:

    -rw-r--r--. 1 stefanie eag 2.3K Jul 25 10:30 ara_CA01.gkpStore.err
    drwxr-xr-x. 2 stefanie eag 4.0K Jul 25 10:27 ara_CA01.gkpStore.BUILDING
    -rw-r--r--. 1 stefanie eag 0 Jul 25 10:27 ara_CA01.gkpStore.BUILDING.errorLog
    -rw-r--r--. 1 stefanie eag 0 Jul 25 10:27 ara_CA01.gkpStore.BUILDING.fastqUIDmap
    drwxr-xr-x. 2 stefanie eag 4.0K Jul 25 10:27 runCA-logs

    Thank you!

    Stefanie

     
  • Brian Walenz

    Brian Walenz - 2014-07-26

    FAILED vs BUILDING: bad memory on my part.

    That the fastqUIDmap is empty indicates that all the reads were (silently) discarded, a known problem with the loader. Are the files in ara_CA01.gkpStore.BUILDING/ also empty?

    The sample reads load fine. I'll try running the full set this weekend; I just left the download running today.

    I converted with:

    fastqToCA -insertsize 300 30 -libraryname X -technology illumina-long -type sanger -mates 1.fastq,2.fastq > r.frg

    and loaded with:

    gatekeeper -o t.gkpStore r.frg

     
  • stefanie

    stefanie - 2014-07-26

    thanks for your help. No, gkpStoreBUILDING is not empty. It contains:

    -rw-r--r--. 1 stefanie eag 20 Jul 25 10:27 f2p
    -rw-r--r--. 1 stefanie eag 80 Jul 25 10:30 fnm
    -rw-r--r--. 1 stefanie eag 80 Jul 25 10:30 fpk
    -rw-r--r--. 1 stefanie eag 80 Jul 25 10:30 fsb
    -rw-r--r--. 1 stefanie eag 144 Jul 25 10:30 inf
    -rw-r--r--. 1 stefanie eag 296 Jul 25 10:30 lib
    -rw-r--r--. 1 stefanie eag 80 Jul 25 10:27 plc
    -rw-r--r--. 1 stefanie eag 80 Jul 25 10:30 qnm
    -rw-r--r--. 1 stefanie eag 80 Jul 25 10:30 qpk
    -rw-r--r--. 1 stefanie eag 80 Jul 25 10:30 qsb
    -rw-r--r--. 1 stefanie eag 80 Jul 25 10:30 snm
    -rw-r--r--. 1 stefanie eag 80 Jul 25 10:30 ssb
    -rw-r--r--. 1 stefanie eag 52 Jul 25 10:30 u2i
    -rw-r--r--. 1 stefanie eag 93 Jul 25 10:30 uid

    Thank you,

    Stefanie

     
  • Brian Walenz

    Brian Walenz - 2014-07-28

    Hi-

    I can't reproduce the crash. I used NCBI's fastq-dump, version 2.3.2. Looking at your steps, I suspect your trimming did something bad to the reads. Can you load the untrimmed reads?

    Load stats of the untrimmed reads:

    libIID bgnIID endIID active deleted mated totLen clrLen libName
    1 1 89797616 89797616 0 89797616 8979761600 8979761600 X

    All I can think to do next is to add a bunch of logging to see what happens to the reads in gatekeeper.

     
  • stefanie

    stefanie - 2014-07-29

    yes, you are right, it was the trimmer! Using the untrimmed data works fine, the trimmed data cause the error message.

    I used trimmomatic. I'll look into what exactly was changed, and I will probably also try another trimmer. Thank you for your help!

    Stefanie

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.