Menu

Possible bug in mtVariantCaller

2014-11-28
2014-12-03
  • Larry Singh

    Larry Singh - 2014-11-28

    Hello,

    I've been trying to run MToolBox, but I receive the following error:

    File "/data/singhln/Projects/MToolBox/assembleMTgenome.py", line 428, in <module>
    mut_events = mtvcf_main_analysis(mt_table, sam_file, sample_name, tail=tail)
    File "/gpfs/gsfs3/users/singhln/Projects/MToolBox_v0.2.1/mtVariantCaller.py", line 702, in mtvcf_main_analysis
    r=SearchINDELsintoSAM(readNAME,mate,CIGAR,seq, qs,refposleft,tail=tail)
    File "/gpfs/gsfs3/users/singhln/Projects/MToolBox_v0.2.1/mtVariantCaller.py", line 502, in SearchINDELsintoSAM
    medR=median(qsR)
    File "/gpfs/gsfs3/users/singhln/Projects/MToolBox_v0.2.1/mtVariantCaller.py", line 402, in median
    m1 = sorted(l)[(len(l)/2)+1-1]
    IndexError: list index out of range

    I believe the problem is that qsR is an empty list because qsRight is empty in my case. I printed out some basic variable values:

    qs='DDA6AEHO'
    s=11, sr =7
    qsRight = '' qsLeft = 'HO'

    I appreciate your help with this matter.

    Grazie mille! :)
    -Larry.

     
    • Domenico Simone

      Domenico Simone - 2014-11-29

      Dear Larry,

      thanks for your interest in our tool and for this reporting.
      Could you please also report the command line you used to run the tool and attach the log(s) you obtained?

      Ciao!

       
  • Larry Singh

    Larry Singh - 2014-11-30

    Caro Domenico,

    Here's the command line:

    /data/singhln/Projects/MToolBox/MToolBox.sh \ -r rCRS \ -m "-g /data/singhln/bin/gsnap \ -D /data/singhln/Projects/ES/Data/ \ -M chrRCRS \ -H human_g1k_v37_RCRS \ -t 14 \ " \ -a "-s /usr/local/apps/samtools/0.1.19/samtools \ -r /data/singhln/Projects/ES/Data/ \ -a human_g1k_v37_RCRS.fasta \ -f chrRCRS.fa \ " \ -c "-m /usr/local/apps/muscle/3.8.31/muscle" \ -M \ -I \ -i bam

    I narrowed down the problem to the assemble genome step, and this command replicates the problem:

    assembleMTgenome.py -i OUT2.sam -o CS107349 -r /usr/local/share/genomes/ -f chrR
    CRS.fa -a hg19RCRS.fa -FCP -s /usr/local/apps/samtool
    s/0.1.19/samtools -r /data/singhln/Projects/ES/Data/ -a human_g1k_v37_RCRS.fast
    a -f chrRCRS.fa

    I've attached the stdout and stderr files. Here is the output of loghumanP.txt:

    GSNAP version 2014-11-18 called with args: /data/singhln/bin/gsnapl -D /data/sin
    ghln/Projects/ES/Data/ -d human_g1k_v37_RCRS -A sam --nofails --query-unk-mismat
    ch=1 -O -t 16 OUT_CS107349/outmt1.fastq OUT_CS107349/outmt2.fastq
    Checking compiler assumptions for popcnt: 6B8B4567 builtin_clz=1 builtin_ctz
    =0 _mm_popcnt_u32=17 __builtin_popcount=17
    Checking compiler assumptions for SSE2: 6B8B4567 327B23C6 xor=59F066A1
    Checking compiler assumptions for SSE4.1: -103 -58 max=198 => compiler zero exte
    nds
    Finished checking compiler assumptions
    Neither novel splicing (-N) nor known splicing (-s) turned on => assume reads ar
    e DNA-Seq (genomic)
    Note: >1 sequence detected, so index files are being memory mapped.
    GSNAP can run slowly at first while the computer starts to accumulate
    pages from the hard disk into its cache. To copy index files into RAM
    instead of memory mapping, use -B 3, -B 4, or -B 5, if you have enough RAM.
    For more speed, also try multiple threads (-t <int>), if you have multiple pro
    cessors or cores.
    This program gsnapl is designed for large genomes.
    For small genomes of less than 2^32 (4 billion) bp, please run gsnap instead.

    loghumanS looks similar. logmt.txt is 232Mb unfortunately, so I can't attach.

    Thanks again for your help.
    -Larry.

     

    Last edit: Larry Singh 2014-11-30
  • Claudia Calabrese

    Hi Larry,

    Could you please tell us what version of MToolBox did you use to run this command line? Are you sure is it the latest one (v0.2.1)? Because this kind of mtVariantCaller.py error is a former bug of the first released version and it has been fixed in following ones.

    Anyway, just a little suggestion that I figured out doesn't come out clearly from the current README...when you run the MToolBox command line specifying with the MToolBox.sh -r option that you want to use rCRS instead of RSRS, you can avoid to state -a and -f assembleMTgenome.py arguments. We will better clarify this issue in a forthcoming version of the MToolBox README.

    Please, don't hesitate to come back with any other issue.
    Thank you very much for using MToolBox

    Saluti
    Claudia

     
  • Larry Singh

    Larry Singh - 2014-12-01

    Cara Claudia,

    Quite sure. See:

    $ MToolBox.sh -version
    MToolBox v0.2.1

    It seems to be a problem with both versions.

    Grazie,
    -Larry.

     
  • Domenico Simone

    Domenico Simone - 2014-12-02

    Larry,

    could you please tell us some info about your data, ie the mean read length, the enrichment/sequencing method, and the QS encoding? I've noticed a "O" in your CIGAR string (the qs variable) and it seems not a Illumina1.8+ encoding. By the way, are you able to tell us what's the length of that specific read? If you could send us the OUT2.sam and your input file it could help us to figure out the matter.

    We have noticed you are using gsnapl, with whom, indeed, we have never tested MToolBox. Do you have the possibility to run gsnap?

    Grazie, a presto!

     
  • Larry Singh

    Larry Singh - 2014-12-03

    Caro Domenico,

    Could you send me an e-mail at my NIH e-mail address? I will address your questions there if you don't mind. The address is my <first name="">.<last name="">(at)nih.gov, like larry.singh(at)...

    Grazie mille!
    -Larry.

     
  • Larry Singh

    Larry Singh - 2014-12-03

    Caro Domenico,

    Could you send me an e-mail at my NIH e-mail address? I will address your questions there if you don't mind. The address is my <first name="">.<last name="">(at)nih.gov, like larry.singh(at)...

    Grazie mille!
    -Larry.

     
  • Domenico Simone

    Domenico Simone - 2014-12-03

    Dear Larry,

    if the problem is the notification of replies from this discussion, I am pleased to tell you that we have a brand new googlegroup which should help in communication (also my projectmates are experiencing difficulties in getting updated with it). The googlegroup is here: https://groups.google.com/forum/?hl=IT#!forum/mtoolbox-users. I hope you will join it so other users can benefit from our communication.

    Have a nice day :)

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.