I've been trying to run MToolBox, but I receive the following error:
File "/data/singhln/Projects/MToolBox/assembleMTgenome.py", line 428, in <module>
mut_events = mtvcf_main_analysis(mt_table, sam_file, sample_name, tail=tail)
File "/gpfs/gsfs3/users/singhln/Projects/MToolBox_v0.2.1/mtVariantCaller.py", line 702, in mtvcf_main_analysis
r=SearchINDELsintoSAM(readNAME,mate,CIGAR,seq, qs,refposleft,tail=tail)
File "/gpfs/gsfs3/users/singhln/Projects/MToolBox_v0.2.1/mtVariantCaller.py", line 502, in SearchINDELsintoSAM
medR=median(qsR)
File "/gpfs/gsfs3/users/singhln/Projects/MToolBox_v0.2.1/mtVariantCaller.py", line 402, in median
m1 = sorted(l)[(len(l)/2)+1-1]
IndexError: list index out of range
I believe the problem is that qsR is an empty list because qsRight is empty in my case. I printed out some basic variable values:
qs='DDA6AEHO'
s=11, sr =7
qsRight = '' qsLeft = 'HO'
I appreciate your help with this matter.
Grazie mille! :)
-Larry.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
thanks for your interest in our tool and for this reporting.
Could you please also report the command line you used to run the tool and attach the log(s) you obtained?
Ciao!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I narrowed down the problem to the assemble genome step, and this command replicates the problem:
assembleMTgenome.py -i OUT2.sam -o CS107349 -r /usr/local/share/genomes/ -f chrR
CRS.fa -a hg19RCRS.fa -FCP -s /usr/local/apps/samtool
s/0.1.19/samtools -r /data/singhln/Projects/ES/Data/ -a human_g1k_v37_RCRS.fast
a -f chrRCRS.fa
I've attached the stdout and stderr files. Here is the output of loghumanP.txt:
GSNAP version 2014-11-18 called with args: /data/singhln/bin/gsnapl -D /data/sin
ghln/Projects/ES/Data/ -d human_g1k_v37_RCRS -A sam --nofails --query-unk-mismat
ch=1 -O -t 16 OUT_CS107349/outmt1.fastq OUT_CS107349/outmt2.fastq
Checking compiler assumptions for popcnt: 6B8B4567 builtin_clz=1 builtin_ctz
=0 _mm_popcnt_u32=17 __builtin_popcount=17
Checking compiler assumptions for SSE2: 6B8B4567 327B23C6 xor=59F066A1
Checking compiler assumptions for SSE4.1: -103 -58 max=198 => compiler zero exte
nds
Finished checking compiler assumptions
Neither novel splicing (-N) nor known splicing (-s) turned on => assume reads ar
e DNA-Seq (genomic)
Note: >1 sequence detected, so index files are being memory mapped.
GSNAP can run slowly at first while the computer starts to accumulate
pages from the hard disk into its cache. To copy index files into RAM
instead of memory mapping, use -B 3, -B 4, or -B 5, if you have enough RAM.
For more speed, also try multiple threads (-t <int>), if you have multiple pro
cessors or cores.
This program gsnapl is designed for large genomes.
For small genomes of less than 2^32 (4 billion) bp, please run gsnap instead.
loghumanS looks similar. logmt.txt is 232Mb unfortunately, so I can't attach.
Could you please tell us what version of MToolBox did you use to run this command line? Are you sure is it the latest one (v0.2.1)? Because this kind of mtVariantCaller.py error is a former bug of the first released version and it has been fixed in following ones.
Anyway, just a little suggestion that I figured out doesn't come out clearly from the current README...when you run the MToolBox command line specifying with the MToolBox.sh -r option that you want to use rCRS instead of RSRS, you can avoid to state -a and -f assembleMTgenome.py arguments. We will better clarify this issue in a forthcoming version of the MToolBox README.
Please, don't hesitate to come back with any other issue.
Thank you very much for using MToolBox
Saluti
Claudia
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
could you please tell us some info about your data, ie the mean read length, the enrichment/sequencing method, and the QS encoding? I've noticed a "O" in your CIGAR string (the qs variable) and it seems not a Illumina1.8+ encoding. By the way, are you able to tell us what's the length of that specific read? If you could send us the OUT2.sam and your input file it could help us to figure out the matter.
We have noticed you are using gsnapl, with whom, indeed, we have never tested MToolBox. Do you have the possibility to run gsnap?
Grazie, a presto!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Could you send me an e-mail at my NIH e-mail address? I will address your questions there if you don't mind. The address is my <first name="">.<last name="">(at)nih.gov, like larry.singh(at)...
Grazie mille!
-Larry.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Could you send me an e-mail at my NIH e-mail address? I will address your questions there if you don't mind. The address is my <first name="">.<last name="">(at)nih.gov, like larry.singh(at)...
Grazie mille!
-Larry.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
if the problem is the notification of replies from this discussion, I am pleased to tell you that we have a brand new googlegroup which should help in communication (also my projectmates are experiencing difficulties in getting updated with it). The googlegroup is here: https://groups.google.com/forum/?hl=IT#!forum/mtoolbox-users. I hope you will join it so other users can benefit from our communication.
Have a nice day :)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello,
I've been trying to run MToolBox, but I receive the following error:
File "/data/singhln/Projects/MToolBox/assembleMTgenome.py", line 428, in <module>
mut_events = mtvcf_main_analysis(mt_table, sam_file, sample_name, tail=tail)
File "/gpfs/gsfs3/users/singhln/Projects/MToolBox_v0.2.1/mtVariantCaller.py", line 702, in mtvcf_main_analysis
r=SearchINDELsintoSAM(readNAME,mate,CIGAR,seq, qs,refposleft,tail=tail)
File "/gpfs/gsfs3/users/singhln/Projects/MToolBox_v0.2.1/mtVariantCaller.py", line 502, in SearchINDELsintoSAM
medR=median(qsR)
File "/gpfs/gsfs3/users/singhln/Projects/MToolBox_v0.2.1/mtVariantCaller.py", line 402, in median
m1 = sorted(l)[(len(l)/2)+1-1]
IndexError: list index out of range
I believe the problem is that qsR is an empty list because qsRight is empty in my case. I printed out some basic variable values:
qs='DDA6AEHO'
s=11, sr =7
qsRight = '' qsLeft = 'HO'
I appreciate your help with this matter.
Grazie mille! :)
-Larry.
Dear Larry,
thanks for your interest in our tool and for this reporting.
Could you please also report the command line you used to run the tool and attach the log(s) you obtained?
Ciao!
Caro Domenico,
Here's the command line:
/data/singhln/Projects/MToolBox/MToolBox.sh \ -r rCRS \ -m "-g /data/singhln/bin/gsnap \ -D /data/singhln/Projects/ES/Data/ \ -M chrRCRS \ -H human_g1k_v37_RCRS \ -t 14 \ " \ -a "-s /usr/local/apps/samtools/0.1.19/samtools \ -r /data/singhln/Projects/ES/Data/ \ -a human_g1k_v37_RCRS.fasta \ -f chrRCRS.fa \ " \ -c "-m /usr/local/apps/muscle/3.8.31/muscle" \ -M \ -I \ -i bam
I narrowed down the problem to the assemble genome step, and this command replicates the problem:
assembleMTgenome.py -i OUT2.sam -o CS107349 -r /usr/local/share/genomes/ -f chrR
CRS.fa -a hg19RCRS.fa -FCP -s /usr/local/apps/samtool
s/0.1.19/samtools -r /data/singhln/Projects/ES/Data/ -a human_g1k_v37_RCRS.fast
a -f chrRCRS.fa
I've attached the stdout and stderr files. Here is the output of loghumanP.txt:
GSNAP version 2014-11-18 called with args: /data/singhln/bin/gsnapl -D /data/sin
ghln/Projects/ES/Data/ -d human_g1k_v37_RCRS -A sam --nofails --query-unk-mismat
ch=1 -O -t 16 OUT_CS107349/outmt1.fastq OUT_CS107349/outmt2.fastq
Checking compiler assumptions for popcnt: 6B8B4567 builtin_clz=1 builtin_ctz
=0 _mm_popcnt_u32=17 __builtin_popcount=17
Checking compiler assumptions for SSE2: 6B8B4567 327B23C6 xor=59F066A1
Checking compiler assumptions for SSE4.1: -103 -58 max=198 => compiler zero exte
nds
Finished checking compiler assumptions
Neither novel splicing (-N) nor known splicing (-s) turned on => assume reads ar
e DNA-Seq (genomic)
Note: >1 sequence detected, so index files are being memory mapped.
GSNAP can run slowly at first while the computer starts to accumulate
pages from the hard disk into its cache. To copy index files into RAM
instead of memory mapping, use -B 3, -B 4, or -B 5, if you have enough RAM.
For more speed, also try multiple threads (-t <int>), if you have multiple pro
cessors or cores.
This program gsnapl is designed for large genomes.
For small genomes of less than 2^32 (4 billion) bp, please run gsnap instead.
loghumanS looks similar. logmt.txt is 232Mb unfortunately, so I can't attach.
Thanks again for your help.
-Larry.
Last edit: Larry Singh 2014-11-30
Hi Larry,
Could you please tell us what version of MToolBox did you use to run this command line? Are you sure is it the latest one (v0.2.1)? Because this kind of mtVariantCaller.py error is a former bug of the first released version and it has been fixed in following ones.
Anyway, just a little suggestion that I figured out doesn't come out clearly from the current README...when you run the MToolBox command line specifying with the MToolBox.sh -r option that you want to use rCRS instead of RSRS, you can avoid to state -a and -f assembleMTgenome.py arguments. We will better clarify this issue in a forthcoming version of the MToolBox README.
Please, don't hesitate to come back with any other issue.
Thank you very much for using MToolBox
Saluti
Claudia
Cara Claudia,
Quite sure. See:
$ MToolBox.sh -version
MToolBox v0.2.1
It seems to be a problem with both versions.
Grazie,
-Larry.
Larry,
could you please tell us some info about your data, ie the mean read length, the enrichment/sequencing method, and the QS encoding? I've noticed a "O" in your CIGAR string (the qs variable) and it seems not a Illumina1.8+ encoding. By the way, are you able to tell us what's the length of that specific read? If you could send us the OUT2.sam and your input file it could help us to figure out the matter.
We have noticed you are using gsnapl, with whom, indeed, we have never tested MToolBox. Do you have the possibility to run gsnap?
Grazie, a presto!
Caro Domenico,
Could you send me an e-mail at my NIH e-mail address? I will address your questions there if you don't mind. The address is my <first name="">.<last name="">(at)nih.gov, like larry.singh(at)...
Grazie mille!
-Larry.
Caro Domenico,
Could you send me an e-mail at my NIH e-mail address? I will address your questions there if you don't mind. The address is my <first name="">.<last name="">(at)nih.gov, like larry.singh(at)...
Grazie mille!
-Larry.
Dear Larry,
if the problem is the notification of replies from this discussion, I am pleased to tell you that we have a brand new googlegroup which should help in communication (also my projectmates are experiencing difficulties in getting updated with it). The googlegroup is here: https://groups.google.com/forum/?hl=IT#!forum/mtoolbox-users. I hope you will join it so other users can benefit from our communication.
Have a nice day :)