Thank you for making this pipeline open source for the community.
I managed to have a test run with MitoSAlt.pl. However, in the intel folder, I could not find a “.tsv” file; there are only “.bed", “.breakpoint", “.cluster”. Also, there is no circular plots generated.
I think problems might occur after the step of alignment as I received an error message "Error: malformed BED entry at line 4. Start was greater than end. Exiting.”. I am wondering if you have any suggestions on this issue.
Please find my log as below. Thank you!
hisat2 = hisat2
lastal = bin/last/src/lastal
lastsp = bin/last/src/last-split
mfcv = bin/last/scripts/maf-convert
reformat = reformat.sh
samtools = samtools
sambamba = bin/sambamba
b2fq = bamToFastq
gcov = genomeCoverageBed
intersectBed = intersectBed
sortBed = sortBed
clusterBed = clusterBed
randomBed = randomBed
groupBy = groupBy
bg2bw = bedGraphToBigWig
hsindex = genome/hg19_g1k
faindex = genome/hg19_g1k.fasta.fai
lastindex = genome/human_mt_rCRS
mtfaindex = genome/human_mt_rCRS.fasta.fai
gsize = genome/hg19_g1k.size
MT_fasta = genome/human_mt_rCRS.fasta
threads = 10
refchr = MT
msize = 16569
exclude = 5
orihs = 16081
orihe = 407
orils = 5730
orile = 5763
score_threshold = 80
evalue_threshold = 0.00001
split_length = 15
paired_distance = 1000
deletion_threshold_min = 30
deletion_threshold_max = 30000
breakthreshold = -2
cluster_threshold = 5
breakspan = 15
sizelimit = 10000
hplimit = 0.01
flank = 15
split_distance_threshold = 5
dna = yes
enriched = yes
nu_mt = no
rmtmp = yes
o_mt = yes
i_del = yes
cn_mt = no
Thu Jan 7 12:18:54 2021: Map to MT genome [bam_sort_core] merging from 20 files and 10 in-memory blocks...
Thu Jan 7 15:27:00 2021: Generate Bigwig
Thu Jan 7 15:31:27 2021: Build split read hash
Thu Jan 7 15:50:08 2021: Generate non-split read BED
Thu Jan 7 15:54:18 2021: Process hash to get best deletion/duplication candidates
Thu Jan 7 15:55:46 2021: Build split read clusters
Error: malformed BED entry at line 4. Start was greater than end. Exiting.
Thu Jan 7 15:55:53 2021: Generate and print results
Thu Jan 7 15:55:53 2021:Plot deletions/duplications
Thu Jan 7 15:56:03 2021:Finished
Have you checked your cluster file and breakpoint file? Is there any events
in those files? One of the possibilities of not having the final tsv and
pdf files is that there is actually no event detected (either no events at
all or no events passing your current thresholds).
Thank you for making this pipeline open source for the community.
I managed to have a test run with MitoSAlt.pl. However, in the intel
folder, I could not find a “.tsv” file; there are only “.bed",
“.breakpoint", “.cluster”. Also, there is no circular plots generated.
I think problems might occur after the step of alignment as I received an
error message "Error: malformed BED entry at line 4. Start was greater than
end. Exiting.”. I am wondering if you have any suggestions on this issue.
Please find my log as below. Thank you!
hisat2 = hisat2
lastal = bin/last/src/lastal
lastsp = bin/last/src/last-split
mfcv = bin/last/scripts/maf-convert
reformat = reformat.sh
samtools = samtools
sambamba = bin/sambamba
b2fq = bamToFastq
gcov = genomeCoverageBed
intersectBed = intersectBed
sortBed = sortBed
clusterBed = clusterBed
randomBed = randomBed
groupBy = groupBy
bg2bw = bedGraphToBigWig
hsindex = genome/hg19_g1k
faindex = genome/hg19_g1k.fasta.fai
lastindex = genome/human_mt_rCRS
mtfaindex = genome/human_mt_rCRS.fasta.fai
gsize = genome/hg19_g1k.size
MT_fasta = genome/human_mt_rCRS.fasta
threads = 10
refchr = MT
msize = 16569
exclude = 5
orihs = 16081
orihe = 407
orils = 5730
orile = 5763
score_threshold = 80
evalue_threshold = 0.00001
split_length = 15
paired_distance = 1000
deletion_threshold_min = 30
deletion_threshold_max = 30000
breakthreshold = -2
cluster_threshold = 5
breakspan = 15
sizelimit = 10000
hplimit = 0.01
flank = 15
split_distance_threshold = 5
dna = yes
enriched = yes
nu_mt = no
rmtmp = yes
o_mt = yes
i_del = yes
cn_mt = no
Thu Jan 7 12:18:54 2021: Map to MT genome [bam_sort_core] merging from 20 files and 10 in-memory blocks...
Thu Jan 7 15:27:00 2021: Generate Bigwig
Thu Jan 7 15:31:27 2021: Build split read hash
Thu Jan 7 15:50:08 2021: Generate non-split read BED
Thu Jan 7 15:54:18 2021: Process hash to get best deletion/duplication
candidates
Thu Jan 7 15:55:46 2021: Build split read clusters
Error: malformed BED entry at line 4. Start was greater than end. Exiting.
Thu Jan 7 15:55:53 2021: Generate and print results
Thu Jan 7 15:55:53 2021:Plot deletions/duplications
Thu Jan 7 15:56:03 2021:Finished
There are quite a lot of events (>10,000) in the .breakpoint file. The problem might occur at the step of "Build split read clusters", since an error showed up:
Error: malformed BED entry at line 4. Start was greater than end. Exiting.
Does this step use the .bed file as input? Is there a way I can make sure the .bed file is proper?
Best,
Yi
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I checked my .bed file and I did see a lot of sequences with start position greater than end position, which to my knowledge is not the format that bedtools will accept. I am wondering if you could double check your perl script, particularly at the step of BED file generation when you define $start and $end based on the positions of frag1 and frag2 (#GENERATE A BED FILE FOR THE SPLIT READS FOR IGV VISUALIZATION).
Let me know if you have any other solutions or suggestions. Thank you so much!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for using our pipeline! Hopefully we will be able to troubleshoot your issue and sorry for the delay. We don't think that the issue is due to the BED file, as the generated BED, us just for visualization purpose while the error message you get is at the clustering step. The clustering step takes the .breakpoint file generated and breaks it down into two files (TAG being the name you give to MitoSAlt run)
tmp_TAG_bps.bed (contains the start positions of the breakpoint file as BED chr:start-start)
tmp_TAG_bpe.bed (contains the end positions of the breakpoint file as BED chr:end-end)
These temporary files are passed through sortBed and clusterBed to get a list of positions cluster at a certain distance threshold (default for MitoSAlt is -2, which is cluster identical positions). The error you get can happen if the input file to sortBed has its start greater than the end. For example if I take a file tmpStart.bed
Thanks a lot for spending the time to help out. I really appreciate it.
I did what you suggested.
Please find the two temporary bed files I attached: tmp_KLD54test3_bps.bed and tmp_KLD54test3_bpe.bed.
I checked quickly and found that within individual .bed files, the start position is always the same as the end position. And for the same sequence, the position in bps.bed file is always smaller than that in bpe.bed. Does it mean that the files are correct? What could be the issues otherwise?
The line 4 throws an error as sortBed from bedtools cannot handle both the start and end position as 0. Unfortunately we had not encountered this error before. We have checks inside the script so that start and end position of a breakpoint, if it falls within 5 bases of start and end of the MT genome, are skipped to avoid split reads generated due to aligning reads from circular Mt genome on a linear FASTA sequence. In your input there are 8 reads with starts as zero tmp_KLD54test3_bps.bed
It might be due to these reads being split twice, once because of an actual breakpoint and once because of the genome circularity. Thus we would have three fragments instead of the regular two fragments of a split read. In this case also the script has checks to avoid start or ends to be 0. Hence could you please share with us a fastq which only has these 8 reads in, both the P1.fq and P2.fq. That would be helpful for us to debug and sort the issue. Apologies for the inconvenience.
Meanwhile you might also try running with a exclude=50 and see if the issue persists.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you for the debugging.
Please find the fastq files that contains the 8 reads in the attachment.
I ran again with exclude=50. However, the result is almost the same. I still got the error about malformed BED entry. In the bps.bed file, the same sequences with start and end positions being zero are still present. Hopefully checking the read sequences from the FASTQ files will give us some hints.
Sorry for the trouble!
We have uploaded version 1.1. Your previous installation should work fine, only replace the MitSAlt scripts with the new ones. Please let us know if your issue persists or if its solved now.
Thanks.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
With version 1.1, I am able to finish a run without generating any error. Thank you so much!
I found that my test sample gives rise many breakpoints (>10,000, most of them are unique). However there are only 11 clusters with ~10-20 reads in each cluster. From you experience, do you think the majority of the breakpoints are likely artifact as there are not >5 reads to support them and include them to a cluster?
I would like to also try to adjust some parameters in the configuration file to make it more sensitive, for example breakthreshold and cluster_threshold. Do you have suggestions on the range of breakthresholdI should test and any other parameters I should try out?
Thanks a lot!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I’m glad that the pipeline is working for your sample now.
Currently your breakpoint threshold is -2, cluster threshold is 5, and
hplimit is 0.01. With this setting, the pipeline will only cluster reads
that have breakpoints at exactly the same location; and only events with
more than 5 reads supporting it, and with heterozygosity higher than 0.01%
will be detected.
There are three ways to increase the sensitivity:
First, you can increase the breakpoint threshold to 50. With this
parameter, reads with breakpoints close to each other will be clustered as
one event, and the number of read supporting each event will also increase.
Alternatively, you can reduce the cluster threshold to 2 or even 1, while
keep breakpoint threshold at -2.
Last, If your sequencing depth is very high, you can also reduce the
hplimit to 0.005 or even 0.001, to detect events with low heteroplasmy.
With version 1.1, I am able to finish a run without generating any error.
Thank you so much!
I found that my test sample gives rise many breakpoints (>10,000, most of
them are unique). However there are only 11 clusters with ~10-20 reads in
each cluster. From you experience, do you think the majority of the
breakpoints are likely artifact as there are not >5 reads to support them
and include them to a cluster?
I would like to also try to adjust some parameters in the configuration
file to make it more sensitive, for example breakthreshold and
cluster_threshold. Do you have suggestions on the range of breakthresholdI
should test and any other parameters I should try out?
Hi,
Thanks for developing this tool again. I´m using MitoSAlt to analyze my single-end reads. But, I have a few questions about sensitivity of the tool:
1) I obtained 22 and 26 bp duplications with a similar start and end-points in some of my samples, likely artifacts. When I increase split_legth to 27 I can remove them, but I also lose my specific deletions too. For example, I have 15 bp deletion in one of my samples, but I cannot obtain this deletion with these features.
(PS: I have already increased the breakpoint threshold to 50 and decrease the cluster threshold to 2 for high sensitivity, but I didn´t work.)
2) I have some samples with known deletions analyzed by eKLIPse before. However, most of the deletion events are classified as duplication right now by MitoSAlt. I know that the MitoSAlt initially assumes that all events are deletions then it complements and re-classify as a duplication. However, I think some of these duplications, including 26 bp deletions in the previous question, can be related to features in configuration file. Regarding to this, I increased the sizelimit to 16500 but'I couldn´t get rid of them.
Do you have any recommendation for them?
Many thanks
Bengisu
Last edit: Bengisu 2021-05-11
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Bengsiu, the final tabular output of MitoSAlt also gives the lenght of the deletion of duplication. Here if you only set the split length, and cluster threshold in config file, you can always filter for deletion/duplication length in the final output. This way, you have a larger set of results, on which some post filtering can be done post run.
Ideally for small indels, < 20 bases, you might also want to use SNP callers tailor made for the purpose like Mity which uses an underlying freebayes run on the WGS data to call small indels.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi again,
Thanks for your suggestion. I have mtDNA enriched single-end sequences, I think Mity s not useful for me. MitoSAlt tool´s feature and scoring system also targets the pair-end seqs, and I have been trying to modify these to single-end reads for weeks. I obtained non-specific multiple alterations for my samples and these alterations could only be removed when the split length became 27-28 etc.
For instance, although I set sizelimit as 16560 in config file, I still obtain duplication events for 16546, 16547, 16543 bp deletions. When I set sizelimit as 1000, I still obtain "deletion" result for a 8186 bp deletion. But theoretically I should obtain "duplication" for these breakpoints, don´t I? What could be the reason for that?
Thanks for your support.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Could you please share with us the final output of MitoSAlt, the TSV file,
and add a column, where you mark the deletions and duplications you want to
keep and remove.
This will help us to figure out how to remove the false positives.
Hi again,
Thanks for your suggestion. I have mtDNA enriched single-end sequences, I
think Mity s not useful for me. MitoSAlt tool´s feature and scoring system
also targets the pair-end seqs, and I have been trying to modify these to
single-end reads for weeks. I obtained non-specific multiple alterations
for my samples and these alterations could only be removed when the split
length became 27-28 etc.
For instance, although I set sizelimit as 16560 in config file, I still
obtain duplication events for 16546, 16547, 16543 bp deletions. When I set
sizelimit as 1000, I still obtain "deletion" result for a 8186 bp
deletion. But theoretically I should obtain "duplication" for these
breakpoints, don´t I? What could be the reason for that?
Thanks for your support.
Thanks for your huge help. You can find one of my samples' tsv file. The sample should have only one 5kb deletion. Default parameters gave me 17 more alterations with the deletion sizes vary between 5756 (expected) and 16548 (with a final 21 bp size - duplication event). To remove these alterations I increased the score threshold, split length (the most effective parameter among other filters based on my previous analysis) , breakthreshold and followed these parameters:
Hi,
I also received an "Error: malformed BED entry at line 592. Start was greater than end. Exiting." message although I followed the same steps you recommended before. When I checked tmp bed file I saw this at line 592:
MT 0 0 0PG33:00925:03439_1 0
So. how can I solve this problem? I have 5 files and received the same message in three of them.
Thanks!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I also received an "Error: malformed BED entry at line 592. Start was
greater than end. Exiting." message although I followed the same steps you
recommended before. When I checked tmp bed file I saw this at line 592:
MT 0 0 0PG33:00925:03439_1 0
So. how can I solve this problem? I have 5 files and received the same
message in three of them.
Thanks!
I tried to analyze other fastqs too and the problem is remaining. I shared the script, which I newly downloaded and used, to be sure. You can also find my BED file and Config file in the folder.
Ok Bengsiu, could you please share the fastq, with only those reads (5-6
reads max) which give this kind of an error. I shall check locally and
update you.
I tried to analyze other fastqs too and the problem is remaining. I shared
the script, which I newly downloaded and used, to be sure. You can also
find my BED file and Config file in the folder.
Hi,
You can find two fastq files in the attachment.
(Edit: The forum doesn't allow me to share any document)
I received "This site can’t be reached" error when I add an attachment.
Last edit: Bengisu 2021-05-24
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
Thank you for making this pipeline open source for the community.
I managed to have a test run with MitoSAlt.pl. However, in the intel folder, I could not find a “.tsv” file; there are only “.bed", “.breakpoint", “.cluster”. Also, there is no circular plots generated.
I think problems might occur after the step of alignment as I received an error message "Error: malformed BED entry at line 4. Start was greater than end. Exiting.”. I am wondering if you have any suggestions on this issue.
Please find my log as below. Thank you!
hisat2 = hisat2
lastal = bin/last/src/lastal
lastsp = bin/last/src/last-split
mfcv = bin/last/scripts/maf-convert
reformat = reformat.sh
samtools = samtools
sambamba = bin/sambamba
b2fq = bamToFastq
gcov = genomeCoverageBed
intersectBed = intersectBed
sortBed = sortBed
clusterBed = clusterBed
randomBed = randomBed
groupBy = groupBy
bg2bw = bedGraphToBigWig
hsindex = genome/hg19_g1k
faindex = genome/hg19_g1k.fasta.fai
lastindex = genome/human_mt_rCRS
mtfaindex = genome/human_mt_rCRS.fasta.fai
gsize = genome/hg19_g1k.size
MT_fasta = genome/human_mt_rCRS.fasta
threads = 10
refchr = MT
msize = 16569
exclude = 5
orihs = 16081
orihe = 407
orils = 5730
orile = 5763
score_threshold = 80
evalue_threshold = 0.00001
split_length = 15
paired_distance = 1000
deletion_threshold_min = 30
deletion_threshold_max = 30000
breakthreshold = -2
cluster_threshold = 5
breakspan = 15
sizelimit = 10000
hplimit = 0.01
flank = 15
split_distance_threshold = 5
dna = yes
enriched = yes
nu_mt = no
rmtmp = yes
o_mt = yes
i_del = yes
cn_mt = no
Thu Jan 7 12:18:54 2021: Map to MT genome
[bam_sort_core] merging from 20 files and 10 in-memory blocks...
Thu Jan 7 15:27:00 2021: Generate Bigwig
Thu Jan 7 15:31:27 2021: Build split read hash
Thu Jan 7 15:50:08 2021: Generate non-split read BED
Thu Jan 7 15:54:18 2021: Process hash to get best deletion/duplication candidates
Thu Jan 7 15:55:46 2021: Build split read clusters
Error: malformed BED entry at line 4. Start was greater than end. Exiting.
Thu Jan 7 15:55:53 2021: Generate and print results
Thu Jan 7 15:55:53 2021:Plot deletions/duplications
Thu Jan 7 15:56:03 2021:Finished
Best,
Yi Fu
Hey,
Have you checked your cluster file and breakpoint file? Is there any events
in those files? One of the possibilities of not having the final tsv and
pdf files is that there is actually no event detected (either no events at
all or no events passing your current thresholds).
Best,
Xie
On Thu, 7 Jan 2021 at 23:51, Yi Fu minusonederland@users.sourceforge.net
wrote:
Hi Xie,
Thank you for your response.
There are quite a lot of events (>10,000) in the .breakpoint file. The problem might occur at the step of "Build split read clusters", since an error showed up:
Error: malformed BED entry at line 4. Start was greater than end. Exiting.
Does this step use the .bed file as input? Is there a way I can make sure the .bed file is proper?
Best,
Yi
Hi,
I checked my .bed file and I did see a lot of sequences with start position greater than end position, which to my knowledge is not the format that bedtools will accept. I am wondering if you could double check your perl script, particularly at the step of BED file generation when you define $start and $end based on the positions of frag1 and frag2 (#GENERATE A BED FILE FOR THE SPLIT READS FOR IGV VISUALIZATION).
Let me know if you have any other solutions or suggestions. Thank you so much!
Hi Yi Fu,
Thanks for using our pipeline! Hopefully we will be able to troubleshoot your issue and sorry for the delay. We don't think that the issue is due to the BED file, as the generated BED, us just for visualization purpose while the error message you get is at the clustering step. The clustering step takes the .breakpoint file generated and breaks it down into two files (TAG being the name you give to MitoSAlt run)
These temporary files are passed through sortBed and clusterBed to get a list of positions cluster at a certain distance threshold (default for MitoSAlt is -2, which is cluster identical positions). The error you get can happen if the input file to sortBed has its start greater than the end. For example if I take a file tmpStart.bed
And edit line 4 of the file
Then the command
gives an error
Thus if you could please re-run with the
rmtmp = no
in the config file and share with us the files named1. tmp_<>_bps.bed and
2. tmp_<>_bpe.bed
I believe we should be able to pin-point the issue. Thanks once again for helping us improve the pipeline by posting your issue!
Last edit: Swaraj Basu 2021-01-11
Hi Swaraj,
Thanks a lot for spending the time to help out. I really appreciate it.
I did what you suggested.
Please find the two temporary bed files I attached:
tmp_KLD54test3_bps.bed
andtmp_KLD54test3_bpe.bed
.I checked quickly and found that within individual .bed files, the start position is always the same as the end position. And for the same sequence, the position in bps.bed file is always smaller than that in bpe.bed. Does it mean that the files are correct? What could be the issues otherwise?
Thanks again and look forward to your reply.
Thanks Yi Fu! The 4th line in tmp_KLD54test3_bps.bed has both its start and end as 0. Thus when MitoSAlt tries to run the command
The line 4 throws an error as sortBed from bedtools cannot handle both the start and end position as 0. Unfortunately we had not encountered this error before. We have checks inside the script so that start and end position of a breakpoint, if it falls within 5 bases of start and end of the MT genome, are skipped to avoid split reads generated due to aligning reads from circular Mt genome on a linear FASTA sequence. In your input there are 8 reads with starts as zero tmp_KLD54test3_bps.bed
It might be due to these reads being split twice, once because of an actual breakpoint and once because of the genome circularity. Thus we would have three fragments instead of the regular two fragments of a split read. In this case also the script has checks to avoid start or ends to be 0. Hence could you please share with us a fastq which only has these 8 reads in, both the P1.fq and P2.fq. That would be helpful for us to debug and sort the issue. Apologies for the inconvenience.
Meanwhile you might also try running with a
exclude=50
and see if the issue persists.Hi Swaraj,
Thank you for the debugging.
Please find the fastq files that contains the 8 reads in the attachment.
I ran again with
exclude=50
. However, the result is almost the same. I still got the error about malformed BED entry. In thebps.bed
file, the same sequences with start and end positions being zero are still present. Hopefully checking the read sequences from the FASTQ files will give us some hints.Sorry for the trouble!
Last edit: Yi Fu 2021-01-12
Thank you Yi Fu.
We have uploaded version 1.1. Your previous installation should work fine, only replace the MitSAlt scripts with the new ones. Please let us know if your issue persists or if its solved now.
Thanks.
Hi Swaraj,
With version 1.1, I am able to finish a run without generating any error. Thank you so much!
I found that my test sample gives rise many breakpoints (>10,000, most of them are unique). However there are only 11 clusters with ~10-20 reads in each cluster. From you experience, do you think the majority of the breakpoints are likely artifact as there are not >5 reads to support them and include them to a cluster?
I would like to also try to adjust some parameters in the configuration file to make it more sensitive, for example
breakthreshold
andcluster_threshold
. Do you have suggestions on the range ofbreakthreshold
I should test and any other parameters I should try out?Thanks a lot!
Hi Yi,
I’m glad that the pipeline is working for your sample now.
Currently your breakpoint threshold is -2, cluster threshold is 5, and
hplimit is 0.01. With this setting, the pipeline will only cluster reads
that have breakpoints at exactly the same location; and only events with
more than 5 reads supporting it, and with heterozygosity higher than 0.01%
will be detected.
There are three ways to increase the sensitivity:
First, you can increase the breakpoint threshold to 50. With this
parameter, reads with breakpoints close to each other will be clustered as
one event, and the number of read supporting each event will also increase.
Alternatively, you can reduce the cluster threshold to 2 or even 1, while
keep breakpoint threshold at -2.
Last, If your sequencing depth is very high, you can also reduce the
hplimit to 0.005 or even 0.001, to detect events with low heteroplasmy.
Hope this helps.
Best regards,
Xie
On Sat, 16 Jan 2021 at 01:26, Yi Fu minusonederland@users.sourceforge.net
wrote:
Hi,
Thanks for developing this tool again. I´m using MitoSAlt to analyze my single-end reads. But, I have a few questions about sensitivity of the tool:
1) I obtained 22 and 26 bp duplications with a similar start and end-points in some of my samples, likely artifacts. When I increase split_legth to 27 I can remove them, but I also lose my specific deletions too. For example, I have 15 bp deletion in one of my samples, but I cannot obtain this deletion with these features.
(PS: I have already increased the breakpoint threshold to 50 and decrease the cluster threshold to 2 for high sensitivity, but I didn´t work.)
2) I have some samples with known deletions analyzed by eKLIPse before. However, most of the deletion events are classified as duplication right now by MitoSAlt. I know that the MitoSAlt initially assumes that all events are deletions then it complements and re-classify as a duplication. However, I think some of these duplications, including 26 bp deletions in the previous question, can be related to features in configuration file. Regarding to this, I increased the sizelimit to 16500 but'I couldn´t get rid of them.
Do you have any recommendation for them?
Many thanks
Bengisu
Last edit: Bengisu 2021-05-11
Yes Bengsiu the plot won't be generated if not deletion clusters are
detected.
On Mon, May 10, 2021, 6:09 PM Bengisu bkbulduk@users.sourceforge.net
wrote:
Thanks for your answer, actually I had modified my question. Could you help me about scoring? Sorry for inconvenience. Many thanks!
Last edit: Bengisu 2021-05-11
Hi Bengsiu, the final tabular output of MitoSAlt also gives the lenght of the deletion of duplication. Here if you only set the split length, and cluster threshold in config file, you can always filter for deletion/duplication length in the final output. This way, you have a larger set of results, on which some post filtering can be done post run.
Ideally for small indels, < 20 bases, you might also want to use SNP callers tailor made for the purpose like Mity which uses an underlying freebayes run on the WGS data to call small indels.
Hi again,
Thanks for your suggestion. I have mtDNA enriched single-end sequences, I think Mity s not useful for me. MitoSAlt tool´s feature and scoring system also targets the pair-end seqs, and I have been trying to modify these to single-end reads for weeks. I obtained non-specific multiple alterations for my samples and these alterations could only be removed when the split length became 27-28 etc.
For instance, although I set sizelimit as 16560 in config file, I still obtain duplication events for 16546, 16547, 16543 bp deletions. When I set sizelimit as 1000, I still obtain "deletion" result for a 8186 bp deletion. But theoretically I should obtain "duplication" for these breakpoints, don´t I? What could be the reason for that?
Thanks for your support.
Hi Bengsiu,
Could you please share with us the final output of MitoSAlt, the TSV file,
and add a column, where you mark the deletions and duplications you want to
keep and remove.
This will help us to figure out how to remove the false positives.
Regards
Swaraj
On Fri, May 28, 2021, 11:06 PM Bengisu bkbulduk@users.sourceforge.net
wrote:
Thanks for your huge help. You can find one of my samples' tsv file. The sample should have only one 5kb deletion. Default parameters gave me 17 more alterations with the deletion sizes vary between 5756 (expected) and 16548 (with a final 21 bp size - duplication event). To remove these alterations I increased the score threshold, split length (the most effective parameter among other filters based on my previous analysis) , breakthreshold and followed these parameters:
score_threshold = 90
evalue_threshold = 0.00001
split_length = 27
paired_distance = 1000
deletion_threshold_min = 30
deletion_threshold_max = 30000
breakthreshold = 50
cluster_threshold = 5
breakspan = 15
sizelimit = 10000
hplimit = 0.01
flank = 15
split_distance_threshold = 5
At the end still I have three more alterations as you'll see in the attachment.
Many thanks,
Best
Hi,
I also received an "Error: malformed BED entry at line 592. Start was greater than end. Exiting." message although I followed the same steps you recommended before. When I checked tmp bed file I saw this at line 592:
MT 0 0 0PG33:00925:03439_1 0
So. how can I solve this problem? I have 5 files and received the same message in three of them.
Thanks!
Hi Bengsiu,
Could you download the MitoSAlt scripts again and replace with your
existing scripts. We added few checks so that this error does not crop up.
You only have to use the updated script while all the downloaded files and
config can remain the same.
Please let us know if that works.
-Regards
Swaraj
On Thu, May 20, 2021, 2:43 PM Bengisu bkbulduk@users.sourceforge.net
wrote:
Thanks for this changing but unfortunately it didn´t work. I received this problem again:
Error: malformed BED entry at line 465. Start was greater than end. Exiting.
The BED line like this:
MT 0 15480 0PG33:00925:03439_1 603 - 0 15480 0 2 88,141 0,15339
I tried to analyze other fastqs too and the problem is remaining. I shared the script, which I newly downloaded and used, to be sure. You can also find my BED file and Config file in the folder.
Thank you!
Ok Bengsiu, could you please share the fastq, with only those reads (5-6
reads max) which give this kind of an error. I shall check locally and
update you.
On Mon, May 24, 2021, 12:35 PM Bengisu bkbulduk@users.sourceforge.net
wrote:
Hi,
You can find two fastq files in the attachment.
(Edit: The forum doesn't allow me to share any document)
I received "This site can’t be reached" error when I add an attachment.
Last edit: Bengisu 2021-05-24
Hi,
You can find the fastq file in the attachement.
Thanks a lot!
Hi Bengsiu,
Could you please download the v1.1.1 (updated today) and check with your data. Let us know if you face any issues.
-Regards
Swaraj