MIRA / Tickets / #44 MIRA 4 cannot recognise paired reads

\o/

:-)

On 18. May 2021, at 9:39 , Maeva perez maeperez@users.sourceforge.net wrote:

Oh nevermind. The issue was in the manifest.conf file. I replace the line

segment_naming = FR

by

segment_naming = solexa

and it worked. Thanks for a great program.

[tickets:#44] https://sourceforge.net/p/mira-assembler/tickets/44/ MIRA 4 cannot recognise paired reads

Status: open
Version: 4.0.1
Created: Tue May 18, 2021 07:16 AM UTC by Maeva perez
Last Updated: Tue May 18, 2021 07:18 AM UTC
Owner: nobody
Attachments:

manifest-1.conf https://sourceforge.net/p/mira-assembler/tickets/44/attachment/manifest-1.conf (454 Bytes; application/octet-stream)
Hi Bastien,
I have sucessfully run Mira in the past but this these reads downloaded from NCBI I have been unable to. My guess is that there is something wrong with the format of my fastq files so I have tried modifying the headers in different ways but without success.

Your help would be greatly appreciated

Below is the log and attached are the read files (small subset) and manifest.conf
This is MIRA 4.0.2_0+g29f87d4 .

Please cite: Chevreux, B., Wetter, T. and Suhai, S. (1999), Genome Sequence
Assembly Using Trace Signals and Additional Sequence Information.
Computer Science and Biology: Proceedings of the German Conference on
Bioinformatics (GCB) 99, pp. 45-56.

To (un-)subscribe the MIRA mailing lists, see:
http://www.chevreux.org/mira_mailinglists.html http://www.chevreux.org/mira_mailinglists.html
After subscribing, mail general questions to the MIRA talk mailing list:
mira_talk@freelists.org

To report bugs or ask for features, please use the SourceForge ticketing
system at:
http://sourceforge.net/p/mira-assembler/tickets/ http://sourceforge.net/p/mira-assembler/tickets/
This ensures that requests do not get lost.

Compiled by: bach
Fri Apr 18 14:57:56 CEST 2014
On: Darwin airfau2.fritz.box 13.1.0 Darwin Kernel Version 13.1.0: Thu Jan 16 19:40:37 PST 2014; root:xnu-2422.90.20~2/RELEASE_X86_64 x86_64
Compiled in boundtracking mode.
Compiled in bugtracking mode.
Compiled with ENABLE64 activated.
Runtime settings (sorry, for debug):
Size of size_t : 8
Size of uint32 : 4
Size of uint32_t: 4
Size of uint64 : 8
Size of uint64_t: 8
Current system: Darwin Maevas-MacBook-Air.local 19.6.0 Darwin Kernel Version 19.6.0: Thu Oct 29 22:56:45 PDT 2020; root:xnu-6153.141.2.2~1/RELEASE_X86_64 x86_64

For mapping assembly: readgroup 2 named 'reads' has no 'infoonly' or 'exclusion_criterion' set for 'segment_placement',
assuming 'infoonly'.
Looking for files named in data ...Pushing back filename: "P08H-3-Mito_circ_reoriented.fa"
Pushing back filename: "subset_reads1.fastq"
Pushing back filename: "subset_reads2.fastq"
Manifest:
projectname: initial_mapping_testpool-to_ref
job: genome,mapping,accurate
parameters: -NW:mrnl=0 -AS:nop=1 SOLEXA_SETTINGS -CO:msr=no
Manifest load entries: 2
MLE 1:
RGID: 1
RGN: SN: AlvCau
SP: SPio: 0 SPC: 0 IF: -1 IT: -1 TSio: 0
ST: 5 (Text) namschem: 6 SID: 0
DQ: 30
BB: 1 Rail: 0 CER: 0

P08H-3-Mito_circ_reoriented.fa MLE 2:
RGID: 2
RGN: reads SN: testpool
SP: ---> <--- SPio: 1 SPC: -1 IF: -1 IT: -1 TSio: 0
ST: 6 (Solexa) namschem: 3 SID: 0
DQ: 30
BB: 0 Rail: 0 CER: 0

subset_reads1.fastq subset_reads2.fastq

Parameters parsed without error, perfect.
Overriding number of threads via '-t' with 2

-CL:pec and -CO:emeas1clpec are set, setting -CO:emea values to 1.

Parameter settings seen for:
Sanger data

Used parameter settings:
General (-GE):
Project name : initial_mapping_testpool-to_ref
Number of threads (not) : 2
Automatic memory management (amm) : yes
Keep percent memory free (kpmf) : 15
Max. process size (mps) : 0
EST SNP pipeline step (esps) : 0
Colour reads by hash frequency (crhf) : yes

Load reads options (-LR):
Wants quality file (wqf) : [sxa] yes

Filecheck only (fo) : no
Assembly options (-AS):
Number of passes (nop) : 1
Skim each pass (sep) : yes
Maximum number of RMB break loops (rbl) : 1
Maximum contigs per pass (mcpp) : 0

Minimum read length (mrl) : [sxa] 20
Minimum reads per contig (mrpc) : [sxa] 10
Enforce presence of qualities (epoq) : [sxa] yes

Automatic repeat detection (ard) : yes
Coverage threshold (ardct) : [sxa] 2
Minimum length (ardml) : [sxa] 200
Grace length (ardgl) : [sxa] 20
Use uniform read distribution (urd) : no
Start in pass (urdsip) : 3
Cutoff multiplier (urdcm) : [sxa] 1.5

Spoiler detection (sd) : yes
Last pass only (sdlpo) : yes

Use genomic pathfinder (ugpf) : yes

Use emergency search stop (uess) : yes
ESS partner depth (esspd) : 500
Use emergency blacklist (uebl) : yes
Use max. contig build time (umcbt) : no
Build time in seconds (bts) : 10000
Strain and backbone options (-SB):
Bootstrap new backbone (bnb) : yes
Start backbone usage in pass (sbuip) : 0
Backbone rail from strain (brfs) :
Backbone rail length (brl) : 0
Backbone rail overlap (bro) : 0
Trim overhanging reads (tor) : yes

(Also build new contigs (abnc)) : no
Dataprocessing options (-DP):
Use read extensions (ure) : [sxa] no
Read extension window length (rewl) : [sxa] 30
Read extension w. maxerrors (rewme) : [sxa] 2
First extension in pass (feip) : [sxa] 0
Last extension in pass (leip) : [sxa] 0

Clipping options (-CL):
SSAHA2 or SMALT clipping:
Gap size (msvsgs) : [sxa] 1
Max front gap (msvsmfg) : [sxa] 2
Max end gap (msvsmeg) : [sxa] 2
Strict front clip (msvssfc) : [sxa] 0
Strict end clip (msvssec) : [sxa] 0
Possible vector leftover clip (pvlc) : [sxa] no
maximum len allowed (pvcmla) : [sxa] 18
Min qual. threshold for entire read (mqtfer): [sxa] 0
Number of bases (mqtfernob) : [sxa] 15
Quality clip (qc) : [sxa] no
Minimum quality (qcmq) : [sxa] 20
Window length (qcwl) : [sxa] 30
Bad stretch quality clip (bsqc) : [sxa] no
Minimum quality (bsqcmq) : [sxa] 5
Window length (bsqcwl) : [sxa] 20
Masked bases clip (mbc) : [sxa] no
Gap size (mbcgs) : [sxa] 5
Max front gap (mbcmfg) : [sxa] 12
Max end gap (mbcmeg) : [sxa] 12
Lower case clip front (lccf) : [sxa] no
Lower case clip back (lccb) : [sxa] no
Clip poly A/T at ends (cpat) : [sxa] no
Keep poly-a signal (cpkps) : [sxa] no
Minimum signal length (cpmsl) : [sxa] 12
Max errors allowed (cpmea) : [sxa] 1
Max gap from ends (cpmgfe) : [sxa] 9
Clip 3 prime polybase (c3pp) : [sxa] yes
Minimum signal length (c3ppmsl) : [sxa] 15
Max errors allowed (c3ppmea) : [sxa] 3
Max gap from ends (c3ppmgfe) : [sxa] 9
Clip known adaptors right (ckar) : [sxa] yes
Ensure minimum left clip (emlc) : [sxa] no
Minimum left clip req. (mlcr) : [sxa] 0
Set minimum left clip to (smlc) : [sxa] 0
Ensure minimum right clip (emrc) : [sxa] no
Minimum right clip req. (mrcr) : [sxa] 10
Set minimum right clip to (smrc) : [sxa] 20

Apply SKIM chimera detection clip (ascdc) : no
Apply SKIM junk detection clip (asjdc) : no

Propose end clips (pec) : [sxa] yes
Bases per hash (pecbph) : 31
Handle Solexa GGCxG problem (pechsgp) : yes
Front freq (pffreq) : [sxa] 0
Back freq (pbfreq) : [sxa] 0
Minimum kmer for forward-rev (pmkfr) : 1
Front forward-rev (pffore) : [sxa] yes
Back forward-rev (pbfore) : [sxa] yes
Front conf. multi-seq type (pfcmst) : [sxa] yes
Back conf. multi-seq type (pbcmst) : [sxa] yes
Front seen at low pos (pfsalp) : [sxa] no
Back seen at low pos (pbsalp) : [sxa] no

Clip bad solexa ends (cbse) : [sxa] yes
Search PhiX174 (spx174) : [sxa] yes
Filter PhiX174 (fpx174) : [sxa] no

Rare kmer mask (rkm) : [sxa] 0
Parameters for SKIM algorithm (-SK):
Number of threads (not) : 2

Also compute reverse complements (acrc) : yes
Bases per hash (bph) : 10
Automatic increase per pass (bphaipp) : 1
Automatic incr. cov. threshold (bphaict): 20
Hash save stepping (hss) : 1
Percent required (pr) : [sxa] 60

Max hits per read (mhpr) : 2000
Max megahub ratio (mmhr) : 0

SW check on backbones (swcob) : yes

Max hashes in memory (mhim) : 15000000
MemCap: hit reduction (mchr) : 4096
Parameters for Hash Statistics (-HS):
Freq. cov. estim. min (fcem) : 0
Freq. estim. min normal (fenn) : 0.4
Freq. estim. max normal (fexn) : 1.6
Freq. estim. repeat (fer) : 1.9
Freq. estim. heavy repeat (fehr) : 8
Freq. estim. crazy (fecr) : 20
Mask nasty repeats (mnr) : no
Nasty repeat ratio (nrr) : 100
Nasty repeat coverage (nrc) : 0
Lossless digital normalisation (ldn) : no

Repeat level in info file (rliif) : 6

Million hashes per buffer (mhpb) : 16
Rare kmer early kill (rkek) : no
Pathfinder options (-PF):
Use quick rule (uqr) : [sxa] yes
Quick rule min len 1 (qrml1) : [sxa] -90
Quick rule min sim 1 (qrms1) : [sxa] 100
Quick rule min len 2 (qrml2) : [sxa] -80
Quick rule min sim 2 (qrms2) : [sxa] 100
Backbone quick overlap min len (bqoml) : [sxa] 20
Max. start cache fill time (mscft) : 5

Align parameters for Smith-Waterman align (-AL):
Bandwidth in percent (bip) : [sxa] 20
Bandwidth max (bmax) : [sxa] 80
Bandwidth min (bmin) : [sxa] 20
Minimum score (ms) : [sxa] 15
Minimum overlap (mo) : [sxa] 20
Minimum relative score in % (mrs) : [sxa] 60
Solexa_hack_max_errors (shme) : [sxa] -1
Extra gap penalty (egp) : [sxa] no
extra gap penalty level (egpl) : [sxa] reject_codongaps
Max. egp in percent (megpp) : [sxa] 100

Contig parameters (-CO):
Name prefix (np) : initial_mapping_testpool-to_ref
Reject on drop in relative alignment score in % (rodirs) : [sxa] 30
Mark repeats (mr) : yes
Only in result (mroir) : no
Assume SNP instead of repeats (asir) : no
Minimum reads per group needed for tagging (mrpg) : [sxa] 3
Minimum neighbour quality needed for tagging (mnq) : [sxa] 20
Minimum Group Quality needed for RMB Tagging (mgqrt) : [sxa] 30
End-read Marking Exclusion Area in bases (emea) : [sxa] 1
Set to 1 on clipping PEC (emeas1clpec) : yes
Also mark gap bases (amgb) : [sxa] yes
Also mark gap bases - even multicolumn (amgbemc) : [sxa] yes
Also mark gap bases - need both strands (amgbnbs): [sxa] yes
Force non-IUPAC consensus per sequencing type (fnicpst) : [sxa] no
Merge short reads (msr) : [sxa] no
Max errors (msrme) : [sxa] 0
Keep ends unmerged (msrkeu) : [sxa] -1
Gap override ratio (gor) : [sxa] 66

Edit options (-ED):
Mira automatic contig editing (mace) : yes
Edit kmer singlets (eks) : yes
Edit homopolymer overcalls (ehpo) : [sxa] no

Misc (-MI):
Large contig size (lcs) : 500
Large contig size for stats (lcs4s) : 5000

I know what I do (ikwid) : no

Extra flag 1 / sanity track check (ef1) : no
Extra flag 2 / dnredreadsatpeaks (ef2) : yes
Extra flag 3 / pelibdisassemble (ef3) : yes
Extended log (el) : no
Nag and Warn (-NW):
Check NFS (cnfs) : stop
Check multi pass mapping (cmpm) : stop
Check template problems (ctp) : stop
Check duplicate read names (cdrn) : stop
Check max read name length (cmrnl) : stop
Max read name length (mrnl) : 0
Check average coverage (cac) : stop
Average coverage value (acv) : 160

Directories (-DI):
Top directory for writing files : initial_mapping_testpool-to_ref_assembly
For writing result files : initial_mapping_testpool-to_ref_assembly/initial_mapping_testpool-to_ref_d_results
For writing result info files : initial_mapping_testpool-to_ref_assembly/initial_mapping_testpool-to_ref_d_info
For writing tmp files : initial_mapping_testpool-to_ref_assembly/initial_mapping_testpool-to_ref_d_tmp
Tmp redirected to (trt) :
For writing checkpoint files : initial_mapping_testpool-to_ref_assembly/initial_mapping_testpool-to_ref_d_chkpt

Output files (-OUTPUT/-OUT):
Save simple singlets in project (sssip) : [sxa] no
Save tagged singlets in project (stsip) : [sxa] yes

Remove rollover tmps (rrot) : yes
Remove tmp directory (rtd) : no

Result files:
Saved as CAF (orc) : yes
Saved as MAF (orm) : yes
Saved as FASTA (orf) : yes
Saved as GAP4 (directed assembly) (org) : no
Saved as phrap ACE (ora) : no
Saved as GFF3 (org3) : no
Saved as HTML (orh) : no
Saved as Transposed Contig Summary (ors) : yes
Saved as simple text format (ort) : no
Saved as wiggle (orw) : yes

Temporary result files:
Saved as CAF (otc) : yes
Saved as MAF (otm) : no
Saved as FASTA (otf) : no
Saved as GAP4 (directed assembly) (otg) : no
Saved as phrap ACE (ota) : no
Saved as HTML (oth) : no
Saved as Transposed Contig Summary (ots) : no
Saved as simple text format (ott) : no

Extended temporary result files:
Saved as CAF (oetc) : no
Saved as FASTA (oetf) : no
Saved as GAP4 (directed assembly) (oetg) : no
Saved as phrap ACE (oeta) : no
Saved as HTML (oeth) : no
Save also singlets (oetas) : no

Alignment output customisation:
TEXT characters per line (tcpl) : 60
HTML characters per line (hcpl) : 60
TEXT end gap fill character (tegfc) :
HTML end gap fill character (hegfc) :

File / directory output names:
CAF : initial_mapping_testpool-to_ref_out.caf
MAF : initial_mapping_testpool-to_ref_out.maf
FASTA : initial_mapping_testpool-to_ref_out.unpadded.fasta
FASTA quality : initial_mapping_testpool-to_ref_out.unpadded.fasta.qual
FASTA (padded) : initial_mapping_testpool-to_ref_out.padded.fasta
FASTA qual.(pad): initial_mapping_testpool-to_ref_out.padded.fasta.qual
GAP4 (directory): initial_mapping_testpool-to_ref_out.gap4da
ACE : initial_mapping_testpool-to_ref_out.ace
HTML : initial_mapping_testpool-to_ref_out.html
Simple text : initial_mapping_testpool-to_ref_out.txt
TCS overview : initial_mapping_testpool-to_ref_out.tcs
Wiggle : initial_mapping_testpool-to_ref_out.wig
Deleting old directory initial_mapping_testpool-to_ref_assembly ... done.
Creating directory initial_mapping_testpool-to_ref_assembly ... done.
Creating directory initial_mapping_testpool-to_ref_assembly/initial_mapping_testpool-to_ref_d_results ... done.
Creating directory initial_mapping_testpool-to_ref_assembly/initial_mapping_testpool-to_ref_d_info ... done.
Creating directory initial_mapping_testpool-to_ref_assembly/initial_mapping_testpool-to_ref_d_chkpt ... done.
Creating directory initial_mapping_testpool-to_ref_assembly/initial_mapping_testpool-to_ref_d_tmp ... done.

Tmp directory is not on a NFS mount, good.

Localtime: Tue May 18 15:11:49 2021

Loading reference backbone from P08H-3-Mito_circ_reoriented.fa type fa
Localtime: Tue May 18 15:11:49 2021
Loading data from FASTA file:
[0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%]
Localtime: Tue May 18 15:11:49 2021
rnm size: 0
No FASTA quality file given, using default qualities for all reads just loaded.
Localtime: Tue May 18 15:11:49 2021

Done.
Loaded 1 reads with 0 reads having quality accounted for.
Loading reads from subset_reads1.fastq type fastq
Localtime: Tue May 18 15:11:49 2021
Loading data from FASTQ file: subset_reads1.fastq
(sorry, no progress indicator for that, possible only with zlib >=1.34)

Done.
Loaded 5 reads, Localtime: Tue May 18 15:11:49 2021
Looking at FASTQ type ... guessing FASTQ-33 (Sanger)
Running quality values adaptation ... done.
Loading reads from subset_reads2.fastq type fastq
Localtime: Tue May 18 15:11:49 2021
Loading data from FASTQ file: subset_reads2.fastq
(sorry, no progress indicator for that, possible only with zlib >=1.34)

Done.
Loaded 5 reads, Localtime: Tue May 18 15:11:49 2021
Looking at FASTQ type ... guessing FASTQ-33 (Sanger)
Running quality values adaptation ... done.
Deleting gap columns in backbones ... Postprocessing backbone(s) ... this may take a while.
1 to process
P08H-3-Mito_bb 16386
Contig P08H-3-Mito_bb has strain AlvCau
TODO: Like Readpool: strain x has y reads
Checking reads for trace data (loading qualities if needed):
[0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%]
No SCF data present in any read, EdIt automatic contig editing for Sanger data is now switched off.
11 reads with valid data for assembly.
Localtime: Tue May 18 15:11:49 2021

Generated 11 unique DNA template ids for 11 valid reads.
No useful template information found.
TODO: Like Readpool: strain x has y reads
Have read pool with 11 reads.

===========================================================================
Pool statistics:
Backbones: 1 Backbone rails: 0
Sanger  454 IonTor  PcBioHQ PcBioLQ Text    Solexa  SOLiD
------------------------------------------------------------
Total reads 0 0 0 0 0 0 10 0
Reads wo qual 0 0 0 0 0 0 0 0
Used reads 0 0 0 0 0 0 10 0
Avg tot rlen 0 0 0 0 0 0 86 0
Avg rlen used 0 0 0 0 0 0 86 0
W/o clips 0 0 0 0 0 0 10 0

Solexa total bases: 864 used bases in used reads: 864

Checking pairs of readgroup 1 (named: ''): found 0
Checking pairs of readgroup 2 (named: 'reads'): found 0
WARNING: in the above readgroup, no read is paired although the manifest says there should be pairs. This is fishy!

In func: void Assembly::basicReadGroupChecks()
Throw message: MIRA found readgroups where pairs are expected but no read has a partner. See log above and then check your input please (either manifest file or data files loaded or segment_naming scheme).

========================== Memory self assessment ==============================
Running in 64 bit mode.

Could not read file /proc/meminfo

Could not read file /proc/self/status

Information on current assembly object:

AS_readpool: 11 reads.
AS_contigs: 0 contigs.
AS_bbcontigs: 1 contigs.
Mem used for reads: 192 (192 B)

Memory used in assembly structures:
Eff. Size Free cap. LostByAlign
AS_writtenskimhitsperid: 0 24 B 0 B 0 B
AS_skim_edges: 0 24 B 0 B 0 B
AS_adsfacts: 0 24 B 0 B 0 B
AS_confirmed_edges: 0 24 B 0 B 0 B
AS_permanent_overlap_bans: 1 24 B 0 B 0 B
AS_readhitmiss: 0 24 B 0 B 0 B
AS_readhmcovered: 0 24 B 0 B 0 B
AS_count_rhm: 0 24 B 0 B 0 B
AS_clipleft: 0 24 B 0 B 0 B
AS_clipright: 0 24 B 0 B 0 B
AS_used_ids: 0 24 B 0 B 0 B
AS_multicopies: 0 24 B 0 B 0 B
AS_hasmcoverlaps: 0 24 B 0 B 0 B
AS_maxcoveragereached: 0 24 B 0 B 0 B
AS_coverageperseqtype: 0 24 B 0 B 0 B
AS_istroublemaker: 0 24 B 0 B 0 B
AS_isdebris: 0 24 B 0 B 0 B
AS_needalloverlaps: 0 24 B 0 B 0 B
AS_readsforrepeatresolve: 0 40 B 0 B 0 B
AS_allrmbsok: 0 24 B 0 B 0 B
AS_probablermbsnotok: 0 24 B 0 B 0 B
AS_weakrmbsnotok: 0 24 B 0 B 0 B
AS_readmaytakeskim: 0 40 B 0 B 0 B
AS_skimstaken: 0 40 B 0 B 0 B
AS_numskimoverlaps: 0 24 B 0 B 0 B
AS_numleftextendskims: 0 24 B 0 B 0 B
AS_rightextendskims: 0 24 B 0 B 0 B
AS_skimleftextendratio: 0 24 B 0 B 0 B
AS_skimrightextendratio: 0 24 B 0 B 0 B
AS_usedtmpfiles: 1 48 B 0 B 0 B
Total: 984 (984 B)

================================================================================
Dynamic s allocs: 0
Dynamic m allocs: 0
Align allocs: 0

Fatal error (may be due to problems of the input data or parameters):

MIRA found readgroups where pairs are expected but no read has a partner. *
See log above and then check your input please (either manifest file or data *
files loaded or segment_naming scheme). *
->Thrown: void Assembly::basicReadGroupChecks()
->Caught: main

Aborting process, probably due to error in the input data or parametrisation.
Please check the output log for more information.
For help, please write a mail to the mira talk mailing list.
Subscribing / unsubscribing to mira talk, see: http://www.freelists.org/list/mira_talk http://www.freelists.org/list/mira_talk
CWD: /Users/maeperez/Desktop/Bioinf_softwares/MITObim/AlvCau
Thank you for noticing that this is NOT a crash, but a
controlled program stop.
Failure, wrapped MIRA process aborted.

Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/mira-assembler/tickets/44/ https://sourceforge.net/p/mira-assembler/tickets/44/
To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/ https://sourceforge.net/auth/subscriptions/

Tickets: #44

alternate

MIRA 4 cannot recognise paired reads

Version

Searches

Help

#44 MIRA 4 cannot recognise paired reads

-CL:pec and -CO:emeas1clpec are set, setting -CO:emea values to 1.

Solexa total bases: 864 used bases in used reads: 864

Could not read file /proc/meminfo

Could not read file /proc/self/status

Related

Discussion

Related