MIRA is a whole genome shotgun and EST sequence assembler for Sanger, 454, Solexa (Illumina), IonTorrent data and PacBio (the later at the moment only CCS and error-corrected CLR reads). It can be seen as a Swiss army knife of sequence assembly developed and used in the past 12 years to get assembly jobs done efficiently - and especially accurately. That is, without actually putting too much manual work into finishing the assembly.
- Visit the project page on SourceForge
- Download the latest release
- Read about some Features of MIRA 3
- See some interesting features of MIRA3 explained in action
- Found a bug or have a feature request? File it here and also send a quick mail to the MIRA talk mailing list
- Need help? Want to discuss something with other users? There are mailing lists for MIRA
The most up-to-date documentation is also always available in the binary distribution packages.
- Whole documentation, stable version: HTML, one file (950 KiB + 3 MiB images)
- Whole documentation, stable version: PDF, one file (4 MiB)
- Whole documentation, development version: HTML, one file (950 KiB + 3 MiB images)
- Scaffolding with BAMBUS Instructions for scaffolding MIRA contigs & paired-end data with BAMBUS. Written by Gregory Harhay, USDA.
MIRA started in 1997 as a PhD project at the German Cancer Research Centre in Heidelberg (Deutsches Krebsforschungszentrum Heidelberg). Binaries were always distributed publicly and over time, other labs and sequencing providers have found MIRA useful for assembly of extremely 'unfriendly' projects containing lots of repetitive sequences (as always, your mileage may vary).
In 2007 I (Bastien Chevreux) asked the DKFZ for the permission to put MIRA under an open source license ... and got it.
Hybrid de-novo assemblies with Sanger, 454, Illumina / Solexa, IonTorrent and PacBio
MIRA 3 is able to perform true hybrid de-novo assemblies using reads gathered through Sanger, 454, Solexa, IonTorrent or PacBio sequencing technologies. That is, it assembles reads instead of a mix of (eventually shredded) consensus sequence and reads. See an example on how it looks like for Sanger and 454 in the documentation introduction, but it also works with any other combination of sequencing technologies. Only restriction at the moment: reads must be <= 15 kilobases and for PacBio, MIRA must get CCS reads or error-corrected CLR data.
An often used combination of current sequencing technologies is a mix of de-novo 454 assembly and Solexa mapping assemblies: 454 to get long contigs built, Solexa to get rid of the pesky 454 homopolymer problems. Here's the recipe I use for sequencing a bacterium de-novo and almost perfectly for comparatively little money:
- get your DNA sequenced at ~20x coverage (30x is even better) with 454 Titanium
- get the very same DNA sequenced at ~30-40x coverage with Solexa (76 or more base pairs)
- put the sequences of 454 and Solexa (and Sanger, if you have) into MIRA
- wait over night for the result
- add half a day or so for prettifying the resulting assembly and check remaining uncertainties (if you really want to) with gap4
Granted, there may be a few more steps ... but that's basically it.
Automatic sequence editors
MIRA contains integrated editors for Sanger and 454 sequences which iteratively remove many sequencing errors from the assembly project and improve the overal alignment quality.
SNP and mutations discovery for mapping assemblies
MIRA 3 can also be used for mapping assemblies and automatic tagging of difference site (SNPs, insertions or deletions) of mutant strains against a reference sequence.
For organisms without exon/intron gene structure (bacteria, viruses etc.) and where annotated files in GenBank format are available, MIRA can generate tables which are ready to use for biologists as they show exactly which genes are hit and give a first estimate whether the function of the protein is attained by the change.
- Chevreux et al. (1999) Genome Sequence Assembly Using Trace Signals and Additional Sequence Information Computer Science and Biology: Proceedings of the German Conference on Bioinformatics (GCB) 99, pp. 45-56. There are also slides for the talk held at the conference, get them here.
- Chevreux et al. (2004) Using the miraEST Assembler for Reliable and Automated mRNA Transcript Assembly and SNP Detection in Sequenced ESTs Genome Research 2004. 14:1147-1159.
- Chevreux et al. (1998) Computer assisted editing of genomic sequences poster at the "Genome and Proteomics 98" in Heidelberg and the "German Conference on Bioinformatics" GCB 98 in Cologne (english)
- Chevreux et al. (1999) MIRA & EdIt poster presented at the ISMB 99 conference in Heidelberg and the "German Conference on Bioinformatics" GCB 99 in Hannover
- 1997 to 2000: Bundesministerium für Bildung, Wissenschaft, Forschung und Technologie by grant number 01 KW 9611