The metassembler relies on the mate pair library to decide which assembly
is correct when there are any conflicts between them: given a conflict it
will pick the version that make the mate pair orientation and separation
match the expected library characteristics. That being said, with your
paired end library you can do many of the same tests, just that its power
to jump over repeats is reduced compared to a larger library. So you can
use just the paired end reads, just make sure to tell the metassembler that
the orientation of the reads are "innies" rather than "outies" as you would
get from a standard mate pair library. (You could also flip the reads as
necessary). Id also recommend that you trim the reads to ~50bp to improve
the odds the reads will align near the ends of the contigs (which is where
the conflicts almost always occur). Finally, I recommend you start with a
partial dataset to confirm everything is set correctly before doing a full
run (all see the metassembler supplemental materials for the exact settings
we used on the larger bird and snake genomes)
Longer term we have been thinking about way to apply the metassembler
without mate pair libraries looking at other features of the data or other
datatypes, but realistically that is some 6 months to a year away from
having any code available.
Does metassembler only work with mate-pair libraries, or will it work with illumina shotgun paired end reads?
Last edit: Gregory Rice 2016-10-19
Hi Gregory,
The metassembler relies on the mate pair library to decide which assembly
is correct when there are any conflicts between them: given a conflict it
will pick the version that make the mate pair orientation and separation
match the expected library characteristics. That being said, with your
paired end library you can do many of the same tests, just that its power
to jump over repeats is reduced compared to a larger library. So you can
use just the paired end reads, just make sure to tell the metassembler that
the orientation of the reads are "innies" rather than "outies" as you would
get from a standard mate pair library. (You could also flip the reads as
necessary). Id also recommend that you trim the reads to ~50bp to improve
the odds the reads will align near the ends of the contigs (which is where
the conflicts almost always occur). Finally, I recommend you start with a
partial dataset to confirm everything is set correctly before doing a full
run (all see the metassembler supplemental materials for the exact settings
we used on the larger bird and snake genomes)
Longer term we have been thinking about way to apply the metassembler
without mate pair libraries looking at other features of the data or other
datatypes, but realistically that is some 6 months to a year away from
having any code available.
Cheers,
Mike
On Wed, Oct 19, 2016 at 11:56 AM, Gregory Rice gregorykrice@users.sf.net
wrote: