Proper construction of 454 pseudo-reads

  • elmaccco

    Hi all,

    Since most assemblers (including Phrap and Gap4) aren't adapted to 454 reads, a well-known strategy to make Sanger/454 hybrid-assemblies is to construct pseudo-reads from the Newbler's 454 assembly and then phrap them together with the sanger reads.

    The way I've done this is to chop up the Newbler contigs into 800bp fragments, including their confidence values, with overlaps of 100bps. My script's (available upon request) input file is a 454 CAF file (easily converted from ACE format with CAF-TOOLS), and output two multi-fasta files with the pseudo-reads' sequence in one and quality in the other. These are subsequently converted to exp files using the script by Andrew (see thread ).

    This is a quite straight-forward approach, which I however think can be done in a more correct way, as described by Goldberg et al. (PNAS 2006). They also took two other factors into consideration:

    1. The deeper the coverage of 454 reads in the Newbler assembly, the bigger the overlaps between pseudo-reads, in order to emulate coverage depth.

    2. By default, the bases of the pseudo-reads have lower quality scores than the sanger reads, due to the inherently lower quality of 454 reads (personally I'm not sure if this also applies to the newer FLX sequencers...)

    I'd be very grateful to hear any comments on this approach, and especially if there are any developed scripts/methods to accomplish it!

    Many thanks!