From: David N. <Dav...@hc...> - 2013-11-20 18:54:25
|
Good point Sue. See the novoalign docs too. The -u setting varies by species. Note, it worthwhile piloting what setting are appropriate, e.g. -b2 or -b4. The additional compute time is considerable with -b4 and isn't needed for standard Illumina libraries run on the HiSeq or MiSeq. It's these amplicon/ pcr derived libraries that need special processing. -cheers, D On Nov 20, 2013, at 11:33 AM, Sue Hammoud <Sue...@hc...<mailto:Sue...@hc...>> wrote: Hi Everyone, Also when aligning miseq data make sure to you use –u12 in addition to the -b4. Best Sue From: David Nix <Dav...@hc...<mailto:Dav...@hc...>> Date: Wednesday, November 20, 2013 11:30 AM To: Sue Hammoud <sue...@hc...<mailto:sue...@hc...>>, Magdalena Potok <Mag...@hc...<mailto:Mag...@hc...>>, Brad Cairns <Bra...@hc...<mailto:Bra...@hc...>>, Bushra Gorsi <bg...@ge...<mailto:bg...@ge...>>, Somaye Dehghanizadeh <Som...@hc...<mailto:Som...@hc...>> Cc: "bio...@ut...<mailto:bio...@ut...>" <bio...@ut...<mailto:bio...@ut...>>, USeq <use...@li...<mailto:use...@li...>> Subject: Important USeq_8.7.0 update for bisulfite data analysis Hello Folks, I've posted a new useq release that contains a critical patch for bisulfite sequencing analysis derived from amplicon based library preps. These need to be aligned with the -b 4 setting to maximize the number of aligned reads and then processed through the new NovoalignBisufiteParser 8.7.0 . See http://useq.sourceforge.net/usageBisSeq.html The old NBP won't correctly parse these new alignment types. I have also corrected an issue with the merging of paired overlapping alignments. Some of these were being skipped and double counting of overlapping paired reads occurred. Note the later fix changes the counts but not the percents of methylation. See below for an illustration on a Hg19 sperm dataset. So moving forward, pay very close attention to amplicon -b4 aligned and parsed datasets. Visually inspect the base fraction methylation and converted and non converted point datasets along side the duplicate filtered bam alignment tracks in IGB. Note any discrepancies. This is a novel complex datatype that warrants close scrutiny. -cheers, David #### New NovoalignBisulfiteParser Stats Filtering statistics for 174477814 alignments: 5621028 Failed mapping quality score (13.0) 2192225 Failed alignment score (300.0) 1331036 Aligned to phiX or adapters 0 Failed vendor QC 55078179 Are unmapped 110255346 Passed filters (63.2%) 88083944 Total non-converted Cs sequenced 1504140872 Total converted Cs sequenced 0.055 Fraction non converted C's. 0.998 Fraction bp passing quality (13) 2164389480 BPs overlapping paired sequence 10644938422 BPs paired sequence 0.203 Fraction overlapping bps from paired reads. #### Old NovoalignBisulfiteParser Stats Filtering statistics for 174477814 alignments: 5621028 Failed mapping quality score (13.0) 2192225 Failed alignment score (300.0) 1331036 Aligned to phiX or adapters 0 Failed vendor QC 55078179 Are unmapped 110255346 Passed filters (63.2%) 94996528 Total non-converted Cs sequenced 1640576210 Total converted Cs sequenced 0.055 Fraction non converted C's. 0.998 Fraction bp passing quality (13) 2164389480 BPs overlapping paired sequence 10644938422 BPs paired sequence 0.203 Fraction overlapping bps from paired reads. #### New NBP run through BisStat Using Lambda data to set the expected fraction non-converted Cs to 0.00123 (16257/(16257+13221619)) Stats based on aligned genomic contexts that meet a minimum FDR threshold of 20.0. WARNING: datasets must be subsampled to the same bp aligned for these stats to be cross dataset comparable. 0.978 (12631645/12919931) mCG/mC 0.007 (89908/12919931) mCHG/mC 0.015 (198378/12919931) mCHH/mC Stats based on cumulative sums of all read sequences, no thresholds: 0.968 (85257586/88067661) mCG/mC 0.010 (875639/88067661) mCHG/mC 0.022 (1934436/88067661) mCHH/mC 0.059 (88067661/1490918281) mC/C 3.809 (85257586/22383883) mCG/CG 0.002 (875639/459687208) mCHG/CHG 0.002 (1934436/1008847190) mCHH/CHH 0.056 (88067661/1578985942) mC/(C+mC) 0.792 (85257586/107641469) mCG/(CG+mCG) 0.002 (875639/460562847) mCHG/(CHG+mCHG) 0.002 (1934436/1010781626) mCHH/(CHH+mCHH) #### Old NBP run through BisStat Using Lambda data to set the expected fraction non-converted Cs to 0.00120 (15986/(15986+13272652)) Stats based on aligned genomic contexts that meet a minimum FDR threshold of 20.0. WARNING: datasets must be subsampled to the same bp aligned for these stats to be cross dataset comparable. 0.966 (14109977/14599772) mCG/mC 0.010 (147396/14599772) mCHG/mC 0.023 (342399/14599772) mCHH/mC Stats based on cumulative sums of all read sequences, no thresholds: 0.969 (92009703/94980515) mCG/mC 0.010 (935216/94980515) mCHG/mC 0.021 (2035596/94980515) mCHH/mC 0.058 (94980515/1627302558) mC/C 3.842 (92009703/23946641) mCG/CG 0.002 (935216/497817018) mCHG/CHG 0.002 (2035596/1105538899) mCHH/CHH 0.055 (94980515/1722283073) mC/(C+mC) 0.793 (92009703/115956344) mCG/(CG+mCG) 0.002 (935216/498752234) mCHG/(CHG+mCHG) 0.002 (2035596/1107574495) mCHH/(CHH+mCHH) |