From: David N. <dav...@gm...> - 2011-01-18 16:30:02
|
Hello Noboru, Ah, that explains it. I think I'm seeing what your seeing 878 peaks using your settings for the EmpFDR and log2Ratio with the full dataset and 120990 with the subsampled dataset. I should say though that 5 is a very relaxed threshold, at minimum I would use 13 (5%) or 20 (1%). I would also always use the qValFDR as well. When the qValFDR and EmpFDR differ then something is off. So better setting would be -I 1,2,4 -s 13,13,1 . Unfortunately no regions pass these thresholds with either the full or the subsampled data. When this is the case I would suggest using the -n option to generate the top 100 peaks and then carefully examine them in IGB and check those that look real by qPCR. The reason the empFDR is wonky with the subsampled data is because this test is based in large part on the input data. The input data is split in 1/2 and a comparison is made between input1 vs input2. When these halves get small odd behavior is observed. I'd definitely recommend more reads. There does appear to be some real signal in the data but the low number (3.4M for the chIP, 12.9M input) is limiting the sensitivity of the apps. We recommend at minimum 10M dup free unique alignments for the chIP and 20M for the input. Most folks are pushing this to 20M for the chIP and 20-40M for the Input. -cheers, D On 1/10/11 2:54 PM, "Noboru Jo Sakabe" <ns...@uc...> wrote: > Hi David, thanks for checking my data. > It seems the difference is because I filtered by EmpFDR, not Qvalue. > I reran Useq now and when I filter with > > -i 2,4 -s 5,1 > > I get tens of thousands of peaks, but with > > -i 1,4 -s 20,1 > > I get ten peaks. > > Can you confirm this by filtering windows by EmpFDR? > The peaks make some biological sense, overall (GO, conservation). So > it's not like the peaks found are noise. The IP is not good, but there > is some signal there. So I tend to trust the peaks that Useq is giving > me. But, please, I would appreciate if you can comment on this! > If you don't mind sharing, in your experience, how unusual is > something like this? I mean, a sample that has some signal, but because > it's weak, one needs to give it a good shake to get something? Are the > samples you analyze consistently better than this? I'm not the wet lab > person, so I won't feel offended with your criticism ;-) > Thanks again. > > noboru > > > David Nix wrote: >> Hello Noboru, >> >> I'm not seeing the increase in the number of regions when you subsample the >> input control to match the chIP sample. >> >> I see 283 regions with the full control and 18 with the matched control when >> thresholding using a qvalue of 20 (0.01) and a log2Ratio of 1 (2x). >> >> Here's what I did: >> >> 1) Run Tag2Point to convert your bed datasets to binary PointData >> 2) Run the PointDataManipulator to filter out duplicate reads. Both datasets >> look good with 94% unique >> 3) Run ScanSeqs to window scan your data >> 4) Run EnrichedRegionMaker to collapse overlapping windows that exceed the >> above thresholds into a list of putative peaks. >> >> For the reduced control dataset, I used the SubSamplePointData to randomly >> toss duplicate filtered input PointData to 3398890 and then ran ScanSeqs and >> the EnrichedRegionMaker. >> >> I wonder where the discrepancy occurred? I've attached the two spread sheet >> results from the EnrichedRegionMaker. >> >> -cheers, D >> >> >> > > ------------------------------------------------------------------------------ > Gaining the trust of online customers is vital for the success of any company > that requires sensitive data to be transmitted over the Web. Learn how to > best implement a security strategy that keeps consumers' information secure > and instills the confidence they need to proceed with transactions. > http://p.sf.net/sfu/oracle-sfdevnl > _______________________________________________ > Useq-users mailing list > Use...@li... > https://lists.sourceforge.net/lists/listinfo/useq-users |