You can subscribe to this list here.
2010 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
(4) |
Jun
(2) |
Jul
(3) |
Aug
(3) |
Sep
(5) |
Oct
(2) |
Nov
(4) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2011 |
Jan
(12) |
Feb
|
Mar
(5) |
Apr
(6) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
(15) |
Nov
(3) |
Dec
|
2012 |
Jan
|
Feb
(7) |
Mar
(3) |
Apr
(17) |
May
(5) |
Jun
|
Jul
(5) |
Aug
(1) |
Sep
(2) |
Oct
(3) |
Nov
(2) |
Dec
(1) |
2013 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(4) |
Jun
|
Jul
(1) |
Aug
|
Sep
(2) |
Oct
(2) |
Nov
(2) |
Dec
(2) |
2014 |
Jan
|
Feb
(2) |
Mar
(9) |
Apr
(2) |
May
|
Jun
(2) |
Jul
(1) |
Aug
(1) |
Sep
|
Oct
|
Nov
(1) |
Dec
(1) |
2015 |
Jan
|
Feb
|
Mar
(4) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: David N. <dav...@gm...> - 2011-04-19 19:05:24
|
Hello Noboru, Good suggestion. I'll put in a warning. You're right many of the apps will think you just want to do a one sample, simple count, analysis. -cheers, D On 4/11/11 2:31 PM, "Noboru Jo Sakabe" <ns...@uc...> wrote: > Hi David, I noticed that when I give a non-existing directory >containing input data (typed the wrong name), USeq does not complain, >and rather, goes on without computing FDR. > I would like to suggest that USeq halts with an error in this case, >or at least issues a warning. > Thanks! > >Noboru >-------------------------------------------------------------------------- >---- >Forrester Wave Report - Recovery time is now measured in hours and minutes >not days. Key insights are discussed in the 2010 Forrester Wave Report as >part of an in-depth evaluation of disaster recovery service providers. >Forrester found the best-in-class provider in terms of services and >vision. >Read this report now! >http://p.sf.net/sfu/ibm-webcastpromo______________________________________ >_________ >Useq-users mailing list >Use...@li... >https://lists.sourceforge.net/lists/listinfo/useq-users |
From: Noboru Jo S. <ns...@uc...> - 2011-04-11 20:32:51
|
Hi David, I noticed that when I give a non-existing directory containing input data (typed the wrong name), USeq does not complain, and rather, goes on without computing FDR. I would like to suggest that USeq halts with an error in this case, or at least issues a warning. Thanks! Noboru |
From: Noboru Jo S. <ns...@uc...> - 2011-04-06 16:15:47
|
Thanks, David. Indeed, that computer didn't have DESeq installed. Silly mistake, I have DESeq in other computers, but not in that one with more memory. Maybe you could add some kind of test and warn the user when this error occurs so you prevent silly users like me from bothering you? Thanks, again. Noboru David Nix wrote: > Looks like DESeq isn't generating a results file (this is the analysis > engine for MRSS, SS uses something simpler). > > Take a look at the temp files in your save directory. One should contain > the error that R/DESeq threw. > > -cheers, D > > > |
From: David N. <dav...@gm...> - 2011-04-06 12:14:01
|
Looks like DESeq isn't generating a results file (this is the analysis engine for MRSS, SS uses something simpler). Take a look at the temp files in your save directory. One should contain the error that R/DESeq threw. -cheers, D On 4/5/11 4:58 PM, "Noboru Jo Sakabe" <ns...@uc...> wrote: > java -Xmx8G -jar > /cchome/nsakabe/chip-seq/useq/USeq_7.6.9/Apps/MultipleReplicaScanSeqs -c > /cchome/nsakabe/chip-seq/useq/input.merged/useq_pointData_satellites_rmsk_Filt > 0bp/ > -s results -p 100 -w 200 -r /apps/rh5_64/R -t > ../h3k27me3.left_ventricle_1/useq_pointData_satellites_rmsk_Filt0bp/ > > Hi David, I am trying to run MRSS, but I'm getting an error. > When I use the same command line, but instead of MRSS, calling > ScanSeqs, it works fine. The problem seems to be with MRSS. Do you have > any idea what might be going on? > > Thank you. > > java -Xmx8G -jar ~/useq/USeq_7.6.9/Apps/MultipleReplicaScanSeqs -c > ../input/useq_pointData_satellites_rmsk_Filt0bp/ -s results -p 100 -w > 200 -r /apps/rh5_64/R -t ../h3k27me3/useq_pointData_satellites_rmsk_Filt0bp/ > > > > Arguments: -c ../input/useq_pointData_satellites_rmsk_Filt0bp/ -s > results -p 100 -w 200 -r /apps/rh5_64/R -t > ../h3k27me3/useq_pointData_satellites_rmsk_Filt0bp/ > > Calculating read count stats... > 19985712 Treatment Observations > 36594184 Control Observations > 100 Peak shift > 200 Window size > 0.5 Minimum Window FDR > Scanning chromosomes...................... > Calculating negative binomial p-values and FDRs in R using DESeq > (http://www-huber.embl.de/users/anders/DESeq/). This requires patience, > 64bit R, and > 6-8G RAM... > java.io.IOException: > R results file(s) doesn't exist. Check temp files in save directory for > error. > > at > edu.utah.seq.analysis.MultipleReplicaScanSeqs.executeDESeq(MultipleReplicaScan > Seqs.java:353) > at > edu.utah.seq.analysis.MultipleReplicaScanSeqs.run(MultipleReplicaScanSeqs.java > :146) > at > edu.utah.seq.analysis.MultipleReplicaScanSeqs.<init>(MultipleReplicaScanSeqs.j > ava:80) > at > edu.utah.seq.analysis.MultipleReplicaScanSeqs.main(MultipleReplicaScanSeqs.jav > a:682) > > ------------------------------------------------------------------------------ > Xperia(TM) PLAY > It's a major breakthrough. An authentic gaming > smartphone on the nation's most reliable network. > And it wants your games. > http://p.sf.net/sfu/verizon-sfdev > _______________________________________________ > Useq-users mailing list > Use...@li... > https://lists.sourceforge.net/lists/listinfo/useq-users |
From: Noboru Jo S. <ns...@uc...> - 2011-04-05 23:00:19
|
java -Xmx8G -jar /cchome/nsakabe/chip-seq/useq/USeq_7.6.9/Apps/MultipleReplicaScanSeqs -c /cchome/nsakabe/chip-seq/useq/input.merged/useq_pointData_satellites_rmsk_Filt0bp/ -s results -p 100 -w 200 -r /apps/rh5_64/R -t ../h3k27me3.left_ventricle_1/useq_pointData_satellites_rmsk_Filt0bp/ Hi David, I am trying to run MRSS, but I'm getting an error. When I use the same command line, but instead of MRSS, calling ScanSeqs, it works fine. The problem seems to be with MRSS. Do you have any idea what might be going on? Thank you. java -Xmx8G -jar ~/useq/USeq_7.6.9/Apps/MultipleReplicaScanSeqs -c ../input/useq_pointData_satellites_rmsk_Filt0bp/ -s results -p 100 -w 200 -r /apps/rh5_64/R -t ../h3k27me3/useq_pointData_satellites_rmsk_Filt0bp/ Arguments: -c ../input/useq_pointData_satellites_rmsk_Filt0bp/ -s results -p 100 -w 200 -r /apps/rh5_64/R -t ../h3k27me3/useq_pointData_satellites_rmsk_Filt0bp/ Calculating read count stats... 19985712 Treatment Observations 36594184 Control Observations 100 Peak shift 200 Window size 0.5 Minimum Window FDR Scanning chromosomes...................... Calculating negative binomial p-values and FDRs in R using DESeq (http://www-huber.embl.de/users/anders/DESeq/). This requires patience, 64bit R, and > 6-8G RAM... java.io.IOException: R results file(s) doesn't exist. Check temp files in save directory for error. at edu.utah.seq.analysis.MultipleReplicaScanSeqs.executeDESeq(MultipleReplicaScanSeqs.java:353) at edu.utah.seq.analysis.MultipleReplicaScanSeqs.run(MultipleReplicaScanSeqs.java:146) at edu.utah.seq.analysis.MultipleReplicaScanSeqs.<init>(MultipleReplicaScanSeqs.java:80) at edu.utah.seq.analysis.MultipleReplicaScanSeqs.main(MultipleReplicaScanSeqs.java:682) |
From: David N. <dav...@gm...> - 2011-04-05 20:44:50
|
Hello Noboru, The simple answer is because they use different methods with different filters. Calling strong peaks is easy, accurately calling weak peaks is difficult. Furthermore, FDR estimations are often quite variable so comparing list at a single FDR of say 1% will be misleading. The error is often +/- 2-10 fold! -cheers, D On 3/23/11 6:04 PM, "Noboru Jo Sakabe" <ns...@uc...> wrote: > Hi David, would you comment on why running different peak callers on > the same data results in different peaks? > I noticed that for PolII, I didn't have this problem. But for TFs > that, I assume, have weaker enrichment, this seems to be common. I just > ran Useq on published data and I obtained low overlap, ~20%, at 1bp. > But, both sets make biological sense (although GO for Useq peaks/genes > gave me fewer terms for the tissue), peaks are generally conserved, > overrepresentation of peaks on the TSS, expression levels on the tissue > make sense, peaks cluster as a sharp peak around the TSS. > I always assumed that when the IP is not highly enriched, such > variation occurs. Maybe you can say something else based on your experience? > I mean, is this something we have to live with, or is it something > wrong I'm doing and I should see higher overlap? > Thanks a lot! > > noboru > ------------------------------------------------------------------------------ > Enable your software for Intel(R) Active Management Technology to meet the > growing manageability and security demands of your customers. Businesses > are taking advantage of Intel(R) vPro (TM) technology - will your software > be a part of the solution? Download the Intel(R) Manageability Checker > today! http://p.sf.net/sfu/intel-dev2devmar > _______________________________________________ > Useq-users mailing list > Use...@li... > https://lists.sourceforge.net/lists/listinfo/useq-users |
From: Noboru Jo S. <ns...@uc...> - 2011-03-24 00:03:04
|
Hi David, would you comment on why running different peak callers on the same data results in different peaks? I noticed that for PolII, I didn't have this problem. But for TFs that, I assume, have weaker enrichment, this seems to be common. I just ran Useq on published data and I obtained low overlap, ~20%, at 1bp. But, both sets make biological sense (although GO for Useq peaks/genes gave me fewer terms for the tissue), peaks are generally conserved, overrepresentation of peaks on the TSS, expression levels on the tissue make sense, peaks cluster as a sharp peak around the TSS. I always assumed that when the IP is not highly enriched, such variation occurs. Maybe you can say something else based on your experience? I mean, is this something we have to live with, or is it something wrong I'm doing and I should see higher overlap? Thanks a lot! noboru |
From: David N. <dav...@gm...> - 2011-03-15 21:40:52
|
Your read sizes shouldn't change thus its easy to get start and stop for each read when you know the center position. The ReadCoverage app does just this and calculates a per base read coverage. It throws a warning if it finds reads of different lengths. -cheers, D On 3/15/11 1:14 PM, "Andrew Oler" <And...@hc...> wrote: > Hi David, > > I have another question. For the NovoalignParser (and other parsers, such as > Tag2Point), the documentation says that it converts to "center position binary > point data", but to get a base-by-base read coverage, we would need start and > stop positions for each read, especially when reads have different sizes. Has > the format of the bar file been updated to include start and stop or is it > still center-based? If not, do you have a recommended method for getting > base-by-base read coverage? > > Thanks, > > Andrew > > Brad Cairns Lab > Huntsman Cancer Institute Room #4350 > University of Utah > 2000 Circle of Hope > Salt Lake City, UT 84112 > (801) 585-1823 > > From: David Nix <Dav...@hc...<mailto:Dav...@hc...>> > Date: Tue, 15 Mar 2011 10:36:01 -0600 > To: Andrew Oler <and...@hc...<mailto:and...@hc...>> > Subject: Re: Novo bis parser > > It means exactly what it says. Where there is an overlap in a pair of reads > from the same template it calls a consensus so that you don¹t double count the > same base pairs. > > Yes, 0.41 is a lot of overlap, the library insert size was too small and thus > your effectively cutting your data output by 40%. This should be < 5%. > > -cheers, D > > > On 3/14/11 10:15 AM, "Andrew Oler" <And...@hc...> wrote: > > Hi David, > > I'm running NBP and I was wondering if you could explain the stats at the end. > >> From the app description, what does this mean, exactly? "Flattens >> overlapping reads in a pair to call consensus bps." Does that mean to call >> consensus as to whether converted or not? If overlapping, then it should >> have a higher accuracy right? > > I got these stats at the end. > > 2353965845 BPs overlapping paired sequence > 5645468238 BPs paired sequence > 0.417 Fraction overlapping bps from paired reads. > > This means that I had a lot of overlapping reads, right? What do you usually > get for this number? > > Is there a way to get coverage statistics, e.g., what fraction of the genome > is covered at least 5-fold? Or coverage tracks? > > Thanks, > > Andrew > > Brad Cairns Lab > Huntsman Cancer Institute Room #4350 > University of Utah > 2000 Circle of Hope > Salt Lake City, UT 84112 > (801) 585-1823 > > > ------------------------------------------------------------------------------ > Colocation vs. Managed Hosting > A question and answer guide to determining the best fit > for your organization - today and in the future. > http://p.sf.net/sfu/internap-sfd2d > _______________________________________________ > Useq-users mailing list > Use...@li... > https://lists.sourceforge.net/lists/listinfo/useq-users |
From: Andrew O. <And...@hc...> - 2011-03-15 19:29:31
|
Hi David, I have another question. For the NovoalignParser (and other parsers, such as Tag2Point), the documentation says that it converts to "center position binary point data", but to get a base-by-base read coverage, we would need start and stop positions for each read, especially when reads have different sizes. Has the format of the bar file been updated to include start and stop or is it still center-based? If not, do you have a recommended method for getting base-by-base read coverage? Thanks, Andrew Brad Cairns Lab Huntsman Cancer Institute Room #4350 University of Utah 2000 Circle of Hope Salt Lake City, UT 84112 (801) 585-1823 From: David Nix <Dav...@hc...<mailto:Dav...@hc...>> Date: Tue, 15 Mar 2011 10:36:01 -0600 To: Andrew Oler <and...@hc...<mailto:and...@hc...>> Subject: Re: Novo bis parser It means exactly what it says. Where there is an overlap in a pair of reads from the same template it calls a consensus so that you don’t double count the same base pairs. Yes, 0.41 is a lot of overlap, the library insert size was too small and thus your effectively cutting your data output by 40%. This should be < 5%. -cheers, D On 3/14/11 10:15 AM, "Andrew Oler" <And...@hc...> wrote: Hi David, I'm running NBP and I was wondering if you could explain the stats at the end. >From the app description, what does this mean, exactly? "Flattens overlapping reads in a pair to call consensus bps." Does that mean to call consensus as to whether converted or not? If overlapping, then it should have a higher accuracy right? I got these stats at the end. 2353965845 BPs overlapping paired sequence 5645468238 BPs paired sequence 0.417 Fraction overlapping bps from paired reads. This means that I had a lot of overlapping reads, right? What do you usually get for this number? Is there a way to get coverage statistics, e.g., what fraction of the genome is covered at least 5-fold? Or coverage tracks? Thanks, Andrew Brad Cairns Lab Huntsman Cancer Institute Room #4350 University of Utah 2000 Circle of Hope Salt Lake City, UT 84112 (801) 585-1823 |
From: Noboru Jo S. <ns...@uc...> - 2011-03-02 16:19:55
|
Hi David, I sent a message to the list from my other email, you can ignore it. I was getting this error from ScanSeqs, but I found the culprit. It seems that, sometimes, random chr cause this error. It may be because treatment/control don't have a random chr and ScanSeqs can't find the data to pair and crashes. Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0 at edu.utah.seq.data.PointData.combinePointData(PointData.java:1142) at edu.utah.seq.data.PointData.combinePointData(PointData.java:1129) at edu.utah.seq.analysis.ScanSeqs.subSamplePointDataForEmpFDR(ScanSeqs.java:670) at edu.utah.seq.analysis.ScanSeqs.calculateBinomialPValsForEmpiricalFDRs(ScanSeqs.java:568) at edu.utah.seq.analysis.ScanSeqs.windowScanChromosome(ScanSeqs.java:278) at edu.utah.seq.analysis.ScanSeqs.scan(ScanSeqs.java:126) at edu.utah.seq.analysis.ScanSeqs.<init>(ScanSeqs.java:107) at edu.utah.seq.analysis.ScanSeqs.main(ScanSeqs.java:1307) |
From: Noboru Jo S. <nob...@gm...> - 2011-03-01 17:47:14
|
Hi David, I am getting this error from ScanSeqs. Do you know what it's about? Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0 at edu.utah.seq.data.PointData.combinePointData(PointData.java:1142) at edu.utah.seq.data.PointData.combinePointData(PointData.java:1129) at edu.utah.seq.analysis.ScanSeqs.subSamplePointDataForEmpFDR(ScanSeqs.java:670) at edu.utah.seq.analysis.ScanSeqs.calculateBinomialPValsForEmpiricalFDRs(ScanSeqs.java:568) at edu.utah.seq.analysis.ScanSeqs.windowScanChromosome(ScanSeqs.java:278) at edu.utah.seq.analysis.ScanSeqs.scan(ScanSeqs.java:126) at edu.utah.seq.analysis.ScanSeqs.<init>(ScanSeqs.java:107) at edu.utah.seq.analysis.ScanSeqs.main(ScanSeqs.java:1307) This is the command I'm issuing. java -Xmx5500M -jar USeq_7.1.2/Apps/ScanSeqs -t treatment/useq_pointData_satellites_rmsk_Filt0bp/ -c control/useq_pointData_satellites_rmsk_Filt0bp/ -s ./results -p 100 -w 200 I've never got this error before. It happened with a new input set. I was tempted to think that something went wrong when converting from .bed to Useq internal format, but Tag2Point seems to be reading coordinates correctly. .bed file is fine, as I was able to use it in other peak callers. FilterPointData also seems to be able to read the converted files. Stats: 23452020 Starting 23040045 Ending 411975 Difference Any ideas what may be wrong? Thanks! |
From: David N. <dav...@gm...> - 2011-01-24 21:37:02
|
Ah, thanks for catching that, I¹ll add this back into the next SourceForge release, 7.6.3. I should get that posted later today or tomorrow. As for your run, 7 reduced regions is rather a low number. Are you performing a chIP vs chIP or a chIP vs Input analysis? You can run the individual applications that the ChIPSeq app wraps, notably run the EnrichedRegionMaker with the n option to output a set number of peaks. Check out the tutorial/ user guide for details. -cheers, D On 1/24/11 2:07 PM, "Ketaki Bhide" <kb...@il...> wrote: > Hello, > I used ChipSeq application of Useq. I had following > questions regarding the same: > > Q. I was not able to set up the window size by typing in -w while > running ChipSeq application. > It gave me an error of -w is not an option. Please let me know right > parameter to set the window size > > Q.Eventually I want to compare the USeq ChipSeq's application with > some other software. > It becomes really hard to compare peaks for my data since Useq > application just gave 7 reduced regions with default > parameters while other sofware gave me several present on different > chromosomes. > Could you please let me know which parameters need to be changed to > get maximum number of peaks from USeq's ChipSeq application. > > > Regards > Ketaki Bhide > > > ------------------------------------------------------------------------------ > Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! > Finally, a world-class log management solution at an even better price-free! > Download using promo code Free_Logger_4_Dev2Dev. Offer expires > February 28th, so secure your free ArcSight Logger TODAY! > http://p.sf.net/sfu/arcsight-sfd2d > _______________________________________________ > Useq-users mailing list > Use...@li... > https://lists.sourceforge.net/lists/listinfo/useq-users |
From: Noboru Jo S. <ns...@uc...> - 2011-01-24 21:35:51
|
What app are you using? -w works fine for me when using ScanSeqs and MultipleReplicaScanSeqs. How did you select your peaks (q-value, etc)? In my experience, generally Useq finds more peaks than MACS and QuEST. I filter by empFDR (and now also by q-value, simultaneously). You will have to play with peak width and q-value/empFDR. What was the FDR for the other peak callers? Maybe your IP failed? How many reads you have? If it was < 10M, read the posts that David and I exchanged about balancing reads. Ketaki Bhide wrote: > Hello, > I used ChipSeq application of Useq. I had following > questions regarding the same: > > Q. I was not able to set up the window size by typing in -w while > running ChipSeq application. > It gave me an error of -w is not an option. Please let me know right > parameter to set the window size > > Q.Eventually I want to compare the USeq ChipSeq's application with > some other software. > It becomes really hard to compare peaks for my data since Useq > application just gave 7 reduced regions with default > parameters while other sofware gave me several present on different > chromosomes. > Could you please let me know which parameters need to be changed to > get maximum number of peaks from USeq's ChipSeq application. > > > Regards > Ketaki Bhide > > > ------------------------------------------------------------------------------ > Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! > Finally, a world-class log management solution at an even better price-free! > Download using promo code Free_Logger_4_Dev2Dev. Offer expires > February 28th, so secure your free ArcSight Logger TODAY! > http://p.sf.net/sfu/arcsight-sfd2d > _______________________________________________ > Useq-users mailing list > Use...@li... > https://lists.sourceforge.net/lists/listinfo/useq-users > |
From: Ketaki B. <kb...@il...> - 2011-01-24 21:27:32
|
Hello, I used ChipSeq application of Useq. I had following questions regarding the same: Q. I was not able to set up the window size by typing in -w while running ChipSeq application. It gave me an error of -w is not an option. Please let me know right parameter to set the window size Q.Eventually I want to compare the USeq ChipSeq's application with some other software. It becomes really hard to compare peaks for my data since Useq application just gave 7 reduced regions with default parameters while other sofware gave me several present on different chromosomes. Could you please let me know which parameters need to be changed to get maximum number of peaks from USeq's ChipSeq application. Regards Ketaki Bhide |
From: David N. <dav...@gm...> - 2011-01-18 16:35:16
|
Odd that a peak could have been called at the terminus? What's happening is that 1/2 the read length is added onto each end of the EnrichedRegion. This could push the boundary off into the never never. What I don't understand is how an alignment was made at the telomere where repetitive sequences dominate. Are you filtering for unique alignments? I wouldn't use any repeats in the analysis. Lots of aligners such as Bowtie will by default pick a random repeat to assign a repeat read. I'd recommend excluding these from typical chIP-seq analysis unless you're doing something unusual with heterochromatin. -cheers, D On 1/12/11 4:43 PM, "Noboru Jo Sakabe" <ns...@uc...> wrote: > Hi David, occasionally, Useq reports a peak ending outside a chr. > Is anybody else in this mailing list seeing this? > > I verified my input .bed file and the highest end coordinate for > chr19 is 61,342,406 (chr19 ends at 61,342,430), but Useq generated a > peak from 61,338,222 - 61,342,436. > Not a big deal, but maybe it's something easy to fix for the next > release. > > > ------------------------------------------------------------------------------ > Protect Your Site and Customers from Malware Attacks > Learn about various malware tactics and how to avoid them. Understand > malware threats, the impact they can have on your business, and how you > can protect your company and customers by using code signing. > http://p.sf.net/sfu/oracle-sfdevnl > _______________________________________________ > Useq-users mailing list > Use...@li... > https://lists.sourceforge.net/lists/listinfo/useq-users |
From: David N. <dav...@gm...> - 2011-01-18 16:30:02
|
Hello Noboru, Ah, that explains it. I think I'm seeing what your seeing 878 peaks using your settings for the EmpFDR and log2Ratio with the full dataset and 120990 with the subsampled dataset. I should say though that 5 is a very relaxed threshold, at minimum I would use 13 (5%) or 20 (1%). I would also always use the qValFDR as well. When the qValFDR and EmpFDR differ then something is off. So better setting would be -I 1,2,4 -s 13,13,1 . Unfortunately no regions pass these thresholds with either the full or the subsampled data. When this is the case I would suggest using the -n option to generate the top 100 peaks and then carefully examine them in IGB and check those that look real by qPCR. The reason the empFDR is wonky with the subsampled data is because this test is based in large part on the input data. The input data is split in 1/2 and a comparison is made between input1 vs input2. When these halves get small odd behavior is observed. I'd definitely recommend more reads. There does appear to be some real signal in the data but the low number (3.4M for the chIP, 12.9M input) is limiting the sensitivity of the apps. We recommend at minimum 10M dup free unique alignments for the chIP and 20M for the input. Most folks are pushing this to 20M for the chIP and 20-40M for the Input. -cheers, D On 1/10/11 2:54 PM, "Noboru Jo Sakabe" <ns...@uc...> wrote: > Hi David, thanks for checking my data. > It seems the difference is because I filtered by EmpFDR, not Qvalue. > I reran Useq now and when I filter with > > -i 2,4 -s 5,1 > > I get tens of thousands of peaks, but with > > -i 1,4 -s 20,1 > > I get ten peaks. > > Can you confirm this by filtering windows by EmpFDR? > The peaks make some biological sense, overall (GO, conservation). So > it's not like the peaks found are noise. The IP is not good, but there > is some signal there. So I tend to trust the peaks that Useq is giving > me. But, please, I would appreciate if you can comment on this! > If you don't mind sharing, in your experience, how unusual is > something like this? I mean, a sample that has some signal, but because > it's weak, one needs to give it a good shake to get something? Are the > samples you analyze consistently better than this? I'm not the wet lab > person, so I won't feel offended with your criticism ;-) > Thanks again. > > noboru > > > David Nix wrote: >> Hello Noboru, >> >> I'm not seeing the increase in the number of regions when you subsample the >> input control to match the chIP sample. >> >> I see 283 regions with the full control and 18 with the matched control when >> thresholding using a qvalue of 20 (0.01) and a log2Ratio of 1 (2x). >> >> Here's what I did: >> >> 1) Run Tag2Point to convert your bed datasets to binary PointData >> 2) Run the PointDataManipulator to filter out duplicate reads. Both datasets >> look good with 94% unique >> 3) Run ScanSeqs to window scan your data >> 4) Run EnrichedRegionMaker to collapse overlapping windows that exceed the >> above thresholds into a list of putative peaks. >> >> For the reduced control dataset, I used the SubSamplePointData to randomly >> toss duplicate filtered input PointData to 3398890 and then ran ScanSeqs and >> the EnrichedRegionMaker. >> >> I wonder where the discrepancy occurred? I've attached the two spread sheet >> results from the EnrichedRegionMaker. >> >> -cheers, D >> >> >> > > ------------------------------------------------------------------------------ > Gaining the trust of online customers is vital for the success of any company > that requires sensitive data to be transmitted over the Web. Learn how to > best implement a security strategy that keeps consumers' information secure > and instills the confidence they need to proceed with transactions. > http://p.sf.net/sfu/oracle-sfdevnl > _______________________________________________ > Useq-users mailing list > Use...@li... > https://lists.sourceforge.net/lists/listinfo/useq-users |
From: Noboru Jo S. <ns...@uc...> - 2011-01-12 23:36:35
|
Hi David, occasionally, Useq reports a peak ending outside a chr. Is anybody else in this mailing list seeing this? I verified my input .bed file and the highest end coordinate for chr19 is 61,342,406 (chr19 ends at 61,342,430), but Useq generated a peak from 61,338,222 - 61,342,436. Not a big deal, but maybe it's something easy to fix for the next release. |
From: Noboru Jo S. <ns...@uc...> - 2011-01-10 21:47:22
|
Hi David, thanks for checking my data. It seems the difference is because I filtered by EmpFDR, not Qvalue. I reran Useq now and when I filter with -i 2,4 -s 5,1 I get tens of thousands of peaks, but with -i 1,4 -s 20,1 I get ten peaks. Can you confirm this by filtering windows by EmpFDR? The peaks make some biological sense, overall (GO, conservation). So it's not like the peaks found are noise. The IP is not good, but there is some signal there. So I tend to trust the peaks that Useq is giving me. But, please, I would appreciate if you can comment on this! If you don't mind sharing, in your experience, how unusual is something like this? I mean, a sample that has some signal, but because it's weak, one needs to give it a good shake to get something? Are the samples you analyze consistently better than this? I'm not the wet lab person, so I won't feel offended with your criticism ;-) Thanks again. noboru David Nix wrote: > Hello Noboru, > > I'm not seeing the increase in the number of regions when you subsample the > input control to match the chIP sample. > > I see 283 regions with the full control and 18 with the matched control when > thresholding using a qvalue of 20 (0.01) and a log2Ratio of 1 (2x). > > Here's what I did: > > 1) Run Tag2Point to convert your bed datasets to binary PointData > 2) Run the PointDataManipulator to filter out duplicate reads. Both datasets > look good with 94% unique > 3) Run ScanSeqs to window scan your data > 4) Run EnrichedRegionMaker to collapse overlapping windows that exceed the > above thresholds into a list of putative peaks. > > For the reduced control dataset, I used the SubSamplePointData to randomly > toss duplicate filtered input PointData to 3398890 and then ran ScanSeqs and > the EnrichedRegionMaker. > > I wonder where the discrepancy occurred? I've attached the two spread sheet > results from the EnrichedRegionMaker. > > -cheers, D > > > |
From: David N. <dav...@gm...> - 2011-01-10 19:33:48
|
Hello Noboru, I'm not seeing the increase in the number of regions when you subsample the input control to match the chIP sample. I see 283 regions with the full control and 18 with the matched control when thresholding using a qvalue of 20 (0.01) and a log2Ratio of 1 (2x). Here's what I did: 1) Run Tag2Point to convert your bed datasets to binary PointData 2) Run the PointDataManipulator to filter out duplicate reads. Both datasets look good with 94% unique 3) Run ScanSeqs to window scan your data 4) Run EnrichedRegionMaker to collapse overlapping windows that exceed the above thresholds into a list of putative peaks. For the reduced control dataset, I used the SubSamplePointData to randomly toss duplicate filtered input PointData to 3398890 and then ran ScanSeqs and the EnrichedRegionMaker. I wonder where the discrepancy occurred? I've attached the two spread sheet results from the EnrichedRegionMaker. -cheers, D -- David Austin Nix, PhD Bioinformatics Shared Resource Huntsman Cancer Institute, Room 3165 2000 Circle of Hope, Salt Lake City, UT 84112 (801) 587-4611 dav...@hc... http://bioserver.hci.utah.edu ------ Forwarded Message From: Noboru Jo Sakabe <ns...@uc...> Date: Thu, 06 Jan 2011 11:02:53 -0600 To: David Nix <dav...@gm...> Subject: genome build Hi David, I forgot to mention it's mm9. David Nix wrote: > > Hmm, that's a bit worrying. There's no need to balance reads with USeq. > This is internally controlled. More data should increase the number of > regions returned at a given FDR, not decrease it. Your result is rather odd? > Would you mind posting the data to a web accessible directory somewhere. > Label it chIP and Input and let me know what genome build it is, I'd like to > run some tests. > > -cheers, D > > > On 1/5/11 3:41 PM, "Noboru Jo Sakabe" <ns...@uc...> > <mailto:ns...@uc...> wrote: > > > >> >> Hi David, I ran Useq on a sample that has a lot fewer reads than input. >> I got very few peaks. >> Then I balanced treatment and input, randomly selecting reads from >> input. >> Then I got ~14k peaks at FDR 4%. QuEST had also found a similar >> number of peaks. >> I know that balancing reads is an issue in MACS, I would like to >> know if this is also true for Useq. I believe it is, given my results, >> but could you comment on this? >> Thank you! >> >> noboru >> ----------------------------------------------------------------------------->> - >> Learn how Oracle Real Application Clusters (RAC) One Node allows customers >> to consolidate database storage, standardize their database environment, and, >> should the need arise, upgrade to a full multi-node Oracle RAC database >> without downtime or disruption >> http://p.sf.net/sfu/oracle-sfdevnl >> _______________________________________________ >> Useq-users mailing list >> Use...@li... >> https://lists.sourceforge.net/lists/listinfo/useq-users >> >> > > > > ------ End of Forwarded Message |
From: David N. <dav...@gm...> - 2011-01-06 03:11:35
|
Hello Noboru, Sure, like MACS, USeq uses a window based scan. It doesn't look at peak shape so it will work equally as well for chIP TFs as histone marks. For broad peaks just increase the window size to 1-5kb. You might want to try a range of window sizes. This is internally controlled so any enrichment seen is likely real regardless of the window size. There's no penalty here. That said though, it's easy to get any app to return more regions than another, just drop the thresholds. Remember the FDR estimations are just that, estimations, not absolutes. So an FDR of 5% in one app might really be 50% in another. This is why I built in two different FDR estimators in USeq based on two very different methods (an empirical null created from the input data and a q-value conversion of the binomial pvalues). Most often they are in agreement, when they differ significantly I get suspicious of the results. Are you generating biological replicas? I'd recommend doing so for your next set of experiments. 3 chIPs and 3 inputs (or better yet chIPs under different conditions/ time point) can really help tease out true and false positives. Dynamic maps also help narrow down the list of marked sites to those most biologically interesting. Good luck! -cheers, D On 1/5/11 3:43 PM, "Noboru Jo Sakabe" <ns...@uc...> wrote: > Hi David, to keep the list organized, I'm starting a new thread. > > What is your take on using Useq to call peaks on histone data? I've > been using it and while MACS often does not find peaks in my samples, > which seem to be poorly enriched, Useq does. But histone modifications > are different than TF, do you think I can trust Useq for histones? > Thank you. > > noboru > ------------------------------------------------------------------------------ > Learn how Oracle Real Application Clusters (RAC) One Node allows customers > to consolidate database storage, standardize their database environment, and, > should the need arise, upgrade to a full multi-node Oracle RAC database > without downtime or disruption > http://p.sf.net/sfu/oracle-sfdevnl > _______________________________________________ > Useq-users mailing list > Use...@li... > https://lists.sourceforge.net/lists/listinfo/useq-users |
From: David N. <dav...@gm...> - 2011-01-06 02:55:23
|
Hmm, that's a bit worrying. There's no need to balance reads with USeq. This is internally controlled. More data should increase the number of regions returned at a given FDR, not decrease it. Your result is rather odd? Would you mind posting the data to a web accessible directory somewhere. Label it chIP and Input and let me know what genome build it is, I'd like to run some tests. -cheers, D On 1/5/11 3:41 PM, "Noboru Jo Sakabe" <ns...@uc...> wrote: > Hi David, I ran Useq on a sample that has a lot fewer reads than input. > I got very few peaks. > Then I balanced treatment and input, randomly selecting reads from > input. > Then I got ~14k peaks at FDR 4%. QuEST had also found a similar > number of peaks. > I know that balancing reads is an issue in MACS, I would like to > know if this is also true for Useq. I believe it is, given my results, > but could you comment on this? > Thank you! > > noboru > ------------------------------------------------------------------------------ > Learn how Oracle Real Application Clusters (RAC) One Node allows customers > to consolidate database storage, standardize their database environment, and, > should the need arise, upgrade to a full multi-node Oracle RAC database > without downtime or disruption > http://p.sf.net/sfu/oracle-sfdevnl > _______________________________________________ > Useq-users mailing list > Use...@li... > https://lists.sourceforge.net/lists/listinfo/useq-users |
From: Noboru Jo S. <ns...@uc...> - 2011-01-05 22:36:51
|
Hi David, to keep the list organized, I'm starting a new thread. What is your take on using Useq to call peaks on histone data? I've been using it and while MACS often does not find peaks in my samples, which seem to be poorly enriched, Useq does. But histone modifications are different than TF, do you think I can trust Useq for histones? Thank you. noboru |
From: Noboru Jo S. <ns...@uc...> - 2011-01-05 22:34:42
|
Hi David, I ran Useq on a sample that has a lot fewer reads than input. I got very few peaks. Then I balanced treatment and input, randomly selecting reads from input. Then I got ~14k peaks at FDR 4%. QuEST had also found a similar number of peaks. I know that balancing reads is an issue in MACS, I would like to know if this is also true for Useq. I believe it is, given my results, but could you comment on this? Thank you! noboru |
From: David N. <dav...@gm...> - 2010-11-18 14:17:57
|
Yes Sonja, the best sub window is a rescan of the enriched region to identify the highest scoring 50bp window. You can also take the center of the best window. -cheers, D On 11/18/10 4:28 AM, "Sonja Althammer" <son...@go...> wrote: > Hello! > > I have been looking at the output files of the ChipSeq applications and I > couldnt find a summit position (= maximal pileup of reads). Also as it is a > window approach I guess there is no such thing? Would it be correct then to > use the center of the best sub windows in the corresponding gff file? BTW are > these also sorted according to the significance of a peak? > I want to look for motifs in the 100bp region underlying the summit of peaks, > that's why I need this position of reference.... > > Thanks in advance! > Sonja > > > ------------------------------------------------------------------------------ > Beautiful is writing same markup. Internet Explorer 9 supports > standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3. > Spend less time writing and rewriting code and more time creating great > experiences on the web. Be a part of the beta today > http://p.sf.net/sfu/msIE9-sfdev2dev > > _______________________________________________ > Useq-users mailing list > Use...@li... > https://lists.sourceforge.net/lists/listinfo/useq-users |
From: Sonja A. <son...@go...> - 2010-11-18 11:28:11
|
Hello! I have been looking at the output files of the ChipSeq applications and I couldnt find a summit position (= maximal pileup of reads). Also as it is a window approach I guess there is no such thing? Would it be correct then to use the center of the best sub windows in the corresponding gff file? BTW are these also sorted according to the significance of a peak? I want to look for motifs in the 100bp region underlying the summit of peaks, that's why I need this position of reference.... Thanks in advance! Sonja |