You can subscribe to this list here.
2009 |
Jan
|
Feb
|
Mar
|
Apr
(11) |
May
(2) |
Jun
|
Jul
|
Aug
(4) |
Sep
|
Oct
|
Nov
|
Dec
|
---|
From: Fuad G. <fua...@ho...> - 2009-08-13 14:11:43
|
What is the format of the quality scores in swift ? For input in Bowtie I am assuming Solexa. Thanks, Fuad _________________________________________________________________ Send and receive email from all of your webmail accounts. http://go.microsoft.com/?linkid=9671356 |
From: Yunchen G. <yun...@gm...> - 2009-08-06 15:09:10
|
Thanks Nava. I've now successfully installed it. Yunchen On Thu, Aug 6, 2009 at 5:14 AM, Nava Whiteford <ne...@sg...> wrote: > Hi Yunchen, > > You will need to install gsl, libtiff and fftw3 on your system in order > to run Swift. > > In most linux distributions these are available in the gsl,gsl-dev, > libtiff,libtiff-dev,fftw3 and fftw3-dev packages. > > On Wed, Aug 05, 2009 at 04:51:22PM -0400, Yunchen Gong wrote: > > Dear all, > > > > I'm compiling swift but have the error: > > > > # make swift > > g++ swift_main.cpp -D'SVN_REV="192"' -O3 -DHAVE_FFTW -DFTYPE=float -Wall > > -Wsign-compare -Wpointer-arith -I./SwiftImageAnalysis -I./Filters > > -I./SmallAlign -I./MockImageAnalysis -I./BaseCaller -I./include > > -I./CrossTalkCorrection -I./PhasingCorrection -I./include_lib -I > > /software/solexa/include -I./Reporting -L /software/solexa/lib -lgsl > > -lgslcblas -lfftw3 -ltiff ./include_lib/gnuplot_i.cc > > ./SwiftImageAnalysis/SwiftFFT.cpp -o swift > > In file included from ./CrossTalkCorrection/CrossTalkCorrection.h:190, > > from swift_main.cpp:27: > > ./CrossTalkCorrection/CrossTalkCorrection.cpp:24:25: error: > gsl/gsl_fit.h: > > No such file or directory > > ...... > > > > > > Any suggestions? > > > > Yunchen > > > > ------------------------------------------------------------------------------ > > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 > 30-Day > > trial. Simplify your report design, integration and deployment - and > focus on > > what you do best, core application coding. Discover what's new with > > Crystal Reports now. http://p.sf.net/sfu/bobj-july > > _______________________________________________ > > Swiftng-users mailing list > > Swi...@li... > > https://lists.sourceforge.net/lists/listinfo/swiftng-users > > > -- > Nav > > Work: 01865 854873 > Mob : 07518-358405 > |
From: Nava W. <ne...@sg...> - 2009-08-06 09:14:42
|
Hi Yunchen, You will need to install gsl, libtiff and fftw3 on your system in order to run Swift. In most linux distributions these are available in the gsl,gsl-dev, libtiff,libtiff-dev,fftw3 and fftw3-dev packages. On Wed, Aug 05, 2009 at 04:51:22PM -0400, Yunchen Gong wrote: > Dear all, > > I'm compiling swift but have the error: > > # make swift > g++ swift_main.cpp -D'SVN_REV="192"' -O3 -DHAVE_FFTW -DFTYPE=float -Wall > -Wsign-compare -Wpointer-arith -I./SwiftImageAnalysis -I./Filters > -I./SmallAlign -I./MockImageAnalysis -I./BaseCaller -I./include > -I./CrossTalkCorrection -I./PhasingCorrection -I./include_lib -I > /software/solexa/include -I./Reporting -L /software/solexa/lib -lgsl > -lgslcblas -lfftw3 -ltiff ./include_lib/gnuplot_i.cc > ./SwiftImageAnalysis/SwiftFFT.cpp -o swift > In file included from ./CrossTalkCorrection/CrossTalkCorrection.h:190, > from swift_main.cpp:27: > ./CrossTalkCorrection/CrossTalkCorrection.cpp:24:25: error: gsl/gsl_fit.h: > No such file or directory > ...... > > > Any suggestions? > > Yunchen > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Swiftng-users mailing list > Swi...@li... > https://lists.sourceforge.net/lists/listinfo/swiftng-users -- Nav Work: 01865 854873 Mob : 07518-358405 |
From: Yunchen G. <yun...@gm...> - 2009-08-05 20:51:32
|
Dear all, I'm compiling swift but have the error: # make swift g++ swift_main.cpp -D'SVN_REV="192"' -O3 -DHAVE_FFTW -DFTYPE=float -Wall -Wsign-compare -Wpointer-arith -I./SwiftImageAnalysis -I./Filters -I./SmallAlign -I./MockImageAnalysis -I./BaseCaller -I./include -I./CrossTalkCorrection -I./PhasingCorrection -I./include_lib -I /software/solexa/include -I./Reporting -L /software/solexa/lib -lgsl -lgslcblas -lfftw3 -ltiff ./include_lib/gnuplot_i.cc ./SwiftImageAnalysis/SwiftFFT.cpp -o swift In file included from ./CrossTalkCorrection/CrossTalkCorrection.h:190, from swift_main.cpp:27: ./CrossTalkCorrection/CrossTalkCorrection.cpp:24:25: error: gsl/gsl_fit.h: No such file or directory ...... Any suggestions? Yunchen |
From: Bernd J. <ber...@pa...> - 2009-05-05 07:53:04
|
Nava, Here are my results: time cc subimage # pf non-pf 13561 10 200072 1807468 3808.2 15 126344 1823356 5016 30 132000 1813200 5063 40 164652 1752824 2691 60 115936 697224 Where time is in sec and corrected for CPU usage. I find it quite surprising that by changing the number of sub-image multipliers we can change the number of clusters from 800K to more than 2.2M. Can you explain a bit please? Thanks, Bernd -----Original Message----- From: Nava Whiteford [mailto:ne...@sg...] Sent: Thursday, April 30, 2009 12:46 PM To: Bernd Jagla Cc: 'Tom Skelly'; Swi...@li... Subject: Re: long run times Hi Bernd, I had trouble extracting the last cycle of the tar you sent so I've discarded that cycle and processed the rest. On the current checkin of Swift the run took 25mins here. With the settings you used it takes about 34mins. So I'm slightly at a loss. I've made some small modifications to the latest checkin, perhaps that has helped. Could you try it on just this tile and let us know what happens? On Mon, Apr 27, 2009 at 04:41:51PM +0200, Bernd Jagla wrote: > Here are the associated output-files... > The one with the worst times unfortunately hasn't gotten any stdout/err for > some odd reason. Here is second "best"... > > > -----Original Message----- > From: Nava Whiteford [mailto:ne...@sg...] > Sent: Monday, April 27, 2009 4:30 PM > To: Bernd Jagla > Cc: 'Tom Skelly'; Swi...@li... > Subject: Re: long run times > > How big is dm3_ucsc.fa? Swifts aligner is very slow and only designed to > deal with 5000bp sequences. > > ->Dm3 is not being used (too big)... > > Are you saying it took 39 hours using the parameters settings in the > svn? That would be a bit odd, if you can send me the stdout/err that > would be useful. > > ->See attached > > Sorry, for the delay in getting back to you on your xml questions. I've > not forgotten! Just snowed under. > > ->no worries... it is cold here in Paris too today, but not that cold ;)... > Good luck !! > > Nav > > On Mon, Apr 27, 2009 at 04:07:06PM +0200, Bernd Jagla wrote: > > Nava, > > > > > > > > Here is a list of parameters I tested and associated run-times (143,343.85 > > sec = 39hrs!!!!) for one tile. > > > > Could you please comment on where all the time might be spent? > > > > > > > > I would like to understand the problem to be able to optimize our > protocols. > > > > > > > > It doesn't really seem to be possible to reduce the time without loosing > too > > many clusters. > > > > > > > > Thanks, > > > > > > > > Bernd > > > > > > > > Time = [sec] user time from time command > > > > Wc pf = number of lines in *.pf files > > > > Wc nonpf = numbe of lines in *.nonpf files > > > > There is only one parameter changed compared to the standard (normal) call > > of the program. > > > > > > > > idx parameter time wc pf wc nonpf > > > > 1 normal 143,343.85 182,256 1,720,580 > > > > 2 --ref dm3_ucsc.fa 27,681.68 110,988 1,809,448 > > > > 3 --correlation_cc_subimage_multiplier 20 114,063.20 179,724 > > 1,695,836 > > > > 4 --correlation_cc_subimage_multiplier 10 34,372.92 154,840 > > 1,751,684 > > > > 5 --correlation_cc_subimage_multiplier 40 4,732.95 116,920 > > 1,801,616 > > > > 6 --threshold_window 3 2,311.50 131,260 664,312 > > > > 7 --threshold_window 4 2,226.00 106,416 261,468 > > > > 8 --segment_cycles 2 3,343.96 143,712 1,301,216 > > > > 9 --segment_cycles 1 2,266.62 142,016 772,496 > > > > -- > Nav > > Work: 01865 854873 > Mob : 07518-358405 -- Nav Work: 01865 854873 Mob : 07518-358405 |
From: Nava W. <ne...@sg...> - 2009-04-30 10:45:51
|
Hi Bernd, I had trouble extracting the last cycle of the tar you sent so I've discarded that cycle and processed the rest. On the current checkin of Swift the run took 25mins here. With the settings you used it takes about 34mins. So I'm slightly at a loss. I've made some small modifications to the latest checkin, perhaps that has helped. Could you try it on just this tile and let us know what happens? On Mon, Apr 27, 2009 at 04:41:51PM +0200, Bernd Jagla wrote: > Here are the associated output-files... > The one with the worst times unfortunately hasn't gotten any stdout/err for > some odd reason. Here is second "best"... > > > -----Original Message----- > From: Nava Whiteford [mailto:ne...@sg...] > Sent: Monday, April 27, 2009 4:30 PM > To: Bernd Jagla > Cc: 'Tom Skelly'; Swi...@li... > Subject: Re: long run times > > How big is dm3_ucsc.fa? Swifts aligner is very slow and only designed to > deal with 5000bp sequences. > > ->Dm3 is not being used (too big)... > > Are you saying it took 39 hours using the parameters settings in the > svn? That would be a bit odd, if you can send me the stdout/err that > would be useful. > > ->See attached > > Sorry, for the delay in getting back to you on your xml questions. I've > not forgotten! Just snowed under. > > ->no worries... it is cold here in Paris too today, but not that cold ;)... > Good luck !! > > Nav > > On Mon, Apr 27, 2009 at 04:07:06PM +0200, Bernd Jagla wrote: > > Nava, > > > > > > > > Here is a list of parameters I tested and associated run-times (143,343.85 > > sec = 39hrs!!!!) for one tile. > > > > Could you please comment on where all the time might be spent? > > > > > > > > I would like to understand the problem to be able to optimize our > protocols. > > > > > > > > It doesn't really seem to be possible to reduce the time without loosing > too > > many clusters. > > > > > > > > Thanks, > > > > > > > > Bernd > > > > > > > > Time = [sec] user time from time command > > > > Wc pf = number of lines in *.pf files > > > > Wc nonpf = numbe of lines in *.nonpf files > > > > There is only one parameter changed compared to the standard (normal) call > > of the program. > > > > > > > > idx parameter time wc pf wc nonpf > > > > 1 normal 143,343.85 182,256 1,720,580 > > > > 2 --ref dm3_ucsc.fa 27,681.68 110,988 1,809,448 > > > > 3 --correlation_cc_subimage_multiplier 20 114,063.20 179,724 > > 1,695,836 > > > > 4 --correlation_cc_subimage_multiplier 10 34,372.92 154,840 > > 1,751,684 > > > > 5 --correlation_cc_subimage_multiplier 40 4,732.95 116,920 > > 1,801,616 > > > > 6 --threshold_window 3 2,311.50 131,260 664,312 > > > > 7 --threshold_window 4 2,226.00 106,416 261,468 > > > > 8 --segment_cycles 2 3,343.96 143,712 1,301,216 > > > > 9 --segment_cycles 1 2,266.62 142,016 772,496 > > > > -- > Nav > > Work: 01865 854873 > Mob : 07518-358405 -- Nav Work: 01865 854873 Mob : 07518-358405 |
From: Nava W. <ne...@sg...> - 2009-04-28 10:12:34
|
Thanks, I've fixed this in the latest checkin. We've been using perl to parse these reports. I generally view them in firefox to check that they parse ok. On Fri, Apr 24, 2009 at 01:29:09PM +0200, Bernd Jagla wrote: > > Nava, > > I am trying to read in the run file using R (the xml package), using the > following command: > > l1xml=xmlParse("runreport.L1-1") > > I get a lot of error messages and the reason for them is that for the row > entries the values are not enclosed in '"', e.g. > <row cycle=1 errors=0 /> > Instead of > <row cycle="1" errors="0" /> > > What do you use to readin the xml files? > > Thanks, > > Bernd > PS. Can you please add me to the swiftng-users list? > -- Nav Work: 01865 854873 Mob : 07518-358405 |
From: Tom S. <ts...@sa...> - 2009-04-27 15:23:34
|
It got hung up in cross-talk correction, by the look of it. Odd, that's usually quite quick, IIRC. There are no Memstats lines in the output. What's become of them? --TS Nava Whiteford wrote: > Though it says it's not using the reference, it maybe processing a large > number of smaller contigs. I've corrected this in the latest checkin of > swift but it would be worth trying this tile with no reference > specified. > > Other than that, it does look like the tile is generating an abnormally > large number of clusters. If possible can you send me a sample tile? > > On Mon, Apr 27, 2009 at 04:41:51PM +0200, Bernd Jagla wrote: > >> Here are the associated output-files... >> The one with the worst times unfortunately hasn't gotten any stdout/err for >> some odd reason. Here is second "best"... >> >> >> -----Original Message----- >> From: Nava Whiteford [mailto:ne...@sg...] >> Sent: Monday, April 27, 2009 4:30 PM >> To: Bernd Jagla >> Cc: 'Tom Skelly'; Swi...@li... >> Subject: Re: long run times >> >> How big is dm3_ucsc.fa? Swifts aligner is very slow and only designed to >> deal with 5000bp sequences. >> >> ->Dm3 is not being used (too big)... >> >> Are you saying it took 39 hours using the parameters settings in the >> svn? That would be a bit odd, if you can send me the stdout/err that >> would be useful. >> >> ->See attached >> >> Sorry, for the delay in getting back to you on your xml questions. I've >> not forgotten! Just snowed under. >> >> ->no worries... it is cold here in Paris too today, but not that cold ;)... >> Good luck !! >> >> Nav >> >> On Mon, Apr 27, 2009 at 04:07:06PM +0200, Bernd Jagla wrote: >> >>> Nava, >>> >>> >>> >>> Here is a list of parameters I tested and associated run-times (143,343.85 >>> sec = 39hrs!!!!) for one tile. >>> >>> Could you please comment on where all the time might be spent? >>> >>> >>> >>> I would like to understand the problem to be able to optimize our >>> >> protocols. >> >>> >>> >>> It doesn't really seem to be possible to reduce the time without loosing >>> >> too >> >>> many clusters. >>> >>> >>> >>> Thanks, >>> >>> >>> >>> Bernd >>> >>> >>> >>> Time = [sec] user time from time command >>> >>> Wc pf = number of lines in *.pf files >>> >>> Wc nonpf = numbe of lines in *.nonpf files >>> >>> There is only one parameter changed compared to the standard (normal) call >>> of the program. >>> >>> >>> >>> idx parameter time wc pf wc nonpf >>> >>> 1 normal 143,343.85 182,256 1,720,580 >>> >>> 2 --ref dm3_ucsc.fa 27,681.68 110,988 1,809,448 >>> >>> 3 --correlation_cc_subimage_multiplier 20 114,063.20 179,724 >>> 1,695,836 >>> >>> 4 --correlation_cc_subimage_multiplier 10 34,372.92 154,840 >>> 1,751,684 >>> >>> 5 --correlation_cc_subimage_multiplier 40 4,732.95 116,920 >>> 1,801,616 >>> >>> 6 --threshold_window 3 2,311.50 131,260 664,312 >>> >>> 7 --threshold_window 4 2,226.00 106,416 261,468 >>> >>> 8 --segment_cycles 2 3,343.96 143,712 1,301,216 >>> >>> 9 --segment_cycles 1 2,266.62 142,016 772,496 >>> >>> >> -- >> Nav >> >> Work: 01865 854873 >> Mob : 07518-358405 >> > > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |
From: Nava W. <ne...@sg...> - 2009-04-27 14:57:19
|
Though it says it's not using the reference, it maybe processing a large number of smaller contigs. I've corrected this in the latest checkin of swift but it would be worth trying this tile with no reference specified. Other than that, it does look like the tile is generating an abnormally large number of clusters. If possible can you send me a sample tile? On Mon, Apr 27, 2009 at 04:41:51PM +0200, Bernd Jagla wrote: > Here are the associated output-files... > The one with the worst times unfortunately hasn't gotten any stdout/err for > some odd reason. Here is second "best"... > > > -----Original Message----- > From: Nava Whiteford [mailto:ne...@sg...] > Sent: Monday, April 27, 2009 4:30 PM > To: Bernd Jagla > Cc: 'Tom Skelly'; Swi...@li... > Subject: Re: long run times > > How big is dm3_ucsc.fa? Swifts aligner is very slow and only designed to > deal with 5000bp sequences. > > ->Dm3 is not being used (too big)... > > Are you saying it took 39 hours using the parameters settings in the > svn? That would be a bit odd, if you can send me the stdout/err that > would be useful. > > ->See attached > > Sorry, for the delay in getting back to you on your xml questions. I've > not forgotten! Just snowed under. > > ->no worries... it is cold here in Paris too today, but not that cold ;)... > Good luck !! > > Nav > > On Mon, Apr 27, 2009 at 04:07:06PM +0200, Bernd Jagla wrote: > > Nava, > > > > > > > > Here is a list of parameters I tested and associated run-times (143,343.85 > > sec = 39hrs!!!!) for one tile. > > > > Could you please comment on where all the time might be spent? > > > > > > > > I would like to understand the problem to be able to optimize our > protocols. > > > > > > > > It doesn't really seem to be possible to reduce the time without loosing > too > > many clusters. > > > > > > > > Thanks, > > > > > > > > Bernd > > > > > > > > Time = [sec] user time from time command > > > > Wc pf = number of lines in *.pf files > > > > Wc nonpf = numbe of lines in *.nonpf files > > > > There is only one parameter changed compared to the standard (normal) call > > of the program. > > > > > > > > idx parameter time wc pf wc nonpf > > > > 1 normal 143,343.85 182,256 1,720,580 > > > > 2 --ref dm3_ucsc.fa 27,681.68 110,988 1,809,448 > > > > 3 --correlation_cc_subimage_multiplier 20 114,063.20 179,724 > > 1,695,836 > > > > 4 --correlation_cc_subimage_multiplier 10 34,372.92 154,840 > > 1,751,684 > > > > 5 --correlation_cc_subimage_multiplier 40 4,732.95 116,920 > > 1,801,616 > > > > 6 --threshold_window 3 2,311.50 131,260 664,312 > > > > 7 --threshold_window 4 2,226.00 106,416 261,468 > > > > 8 --segment_cycles 2 3,343.96 143,712 1,301,216 > > > > 9 --segment_cycles 1 2,266.62 142,016 772,496 > > > > -- > Nav > > Work: 01865 854873 > Mob : 07518-358405 -- Nav Work: 01865 854873 Mob : 07518-358405 |
From: Bernd J. <ber...@pa...> - 2009-04-27 14:42:08
|
Here are the associated output-files... The one with the worst times unfortunately hasn't gotten any stdout/err for some odd reason. Here is second "best"... -----Original Message----- From: Nava Whiteford [mailto:ne...@sg...] Sent: Monday, April 27, 2009 4:30 PM To: Bernd Jagla Cc: 'Tom Skelly'; Swi...@li... Subject: Re: long run times How big is dm3_ucsc.fa? Swifts aligner is very slow and only designed to deal with 5000bp sequences. ->Dm3 is not being used (too big)... Are you saying it took 39 hours using the parameters settings in the svn? That would be a bit odd, if you can send me the stdout/err that would be useful. ->See attached Sorry, for the delay in getting back to you on your xml questions. I've not forgotten! Just snowed under. ->no worries... it is cold here in Paris too today, but not that cold ;)... Good luck !! Nav On Mon, Apr 27, 2009 at 04:07:06PM +0200, Bernd Jagla wrote: > Nava, > > > > Here is a list of parameters I tested and associated run-times (143,343.85 > sec = 39hrs!!!!) for one tile. > > Could you please comment on where all the time might be spent? > > > > I would like to understand the problem to be able to optimize our protocols. > > > > It doesn't really seem to be possible to reduce the time without loosing too > many clusters. > > > > Thanks, > > > > Bernd > > > > Time = [sec] user time from time command > > Wc pf = number of lines in *.pf files > > Wc nonpf = numbe of lines in *.nonpf files > > There is only one parameter changed compared to the standard (normal) call > of the program. > > > > idx parameter time wc pf wc nonpf > > 1 normal 143,343.85 182,256 1,720,580 > > 2 --ref dm3_ucsc.fa 27,681.68 110,988 1,809,448 > > 3 --correlation_cc_subimage_multiplier 20 114,063.20 179,724 > 1,695,836 > > 4 --correlation_cc_subimage_multiplier 10 34,372.92 154,840 > 1,751,684 > > 5 --correlation_cc_subimage_multiplier 40 4,732.95 116,920 > 1,801,616 > > 6 --threshold_window 3 2,311.50 131,260 664,312 > > 7 --threshold_window 4 2,226.00 106,416 261,468 > > 8 --segment_cycles 2 3,343.96 143,712 1,301,216 > > 9 --segment_cycles 1 2,266.62 142,016 772,496 > -- Nav Work: 01865 854873 Mob : 07518-358405 |
From: Nava W. <ne...@sg...> - 2009-04-27 14:30:00
|
How big is dm3_ucsc.fa? Swifts aligner is very slow and only designed to deal with 5000bp sequences. Are you saying it took 39 hours using the parameters settings in the svn? That would be a bit odd, if you can send me the stdout/err that would be useful. Sorry, for the delay in getting back to you on your xml questions. I've not forgotten! Just snowed under. Nav On Mon, Apr 27, 2009 at 04:07:06PM +0200, Bernd Jagla wrote: > Nava, > > > > Here is a list of parameters I tested and associated run-times (143,343.85 > sec = 39hrs!!!!) for one tile. > > Could you please comment on where all the time might be spent? > > > > I would like to understand the problem to be able to optimize our protocols. > > > > It doesn't really seem to be possible to reduce the time without loosing too > many clusters. > > > > Thanks, > > > > Bernd > > > > Time = [sec] user time from time command > > Wc pf = number of lines in *.pf files > > Wc nonpf = numbe of lines in *.nonpf files > > There is only one parameter changed compared to the standard (normal) call > of the program. > > > > idx parameter time wc pf wc nonpf > > 1 normal 143,343.85 182,256 1,720,580 > > 2 --ref dm3_ucsc.fa 27,681.68 110,988 1,809,448 > > 3 --correlation_cc_subimage_multiplier 20 114,063.20 179,724 > 1,695,836 > > 4 --correlation_cc_subimage_multiplier 10 34,372.92 154,840 > 1,751,684 > > 5 --correlation_cc_subimage_multiplier 40 4,732.95 116,920 > 1,801,616 > > 6 --threshold_window 3 2,311.50 131,260 664,312 > > 7 --threshold_window 4 2,226.00 106,416 261,468 > > 8 --segment_cycles 2 3,343.96 143,712 1,301,216 > > 9 --segment_cycles 1 2,266.62 142,016 772,496 > -- Nav Work: 01865 854873 Mob : 07518-358405 |
From: Tom S. <ts...@sa...> - 2009-04-27 14:27:36
|
Ouch! If you go to the run output file (the log, not the xml) and grep out 'Memstats' lines, that will give you the start/stop and delta times of the various processing steps steps. Note that the start/stops are nested, so you need to match up a start and a stop with the same label. That should tell us where to look further. --TS Bernd Jagla wrote: > Nava, > > > > Here is a list of parameters I tested and associated run-times (143,343.85 > sec = 39hrs!!!!) for one tile. > > Could you please comment on where all the time might be spent? > > > > I would like to understand the problem to be able to optimize our protocols. > > > > It doesn't really seem to be possible to reduce the time without loosing too > many clusters. > > > > Thanks, > > > > Bernd > > > > Time = [sec] user time from time command > > Wc pf = number of lines in *.pf files > > Wc nonpf = numbe of lines in *.nonpf files > > There is only one parameter changed compared to the standard (normal) call > of the program. > > > > idx parameter time wc pf wc nonpf > > 1 normal 143,343.85 182,256 1,720,580 > > 2 --ref dm3_ucsc.fa 27,681.68 110,988 1,809,448 > > 3 --correlation_cc_subimage_multiplier 20 114,063.20 179,724 > 1,695,836 > > 4 --correlation_cc_subimage_multiplier 10 34,372.92 154,840 > 1,751,684 > > 5 --correlation_cc_subimage_multiplier 40 4,732.95 116,920 > 1,801,616 > > 6 --threshold_window 3 2,311.50 131,260 664,312 > > 7 --threshold_window 4 2,226.00 106,416 261,468 > > 8 --segment_cycles 2 3,343.96 143,712 1,301,216 > > 9 --segment_cycles 1 2,266.62 142,016 772,496 > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |
From: Bernd J. <ber...@pa...> - 2009-04-27 14:07:25
|
Nava, Here is a list of parameters I tested and associated run-times (143,343.85 sec = 39hrs!!!!) for one tile. Could you please comment on where all the time might be spent? I would like to understand the problem to be able to optimize our protocols. It doesn't really seem to be possible to reduce the time without loosing too many clusters. Thanks, Bernd Time = [sec] user time from time command Wc pf = number of lines in *.pf files Wc nonpf = numbe of lines in *.nonpf files There is only one parameter changed compared to the standard (normal) call of the program. idx parameter time wc pf wc nonpf 1 normal 143,343.85 182,256 1,720,580 2 --ref dm3_ucsc.fa 27,681.68 110,988 1,809,448 3 --correlation_cc_subimage_multiplier 20 114,063.20 179,724 1,695,836 4 --correlation_cc_subimage_multiplier 10 34,372.92 154,840 1,751,684 5 --correlation_cc_subimage_multiplier 40 4,732.95 116,920 1,801,616 6 --threshold_window 3 2,311.50 131,260 664,312 7 --threshold_window 4 2,226.00 106,416 261,468 8 --segment_cycles 2 3,343.96 143,712 1,301,216 9 --segment_cycles 1 2,266.62 142,016 772,496 |
From: Bernd J. <ber...@pa...> - 2009-04-27 09:05:17
|
Nava, More questions regarding the xml output. > > Image offsets: > > > > ============== > > > > I think there are few issues with the runreport files: > > > > The XML tags are somewhat obvious: > > > > Base = A=0,C=1,G=2,T=3 > > > > Cycle = cycles of solexa run = length of sequence (Problem the cycle > > numbers: for me they go from 0-9, 0-10, 0-10, 0-6) = 39 cycles, but we only > > use 36!!!) > > ah ok. I think this is an issue with load_cycle. The pipeline loads > reads in batches of 10 by default. During each back it also pulls in the > reference cycle images, that's putting the numbering out. I'll fix this > but as a temporary workaround you should be able to change load_cycle to > 37 and the offsets should be correct. >From looking at the offsets I have to believe that the last image is the reference image for the offsetmaps blocks following the first. I.e. There are four blocks of "offsetmaps". In the first block the first entry for each base is the reference (base =0,1,2,3 cycle=0). In the following three blocks (for a 36 cycle run) the last "offsetmap" block refers to the reference image (base =0,1,2,3 cycle=10,10,6). Now I would expect the offset of the reference image to be zero (0), but it seems not the case for the last three "offsetmaps" blocks. Could you please elaborate on this? Also, when are you planning to release a new version of the swift program with an updated xml output? Do you plan to provide a converter tool to change old xml output to new? As I am currently developing the image/cluster retrieval I would be interested in those tools because I don't want to redo everything in a week or two ;) I am trying to do this in R. Do you by any chance have already some tools related to this? Yet another note: Do you know how to do the same thing with the Illumina image analysis. I.e. how can I get the actual raw data from the images for each cluster (image intensities)? Thanks so much for your kind help, Bernd |
From: Bernd J. <ber...@pa...> - 2009-04-24 11:29:36
|
Nava, I am trying to read in the run file using R (the xml package), using the following command: l1xml=xmlParse("runreport.L1-1") I get a lot of error messages and the reason for them is that for the row entries the values are not enclosed in '"', e.g. <row cycle=1 errors=0 /> Instead of <row cycle="1" errors="0" /> What do you use to readin the xml files? Thanks, Bernd PS. Can you please add me to the swiftng-users list? |
From: Nava W. <ne...@sg...> - 2009-04-20 13:50:36
|
> Seg-fault: > > ========== > > Here is our gcc version info: > > -bash-3.00$ gcc -v > > Reading specs from /usr/lib/gcc/x86_64-redhat-linux/3.4.6/specs > > Configured with: ../configure --prefix=/usr --mandir=/usr/share/man > --infodir=/usr/share/info --enable-shared --enable-threads=posix > --disable-checking --with-system-zlib --enable-__cxa_atexit > --disable-libunwind-exceptions --enable-java-awt=gtk > --host=x86_64-redhat-linux > > Thread model: posix > > gcc version 3.4.6 20060404 (Red Hat 3.4.6-9) ok, I've not been able to replicate the segfault on 4.2.3 but it also only finds one read on this cluster, so I'll dig further. > Image offsets: > > ============== > > I think there are few issues with the runreport files: > > The XML tags are somewhat obvious: > > Base = A=0,C=1,G=2,T=3 > > Cycle = cycles of solexa run = length of sequence (Problem the cycle > numbers: for me they go from 0-9, 0-10, 0-10, 0-6) = 39 cycles, but we only > use 36!!!) ah ok. I think this is an issue with load_cycle. The pipeline loads reads in batches of 10 by default. During each back it also pulls in the reference cycle images, that's putting the numbering out. I'll fix this but as a temporary workaround you should be able to change load_cycle to 37 and the offsets should be correct. > X,y = subtiles (Which corner is x=0; y=0 (upper left?)? It should be top left yes. > For me it seems to be a 30x 30 square of sub images (see your previous > answer below). > > There is no overlap/gap between the tiles. In the case the divisions > (image_width/subimages) don't end up in an integer value, what happens > (round/ceil)? IIRC it's floored > On a different note, in the paper you describe the reason for subimage > registration as: incorrect focusing, warping, of the flowcell due to > temperature variation. > > What kind of improvements did you see with this approach? A similar approach is taken in the GAPipeline. In the GAPipeline they calculate offsets for 125x125 pixel regions. They then place a linear regression through these points of calculate a scaling factor. I opted for simple X/Y offsets because the offset variation across the tile didn't look linear. As it appears you can calculate offsets accurately for 50x50 subregions it seemed to me that scaling wouldn't buy you much (we are only talking about a variation of 1 or 2 pixels across the image). So, I'd hope that Swift is able to align clusters more accurately than the GAPipeline. > Why did you choose 30x30 sub-images as the standard? Experimentation. This has suited the datasets I've run well, however may be dependent on cluster density. From the tile you sent the cluster density appears to be a little lower than I've seen before, you might get some benefit from using a smaller number of subimages. > I believe this information (and derived statistics of the variations) could > be very well used for QC purposes. Are you doing something like that? I think it would be interesting to develop Swift in to QC tool. Right now my main focus is on the analysis algorithms, I'm writing the report data as xml which hopefully makes it easy for others to parse. > I guess that is it for now. > > > > Thanks for your help. > > > > Bernd > > > > -----Original Message----- > From: Nava Whiteford [mailto:ne...@sg...] > Sent: Monday, April 20, 2009 12:10 AM > To: Bernd Jagla > Cc: 'Tom Skelly' > Subject: Re: segmentation fault in swift program > > > > Thanks received the download. I've ran it through on my laptop using the > > Intel C compiler and didn't get a segfault, however the Swift only found > > 1 cluster on the tile. I'll check the tile against gcc and let you know > > what happens. > > > > > If you look in the fastq files the last 2 values in the description > > > field give the X and Y coordinates of the cluster. This is the X/Y > > > position on the reference image (usually cycle 0, A image). > > > > > > Hmm... I guess I am missing the obvious... Aren't the other images > > > registered to that reference image and don't I need to transform those > image > > > to have the same coordinates???? Where do I get this information? > > > > Ah ok, I understand now. Yes, if you want to match the cluster positions > > back to one of the other images you'll need to apply a transformation. > > > > Swift uses a simple x/y offset. However the offset is not constant over > > the image. Each subregion (set by the parameters -correlation_subimages > > * --correlation_cc_subimage_multiplier and by default 30) gets a > > * different offset. > > > > The offsets are available in two places, firstly in the standard output > > after: > > > > Cycle: 0 base: 2 offsetmap: > > X MAP: > > > > The follow matrix gives the X offsets for the first cycle G image (bases > > are A=0,C=1,G=2,T=3). So if you wanted to find the correct position in > > the G image you'd need something like: > > > > g_image_position_x = g_image_position_x + > offset_matrix[cluster_x_position/(image_width/subimages)][cluster_y_position > /(image_height/subimages)] > > > > Similarly for the clusters y coordinate. > > > > In addition to being in the standard output the offsets are also > > available in the xml runreports under: <offsetmaps> hopefully the layout > > makes sense. > > > > btw, do my mind if I CC these mailings to the swift mailing list? It > > might be useful to others. > > > > > > > > > PPS. Any further suggestions on how to compare the two methods? > > > > > > Have you performed an alignment to determine the error rate? This would > > > probably be quite a good idea, just to make sure that real sequences are > > > being generated. The preprint describes the methods we used. > > > > > > Good point! Even though I don't think this is the optimal way to very it > it > > > certainly makes sense as a first approximation. And this information will > be > > > useful to identify potentially problematic clusters. (And my boss also > > > suggested this over lunch ;) ) > > > > > > > > > Thanks a lot, and sorry if some of the answers are in your paper as I > > > haven't had time to read it yet. We do so now ;) > > > > > > Thanks, > > > > > > Bernd > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > From: Nava Whiteford [mailto:ne...@sg...] > > > > Sent: Thursday, April 16, 2009 10:20 PM > > > > To: Bernd Jagla > > > > Cc: 'Tom Skelly' > > > > Subject: Re: segmentation fault in swift program > > > > > > > > > > > > > > > > Hi Bernd, > > > > > > > > > > > > > > > > Thanks for trying out Swift, we're keen on working with the community to > > > > > > > > develop Swift so feedback is always useful. > > > > > > > > > > > > > > > > 30x yield increases: > > > > > > > > > > > > > > > > 30x increase in yield seems too good to be true. :) What are the raw > > > > > > > > numbers? If the GAPipeline is producing a few thousand and Swift is > > > > > > > > producing 100k+ then this sounds reasonable. However if Swift is > > > > > > > > producing 1000k+ then there's probably a problem somewhere. > > > > > > > > > > > > > > > > Both the GAPipeline and Swift apply what is know as purity filtering. > > > > > > > > Purity filtering is a relatively coarse metric for throwing out bad > > > > > > > > data. They have both been parameterised to result in an error rate of > > > > > > > > around 1% on the total dataset (both GAPipline and Swift). It maybe that > > > > > > > > the GAPipeline is experiencing a catastrophic failure on some tiles, > > > > > > > > which Swift is able to recover from. > > > > > > > > > > > > > > > > Quality scores: > > > > > > > > > > > > > > > > Neither the GAPipeline nor Swift's quality scores are very good without > > > > > > > > calibration. If you look in the Swift QualityCalibration directory there > > > > > > > > is a very simple score shuffling calibrator I suggest you use if > > > > > > > > aligning data using MAQ. I've attached a preprint of the Swift paper > > > > > > > > which describes our quality calibration and maybe of general interest. > > > > > > > > > > > > > > > > 4 hour run times: > > > > > > > > > > > > > > > > It maybe that Swift is identifying many "optical duplicates" and then > > > > > > > > filtering them out. You can try tweaking "threshold" (make it lower) and > > > > > > > > "threshold_window" (make it higher). Also reduce "segment_cycles". > > > > > > > > > > > > > > > > If you have the output for one of these runs I can possibly give you > > > > > > > > some other suggestions. > > > > > > > > > > > > > > > > Segmentation fault bug: > > > > > > > > > > > > > > > > Hmm yes, as Tom said it seems to be a bug in the crosstalk correction. I > > > > > > > > should make it fail more gracefully! :) > > > > > > > > > > > > > > > > If you have a copy of this image set you can send me it would be very > > > > > > > > useful in debugging. There's a script to grab a set of tile images here: > > > > > > > > > > > > > > > > http://linuxjunk.blogspot.com/2008/09/grab-set-of-tile-images-from.html > > > > > > > > > > > > > > > > Comparisons: > > > > > > > > > > > > > > > > Yes I'm extremely interested in seeing the results of your comparisons. > > > > > > > > Also check out the attached preprint where we have some basic > > > > > > > > comparisons of Swift against the GAPipeline. If you find datasets where > > > > Swift > > > > > > > > performs badly, or have features you'd like added please let me know. > > > > > > > > > > > > > > > > On Thu, Apr 16, 2009 at 06:19:29PM +0200, Bernd Jagla wrote: > > > > > > > > > > > > > > > > > > Thanks for the answer. What can I do to avoid this problem? I guess a > > > > > > > > > changing the code to account for this situation would be the best > > > > solution. > > > > > > > > > Unfortunately I am not fluent in C++... ;) > > > > > > > > > > > > > > > > > > On a different note: > > > > > > > > > I am comparing the output from swift and Firecrest currently and find > > > that > > > > > > > > > swift detects sometimes 30x more unique sequences than Firecrest... > this > > > > > > > > > make me wonder about the quality scores and how to really compare the > > > > > > > > > results. Maybe those sequences have been discarded by Firecrest for a > > > > > > > > > reason??? I would like to see with my own eyes the clusters, hence my > > > > > > > > > previous question about how to locate the clusters given a fastq > file... > > > > > > > > > >From the documentation I don't really understand what to do in order > to > > > > > > > > > compare them. > > > > > > > > > > > > > > > > > > Have you done similar experiments? > > > > > > > > > Do you have a more detailed description on how those scores are > > > > calculated? > > > > > > > > > Do you have any suggestions on how to compare the two methods? > > > > > > > > > > > > > > > > > > Also, what are ways to speed up the image analysis? What parameters > > > should > > > > I > > > > > > > > > tweak? > > > > > > > > > Sometimes the analysis for one tile takes more than 4 hours, which is > > > too > > > > > > > > > much for our environment... > > > > > > > > > > > > > > > > > > Thanks so much for your kind support. > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > > > > > > > Bernd > > > > > > > > > PS. Please let me know if you are interested in the results of my > > > > > > > > > comparisons... > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > > From: Tom Skelly [mailto:ts...@sa...] > > > > > > > > > Sent: Thursday, April 16, 2009 4:06 PM > > > > > > > > > To: Bernd Jagla > > > > > > > > > Cc: ne...@sg... > > > > > > > > > Subject: Re: segmentation fault in swift program > > > > > > > > > > > > > > > > > > > > > > > > > > > I can see a lot of "Bin size: nan" in the output. There's a loop in > > > > > > > > > CrossTalkCorrection that counts down current_num_bins, and divides by > it > > > > > > > > > > > > to get the bin size. I'm guessing it's being counted down to zero, > hence > > > > > > > > > > > > the nan. > > > > > > > > > > > > > > > > > > That's as far as I can take it, however, as I'm not familiar with that > > > > > > > > > > area of the code. I'm hoping Nava can take it from there. > > > > > > > > > > > > > > > > > > --TS > > > > > > > > > > > > > > > > > > Bernd Jagla wrote: > > > > > > > > > > Hi Nava and Tom, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > First off, thanks for your swift program!!! It seems to be working > > > much > > > > > > > > > > better than the Illumina image analysis. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I just discovered a potential problem where you might be able to > help: > > > > > > > > > > > > > > > > > > > > Occasionally I get a segmentation fault (see attached files). The > > > files > > > > > > > > > were > > > > > > > > > > created using the following command: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > runswifttile /pasteur/solexa2/solexa_depot/090320_HWI-EAS285_0003/ 7 > > > 63 > > > > > > > > > > L7-63 > 63.out 2> 63.err > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If you need access to the images, please let me know where I can > drop > > > > > > > > > them. > > > > > > > > > > I only get two such seg-faults within the current experient (8 > lanes). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Please let me know if you know what I can do to solve this problem. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks a lot for your kind help. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Bernd > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Bernd Jagla > > > > > > > > > > Bioinformatician > > > > > > > > > > > > > > > > > > > > Institute Pasteur > > > > > > > > > > Plate-forme puces a ADN > > > > > > > > > > Genopole / Institut Pasteur > > > > > > > > > > 28 rue du Docteur Roux > > > > > > > > > > 75724 Paris Cedex 15 > > > > > > > > > > France > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > <mailto:ber...@pa...> ber...@pa... > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > tel: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > <http://www.plaxo.com/click_to_call?lang=en&src=jj_signature&To=%2B33+%280%2 > > > > > > > > > > 9+140+61+35+13&Email=ber...@ya...> +33 (0) 140 61 35 13 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > The Wellcome Trust Sanger Institute is operated by Genome Research > > > > > > > > > Limited, a charity registered in England with number 1021457 and a > > > > > > > > > company registered in England with number 2742969, whose registered > > > > > > > > > office is 215 Euston Road, London, NW1 2BE. > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Nav > > > > > > > > > > > > > > > > Work: 01865 854873 > > > > > > > > Mob : 07518-358405 > > > > > > > > > > -- > > > Nav > > > > > > Work: 01865 854873 > > > Mob : 07518-358405 > > > > > > > -- > > Nav > > > > Work: 01865 854873 > > Mob : 07518-358405 > -- Nav Work: 01865 854873 Mob : 07518-358405 |