From: Braisted, J. C. <bra...@jc...> - 2007-08-28 14:35:01
Attachments:
NonpaR_manual_entry.pdf
|
Hi John and others, I've developed a nonparametric statistical module that includes: -Wilcoxon Rank Sum -Kruskal-Wallace Test=20 -Mack-Skillings ( a generalization of the Friedman 2-way test that handles replication in cells) (the MS test is probably the most important novelty since it's not available in R (I think) and it handles a common task) -Fisher Exact (for special cases where data values are more like A/P calls as in CGH) These tests fall under one module. Originally the plan was to use R but that really wasn't needed and it seemed like it would require extra effort to deploy with dependencies on R and RServe. I think it was your advice at our last meeting to just implement the tests without R and while R worked fine with MeV in my tests, the deployment complication did outweighed the development cost of developing without R. In addition the module, I've created a wizard dialog system that can control the process of parameter collection for stat methods. This eliminates the need for tabbed panes and the size of dialogs are constrained. This new stat wizard system is only utilized in my NonpaR module but it can be used by other methods in the future. I have a manual entry finished. The last step is to build some help pages. The NonpaR module results are verified against R and also examples from the Hollander and Wolfe book on nonparametric methods. The module is currently being used by a couple of groups internal to JCVI that are part of the PFGRC. The manual entry pdf that I've attached will need a bit of formatting but it gives an overview of the module and also the look of the dialog wizard system (at least as rendered by Java 1.4.2).=20 John ________________________________ From: John Quackenbush [mailto:jo...@ji...]=20 Sent: Tuesday, August 28, 2007 9:31 AM To: Braisted, John C. Subject: Re: loading GEO format files John, It would be good to know about which tests you are working on since we are doing some parallel work. Do you have a list you could send? JQ Braisted, John C. wrote:=20 Hi Steve, =09 I just saw JQ's response as I was drafting mine... MeV is still being actively developed. The Dana Farber Cancer Institute (DFCI) affiliated with Harvard has a very large and capable MeV/TM4 development team with diverse skills. The vast majority of new work is coming from the DFCI group headed by John Quackenbush. There are developers at the University of Washington in Seattle that have added many important features and modules over several years. The group at UW is headed by Roger Bumgarner and while I haven't been in touch with them for several months I'm pretty sure they are going strong. =20 =09 The JCVI team (formally we were TIGR) has development goals that are a bit more aligned with supporting specific research needs and less focused on general software development (like features and under-the-hood details). One module close to release is a collection of nonparametric statistical tests. =09 John =09 -----Original Message----- From: Steve Taylor [mailto:st...@mo...]=20 Sent: Tuesday, August 28, 2007 4:18 AM To: Braisted, John C. Cc: mev; me...@ji... Subject: Re: loading GEO format files =09 Hi John, =09 Thanks for looking at this. I look forward to your reply. Is MeV still actively being developed BTW? =09 Steve =09 =20 I took a look at the file and came across the same error. The code=20 doesn't seem to handle the format of your file because it looks for=20 certain features in the file that can indicate where a data matrix=20 starts and stops but your file seems to deviate from the sample GEO=20 file enough to break the loading process. =09 The GEO file loader was developed at Harvard so I'm not an expert on=20 the file format or the loader. I think the loader can be made more=20 robust by testing with files like yours but the update of the loader=20 isn't in my current domain of work. I've cc'ed the group at Harvard=20 in case they can try to tackle the problem but I don't know their=20 priorities and so I can't say whether it will be addressed. It's not=20 very complex but it's a bit of work to make the changes. The group up =20 =09 =20 there at Harvard may also have advice about either modification of the =20 =09 =20 file or whether another format can be used from GEO. =09 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =09 I downloaded the matrix file which is supposed to be a matrix of=20 values where rows are spots or features and columns are separate=20 hybridization results. This normally can be easily modified to load=20 as a TDMS file in MeV. It looks like there may have been a possible=20 formatting issue with the matrix file submission to GEO where array=20 results were sort of concatenated or stacked. There are about 400,000 =20 =09 =20 rows and I realize it's a SNP study but I'm not sure it's a huge=20 tiling array or if the row count is correct. =09 Need to use WordPad (not Excel to modify the file into TDMS format): You would need to remove the header section of the file except for the =20 =09 =20 last header row with column ids that is just above the 'matrix' of=20 values. It's pretty sparse. Then go the last row and remove the last =20 =09 =20 line labeled '!....'. =09 The file is huge and will load into MEV but I suspect it's sort of a=20 stacking of array results rather than a properly formed data matrix. The row ids are not unique meaning that either there is a lot of=20 replication or that the results are stacked. The number of columns in =20 =09 =20 the matrix is consistent but most rows only contain data for one hyb=20 but that's not a strict rule as some rows have several values. =09 You can try to load the matrix file but unless the strange format=20 (sparse matrix with 400K rows) seems to make sense given the=20 experiment it might be better to either see if a GEO loader can be=20 re-worked or if you can contact GEO or the authors for a well=20 formatted matrix of values. =09 John Braisted =09 John Braisted Software Engineer II Pathogen Functional Genomics Resource Center (PFGRC) J. Craig Venter=20 Institute 9704 Medical Center Drive Rockville, MD 20850 =09 =09 -----Original Message----- From: Steve Taylor [mailto:st...@mo...] Sent: Friday, August 24, 2007 5:05 AM To: mev Subject: loading GEO format files =09 Hi, =09 I was trying to load a GEO SOFT format file in TMEV4 on Windows XP SP2 =20 =09 =20 (for example the one in=20 =09 ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SOFT/by_series/GSE5291/). I=20 uncompressed it and renamed the extension at the end to .txt but when=20 I tried to load it didn't show any columns had loaded (see attached=20 screenshot). =09 Has the SOFT format changed or does this feature just not work? =09 Thanks for any help, =09 Steve =09 ------------------------------------------------------------------ Head of Computational Biology Research Group Medical Sciences Division =20 =09 =20 Weatherall Institute of Molecular Medicine/Sir William Dunn School=20 Oxford University Tel: +44 (0)1865 (2)22640 (WIMM - Monday to Wednesday) Tel: +44 (0)1865 (2)85732 (Dunn - Thursday to Friday) Web: http://www.compbio.ox.ac.uk =09 =20 |
From: Sinha, R. D. <Rak...@df...> - 2007-08-28 15:11:48
|
Hello All, Along with the Bayesian Network analysis tool that JohnQ already mentioned = the pipeline also has the following: 1=2E New CGH module called ChARM with new viewers. 2=2E A new Data Annotation model for MeV. 3=2E The File Loaders are being re-worked to accommodate the new Model and = make them more user-friendly. Raktim -----Original Message----- From: Braisted, John C. [mailto:bra...@jc...]=20 Sent: Tuesday, August 28, 2007 10:35 AM To: Quackenbush, John Cc: mev; me...@ji...; mev...@li...; Saeed, Alexander I. Subject: RE: loading GEO format files Hi John and others, I've developed a nonparametric statistical module that includes: -Wilcoxon Rank Sum -Kruskal-Wallace Test=20 -Mack-Skillings ( a generalization of the Friedman 2-way test that handles replication in cells) (the MS test is probably the most important novelty since it's not available in R (I think) and it handles a common task) -Fisher Exact (for special cases where data values are more like A/P calls as in CGH) These tests fall under one module. Originally the plan was to use R but that really wasn't needed and it seemed like it would require extra effort to deploy with dependencies on R and RServe. I think it was your advice at our last meeting to just implement the tests without R and while R worked fine with MeV in my tests, the deployment complication did outweighed the development cost of developing without R. In addition the module, I've created a wizard dialog system that can control the process of parameter collection for stat methods. This eliminates the need for tabbed panes and the size of dialogs are constrained. This new stat wizard system is only utilized in my NonpaR module but it can be used by other methods in the future. I have a manual entry finished. The last step is to build some help pages. The NonpaR module results are verified against R and also examples from the Hollander and Wolfe book on nonparametric methods. The module is currently being used by a couple of groups internal to JCVI that are part of the PFGRC. The manual entry pdf that I've attached will need a bit of formatting but it gives an overview of the module and also the look of the dialog wizard system (at least as rendered by Java 1.4.2).=20 John ________________________________ From: John Quackenbush [mailto:jo...@ji...]=20 Sent: Tuesday, August 28, 2007 9:31 AM To: Braisted, John C. Subject: Re: loading GEO format files John, It would be good to know about which tests you are working on since we are doing some parallel work. Do you have a list you could send? JQ Braisted, John C. wrote:=20 Hi Steve, =09 I just saw JQ's response as I was drafting mine... MeV is still being actively developed. The Dana Farber Cancer Institute (DFCI) affiliated with Harvard has a very large and capable MeV/TM4 development team with diverse skills. The vast majority of new work is coming from the DFCI group headed by John Quackenbush. There are developers at the University of Washington in Seattle that have added many important features and modules over several years. The group at UW is headed by Roger Bumgarner and while I haven't been in touch with them for several months I'm pretty sure they are going strong. =20 =09 The JCVI team (formally we were TIGR) has development goals that are a bit more aligned with supporting specific research needs and less focused on general software development (like features and under-the-hood details). One module close to release is a collection of nonparametric statistical tests. =09 John =09 -----Original Message----- From: Steve Taylor [mailto:st...@mo...]=20 Sent: Tuesday, August 28, 2007 4:18 AM To: Braisted, John C. Cc: mev; me...@ji... Subject: Re: loading GEO format files =09 Hi John, =09 Thanks for looking at this. I look forward to your reply. Is MeV still actively being developed BTW? =09 Steve =09 =20 I took a look at the file and came across the same error. The code=20 doesn't seem to handle the format of your file because it looks for=20 certain features in the file that can indicate where a data matrix=20 starts and stops but your file seems to deviate from the sample GEO=20 file enough to break the loading process. =09 The GEO file loader was developed at Harvard so I'm not an expert on=20 the file format or the loader. I think the loader can be made more=20 robust by testing with files like yours but the update of the loader=20 isn't in my current domain of work. I've cc'ed the group at Harvard=20 in case they can try to tackle the problem but I don't know their=20 priorities and so I can't say whether it will be addressed. It's not=20 very complex but it's a bit of work to make the changes. The group up =20 =09 =20 there at Harvard may also have advice about either modification of the =20 =09 =20 file or whether another format can be used from GEO. =09 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =09 I downloaded the matrix file which is supposed to be a matrix of=20 values where rows are spots or features and columns are separate=20 hybridization results. This normally can be easily modified to load=20 as a TDMS file in MeV. It looks like there may have been a possible=20 formatting issue with the matrix file submission to GEO where array=20 results were sort of concatenated or stacked. There are about 400,000 =20 =09 =20 rows and I realize it's a SNP study but I'm not sure it's a huge=20 tiling array or if the row count is correct. =09 Need to use WordPad (not Excel to modify the file into TDMS format): You would need to remove the header section of the file except for the =20 =09 =20 last header row with column ids that is just above the 'matrix' of=20 values. It's pretty sparse. Then go the last row and remove the last =20 =09 =20 line labeled '!....'. =09 The file is huge and will load into MEV but I suspect it's sort of a=20 stacking of array results rather than a properly formed data matrix. The row ids are not unique meaning that either there is a lot of=20 replication or that the results are stacked. The number of columns in =20 =09 =20 the matrix is consistent but most rows only contain data for one hyb=20 but that's not a strict rule as some rows have several values. =09 You can try to load the matrix file but unless the strange format=20 (sparse matrix with 400K rows) seems to make sense given the=20 experiment it might be better to either see if a GEO loader can be=20 re-worked or if you can contact GEO or the authors for a well=20 formatted matrix of values. =09 John Braisted =09 John Braisted Software Engineer II Pathogen Functional Genomics Resource Center (PFGRC) J. Craig Venter=20 Institute 9704 Medical Center Drive Rockville, MD 20850 =09 =09 -----Original Message----- From: Steve Taylor [mailto:st...@mo...] Sent: Friday, August 24, 2007 5:05 AM To: mev Subject: loading GEO format files =09 Hi, =09 I was trying to load a GEO SOFT format file in TMEV4 on Windows XP SP2 =20 =09 =20 (for example the one in=20 =09 ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SOFT/by_series/GSE5291/). I=20 uncompressed it and renamed the extension at the end to .txt but when=20 I tried to load it didn't show any columns had loaded (see attached=20 screenshot). =09 Has the SOFT format changed or does this feature just not work? =09 Thanks for any help, =09 Steve =09 ------------------------------------------------------------------ Head of Computational Biology Research Group Medical Sciences Division =20 =09 =20 Weatherall Institute of Molecular Medicine/Sir William Dunn School=20 Oxford University Tel: +44 (0)1865 (2)22640 (WIMM - Monday to Wednesday) Tel: +44 (0)1865 (2)85732 (Dunn - Thursday to Friday) Web: http://www.compbio.ox.ac.uk =09 =20 The information transmitted in this electronic communication is intended on= ly for the person or entity to whom it is addressed and may contain confide= ntial and/or privileged material. Any review, retransmission, dissemination= or other use of or taking of any action in reliance upon this information = by persons or entities other than the intended recipient is prohibited. If = you received this information in error, please contact the Compliance HelpL= ine at 800-856-1983 and properly dispose of this information. |
From: John Q. <jo...@ji...> - 2007-08-29 13:35:18
|
This sounds great John. Thanks for the update. Maybe we should have a conference call in a few weeks to touch base a bit and see when we want to have a next release. JQ Braisted, John C. wrote: > Hi John and others, > > I've developed a nonparametric statistical module that includes: > > -Wilcoxon Rank Sum > -Kruskal-Wallace Test > -Mack-Skillings ( a generalization of the Friedman 2-way test that > handles replication in cells) > (the MS test is probably the most important novelty since it's not > available in R (I think) and it handles a common task) > -Fisher Exact (for special cases where data values are more like A/P > calls as in CGH) > > These tests fall under one module. Originally the plan was to use R but > that really wasn't needed and it seemed like it would require extra > effort to deploy with dependencies on R and RServe. I think it was your > advice at our last meeting to just implement the tests without R and > while R worked fine with MeV in my tests, the deployment complication > did outweighed the development cost of developing without R. > > In addition the module, I've created a wizard dialog system that can > control the process of parameter collection for stat methods. This > eliminates the need for tabbed panes and the size of dialogs are > constrained. This new stat wizard system is only utilized in my NonpaR > module but it can be used by other methods in the future. > > I have a manual entry finished. The last step is to build some help > pages. The NonpaR module results are verified against R and also > examples from the Hollander and Wolfe book on nonparametric methods. > The module is currently being used by a couple of groups internal to > JCVI that are part of the PFGRC. > > The manual entry pdf that I've attached will need a bit of formatting > but it gives an overview of the module and also the look of the dialog > wizard system (at least as rendered by Java 1.4.2). > > John > > > ________________________________ > > From: John Quackenbush [mailto:jo...@ji...] > Sent: Tuesday, August 28, 2007 9:31 AM > To: Braisted, John C. > Subject: Re: loading GEO format files > > > John, > > It would be good to know about which tests you are working on since we > are doing some parallel work. Do you have a list you could send? > > JQ > > Braisted, John C. wrote: > > Hi Steve, > > I just saw JQ's response as I was drafting mine... MeV is still > being > actively developed. The Dana Farber Cancer Institute (DFCI) > affiliated > with Harvard has a very large and capable MeV/TM4 development > team with > diverse skills. The vast majority of new work is coming from > the DFCI > group headed by John Quackenbush. There are developers at the > University of Washington in Seattle that have added many > important > features and modules over several years. The group at UW is > headed by > Roger Bumgarner and while I haven't been in touch with them for > several > months I'm pretty sure they are going strong. > > The JCVI team (formally we were TIGR) has development goals that > are a > bit more aligned with supporting specific research needs and > less > focused on general software development (like features and > under-the-hood details). One module close to release is a > collection of > nonparametric statistical tests. > > John > > -----Original Message----- > From: Steve Taylor [mailto:st...@mo...] > Sent: Tuesday, August 28, 2007 4:18 AM > To: Braisted, John C. > Cc: mev; me...@ji... > Subject: Re: loading GEO format files > > Hi John, > > Thanks for looking at this. I look forward to your reply. Is MeV > still > actively being developed BTW? > > Steve > > > > I took a look at the file and came across the same > error. The code > doesn't seem to handle the format of your file because > it looks for > certain features in the file that can indicate where a > data matrix > starts and stops but your file seems to deviate from the > sample GEO > file enough to break the loading process. > > The GEO file loader was developed at Harvard so I'm not > an expert on > the file format or the loader. I think the loader can > be made more > robust by testing with files like yours but the update > of the loader > isn't in my current domain of work. I've cc'ed the > group at Harvard > in case they can try to tackle the problem but I don't > know their > priorities and so I can't say whether it will be > addressed. It's not > very complex but it's a bit of work to make the changes. > The group up > > > > > > there at Harvard may also have advice about either > modification of the > > > > > > file or whether another format can be used from GEO. > > ============ > > I downloaded the matrix file which is supposed to be a > matrix of > values where rows are spots or features and columns are > separate > hybridization results. This normally can be easily > modified to load > as a TDMS file in MeV. It looks like there may have > been a possible > formatting issue with the matrix file submission to GEO > where array > results were sort of concatenated or stacked. There are > about 400,000 > > > > > > rows and I realize it's a SNP study but I'm not sure > it's a huge > tiling array or if the row count is correct. > > Need to use WordPad (not Excel to modify the file into > TDMS format): > You would need to remove the header section of the file > except for the > > > > > > last header row with column ids that is just above the > 'matrix' of > values. It's pretty sparse. Then go the last row and > remove the last > > > > > > line labeled '!....'. > > The file is huge and will load into MEV but I suspect > it's sort of a > stacking of array results rather than a properly formed > data matrix. > The row ids are not unique meaning that either there is > a lot of > replication or that the results are stacked. The number > of columns in > > > > > > the matrix is consistent but most rows only contain data > for one hyb > but that's not a strict rule as some rows have several > values. > > You can try to load the matrix file but unless the > strange format > (sparse matrix with 400K rows) seems to make sense given > the > experiment it might be better to either see if a GEO > loader can be > re-worked or if you can contact GEO or the authors for a > well > formatted matrix of values. > > John Braisted > > John Braisted > Software Engineer II > Pathogen Functional Genomics Resource Center (PFGRC) J. > Craig Venter > Institute > 9704 Medical Center Drive > Rockville, MD 20850 > > > -----Original Message----- > From: Steve Taylor [mailto:st...@mo...] > Sent: Friday, August 24, 2007 5:05 AM > To: mev > Subject: loading GEO format files > > Hi, > > I was trying to load a GEO SOFT format file in TMEV4 on > Windows XP SP2 > > > > > > (for example the one in > > ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SOFT/by_series/GSE5291/). I > uncompressed it and renamed the extension at the end to > .txt but when > I tried to load it didn't show any columns had loaded > (see attached > screenshot). > > Has the SOFT format changed or does this feature just > not work? > > Thanks for any help, > > Steve > > ------------------------------------------------------------------ > Head of Computational Biology Research Group Medical > Sciences Division > > > > > > Weatherall Institute of Molecular Medicine/Sir William > Dunn School > Oxford University > Tel: +44 (0)1865 (2)22640 (WIMM - Monday to Wednesday) > Tel: +44 (0)1865 (2)85732 (Dunn - Thursday to Friday) > Web: http://www.compbio.ox.ac.uk > > > > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > ------------------------------------------------------------------------ > > _______________________________________________ > mev-tm4-devel mailing list > mev...@li... > https://lists.sourceforge.net/lists/listinfo/mev-tm4-devel > |