This sounds great John. Thanks for the update.

Maybe we should have a conference call in a few weeks to touch base a bit and see when we want to have a next release.

JQ

Braisted, John C. wrote:
Hi John and others,

I've developed a nonparametric statistical module that includes:

-Wilcoxon Rank Sum
-Kruskal-Wallace Test 
-Mack-Skillings ( a generalization of the Friedman 2-way test that
handles replication in cells)
(the MS test is probably the most important novelty since it's not
available in R (I think) and it handles a common task)
-Fisher Exact (for special cases where data values are more like A/P
calls as in CGH)

These tests fall under one module.  Originally the plan was to use R but
that really wasn't needed and it seemed like it would require extra
effort to deploy with dependencies on R and RServe.  I think it was your
advice at our last meeting to just implement the tests without R and
while R worked fine with MeV in my tests, the deployment complication
did outweighed the development cost of developing without R.

In addition the module, I've created a wizard dialog system that can
control the process of parameter collection for stat methods.  This
eliminates the need for tabbed panes and the size of dialogs are
constrained.  This new stat wizard system is only utilized in my NonpaR
module but it can be used by other methods in the future.

I have a manual entry finished.  The last step is to build some help
pages.  The NonpaR module results are verified against R and also
examples from the Hollander and Wolfe book on nonparametric methods.
The module is currently being used by a couple of groups internal to
JCVI that are part of the PFGRC.

The manual entry pdf that I've attached will need a bit of formatting
but it gives an overview of the module and also the look of the dialog
wizard system (at least as rendered by Java 1.4.2). 

John


________________________________

From: John Quackenbush [mailto:johnq@jimmy.harvard.edu] 
Sent: Tuesday, August 28, 2007 9:31 AM
To: Braisted, John C.
Subject: Re: loading GEO format files


John,

It would be good to know about which tests you are working on since we
are doing some parallel work. Do you have a list you could send?

JQ

Braisted, John C. wrote: 

	Hi Steve,
	
	I just saw JQ's response as I was drafting mine... MeV is still
being
	actively developed.  The Dana Farber Cancer Institute (DFCI)
affiliated
	with Harvard has a very large and capable MeV/TM4 development
team with
	diverse skills.  The vast majority of new work is coming from
the DFCI
	group headed by John Quackenbush.  There are developers at the
	University of Washington in Seattle that have added many
important
	features and modules over several years.  The group at UW is
headed by
	Roger Bumgarner and while I haven't been in touch with them for
several
	months I'm pretty sure they are going strong.  
	
	The JCVI team (formally we were TIGR) has development goals that
are a
	bit more aligned with supporting specific research needs and
less
	focused on general software development (like features and
	under-the-hood details).  One module close to release is a
collection of
	nonparametric statistical tests.
	
	John
	
	-----Original Message-----
	From: Steve Taylor [mailto:staylor@molbiol.ox.ac.uk] 
	Sent: Tuesday, August 28, 2007 4:18 AM
	To: Braisted, John C.
	Cc: mev; mev@jimmy.harvard.edu
	Subject: Re: loading GEO format files
	
	Hi John,
	
	Thanks for looking at this. I look forward to your reply. Is MeV
still
	actively being developed BTW?
	
	Steve
	
	  

		I took a look at the file and came across the same
error. The code 
		doesn't seem to handle the format of your file because
it looks for 
		certain features in the file that can indicate where a
data matrix 
		starts and stops but your file seems to deviate from the
sample GEO 
		file enough to break the loading process.
		
		The GEO file loader was developed at Harvard so I'm not
an expert on 
		the file format or the loader.  I think the loader can
be made more 
		robust by testing with files like yours but the update
of the loader 
		isn't in my current domain of work.  I've cc'ed the
group at Harvard 
		in case they can try to tackle the problem but I don't
know their 
		priorities and so I can't say whether it will be
addressed.  It's not 
		very complex but it's a bit of work to make the changes.
The group up
		    

	
	  

		there at Harvard may also have advice about either
modification of the
		    

	
	  

		file or whether another format can be used from GEO.
		
		============
		
		I downloaded the matrix file which is supposed to be a
matrix of 
		values where rows are spots or features and columns are
separate 
		hybridization results.  This normally can be easily
modified to load 
		as a TDMS file in MeV.  It looks like there may have
been a possible 
		formatting issue with the matrix file submission to GEO
where array 
		results were sort of concatenated or stacked.  There are
about 400,000
		    

	
	  

		rows and I realize it's a SNP study but I'm not sure
it's a huge 
		tiling array or if the row count is correct.
		
		Need to use WordPad (not Excel to modify the file into
TDMS format):
		You would need to remove the header section of the file
except for the
		    

	
	  

		last header row with column ids that is just above the
'matrix' of 
		values.  It's pretty sparse.  Then go the last row and
remove the last
		    

	
	  

		line labeled '!....'.
		
		The file is huge and will load into MEV but I suspect
it's sort of a 
		stacking of array results rather than a properly formed
data matrix.
		The row ids are not unique meaning that either there is
a lot of 
		replication or that the results are stacked.  The number
of columns in
		    

	
	  

		the matrix is consistent but most rows only contain data
for one hyb 
		but that's not a strict rule as some rows have several
values.
		
		You can try to load the matrix file but unless the
strange format 
		(sparse matrix with 400K rows) seems to make sense given
the 
		experiment it might be better to either see if a GEO
loader can be 
		re-worked or if you can contact GEO or the authors for a
well 
		formatted matrix of values.
		
		John Braisted
		
		John Braisted
		Software Engineer II
		Pathogen Functional Genomics Resource Center (PFGRC) J.
Craig Venter 
		Institute
		9704 Medical Center Drive
		Rockville, MD 20850
		
		
		-----Original Message-----
		From: Steve Taylor [mailto:staylor@molbiol.ox.ac.uk]
		Sent: Friday, August 24, 2007 5:05 AM
		To: mev
		Subject: loading GEO format files
		
		Hi,
		
		I was trying to load a GEO SOFT format file in TMEV4 on
Windows XP SP2
		    

	
	  

		(for example the one in 
	
ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SOFT/by_series/GSE5291/). I 
		uncompressed it and renamed the extension at the end to
.txt but when 
		I tried to load it didn't show any columns had loaded
(see attached 
		screenshot).
		
		Has the SOFT format changed or does this feature just
not work?
		
		Thanks for any help,
		
		Steve
	
------------------------------------------------------------------
		Head of Computational Biology Research Group Medical
Sciences Division
		    

	
	  

		Weatherall Institute of Molecular Medicine/Sir William
Dunn School 
		Oxford University
		Tel: +44 (0)1865 (2)22640 (WIMM - Monday to Wednesday)
		Tel: +44 (0)1865 (2)85732 (Dunn - Thursday to Friday)
		Web: http://www.compbio.ox.ac.uk
		
		    

  

------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/

_______________________________________________ mev-tm4-devel mailing list mev-tm4-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mev-tm4-devel