Hi Carthinka,
 
You can load as many mev files as you want at one time.  Figure 4.1.1, page 10
of the manual has a view of the MeV file loader.  The top portion allows the selection
of multiple MeV files, while the lower is for annotation file selection.  The values
in the expression matrix after loading mev format files will be log 2 ratios.
 
You will need to construct a single .ann file that has rows of annotation that correspond
to the rows in your mev expression file.
 
Note that once you have loaded all of your data as separate mev files you can
use the File->Save Matrix option to save it as a TDMS file.   The TDMS does have
the advantage of not having to select files in a particular order related to experimental
group.
 
Log transformation is certainly essential and it's important to do it if you are constructing
a TDMS file in Excel.  My point was that you can skip that step by loading multiple mev format
files.  Saving the matrix as described above will construct a TDMS for you.
 
===
 
Regarding your ttest question, all replicates are considered if they are checked in
the one sample ttest module dialog box.  If you are missing data for a particular row
in the matrix there is no imputation but rather that gene just has fewer replicates.
People often use the 'Percentage Cutoff' filter to eliminate rows in the matrix
with few good values to ensure that a minimum number of measurements exist.
(Page 28 of the manual, menu options are:
Adjust Data->Data Filters->Percentage Cutoff Filter,  enter a percentage)
 
One side note, most dialogs have a button for information in the lower left corner (i).
Sometimes this can be helpful.
 
===
 
For the multi group case the ANOVA should be the perfect tool.  One point to
make is that ANOVA will create a set of significant genes but that this group
will consist of genes that are significant for different reasons.  Some may be
up in group one, others may be under expressed in group three.... you get
the idea.  To sort this out I suggest using the 'Construct Hierarchical Trees'
option at the bottom of the dialog (* Just choose to cluster the significant cluster from ANOVA).
(You will see this option at the bottom of several dialogs including the one for ANOVA)
This will organize your significant genes based on the pattern of expression and
will help to segregate genes reacting in similar ways.
 
I'm also not sure about ordering significant genes from ANOVA since there
are multiple comparisons.
 
By the way, your area of research is fascinating.  Let us know if there are further questions.
 
John
 
 
 
 


From: Carthika Luxmanan [mailto:carthika.luxmanan@anatomy.otago.ac.nz]
Sent: Thursday, October 12, 2006 6:15 PM
To: Braisted, John C.
Subject: Re: Log transformations...

Dear John

Thank you for your most informative email. We are looking at gene expression changes following LTP (long term potentiation; an electrophysiological model for memory) across age. Briefly, one side of a rat hippocampus is stimulated for LTP, and the opposite side stays as control. In a single array, I hybridise tissue from a single animal (Stimulated vs unstimulated, Cy5 vs Cy3) and so look at a fold change. I have 3 different age groups (Young, middle-aged and old) and I have at the moment 5 biological replicates each for young and old, and 3 biological replicates for middle-aged. This is why I use the TDMS file format, so that I can load all the data at once. If I were to use the mev files, then I'll only load individual arrays, will I not. In this case, I am not sure how I can carry out statistics. Is there a way of inputting mev files for all my arrays, then look at them altogether? I'd prefer not to manually meddle with my data! But I thought I should log transform my fold changes so the data would follow a normal distribution.

Following this, indeed I carry out a one-sample TTEST for each group. Currently I have three lists of differentially expressed genes following LTP; one for the young, one for middle aged and one for old. The next aim is to look across those lists and see if there are changes across age. I am not sure if I should instead perform an ANOVA for this. The gene list is then smaller.

If I perform a SAM, I am not getting any differentially expressed genes! So I have stuck to TTEST. I am aware that depending on my p-value cut-off, there is a chance of getting a percentage of genes that are randomly found to be differentially expressed. With the ttest itself, does the program look at the variability across the biological replicates of each gene, then calculate the p-value? Are all the replicates taken into account, or does the program look to see consistent changes in perhaps 3 out of 5 arrays?

I apologise for all these questions! I would like to go ahead and start verifying some genes. And your suggestion sounds perfect, which is what I had thought I should do-get a list of differentially expressed genes, and sort them on fold change then look at the highest changed. However when I look across groups, it becomes a bit more complicated.

Thanks so much for your email. Looking forward to your reply.

Best wishes
Carthika

On 13/10/2006, at 2:58 AM, Braisted, John C. wrote:

Hi Carthika,
 
The mev files are directly loadable into MeV if you would prefer not to have to do
the log ratio ratios in Excel.  An annotation (.ann) file would be needed
to load related annotation.  The data directory an example of the annotation
file under 'TIGR_files' and there is a description in the MeV manual appendix on
file formats.
 
====
 
MeV simply uses the values in the TDMS file and does not perform a log transformation
on the input data.  I gather that since you are focusing on a fold change within a
particular array that you have replicate hybridizations of your experimental condition samples
against reference samples or some sort of control condition sample and that you are using
one sample TTEST to find significant genes.
 
If this is not the case please describe the design a bit more since your options will
vary based on whether you have one, two, or multiple experimental groups.
 
It is possible and likely that some genes will be statistically significant yet will have a
mean fold change that is not far from 1.   Roughly speaking, significance is a function
of mean difference over a measure of the variability of the measurements.
It can happen that a gene is not highly over expressed or under expressed yet the measurements
are so tight that the conclusion is made that there is a significant change given the
confidence in the mean value (low replicate variability).  This can happen even given
biological variation when thousands of genes are considered.
 
I think it is valid to use t-test or SAM to find statistically significant genes and report them
all or have them all ready in a supplemental table and then filter the list to focus on those
genes that have more biologically significant changes (mean fold change is larger).
*The fold change criteria or cutoff is arbitrary so I think reporting all statistically significant genes
sorted by mean fold change would be a good compromise.  That way you capture all
significant genes but the focus can be on those that were up or down regulated by
the greatest magnitude fold change.
 
**This is email going out to others in the MeV development group in case there are other
opinions or ideas to add.  Please let us know if there are additional question.
Thanks for using MeV and the TM4 suite.
 
Best Regards,
 
John Braisted
The Institute for Genomic Research
 
 
 
 
 


From: Carthika Luxmanan [mailto:carthika.luxmanan@anatomy.otago.ac.nz]
Sent: Wednesday, October 11, 2006 9:54 PM
To: mev
Subject: Log transformations...

Hi there.

I have used MIDAS to carry out some normalisations on my mucroarray data. The result is an MDS.mev file, which contains normalised data as specified in MIDAS. Then I open these MDS.mev files for each of my arrays in excel, and work out the Cy5/Cy3 ratio, log2 transform this ratio, and make a file containing my log transformed data for each of my arrays. Then I load this as a TDMS file in MeV, and carry out my analyses. My question is, do I need to manually log transform my data following MIDAS normalisation, or does MeV automatically take care of this? I am generally finding that my expression ratios are quite low when I look at differentially expressed genes from Mev, and I suspect that perhaps my data is being log transformed twice. One my top differentially expressed gene (with a very small p value) only has an dye ratio of 1.1 fold (after I anti-log it). I would appreciate your thoughts on this.

Thanks heaps
Carthika


________________________________________________________
Carthika Luxmanan, BSc(Hons)
Dept. Anatomy & Structural Biology
University of Otago
Dunedin
New Zealand
Office: +64 3 479 5182
Lab: +64 3 479 5792