You can load as many mev files as you want at one
time. Figure 4.1.1, page 10
of the manual has a view of the MeV file loader. The
top portion allows the selection
of multiple MeV files, while the lower is for annotation
file selection. The values
in the expression matrix after loading mev format files
will be log 2 ratios.
You will need to construct a single .ann file that has rows
of annotation that correspond
to the rows in your mev expression
Note that once you have loaded all of your data as separate
mev files you can
use the File->Save Matrix option to save it as a TDMS
file. The TDMS does have
the advantage of not having to select files in a particular
order related to experimental
Log transformation is certainly essential and it's
important to do it if you are constructing
a TDMS file in Excel. My point was that you can skip
that step by loading multiple mev format
files. Saving the matrix as described above will
construct a TDMS for you.
Regarding your ttest question, all replicates are
considered if they are checked in
the one sample ttest module dialog box. If
you are missing data for a particular row
in the matrix there is no imputation but rather that
gene just has fewer replicates.
People often use the 'Percentage Cutoff' filter to
eliminate rows in the matrix
with few good values to ensure that a minimum number of
(Page 28 of the manual, menu options
Adjust Data->Data Filters->Percentage Cutoff
Filter, enter a percentage)
One side note, most dialogs have a button for
information in the lower left corner (i).
Sometimes this can be
For the multi group case the ANOVA should be the
perfect tool. One point to
make is that ANOVA will create a set of significant
genes but that this group
will consist of genes that are significant for
different reasons. Some may be
up in group one, others may be under expressed in group
three.... you get
the idea. To sort this out I suggest using the
'Construct Hierarchical Trees'
option at the bottom of the dialog (* Just choose to
cluster the significant cluster from ANOVA).
(You will see this option at the bottom of several
dialogs including the one for ANOVA)
This will organize your significant genes based on the
pattern of expression and
will help to segregate genes reacting in similar
I'm also not sure about ordering significant genes from
ANOVA since there
By the way, your area of research is fascinating.
Let us know if there are further questions.
Thank you for your most informative email. We are looking at gene
expression changes following LTP (long term potentiation; an
electrophysiological model for memory) across age. Briefly, one side of a rat
hippocampus is stimulated for LTP, and the opposite side stays as control. In a
single array, I hybridise tissue from a single animal (Stimulated vs
unstimulated, Cy5 vs Cy3) and so look at a fold change. I have 3 different age
groups (Young, middle-aged and old) and I have at the moment 5 biological
replicates each for young and old, and 3 biological replicates for middle-aged.
This is why I use the TDMS file format, so that I can load all the data at once.
If I were to use the mev files, then I'll only load individual arrays, will I
not. In this case, I am not sure how I can carry out statistics. Is there a way
of inputting mev files for all my arrays, then look at them altogether? I'd
prefer not to manually meddle with my data! But I thought I should log transform
my fold changes so the data would follow a normal distribution.
Following this, indeed I carry out a one-sample TTEST for each group.
Currently I have three lists of differentially expressed genes following LTP;
one for the young, one for middle aged and one for old. The next aim is to look
across those lists and see if there are changes across age. I am not sure if I
should instead perform an ANOVA for this. The gene list is then smaller.
If I perform a SAM, I am not getting any differentially expressed genes! So
I have stuck to TTEST. I am aware that depending on my p-value cut-off, there is
a chance of getting a percentage of genes that are randomly found to be
differentially expressed. With the ttest itself, does the program look at the
variability across the biological replicates of each gene, then calculate the
p-value? Are all the replicates taken into account, or does the program look to
see consistent changes in perhaps 3 out of 5 arrays?
I apologise for all these questions! I would like to go ahead and start
verifying some genes. And your suggestion sounds perfect, which is what I had
thought I should do-get a list of differentially expressed genes, and sort them
on fold change then look at the highest changed. However when I look across
groups, it becomes a bit more complicated.
Thanks so much for your email. Looking forward to your reply.
On 13/10/2006, at 2:58 AM, Braisted, John C. wrote:
The mev files are directly loadable into MeV if you would
prefer not to have to do
the log ratio ratios in Excel. An annotation
(.ann) file would be needed
to load related annotation. The data
directory an example of the annotation
file under 'TIGR_files' and there is a description
in the MeV manual appendix on
MeV simply uses the values in the TDMS file and does not
perform a log transformation
on the input data. I gather that since you are
focusing on a fold change within a
particular array that you have replicate hybridizations
of your experimental condition samples
against reference samples or some sort of control
condition sample and that you are using
one sample TTEST to find significant
If this is not the case please describe the design a bit
more since your options will
vary based on whether you have one, two, or multiple
It is possible and likely that some genes will be
statistically significant yet will have a
mean fold change that is not far from 1.
Roughly speaking, significance is a function
of mean difference over a measure of the variability of
It can happen that a gene is not highly over expressed or
under expressed yet the measurements
are so tight that the conclusion is made that there is a
significant change given the
confidence in the mean value (low replicate
variability). This can happen even given
biological variation when thousands of genes are
I think it is valid to use t-test or SAM to find
statistically significant genes and report them
all or have them all ready in a supplemental table and
then filter the list to focus on those
genes that have more biologically significant changes
(mean fold change is larger).
*The fold change criteria or cutoff is arbitrary so
I think reporting all statistically significant genes
sorted by mean fold change would be a good
compromise. That way you capture all
significant genes but the focus can be on those that were
up or down regulated by
the greatest magnitude fold change.
**This is email going out to others in
the MeV development group in
case there are other
opinions or ideas to add. Please let us know if
there are additional question.
Thanks for using MeV and the TM4
The Institute for Genomic Research
I have used MIDAS to carry out some normalisations on my mucroarray data.
The result is an MDS.mev file, which contains normalised data as specified in
MIDAS. Then I open these MDS.mev files for each of my arrays in excel, and
work out the Cy5/Cy3 ratio, log2 transform this ratio, and make a file
containing my log transformed data for each of my arrays. Then I load this as
a TDMS file in MeV, and carry out my analyses. My question is, do I need to
manually log transform my data following MIDAS normalisation, or does MeV
automatically take care of this? I am generally finding that my expression
ratios are quite low when I look at differentially expressed genes from Mev,
and I suspect that perhaps my data is being log transformed twice. One my top
differentially expressed gene (with a very small p value) only has an dye
ratio of 1.1 fold (after I anti-log it). I would appreciate your thoughts on
Carthika Luxmanan, BSc(Hons)
Dept. Anatomy & Structural Biology
University of Otago
Office: +64 3 479 5182
Lab: +64 3 479