From: Braisted, J. C. <bra...@ti...> - 2006-10-12 13:59:02
|
Hi Carthika, =20 The mev files are directly loadable into MeV if you would prefer not to have to do the log ratio ratios in Excel. An annotation (.ann) file would be needed to load related annotation. The data directory an example of the annotation file under 'TIGR_files' and there is a description in the MeV manual appendix on=20 file formats. =20 =3D=3D=3D=3D =20 MeV simply uses the values in the TDMS file and does not perform a log transformation on the input data. I gather that since you are focusing on a fold change within a particular array that you have replicate hybridizations of your experimental condition samples against reference samples or some sort of control condition sample and that you are using=20 one sample TTEST to find significant genes. =20 If this is not the case please describe the design a bit more since your options will vary based on whether you have one, two, or multiple experimental groups. =20 It is possible and likely that some genes will be statistically significant yet will have a mean fold change that is not far from 1. Roughly speaking, significance is a function of mean difference over a measure of the variability of the measurements. It can happen that a gene is not highly over expressed or under expressed yet the measurements are so tight that the conclusion is made that there is a significant change given the confidence in the mean value (low replicate variability). This can happen even given biological variation when thousands of genes are considered. =20 I think it is valid to use t-test or SAM to find statistically significant genes and report them all or have them all ready in a supplemental table and then filter the list to focus on those genes that have more biologically significant changes (mean fold change is larger). *The fold change criteria or cutoff is arbitrary so I think reporting all statistically significant genes sorted by mean fold change would be a good compromise. That way you capture all significant genes but the focus can be on those that were up or down regulated by the greatest magnitude fold change. =20 **This is email going out to others in the MeV development group in case there are other opinions or ideas to add. Please let us know if there are additional question. Thanks for using MeV and the TM4 suite. =20 Best Regards, =20 John Braisted The Institute for Genomic Research =20 =20 =20 =20 =20 ________________________________ From: Carthika Luxmanan [mailto:car...@an...]=20 Sent: Wednesday, October 11, 2006 9:54 PM To: mev Subject: Log transformations... Hi there.=20 I have used MIDAS to carry out some normalisations on my mucroarray data. The result is an MDS.mev file, which contains normalised data as specified in MIDAS. Then I open these MDS.mev files for each of my arrays in excel, and work out the Cy5/Cy3 ratio, log2 transform this ratio, and make a file containing my log transformed data for each of my arrays. Then I load this as a TDMS file in MeV, and carry out my analyses. My question is, do I need to manually log transform my data following MIDAS normalisation, or does MeV automatically take care of this? I am generally finding that my expression ratios are quite low when I look at differentially expressed genes from Mev, and I suspect that perhaps my data is being log transformed twice. One my top differentially expressed gene (with a very small p value) only has an dye ratio of 1.1 fold (after I anti-log it). I would appreciate your thoughts on this. Thanks heaps Carthika ________________________________________________________ Carthika Luxmanan, BSc(Hons) Dept. Anatomy & Structural Biology University of Otago Dunedin New Zealand Office: +64 3 479 5182 Lab: +64 3 479 5792 |
From: Braisted, J. C. <bra...@ti...> - 2006-10-13 13:43:57
|
Hi Carthinka, =20 You can load as many mev files as you want at one time. Figure 4.1.1, page 10 of the manual has a view of the MeV file loader. The top portion allows the selection=20 of multiple MeV files, while the lower is for annotation file selection. The values in the expression matrix after loading mev format files will be log 2 ratios. =20 You will need to construct a single .ann file that has rows of annotation that correspond to the rows in your mev expression file. =20 Note that once you have loaded all of your data as separate mev files you can use the File->Save Matrix option to save it as a TDMS file. The TDMS does have the advantage of not having to select files in a particular order related to experimental group. =20 Log transformation is certainly essential and it's important to do it if you are constructing a TDMS file in Excel. My point was that you can skip that step by loading multiple mev format files. Saving the matrix as described above will construct a TDMS for you. =20 =3D=3D=3D =20 Regarding your ttest question, all replicates are considered if they are checked in the one sample ttest module dialog box. If you are missing data for a particular row in the matrix there is no imputation but rather that gene just has fewer replicates. People often use the 'Percentage Cutoff' filter to eliminate rows in the matrix with few good values to ensure that a minimum number of measurements exist. (Page 28 of the manual, menu options are: Adjust Data->Data Filters->Percentage Cutoff Filter, enter a percentage) =20 One side note, most dialogs have a button for information in the lower left corner (i). Sometimes this can be helpful. =20 =3D=3D=3D =20 For the multi group case the ANOVA should be the perfect tool. One point to make is that ANOVA will create a set of significant genes but that this group will consist of genes that are significant for different reasons. Some may be up in group one, others may be under expressed in group three.... you get the idea. To sort this out I suggest using the 'Construct Hierarchical Trees' option at the bottom of the dialog (* Just choose to cluster the significant cluster from ANOVA). (You will see this option at the bottom of several dialogs including the one for ANOVA) This will organize your significant genes based on the pattern of expression and will help to segregate genes reacting in similar ways. =20 I'm also not sure about ordering significant genes from ANOVA since there are multiple comparisons. =20 By the way, your area of research is fascinating. Let us know if there are further questions. =20 John =20 =20 =20 =20 ________________________________ From: Carthika Luxmanan [mailto:car...@an...]=20 Sent: Thursday, October 12, 2006 6:15 PM To: Braisted, John C. Subject: Re: Log transformations... Dear John=20 Thank you for your most informative email. We are looking at gene expression changes following LTP (long term potentiation; an electrophysiological model for memory) across age. Briefly, one side of a rat hippocampus is stimulated for LTP, and the opposite side stays as control. In a single array, I hybridise tissue from a single animal (Stimulated vs unstimulated, Cy5 vs Cy3) and so look at a fold change. I have 3 different age groups (Young, middle-aged and old) and I have at the moment 5 biological replicates each for young and old, and 3 biological replicates for middle-aged. This is why I use the TDMS file format, so that I can load all the data at once. If I were to use the mev files, then I'll only load individual arrays, will I not. In this case, I am not sure how I can carry out statistics. Is there a way of inputting mev files for all my arrays, then look at them altogether? I'd prefer not to manually meddle with my data! But I thought I should log transform my fold changes so the data would follow a normal distribution. Following this, indeed I carry out a one-sample TTEST for each group. Currently I have three lists of differentially expressed genes following LTP; one for the young, one for middle aged and one for old. The next aim is to look across those lists and see if there are changes across age. I am not sure if I should instead perform an ANOVA for this. The gene list is then smaller. If I perform a SAM, I am not getting any differentially expressed genes! So I have stuck to TTEST. I am aware that depending on my p-value cut-off, there is a chance of getting a percentage of genes that are randomly found to be differentially expressed. With the ttest itself, does the program look at the variability across the biological replicates of each gene, then calculate the p-value? Are all the replicates taken into account, or does the program look to see consistent changes in perhaps 3 out of 5 arrays? I apologise for all these questions! I would like to go ahead and start verifying some genes. And your suggestion sounds perfect, which is what I had thought I should do-get a list of differentially expressed genes, and sort them on fold change then look at the highest changed. However when I look across groups, it becomes a bit more complicated. Thanks so much for your email. Looking forward to your reply. Best wishes Carthika On 13/10/2006, at 2:58 AM, Braisted, John C. wrote: Hi Carthika, =20 The mev files are directly loadable into MeV if you would prefer not to have to do the log ratio ratios in Excel. An annotation (.ann) file would be needed to load related annotation. The data directory an example of the annotation file under 'TIGR_files' and there is a description in the MeV manual appendix on=20 file formats. =20 =3D=3D=3D=3D =20 MeV simply uses the values in the TDMS file and does not perform a log transformation on the input data. I gather that since you are focusing on a fold change within a particular array that you have replicate hybridizations of your experimental condition samples against reference samples or some sort of control condition sample and that you are using=20 one sample TTEST to find significant genes. =20 If this is not the case please describe the design a bit more since your options will vary based on whether you have one, two, or multiple experimental groups. =20 It is possible and likely that some genes will be statistically significant yet will have a mean fold change that is not far from 1. Roughly speaking, significance is a function of mean difference over a measure of the variability of the measurements. It can happen that a gene is not highly over expressed or under expressed yet the measurements are so tight that the conclusion is made that there is a significant change given the confidence in the mean value (low replicate variability). This can happen even given biological variation when thousands of genes are considered. =20 I think it is valid to use t-test or SAM to find statistically significant genes and report them all or have them all ready in a supplemental table and then filter the list to focus on those genes that have more biologically significant changes (mean fold change is larger). *The fold change criteria or cutoff is arbitrary so I think reporting all statistically significant genes sorted by mean fold change would be a good compromise. That way you capture all significant genes but the focus can be on those that were up or down regulated by the greatest magnitude fold change. =20 **This is email going out to others in the MeV development group in case there are other opinions or ideas to add. Please let us know if there are additional question. Thanks for using MeV and the TM4 suite. =20 Best Regards, =20 John Braisted The Institute for Genomic Research =20 =20 =20 =20 =20 ________________________________ From: Carthika Luxmanan [mailto:car...@an...]=20 Sent: Wednesday, October 11, 2006 9:54 PM To: mev Subject: Log transformations... =09 =09 Hi there.=20 I have used MIDAS to carry out some normalisations on my mucroarray data. The result is an MDS.mev file, which contains normalised data as specified in MIDAS. Then I open these MDS.mev files for each of my arrays in excel, and work out the Cy5/Cy3 ratio, log2 transform this ratio, and make a file containing my log transformed data for each of my arrays. Then I load this as a TDMS file in MeV, and carry out my analyses. My question is, do I need to manually log transform my data following MIDAS normalisation, or does MeV automatically take care of this? I am generally finding that my expression ratios are quite low when I look at differentially expressed genes from Mev, and I suspect that perhaps my data is being log transformed twice. One my top differentially expressed gene (with a very small p value) only has an dye ratio of 1.1 fold (after I anti-log it). I would appreciate your thoughts on this. Thanks heaps Carthika ________________________________________________________ Carthika Luxmanan, BSc(Hons) Dept. Anatomy & Structural Biology University of Otago Dunedin New Zealand Office: +64 3 479 5182 Lab: +64 3 479 5792 |