Re: [Erppcatoolkit-support] Cross validation

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

There was indeed a serious bug in this function that I will be fixing immediately.  Thanks for your report!

Regarding your more general question about applying the PCA results of the averaged data to the single-trial data, after some further thought I would discourage doing so.  It is important to understand that this procedure is not intended to allow one to apply a factor solution to entirely different datasets per se.  A factor solution is specific to a dataset.  For example, factor scoring coefficients reflect not just what is needed to measure a latent variable (e.g., ERP component) but also what is needed to disentangle it from other overlapping latent variables.  So if both datasets had exactly the same P300 but only the first also had an N400, then applying the PCA results to the second dataset would be invalid.  In the present case, it is likely that the single-trial data has features (e.g., alpha waves and artifacts) that were averaged out.  It would only be valid to apply the factor scoring coefficients from the PCA of one dataset to another dataset if the second dataset had exactly the same latent variables (e.g., same ERP components with exactly the same time course for a temporal PCA, differing only in amplitudes) and the same inter-factor correlations.  This can rarely be said to be the case.  The sole intent is to allow one to compare two different factor solutions on the same dataset to quantitatively determine how similar they are (correlate the two sets of factor scores).  I had chosen to call this procedure “cross-validation" to emphasize this point, although I should probably change the name to something like “cross-verification” to avoid confusion with more common applications of the procedure.

Thanks again!

Joe

> On Nov 19, 2017, at 01:19, Joseph Dien <jd...@ma...> wrote:
> 
> Hi Andreas,
>    1) yes, that is correct.
> 
>    2) I haven’t tried doing anything like this.  when you say you tried the former, what you did was take all the single trials from all the subjects and combined them into a single dataset and then applied the cross-validation to this combined dataset?
> 
>    3) it should be the same.  Send me your data file along with the settings you used and I’ll take a look at it.
> 
> Joe
> 
>> On Nov 14, 2017, at 10:01, Andreas Widmann <wi...@un... <mailto:wi...@un...>> wrote:
>> 
>> Dear Joe,
>> 
>> may I ask three (hopefully not too stupid) questions with respect to cross-validation?
>> 
>> (1) I successfully computed a temporal PCA for an ERP dataset (consisting of 4 conditions and 24 subjects each averaged over the individual trials). Now a reviewer wants us to perform a single trial analysis. My initial idea was to somehow "apply" the PCA pattern resulting from the grand-average PCA to the single trials (as we sometimes do "apply" ICA weights to other datasets filtered differently). My naive understanding is that this is what cross-validation does. Is this correct?
>> 
>> (2) In case yes, I’m not sure how standardization of factor scores should be performed. Should standardization (as in line 812 of ep_doPCA.m) be done including all trials from all subjects and conditions? Or should standardization be done per subject? I tried the former. The resulting factor scores were (averaged across trials per subject and condition) about factor 4 smaller than the factor scores from the grand-average PCA. Is this expected/plausible?
>> 
>> (3) Finally, I tried cross-validation of the same dataset the PCA originally was computed from. The resulting factor scores were similar but not identical. Is this expected/plausible?
>> 
>> Thanks a lot for your help! Best,
>> Andreas

--------------------------------------------------------------------------------

Joseph Dien, PhD
Senior Research Scientist
Department of Human Development and Quantitative Methodology
University of Maryland, College Park
http://joedien.com