## Suggestion for development document.SUBSCRIPTION_OPTIONS = { "thing": "thread", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Help
2014-02-04
2014-02-04
• Tom Sambrook - 2014-02-04

If I understand it correctly, the Horn Test comprises a single sample Monte Carlo simulation of the variance “explained” by bogus factors extracted from a random dataset (a random reshuffling of the cells in the data matrix?)

If so, the criterion that factors be extracted if they exceed the variance explained by these bogus factors (at that factor’s rank order) amounts, in conventional hypothesis testing terms, to a demonstration that p<.5, i.e. very lax.

I appreciate that significance testing is not necessarily the aim of every PCA, however it would be useful to demonstrate the degree of surety that existed for a factor’s claim to not be noise.

It seems to me that the solution would be run a more standard Monte Carlo simulation, that is, to generate multiple random data sets to generate a distribution of “variance explained” at each factor rank, against which the variance expalined by the observed factor could be compared, and then allocate a p value based on the number of bogus factors that explained more variance than it

Eg, significance testing of observed Factor 6: Run 1000 simulations, In three of these bogus factor 6 explains more variance than the observed factor 6, so p =.003

I don’t know how difficult this would be to implement. At the user end I have no problem with calculations that run overnight.

Tom

• Joe Dien - 2014-02-04

Factor retention is a pretty complex topic. There are a lot of published papers on the Parallel test alone and a lot of different variations on how it is implemented. I’m currently working on a paper looking at the topic as it applies to ERP datasets. Once I have some conclusions, I’ll update the EP Toolkit accordingly.

Thanks for the suggestion!

Joe

On Feb 4, 2014, at 10:19 AM, Tom Sambrook tomsambrook@users.sf.net wrote:

If I understand it correctly, the Horn Test comprises a single sample Monte Carlo simulation of the variance “explained” by bogus factors extracted from a random dataset (a random reshuffling of the cells in the data matrix?)

If so, the criterion that factors be extracted if they exceed the variance explained by these bogus factors (at that factor’s rank order) amounts, in conventional hypothesis testing terms, to a demonstration that p<.5, i.e. very lax.

I appreciate that significance testing is not necessarily the aim of every PCA, however it would be useful to demonstrate the degree of surety that existed for a factor’s claim to not be noise.

It seems to me that the solution would be run a more standard Monte Carlo simulation, that is, to generate multiple random data sets to generate a distribution of “variance explained” at each factor rank, against which the variance expalined by the observed factor could be compared, and then allocate a p value based on the number of bogus factors that explained more variance than it

Eg, significance testing of observed Factor 6: Run 1000 simulations, In three of these bogus factor 6 explains more variance than the observed factor 6, so p =.003

I don’t know how difficult this would be to implement. At the user end I have no problem with calculations that run overnight.

Tom

Suggestion for development

Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/erppcatoolkit/discussion/767176/

To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

Joseph Dien,
Senior Research Scientist
Maryland Neuroimaging Center
University of Maryland

E-mail: jdien07@mac.com
Phone: 202-297-8117
http://joedien.com