Suggestion for development

Help
2014-02-04
2014-02-04
  • Tom Sambrook

    Tom Sambrook - 2014-02-04

    If I understand it correctly, the Horn Test comprises a single sample Monte Carlo simulation of the variance “explained” by bogus factors extracted from a random dataset (a random reshuffling of the cells in the data matrix?)

    If so, the criterion that factors be extracted if they exceed the variance explained by these bogus factors (at that factor’s rank order) amounts, in conventional hypothesis testing terms, to a demonstration that p<.5, i.e. very lax.

    I appreciate that significance testing is not necessarily the aim of every PCA, however it would be useful to demonstrate the degree of surety that existed for a factor’s claim to not be noise.

    It seems to me that the solution would be run a more standard Monte Carlo simulation, that is, to generate multiple random data sets to generate a distribution of “variance explained” at each factor rank, against which the variance expalined by the observed factor could be compared, and then allocate a p value based on the number of bogus factors that explained more variance than it

    Eg, significance testing of observed Factor 6: Run 1000 simulations, In three of these bogus factor 6 explains more variance than the observed factor 6, so p =.003

    I don’t know how difficult this would be to implement. At the user end I have no problem with calculations that run overnight.

    Tom

     
    • Joe Dien

      Joe Dien - 2014-02-04

      Factor retention is a pretty complex topic. There are a lot of published papers on the Parallel test alone and a lot of different variations on how it is implemented. I’m currently working on a paper looking at the topic as it applies to ERP datasets. Once I have some conclusions, I’ll update the EP Toolkit accordingly.

      Thanks for the suggestion!

      Joe

      On Feb 4, 2014, at 10:19 AM, Tom Sambrook tomsambrook@users.sf.net wrote:

      If I understand it correctly, the Horn Test comprises a single sample Monte Carlo simulation of the variance “explained” by bogus factors extracted from a random dataset (a random reshuffling of the cells in the data matrix?)

      If so, the criterion that factors be extracted if they exceed the variance explained by these bogus factors (at that factor’s rank order) amounts, in conventional hypothesis testing terms, to a demonstration that p<.5, i.e. very lax.

      I appreciate that significance testing is not necessarily the aim of every PCA, however it would be useful to demonstrate the degree of surety that existed for a factor’s claim to not be noise.

      It seems to me that the solution would be run a more standard Monte Carlo simulation, that is, to generate multiple random data sets to generate a distribution of “variance explained” at each factor rank, against which the variance expalined by the observed factor could be compared, and then allocate a p value based on the number of bogus factors that explained more variance than it

      Eg, significance testing of observed Factor 6: Run 1000 simulations, In three of these bogus factor 6 explains more variance than the observed factor 6, so p =.003

      I don’t know how difficult this would be to implement. At the user end I have no problem with calculations that run overnight.

      Tom

      Suggestion for development

      Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/erppcatoolkit/discussion/767176/

      To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/


      Joseph Dien,
      Senior Research Scientist
      Maryland Neuroimaging Center
      University of Maryland

      E-mail: jdien07@mac.com
      Phone: 202-297-8117
      http://joedien.com

       

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks