Menu

Empty cells

Help
Nefeli
2013-11-01
2013-11-13
  • Nefeli

    Nefeli - 2013-11-01

    Hi, I am new to imdev add-in and I would like some help.
    For instance, I want to calculate a correlation matrix but I have some missing values and so the cells are empty. This should not be a problem but unfortunately it is. It does not appear the matrix.
    R appears this message :
    Error in as.data.frame(cor(start.data, method = calc.method, use = "pairwise.complete.obs")) :
    error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': Error in cor(start.data, method = calc.method, use = "pairwise.complete.obs") :
    'x' is empty

    Is there any possible way to tackle with this problem?
    Thanks

     
  • Dmitry Grapov

    Dmitry Grapov - 2013-11-01

    Hi Nefeli,

    You are getting the error because there maybe no pairwise complete observations. Generally you may have too many missing values to compute anything useful. You could try to impute them first. Another idea is to use R directly (see "help(cor)") and choose a better parameter for "use" or find a package that handles missing values in a better manner.

    -Dmitry

     
  • Nefeli

    Nefeli - 2013-11-02

    Thank you Dmitry! Your help was valuable!
    I tried to do the same in some other data with less missing values and everything workes just fine.

     
    • John W. Newman

      John W. Newman - 2013-11-08

      Hi Nefeli,

      you should play around with the data imputation in under PCA. Using 2 components, no normalizaiton, and ppca you generally get god imputations. Can prove to yourself by taking a complete data set, delete random data and then impute and compare.

      Have found if you know group classification and data set large enough, imputation using set of like samples gives best results.

      Always need to be careful. If missing more than 70% of measures for a given data point, this imputation will not work, and another generally applied and acceptable approach is to replace missing with half of the lowest detected. This descriminates this data as "very low" without eliminating potentially important variables from multivariate considerations.

      John

       
  • Nefeli

    Nefeli - 2013-11-09

    John, thank you! Your advice was very helpful.
    I put a restriction, missing no more than 40% of my data set.
    I think it is quite enough to make a logical assumption.
    Nefeli

     
    • John W. Newman

      John W. Newman - 2013-11-13

      Hi Nefeli, I actually made that statement upside down... missing more than 30% can lead to problems (i.e data should be more than 70% complete). So tighten your filter a little more and you will get no complaints from reviewers in the future. ;)

      sorry about that...

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.