Empty cells

Tools for multivariate data visualization, exploration and analysis.

Status: Alpha

Brought to you by: dgrapov

Empty cells

Forum: Help

Creator: Nefeli

Created: 2013-11-01

Updated: 2013-11-13

Nefeli - 2013-11-01

Hi, I am new to imdev add-in and I would like some help.
For instance, I want to calculate a correlation matrix but I have some missing values and so the cells are empty. This should not be a problem but unfortunately it is. It does not appear the matrix.
R appears this message :
Error in as.data.frame(cor(start.data, method = calc.method, use = "pairwise.complete.obs")) :
error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': Error in cor(start.data, method = calc.method, use = "pairwise.complete.obs") :
'x' is empty

Is there any possible way to tackle with this problem?
Thanks

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dmitry Grapov - 2013-11-01

Hi Nefeli,

You are getting the error because there maybe no pairwise complete observations. Generally you may have too many missing values to compute anything useful. You could try to impute them first. Another idea is to use R directly (see "help(cor)") and choose a better parameter for "use" or find a package that handles missing values in a better manner.

-Dmitry

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nefeli - 2013-11-02

Thank you Dmitry! Your help was valuable!
I tried to do the same in some other data with less missing values and everything workes just fine.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- John W. Newman - 2013-11-08
  
  Hi Nefeli,
  
  you should play around with the data imputation in under PCA. Using 2 components, no normalizaiton, and ppca you generally get god imputations. Can prove to yourself by taking a complete data set, delete random data and then impute and compare.
  
  Have found if you know group classification and data set large enough, imputation using set of like samples gives best results.
  
  Always need to be careful. If missing more than 70% of measures for a given data point, this imputation will not work, and another generally applied and acceptable approach is to replace missing with half of the lowest detected. This descriminates this data as "very low" without eliminating potentially important variables from multivariate considerations.
  
  John
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nefeli - 2013-11-09

John, thank you! Your advice was very helpful.
I put a restriction, missing no more than 40% of my data set.
I think it is quite enough to make a logical assumption.
Nefeli

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- John W. Newman - 2013-11-13
  
  Hi Nefeli, I actually made that statement upside down... missing more than 30% can lead to problems (i.e data should be more than 70% complete). So tighten your filter a little more and you will get no complaints from reviewers in the future. ;)
  
  sorry about that...
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.