Hi, I am new to imdev add-in and I would like some help.
For instance, I want to calculate a correlation matrix but I have some missing values and so the cells are empty. This should not be a problem but unfortunately it is. It does not appear the matrix.
R appears this message :
Error in as.data.frame(cor(start.data, method = calc.method, use = "pairwise.complete.obs")) :
error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': Error in cor(start.data, method = calc.method, use = "pairwise.complete.obs") :
'x' is empty
Is there any possible way to tackle with this problem?
Thanks
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You are getting the error because there maybe no pairwise complete observations. Generally you may have too many missing values to compute anything useful. You could try to impute them first. Another idea is to use R directly (see "help(cor)") and choose a better parameter for "use" or find a package that handles missing values in a better manner.
-Dmitry
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
you should play around with the data imputation in under PCA. Using 2 components, no normalizaiton, and ppca you generally get god imputations. Can prove to yourself by taking a complete data set, delete random data and then impute and compare.
Have found if you know group classification and data set large enough, imputation using set of like samples gives best results.
Always need to be careful. If missing more than 70% of measures for a given data point, this imputation will not work, and another generally applied and acceptable approach is to replace missing with half of the lowest detected. This descriminates this data as "very low" without eliminating potentially important variables from multivariate considerations.
John
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
John, thank you! Your advice was very helpful.
I put a restriction, missing no more than 40% of my data set.
I think it is quite enough to make a logical assumption.
Nefeli
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Nefeli, I actually made that statement upside down... missing more than 30% can lead to problems (i.e data should be more than 70% complete). So tighten your filter a little more and you will get no complaints from reviewers in the future. ;)
sorry about that...
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi, I am new to imdev add-in and I would like some help.
For instance, I want to calculate a correlation matrix but I have some missing values and so the cells are empty. This should not be a problem but unfortunately it is. It does not appear the matrix.
R appears this message :
Error in as.data.frame(cor(start.data, method = calc.method, use = "pairwise.complete.obs")) :
error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': Error in cor(start.data, method = calc.method, use = "pairwise.complete.obs") :
'x' is empty
Is there any possible way to tackle with this problem?
Thanks
Hi Nefeli,
You are getting the error because there maybe no pairwise complete observations. Generally you may have too many missing values to compute anything useful. You could try to impute them first. Another idea is to use R directly (see "help(cor)") and choose a better parameter for "use" or find a package that handles missing values in a better manner.
-Dmitry
Thank you Dmitry! Your help was valuable!
I tried to do the same in some other data with less missing values and everything workes just fine.
Hi Nefeli,
you should play around with the data imputation in under PCA. Using 2 components, no normalizaiton, and ppca you generally get god imputations. Can prove to yourself by taking a complete data set, delete random data and then impute and compare.
Have found if you know group classification and data set large enough, imputation using set of like samples gives best results.
Always need to be careful. If missing more than 70% of measures for a given data point, this imputation will not work, and another generally applied and acceptable approach is to replace missing with half of the lowest detected. This descriminates this data as "very low" without eliminating potentially important variables from multivariate considerations.
John
John, thank you! Your advice was very helpful.
I put a restriction, missing no more than 40% of my data set.
I think it is quite enough to make a logical assumption.
Nefeli
Hi Nefeli, I actually made that statement upside down... missing more than 30% can lead to problems (i.e data should be more than 70% complete). So tighten your filter a little more and you will get no complaints from reviewers in the future. ;)
sorry about that...