Marcelo, thank you for posting this code. Is 6.4 still the newest version or has it already been implemented in scikit-learn since?
I have opened your notebook and tried to executing cells one by one, I stumble upon errors. In cell 4, calling this code: D = gower_distances(X), I get this error: ValueError: Input contains NaN. The same happens after unit test in cell 5. I can see that None values have been placed in X in purpose. Why is this error happening, and is it safe to ignore it?
But also: my real data actually contains NaNs. If the gower_distance function does not accept NaNs, then what is the recommended data preprocessing before running your procedure?
Below is the full output of the error. I am running python 3.7.3 inside jupyter notebook 4.4.0. I am running your most up-to-date package gower-function-v6.4. Thank you in advance for your response, and for the effort in publishing this.
below is the error code after [shift+enter] cell 4:
/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py:432:DeprecationWarning:'warn_on_dtype'isdeprecatedinversion0.21andwillberemovedin0.23.Don't set `warn_on_dtype` to remove this warning. DeprecationWarning)---------------------------------------------------------------------------ValueError Traceback (most recent call last)<ipython-input-4-271d29ea1af0> in <module>----> 1 D = gower_distances(X) 2 3 print(D)<ipython-input-3-182d2eccf8e9> in gower_distances(X, Y, feature_weight, categorical_features) 52 array_type = type(np.zeros(1,X.dtype).flat[0]) 53 ---> 54 X, Y = check_pairwise_arrays(X, Y, precomputed=False, dtype=array_type) 55 56 n_rows, n_cols = X.shape<ipython-input-2-a839000181b1> in check_pairwise_arrays(X, Y, precomputed, dtype) 9 if Y is X or Y is None: 10 X = Y = validation.check_array(X, accept_sparse='csr', dtype=dtype,---> 11 warn_on_dtype=warn_on_dtype, estimator=estimator) 12 else: 13 X = validation.check_array(X, accept_sparse='csr', dtype=dtype,/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator) 540 if force_all_finite: 541 _assert_all_finite(array,--> 542 allow_nan=force_all_finite == 'allow-nan') 543 544 if ensure_min_samples > 0:/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in _assert_all_finite(X, allow_nan) 58 elif X.dtype == np.dtype('object') and not allow_nan: 59 if _object_dtype_isnan(X).any():---> 60 raise ValueError("Input contains NaN") 61 62 ValueError: Input contains NaN
Last edit: Pawel Plaz 2019-09-25
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Marcelo, thank you for posting this code. Is 6.4 still the newest version or has it already been implemented in scikit-learn since?
I have opened your notebook and tried to executing cells one by one, I stumble upon errors. In cell 4, calling this code: D = gower_distances(X), I get this error: ValueError: Input contains NaN. The same happens after unit test in cell 5. I can see that None values have been placed in X in purpose. Why is this error happening, and is it safe to ignore it?
But also: my real data actually contains NaNs. If the gower_distance function does not accept NaNs, then what is the recommended data preprocessing before running your procedure?
Below is the full output of the error. I am running python 3.7.3 inside jupyter notebook 4.4.0. I am running your most up-to-date package gower-function-v6.4. Thank you in advance for your response, and for the effort in publishing this.
below is the error code after [shift+enter] cell 4:
Last edit: Pawel Plaz 2019-09-25