Marcelo, thank you for posting this code. Is 6.4 still the newest version or has it already been implemented in scikit-learn since?
I have opened your notebook and tried to executing cells one by one, I stumble upon errors. In cell 4, calling this code: D = gower_distances(X), I get this error: ValueError: Input contains NaN. The same happens after unit test in cell 5. I can see that None values have been placed in X in purpose. Why is this error happening, and is it safe to ignore it?
But also: my real data actually contains NaNs. If the gower_distance function does not accept NaNs, then what is the recommended data preprocessing before running your procedure?
Below is the full output of the error. I am running python 3.7.3 inside jupyter notebook 4.4.0. I am running your most up-to-date package gower-function-v6.4. Thank you in advance for your response, and for the effort in publishing this.
below is the error code after [shift+enter] cell 4:
/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py:432: DeprecationWarning: 'warn_on_dtype'isdeprecatedinversion0.21andwillberemovedin0.23. Don't set `warn_on_dtype` to remove this warning.DeprecationWarning)---------------------------------------------------------------------------ValueErrorTraceback(mostrecentcalllast)<ipython-input-4-271d29ea1af0>in<module>---->1D=gower_distances(X)23print(D)<ipython-input-3-182d2eccf8e9>ingower_distances(X, Y, feature_weight, categorical_features)52array_type=type(np.zeros(1,X.dtype).flat[0])53--->54X, Y=check_pairwise_arrays(X, Y, precomputed=False, dtype=array_type)5556n_rows, n_cols=X.shape<ipython-input-2-a839000181b1>incheck_pairwise_arrays(X, Y, precomputed, dtype)9ifYisXorYisNone:
10X=Y=validation.check_array(X, accept_sparse='csr', dtype=dtype,
--->11warn_on_dtype=warn_on_dtype, estimator=estimator)12else:
13X=validation.check_array(X, accept_sparse='csr', dtype=dtype,
/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.pyincheck_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)540ifforce_all_finite:
541_assert_all_finite(array,
-->542allow_nan=force_all_finite=='allow-nan')543544ifensure_min_samples>0:
/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.pyin_assert_all_finite(X, allow_nan)58elifX.dtype==np.dtype('object')andnotallow_nan:
59if_object_dtype_isnan(X).any():
--->60raiseValueError("Input contains NaN")6162ValueError: InputcontainsNaN
Last edit: Pawel Plaz 2019-09-25
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Marcelo, thank you for posting this code. Is 6.4 still the newest version or has it already been implemented in scikit-learn since?
I have opened your notebook and tried to executing cells one by one, I stumble upon errors. In cell 4, calling this code: D = gower_distances(X), I get this error: ValueError: Input contains NaN. The same happens after unit test in cell 5. I can see that None values have been placed in X in purpose. Why is this error happening, and is it safe to ignore it?
But also: my real data actually contains NaNs. If the gower_distance function does not accept NaNs, then what is the recommended data preprocessing before running your procedure?
Below is the full output of the error. I am running python 3.7.3 inside jupyter notebook 4.4.0. I am running your most up-to-date package gower-function-v6.4. Thank you in advance for your response, and for the effort in publishing this.
below is the error code after [shift+enter] cell 4:
Last edit: Pawel Plaz 2019-09-25