Pawel Plaz - 2019-09-25

Marcelo, thank you for posting this code. Is 6.4 still the newest version or has it already been implemented in scikit-learn since?

I have opened your notebook and tried to executing cells one by one, I stumble upon errors. In cell 4, calling this code: D = gower_distances(X), I get this error: ValueError: Input contains NaN. The same happens after unit test in cell 5. I can see that None values have been placed in X in purpose. Why is this error happening, and is it safe to ignore it?

But also: my real data actually contains NaNs. If the gower_distance function does not accept NaNs, then what is the recommended data preprocessing before running your procedure?

Below is the full output of the error. I am running python 3.7.3 inside jupyter notebook 4.4.0. I am running your most up-to-date package gower-function-v6.4. Thank you in advance for your response, and for the effort in publishing this.

below is the error code after [shift+enter] cell 4:

/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py:432: DeprecationWarning: 'warn_on_dtype' is deprecated in version 0.21 and will be removed in 0.23. Don't set `warn_on_dtype` to remove this warning.
  DeprecationWarning)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-271d29ea1af0> in <module>
----> 1 D = gower_distances(X)
      2 
      3 print(D)

<ipython-input-3-182d2eccf8e9> in gower_distances(X, Y, feature_weight, categorical_features)
     52         array_type = type(np.zeros(1,X.dtype).flat[0])
     53 
---> 54     X, Y = check_pairwise_arrays(X, Y, precomputed=False, dtype=array_type)
     55 
     56     n_rows, n_cols = X.shape

<ipython-input-2-a839000181b1> in check_pairwise_arrays(X, Y, precomputed, dtype)
      9     if Y is X or Y is None:
     10         X = Y = validation.check_array(X, accept_sparse='csr', dtype=dtype,
---> 11                             warn_on_dtype=warn_on_dtype, estimator=estimator)
     12     else:
     13         X = validation.check_array(X, accept_sparse='csr', dtype=dtype,

/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    540         if force_all_finite:
    541             _assert_all_finite(array,
--> 542                                allow_nan=force_all_finite == 'allow-nan')
    543 
    544     if ensure_min_samples > 0:

/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in _assert_all_finite(X, allow_nan)
     58     elif X.dtype == np.dtype('object') and not allow_nan:
     59         if _object_dtype_isnan(X).any():
---> 60             raise ValueError("Input contains NaN")
     61 
     62 

ValueError: Input contains NaN
 

Last edit: Pawel Plaz 2019-09-25