K-means with Scikit-Learn

K-means with Scikit-Learn


The following link ( https://software.intel.com/en-us/articles/intelr-distribution-for-python... ) states that the Intel Python Distribution 2017 Update 2 uses the DAAL as a backend for K-means clustering in Scikit-Learn.

Using the following code, on Linux, does not seem to use the DAAL as reported by vTune. It still runs the original Scikit-Learn code written in Cython. Is there anything to do to enable the DAAL?

import os
from time import time
import numpy as np
from sklearn.cluster import KMeans

dim = 784
nb_points = 60000
nb_clusters = 10
nb_iterations = 20
nb_threads = 1

os.environ['OMP_NUM_THREADS'] = '28'

points = np.random.rand(nb_points, dim)

estimator = KMeans(init = "random", verbose = 1, max_iter = nb_iterations,
                   algorithm = "full", precompute_distances = False, tol = 0.0,
                   n_clusters = nb_clusters, n_init = 1)
print("Before computing")
time_begin = time()
time_end = time()
print("After computing")
print((time_end - time_begin) / nb_iterations)


Hi Velvia,

Check the following comment related to KMeans and DAAL: https://software.intel.com/en-us/forums/intel-distribution-for-python/topic/731351#comment-1903917


Hi Velvia,

I think that the entire thread will be useful because in this thread it was suggested to disable daal optimizations. So, probably the same problem applies to the issue you are reporting. I haven't tested your code in my configs. However, I remember the last comments of the thread I mention: https://software.intel.com/en-us/forums/intel-distribution-for-python/topic/731351

Hi Velvia,

Can you give me the output of your 'conda list', 'conda info', and your system specs?



Hi David.

Sure. Here it is:

[fayard@grisbouille Digits]$ conda info
Current conda install:

               platform : linux-64
          conda version : 4.2.12
       conda is private : False
      conda-env version : 4.2.12
    conda-build version : not installed
         python version : 3.5.2.final.0
       requests version : 2.11.1
       root environment : /opt/intel/intelpython3  (writable)
    default environment : /opt/intel/intelpython3
       envs directories : /opt/intel/intelpython3/envs
          package cache : /opt/intel/intelpython3/pkgs
           channel URLs : https://conda.anaconda.org/intel/linux-64





            config file : /opt/intel/intelpython3/.condarc
           offline mode : False

[fayard@grisbouille Digits]$ conda list
Hi Velvia, 

Thanks for your reply.  I've asked engineering for further clarification on the DAAL functionality under scikit-learn, which will hopefully shed some light as to why your example is reverting back to normal Cython.



Hi Velvia, 

There were some updates on our channel. You can try to update scikit-learn with 

conda update -c intel scikit-learn




We've released Update 3, so you can try with it too.




