# How to Utilize DAAL KNN to Sort Feature Vectors by Distance?

## How to Utilize DAAL KNN to Sort Feature Vectors by Distance? Hello DAALers,

I have written a simple serial implementation of KNN to calculate the distance from vector "a" for every vector in matrix "b", where

"a" is 1 by n

"b" is m by n

The distance is recorded for every i'th vector in "b" in result[i].distance. At the end, the data in struct are sorted based on distance.

I have been trying to map my inputs and outputs to DAAL's  KD-Tree KNN, but not luck so far. I seem to be having difficulty in passing "a" and "b" in the data frame format expected by the function. Also, the example that comes with DAAL only shows how to invoke prediction on the testing data by training the model, but it is not clear how to retrieve the distances and indices from the model. I would highly appreciate the help as the KNN function is the performance bottleneck in my program.

struct knn_output
{
int index;
double distance;
};

void knn_serial(double* a, double* b, int m, int n, knn_output* result)
{
//Norm level used for distance calculation
double L = 2;
for (int i = 0; i < m; i++)
{
result[i].distance = 0;
result[i].index = i;
for (int j = 0; j < n; j++)
{
result[i].distance = result[i].distance + pow(abs(a[j] - b[n*i + j]), L);
};
result[i].distance = pow(result[i].distance, 1 / L);
};
qsort(result, m, sizeof(knn_output), compare_knn);
};

int compare_knn(const void* a, const void* b)
{
knn_output* a_knn = (knn_output*)a;
knn_output* b_knn = (knn_output*)b;

if (a_knn->distance < b_knn->distance) return -1;
else if (a_knn->distance > b_knn->distance) return 1;
else return 0;
};

Best regards,

Ali

AdjuntoTamaño KNN_Serial.cpp777 bytes
publicaciones de 9 / 0 nuevos
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.  Hello.

Intel® Data Analytics Acceleration Library does not provide indices and distances to the user at the moment.

Could you please provide some additional details about your use-case and the reason to sort feature vectors? Given an out-of-sample feature vector "a", I attempt to find the k-nearest (sorted by distance) feature vectors in in-sample feature matrix "b". Then, by applying a statistical learning algorithm, I can learn the relationship between k-nearest features vectors and their corresponding labels to predict the label of "a". Looking forward to seeing updates in the future. Also, please let me know if there is a current work around to this. DAAL KD-Tree KNN must internally calculate the distance and sort the feature vectors to invoke the final prediction.

Thanks,

Ali  So, is k-nearest neighbors the step of some algorithm, which predicts the label of "a", but in different approach relative to k-nearest neighbors? Or, do you just want to use some specific approach to estimate label based on distances (like weight classes of neighbors based on distances)? Yes, exactly I use the KNN as a setup for my own statistical learning algorithm to identify the indices of the k-nearest feature vectors in "b" (row position). I only apply learning on the k-nearest "b" feature vectors and their corresponding labels, then I apply the learned model to the out-of-sample feature vector "a" to predicts its "unknown" label. Sorry if I'm repeating myself. Please let me if we can still do this using current Intel libraries.

Thanks,

Ali Hi Mikhail,

Please let me know if there's any workaround to retrieve indices and distances from KD-Tree KNN. If not, I would appreciate it if you can pass my request to the DAAL development team. Either way, please let me know.

Many thanks,

Ali  Hi,

Thank you for request and providing the details about use case. At the moment, the only possible option is to use open source DAAL (https://github.com/intel/daal) to add interface to access to the data you need manually. Please note that distances are calculated internally in the algorithm, as you mentioned. Your request will be taken into account by DAAL team to expand API accordingly in future releases. Thank you and looking forward to the updates. I think I found where the sorted indices are located. They are stored in "indexes" after sorting " inSortValues[i].idx". They can be found in "algorithms\kernel\k_nearest_neighbors\kdtree_knn_impl.i" (file is attached). However, I feel challenged when it comes to understanding how to call "indexes" after executing the model.

## Adjuntos:

AdjuntoTamaño kdtree_knn_impl.i4.21 KB

## Deje un comentario

Por favor inicie sesión para agregar un comentario. ¿No es socio?