====================
K Nearest neighbours
====================

KNN classifies data according to the majority of labels in the nearest neighbourhood, according to some underlying distance function :math:`d(x,x')`.

For :math:`k=1`, the label for a test point :math:`x^*` is predicted to be the same as for its closest training point :math:`x_{k}`, i.e. :math:`y_{k}`, where

.. math::

   k=\argmin_j d(x^*, x_j).  
   
See Chapter 14 in :cite:`barber2012bayesian` for a detailed introduction.

-------
Example
-------

Imagine we have files with training and test data. We create CDenseFeatures (here 64 bit floats aka RealFeatures) and :sgclass:`CMulticlassLabels` as

.. sgexample:: knn.sg:create_features

In order to run :sgclass:`CKNN`, we need to choose a distance, for example :sgclass:`CEuclideanDistance`, or other sub-classes of :sgclass:`CDistance`. The distance is initialized with the data we want to classify.

.. sgexample:: knn.sg:choose_distance

Once we have chosen a distance, we create an instance of the :sgclass:`CKNN` classifier, passing it :math:`k`.

.. sgexample:: knn.sg:create_instance

Then we run the train KNN algorithm and apply it to test data, which here gives :sgclass:`CMulticlassLabels`.

.. sgexample:: knn.sg:train_and_apply

We can evaluate test performance via e.g. :sgclass:`CMulticlassAccuracy`.

.. sgexample:: knn.sg:evaluate_accuracy


----------
References
----------
:wiki:`K-nearest_neighbors_algorithm`

.. bibliography:: ../../references.bib
    :filter: docname in docnames
