Implementations of the KNeighborsTransformer interface#

This module contains transformers which transform from array-like structures of shape (n_samples, n_features) to KNN-graphs encoded as scipy.sparse.csr_matrix. They conform to the KNeighborsTransformer interface. Each submodule in this module provides facilities for exactly one external nearest neighbour library.

Annoy#

Annoy (Approximate Nearest Neighbors Oh Yeah) is a C++ library with Python bindings to search for points in space that are close to a given query point. The originates from Spotify. It uses a forest of random projection trees.

class sklearn_ann.kneighbors.annoy.AnnoyTransformer(n_neighbors=5, *, metric='euclidean', n_trees=10, search_k=-1)[source]#

Wrapper for using annoy.AnnoyIndex as sklearn’s KNeighborsTransformer

fit(X, y=None)[source]#
fit_transform(X, y=None)[source]#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
X array-like of shape (n_samples, n_features)

Input samples.

y array-like of shape (n_samples,) or (n_samples, n_outputs), default=None

Target values (None for unsupervised transformations).

**fit_params dict

Additional fit parameters.

Returns:

X_new – Transformed array.

Return type:

ndarray array of shape (n_samples, n_features_new)

transform(X)[source]#

FAISS#

FAISS (Facebook AI Similarity Search) is a library for efficient similarity search and clustering of dense vectors. The project originates from Facebook AI Research (FAIR). It contains multiple algorithms including algorithms for exact/brute force nearest neighbour, methods based on quantization and product quantization, and methods based on Hierarchical Navigable Small World graphs (HNSW). There are some guidelines on how to choose the best index for your purposes.

class sklearn_ann.kneighbors.faiss.FAISSTransformer(n_neighbors=5, *, metric='euclidean', index_key='', n_probe=128, n_jobs=-1, include_fwd=True, include_rev=False)[source]#
fit(X, y=None)[source]#
fit_transform(X, y=None)[source]#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
X array-like of shape (n_samples, n_features)

Input samples.

y array-like of shape (n_samples,) or (n_samples, n_outputs), default=None

Target values (None for unsupervised transformations).

**fit_params dict

Additional fit parameters.

Returns:

X_new – Transformed array.

Return type:

ndarray array of shape (n_samples, n_features_new)

transform(X)[source]#
sklearn_ann.kneighbors.faiss.mk_faiss_index(feats, inner_metric, index_key='', nprobe=128) faiss.Index[source]#

nmslib#

nmslib (non-metric space library) is a library for similarity search support metric and non-metric spaces. It contains multiple algorithms.

class sklearn_ann.kneighbors.nmslib.NMSlibTransformer(n_neighbors=5, *, metric='euclidean', method='sw-graph', n_jobs=1)[source]#

Wrapper for using nmslib as sklearn’s KNeighborsTransformer

fit(X, y=None)[source]#
transform(X)[source]#

PyNNDescent#

PyNNDescent is a Python nearest neighbor descent for approximate nearest neighbors. It iteratively improves kNN-graph using the transitive property, using random projections for initialisation. This transformer is actually implemented as part of PyNNDescent, and simply re-exported here for (foolish) consistency. If you only need this transformer, just use PyNNDescent directly.

class sklearn_ann.kneighbors.pynndescent.PyNNDescentTransformer(*args: Any, **kwargs: Any)[source]#
fit(X, compress_index=True)[source]#

sklearn#

scikit-learn itself contains ball tree and k-d indices. KNeighborsTransformer is re-exported here specialised for these two types of index for consistency.