Clustering#

While it is possible to use the transformers of the sklearn_ann.kneighbors module together with clustering algorithms from scikit-learn directly, there is often a mismatch between techniques like DBSCAN, which require for each node its neighbors within a certain radius, and kNN-graph which has a fixed number of. This mismatch may result in k being set to high, to make sure that, slowing things down.

This module contains an implementation of RNN-DBSCAN, which is based on the kNN-graph structure.

class sklearn_ann.cluster.rnn_dbscan.RnnDBSCAN(n_neighbors=5, *, input_guarantee='none', n_jobs=None, keep_knns=False)[source]#

Implements the RNN-DBSCAN clustering algorithm.

Parameters:

n_neighbors int: The number of neighbors in the kNN-graph (the k in kNN), and the theshold of reverse nearest neighbors for a node to be considered a core node.
input_guarantee "none" | "kneighbors": A guarantee on input matrices. If equal to “kneighbors”, the algorithm will assume you are passing in the kNN graph exactly as required, e.g. with n_neighbors. This can be used to pass in a graph produced by one of the implementations of the KNeighborsTransformer interface.
n_jobs int: The number of jobs to use. Currently has not effect since no part of the algorithm has been parallelled.
keep_knns bool: If true, the kNN and inverse kNN graph will be saved to knns_ and rev_knns_ after fitting.

Clustering

Contents

Clustering#