Clustering#

While it is possible to use the transformers of the sklearn_ann.kneighbors module together with clustering algorithms from scikit-learn directly, there is often a mismatch between techniques like DBSCAN, which require for each node its neighbors within a certain radius, and kNN-graph which has a fixed number of. This mismatch may result in k being set to high, to make sure that, slowing things down.

This module contains an implementation of RNN-DBSCAN, which is based on the kNN-graph structure.

class sklearn_ann.cluster.rnn_dbscan.RnnDBSCAN(n_neighbors=5, *, input_guarantee='none', n_jobs=None, keep_knns=False)[source]#

Implements the RNN-DBSCAN clustering algorithm.

Parameters:
n_neighbors int

The number of neighbors in the kNN-graph (the k in kNN), and the theshold of reverse nearest neighbors for a node to be considered a core node.

input_guarantee "none" | "kneighbors"

A guarantee on input matrices. If equal to “kneighbors”, the algorithm will assume you are passing in the kNN graph exactly as required, e.g. with n_neighbors. This can be used to pass in a graph produced by one of the implementations of the KNeighborsTransformer interface.

n_jobs int

The number of jobs to use. Currently has not effect since no part of the algorithm has been parallelled.

keep_knns bool

If true, the kNN and inverse kNN graph will be saved to knns_ and rev_knns_ after fitting.

See also

simple_rnn_dbscan_pipeline

Create a pipeline of a KNeighborsTransformer and RnnDBSCAN

References

A. Bryant and K. Cios, “RNN-DBSCAN: A Density-Based Clustering Algorithm Using Reverse Nearest Neighbor Density Estimates,” in IEEE Transactions on Knowledge and Data Engineering, vol. 30, no. 6, pp. 1109-1121, 1 June 2018, doi: 10.1109/TKDE.2017.2787640.

drop_knns()[source]#
fit(X, y=None)[source]#
fit_predict(X, y=None)[source]#

Perform clustering on X and returns cluster labels.

Parameters:
X array-like of shape (n_samples, n_features)

Input data.

y Ignored

Not used, present for API consistency by convention.

**kwargs dict

Arguments to be passed to fit.

Added in version 1.4.

Returns:

labels – Cluster labels.

Return type:

ndarray of shape (n_samples,), dtype=np.int64

sklearn_ann.cluster.rnn_dbscan.join(it1, it2)[source]#
sklearn_ann.cluster.rnn_dbscan.neighborhood(is_core, knns, rev_knns, idx)[source]#
sklearn_ann.cluster.rnn_dbscan.rnn_dbscan_inner(is_core, knns, rev_knns, labels)[source]#
sklearn_ann.cluster.rnn_dbscan.simple_rnn_dbscan_pipeline(neighbor_transformer, n_neighbors, n_jobs=None, keep_knns=None, **kwargs)[source]#

Create a simple pipeline comprising a transformer and RnnDBSCAN.

Parameters:
neighbor_transformer class implementing KNeighborsTransformer interface

n_neighbors

Passed to neighbor_transformer and RnnDBSCAN

n_jobs

Passed to neighbor_transformer and RnnDBSCAN

keep_knns

Passed to RnnDBSCAN

kwargs

Passed to neighbor_transformer