Clustering#
While it is possible to use the transformers of the sklearn_ann.kneighbors module together with clustering algorithms from scikit-learn directly, there is often a mismatch between techniques like DBSCAN, which require for each node its neighbors within a certain radius, and kNN-graph which has a fixed number of. This mismatch may result in k being set to high, to make sure that, slowing things down.
This module contains an implementation of RNN-DBSCAN, which is based on the kNN-graph structure.
- class sklearn_ann.cluster.rnn_dbscan.RnnDBSCAN(n_neighbors=5, *, input_guarantee='none', n_jobs=None, keep_knns=False)[source]#
Implements the RNN-DBSCAN clustering algorithm.
- Parameters:
- n_neighbors int
The number of neighbors in the kNN-graph (the k in kNN), and the theshold of reverse nearest neighbors for a node to be considered a core node.
- input_guarantee "none" | "kneighbors"
A guarantee on input matrices. If equal to “kneighbors”, the algorithm will assume you are passing in the kNN graph exactly as required, e.g. with n_neighbors. This can be used to pass in a graph produced by one of the implementations of the KNeighborsTransformer interface.
- n_jobs int
The number of jobs to use. Currently has not effect since no part of the algorithm has been parallelled.
- keep_knns bool
If true, the kNN and inverse kNN graph will be saved to knns_ and rev_knns_ after fitting.
See also
simple_rnn_dbscan_pipeline
Create a pipeline of a KNeighborsTransformer and RnnDBSCAN
References
A. Bryant and K. Cios, “RNN-DBSCAN: A Density-Based Clustering Algorithm Using Reverse Nearest Neighbor Density Estimates,” in IEEE Transactions on Knowledge and Data Engineering, vol. 30, no. 6, pp. 1109-1121, 1 June 2018, doi: 10.1109/TKDE.2017.2787640.
- fit_predict(X, y=None)[source]#
Perform clustering on X and returns cluster labels.
- Parameters:
- X array-like of shape (n_samples, n_features)
Input data.
- y Ignored
Not used, present for API consistency by convention.
- **kwargs dict
Arguments to be passed to
fit
.Added in version 1.4.
- Returns:
labels – Cluster labels.
- Return type:
ndarray of shape (n_samples,), dtype=np.int64
- sklearn_ann.cluster.rnn_dbscan.simple_rnn_dbscan_pipeline(neighbor_transformer, n_neighbors, n_jobs=None, keep_knns=None, **kwargs)[source]#
Create a simple pipeline comprising a transformer and RnnDBSCAN.
- Parameters:
- neighbor_transformer class implementing KNeighborsTransformer interface
- n_neighbors
Passed to neighbor_transformer and RnnDBSCAN
- n_jobs
Passed to neighbor_transformer and RnnDBSCAN
- keep_knns
Passed to RnnDBSCAN
- kwargs
Passed to neighbor_transformer