BBKNN¶

bbknn.
bbknn
(adata, batch_key='batch', save_knn=False, copy=False, **kwargs)¶ Batch balanced KNN, altering the KNN procedure to identify each cell’s top neighbours in each batch separately instead of the entire cell pool with no accounting for batch. Aligns batches in a quick and lightweight manner. For use in the scanpy workflow as an alternative to
scanpi.api.pp.neighbors()
. adata :
AnnData
 Needs the PCA computed and stored in
adata.obsm["X_pca"]
.  batch_key :
str
, optional (default: “batch”) adata.obs
column name discriminating between your batches. neighbors_within_batch :
int
, optional (default: 3)  How many top neighbours to report for each batch; total number of neighbours will be this number times the number of batches.
 n_pcs :
int
, optional (default: 50)  How many principal components to use in the analysis.
 trim :
int
orNone
, optional (default:None
)  If not
None
, trim the neighbours of each cell to these many top connectivities. May help with population independence and improve the tidiness of clustering.  approx :
bool
, optional (default:True
)  If
True
, use annoy’s approximate neighbour finding. This results in a quicker run time for large datasets while also potentially increasing the degree of batch correction.  n_trees :
int
, optional (default: 10)  Only used when
approx=True
. The number of trees to construct in the annoy forest. More trees give higher precision when querying, at the cost of increased run time and resource intensity.  use_faiss :
bool
, optional (default:True
)  If
approx=False
and the metric is “euclidean”, use the faiss package to compute nearest neighbours if installed. This improves performance at a minor cost to numerical precision as faiss operates on float32.  metric :
str
orsklearn.neighbors.DistanceMetric
, optional (default: “angular”) What distance metric to use. If using
approx=True
, the options are “angular”, “euclidean”, “manhattan” and “hamming”. Otherwise, the options are “euclidean”, a member of thesklearn.neighbors.KDTree.valid_metrics
list, or parameterisedsklearn.neighbors.DistanceMetric
objects:>>> from sklearn import neighbors >>> neighbors.KDTree.valid_metrics ['p', 'chebyshev', 'cityblock', 'minkowski', 'infinity', 'l2', 'euclidean', 'manhattan', 'l1'] >>> pass_this_as_metric = neighbors.DistanceMetric.get_metric('minkowski',p=3)
 bandwidth :
float
, optional (default: 1) scanpy.neighbors.compute_connectivities_umap
parameter, higher values result in a gentler slope of the connectivities exponentials (i.e. larger connectivity values being returned) local_connectivity :
int
, optional (default: 1) scanpy.neighbors.compute_connectivities_umap
parameter, how many nearest neighbors of each cell are assumed to be fully connected (and given a connectivity value of 1) save_knn :
bool
, optional (default:False
)  If
True
, save the indices of the nearest neighbours for each cell inadata.uns['bbknn']
.  copy :
bool
, optional (default:False
)  If
True
, return a copy instead of writing to the supplied adata.
 adata :

bbknn.
bbknn_pca_matrix
(pca, batch_list, neighbors_within_batch=3, n_pcs=50, trim=None, approx=True, n_trees=10, use_faiss=True, metric='angular', bandwidth=1, local_connectivity=1, save_knn=False)¶ Scanpyindependent BBKNN variant that runs on a PCA matrix and list of percell batch assignments instead of an AnnData object. Nondataentry arguments behave the same way as
bbknn.bbknn()
. Returns a(distances, connectivities)
tuple, like what would have been stored in the AnnData object. The connectivities are the actual neighbourhood graph. Ifsave_knn=True
, the tuple also includes the nearest neighbour indices for each cell as a third element. pca :
numpy.array
 PCA coordinates for each cell, with cells as rows.
 batch_list :
numpy.array
orlist
 A list of batch assignments for each cell.
 pca :