scimilarity.cell_annotation#
- class scimilarity.cell_annotation.CellAnnotation(model_path, use_gpu=False, filenames=None)[source]#
Bases:
CellSearchKNN
A class that annotates cells using a cell embedding and then knn search.
- Parameters:
model_path (str) – Path to the directory containing model files.
use_gpu (bool, default: False) – Use GPU instead of CPU.
filenames (dict, optional, default: None) – Use a dictionary of custom filenames for files instead default.
Examples
>>> ca = CellAnnotation(model_path="/opt/data/model")
- annotate_dataset(data)[source]#
Annotate dataset with celltype predictions.
- Parameters:
data (anndata.AnnData) – The annotated data matrix with rows for cells and columns for genes. This function assumes the data has been log normalized (i.e. via lognorm_counts) accordingly.
- Returns:
- A data object where:
celltype predictions are in obs[“celltype_hint”]
embeddings are in obs[“X_scimilarity”].
- Return type:
anndata.AnnData
Examples
>>> ca = CellAnnotation(model_path="/opt/data/model") >>> data = annotate_dataset(data)
- blocklist_celltypes(labels)[source]#
Blocklist celltypes.
- Parameters:
labels (List[str], Set[str]) – A list or set containing blocklist labels.
Notes
Blocking a celltype will persist for this instance of the class and subsequent predictions will have this blocklist. Blocklists and safelists are mutually exclusive, setting one will clear the other.
Examples
>>> ca.blocklist_celltypes(["T cell"])
- property classes: set#
Get the set of all viable prediction classes.
- get_predictions_knn(embeddings, k=50, ef=100, weighting=False, disable_progress=False)[source]#
Get predictions from knn search results.
- Parameters:
embeddings (numpy.ndarray) – Embeddings as a numpy array.
k (int, default: 50) – The number of nearest neighbors.
ef (int, default: 100) – The size of the dynamic list for the nearest neighbors. See nmslib/hnswlib
weighting (bool, default: False) – Use distance weighting when getting the consensus prediction.
disable_progress (bool, default: False) – Disable tqdm progress bar
- Returns:
predictions (pandas.Series) – A pandas series containing celltype label predictions.
nn_idxs (numpy.ndarray) – A 2D numpy array of nearest neighbor indices [num_cells x k].
nn_dists (numpy.ndarray) – A 2D numpy array of nearest neighbor distances [num_cells x k].
stats (pandas.DataFrame) – Prediction statistics dataframe with columns: “hits” is a json string with the count for every class in k cells. “min_dist” is the minimum distance. “max_dist” is the maximum distance “vs2nd” is sum(best) / sum(best + 2nd best). “vsAll” is sum(best) / sum(all hits). “hits_weighted” is a json string with the weighted count for every class in k cells. “vs2nd_weighted” is weighted sum(best) / sum(best + 2nd best). “vsAll_weighted” is weighted sum(best) / sum(all hits).
- Return type:
Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray, pandas.DataFrame]
Examples
>>> ca = CellAnnotation(model_path="/opt/data/model") >>> embeddings = ca.get_embeddings(align_dataset(data, ca.gene_order).X) >>> predictions, nn_idxs, nn_dists, stats = ca.get_predictions_knn(embeddings)
- safelist_celltypes(labels)[source]#
Safelist celltypes.
- Parameters:
labels (List[str], Set[str]) – A list or set containing safelist labels.
Notes
Safelisting a celltype will persist for this instance of the class and subsequent predictions will have this safelist. Blocklists and safelists are mutually exclusive, setting one will clear the other.
Examples
>>> ca.safelist_celltypes(["CD4-positive, alpha-beta T cell"])