scimilarity.cell_annotation#

class scimilarity.cell_annotation.CellAnnotation(model_path, use_gpu=False, filenames=None)[source]#

Bases: CellSearchKNN

A class that annotates cells using a cell embedding and then knn search.

Parameters:
  • model_path (str) – Path to the directory containing model files.

  • use_gpu (bool, default: False) – Use GPU instead of CPU.

  • filenames (dict, optional, default: None) – Use a dictionary of custom filenames for files instead default.

Examples

>>> ca = CellAnnotation(model_path="/opt/data/model")
annotate_dataset(data)[source]#

Annotate dataset with celltype predictions.

Parameters:

data (anndata.AnnData) – The annotated data matrix with rows for cells and columns for genes. This function assumes the data has been log normalized (i.e. via lognorm_counts) accordingly.

Returns:

A data object where:
  • celltype predictions are in obs[“celltype_hint”]

  • embeddings are in obs[“X_scimilarity”].

Return type:

anndata.AnnData

Examples

>>> ca = CellAnnotation(model_path="/opt/data/model")
>>> data = annotate_dataset(data)
blocklist_celltypes(labels)[source]#

Blocklist celltypes.

Parameters:

labels (List[str], Set[str]) – A list or set containing blocklist labels.

Notes

Blocking a celltype will persist for this instance of the class and subsequent predictions will have this blocklist. Blocklists and safelists are mutually exclusive, setting one will clear the other.

Examples

>>> ca.blocklist_celltypes(["T cell"])
property classes: set#

Get the set of all viable prediction classes.

get_predictions_knn(embeddings, k=50, ef=100, weighting=False, disable_progress=False)[source]#

Get predictions from knn search results.

Parameters:
  • embeddings (numpy.ndarray) – Embeddings as a numpy array.

  • k (int, default: 50) – The number of nearest neighbors.

  • ef (int, default: 100) – The size of the dynamic list for the nearest neighbors. See nmslib/hnswlib

  • weighting (bool, default: False) – Use distance weighting when getting the consensus prediction.

  • disable_progress (bool, default: False) – Disable tqdm progress bar

Returns:

  • predictions (pandas.Series) – A pandas series containing celltype label predictions.

  • nn_idxs (numpy.ndarray) – A 2D numpy array of nearest neighbor indices [num_cells x k].

  • nn_dists (numpy.ndarray) – A 2D numpy array of nearest neighbor distances [num_cells x k].

  • stats (pandas.DataFrame) – Prediction statistics dataframe with columns: “hits” is a json string with the count for every class in k cells. “min_dist” is the minimum distance. “max_dist” is the maximum distance “vs2nd” is sum(best) / sum(best + 2nd best). “vsAll” is sum(best) / sum(all hits). “hits_weighted” is a json string with the weighted count for every class in k cells. “vs2nd_weighted” is weighted sum(best) / sum(best + 2nd best). “vsAll_weighted” is weighted sum(best) / sum(all hits).

Return type:

Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray, pandas.DataFrame]

Examples

>>> ca = CellAnnotation(model_path="/opt/data/model")
>>> embeddings = ca.get_embeddings(align_dataset(data, ca.gene_order).X)
>>> predictions, nn_idxs, nn_dists, stats = ca.get_predictions_knn(embeddings)
reset_knn()[source]#

Reset the knn such that nothing is marked deleted.

Examples

>>> ca.reset_knn()
safelist_celltypes(labels)[source]#

Safelist celltypes.

Parameters:

labels (List[str], Set[str]) – A list or set containing safelist labels.

Notes

Safelisting a celltype will persist for this instance of the class and subsequent predictions will have this safelist. Blocklists and safelists are mutually exclusive, setting one will clear the other.

Examples

>>> ca.safelist_celltypes(["CD4-positive, alpha-beta T cell"])