scimilarity.triplet_selector#
- class scimilarity.triplet_selector.TripletSelector(margin, negative_selection='semihard', perturb_labels=True, perturb_labels_fraction=0.5)[source]#
Bases:
object
For each anchor-positive pair, mine negative samples to create a triplet.
- Parameters:
margin (float) –
negative_selection (str) –
perturb_labels (bool) –
perturb_labels_fraction (float) –
- get_asw(embeddings, labels, int2label, metric='cosine')[source]#
- Get the average silhouette width of celltypes, being aware of cell ontology such that
ancestors are not considered inter-cluster and descendants are considered intra-cluster.
- Parameters:
embeddings (numpy.ndarray, torch.Tensor) – Cell embeddings.
labels (List[str]) – Celltype names.
int2label (dict) – Dictionary to map labels in integer form to string
metric (str, default: "cosine") – The distance metric to use for scipy.spatial.distance.cdist().
- Returns:
asw – The average silhouette width.
- Return type:
float
Examples
>>> asw = ontology_silhouette_width(embeddings, labels, metric="cosine")
- get_triplets(embeddings, labels, int2label, studies=None)[source]#
Get triplets as anchor, positive, and negative cell embeddings.
- Parameters:
embeddings (numpy.ndarray, torch.Tensor) – Cell embeddings.
labels (numpy.ndarray, torch.Tensor) – Cell labels in integer form.
int2label (dict) – Dictionary to map labels in integer form to string
studies (numpy.ndarray, torch.Tensor, optional, default: None) – Studies metadata for each cell.
- Returns:
triplets (Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray]) – A tuple of numpy arrays containing anchor, positive, and negative cell embeddings.
num_hard_triplets (int) – Number of hard triplets.
num_viable_triplets (int) – Number of viable triplets.
- get_triplets_idx(embeddings, labels, int2label, studies=None)[source]#
Get triplets as anchor, positive, and negative cell indices.
- Parameters:
embeddings (numpy.ndarray, torch.Tensor) – Cell embeddings.
labels (numpy.ndarray, torch.Tensor) – Cell labels in integer form.
int2label (dict) – Dictionary to map labels in integer form to string
studies (numpy.ndarray, torch.Tensor, optional, default: None) – Studies metadata for each cell.
- Returns:
triplets (Tuple[List, List, List]) – A tuple of lists containing anchor, positive, and negative cell indices.
num_hard_triplets (int) – Number of hard triplets.
num_viable_triplets (int) – Number of viable triplets.
)
- hardest_negative(loss_values)[source]#
Get hardest negative.
- Parameters:
loss_values (numpy.ndarray) – Triplet loss of all negatives for given anchor positive pair.
- Returns:
Index of selection.
- Return type:
int
- pdist(vectors)[source]#
Get pair-wise distance between all cell embeddings.
- Parameters:
vectors (numpy.ndarray) – Cell embeddings.
- Returns:
Distance matrix of cell embeddings.
- Return type:
numpy.ndarray