SIGnature.models.scimilarity#

class SIGnature.models.scimilarity.Encoder(*args, **kwargs)[source]#

Bases: Module

A class that encapsulates the SCimilarity encoder.

Parameters:

n_genes (int)
latent_dim (int)
hidden_dim (List[int])
dropout (float)
input_dropout (float)
residual (bool)

forward(x)[source]#

Forward.

Parameters:: x (torch.Tensor) – Input tensor corresponding to input layer.
Returns:: Output tensor corresponding to output layer.
Return type:: torch.Tensor

load_state(filename, use_gpu=False)[source]#

Load model state.

Parameters:

filename (str) – Filename containing the model state.
use_gpu (bool, default: False) – Boolean indicating whether or not to use GPUs.

save_state(filename)[source]#

Save model state.

Parameters:: filename (str) – Filename to save the model state.

class SIGnature.models.scimilarity.SCimilarityWrapper(*args, **kwargs)[source]#

Bases: Module

A wrapper for the SCimilarity (https://doi.org/10.1038/s41586-024-08411-y) Encoder model to enable attribution methods.

This adapts the output of the model for use with Captum’s attribution algorithms. Its forward method requires an additional ‘weights’ tensor to be passed, which is the output of the original model on the input. A class loads the SCimilarity model.

Parameters:

model_path (str)
use_gpu (bool)

calculate_attributions(X, method='ig', batch_size=500, multiply_by_inputs=True, disable_tqdm=False, target_sum=1000.0, npz_path=None)[source]#

Calculates gene attributions for the SCimilarity model using a specified method.

Parameters:

X (torch.Tensor | numpy.ndarray | scipy.sparse.csr_matrix) – The input data matrix (e.g., log-normalized gene expression).
method (str) – The attribution method to use. Options are “ig” (Integrated Gradients), “dl” (DeepLift), or “ixg” (Saliency).
batch_size (int) – The number of samples to process in each batch.
multiply_by_inputs (bool) – Whether to multiply attributions by input values. Note: for Integrated Gradients and DeepLift, this is passed to the Captum constructor. For Saliency, the multiplication is done manually after calculation.
disable_tqdm (bool) – Whether to disable the progress bar.
target_sum (float) – The desired sum for each row after normalization.
npz_path (str | None) – Path to save the resulting sparse attribution matrix.

Returns:

A scipy.sparse.csr_matrix containing the calculated attributions.

Return type:

scipy.sparse.csr_matrix

forward(inputs, weights)[source]#

The forward pass designed for Captum.

This method is a simple pass-through to the original model, with the final output multiplied by a pre-computed ‘weights’ tensor.

Parameters:

inputs (torch.Tensor) – A tensor of shape [batch_size, n_genes].
weights (torch.Tensor) – The pre-computed model output for the same batch, with shape [batch_size, latent_dim].

Returns:

A tensor of shape [batch_size, 2].

Return type:

torch.Tensor

preprocess_adata(adata, gene_overlap_threshold=500)[source]#

Preprocesses an AnnData object for use with the SCimilarity model.

This method aligns the gene space, subsets the data to the model’s gene order, and log-normalizes the counts.

Parameters:

adata (anndata.AnnData) – The AnnData object to be preprocessed.
gene_overlap_threshold (int) – The minimum number of genes in common between the AnnData object and the model’s gene order.

Returns:

The preprocessed AnnData object.

Return type:

anndata.AnnData