decima package¶

Subpackages¶

Submodules¶

decima.constants module¶

Module contents¶

class decima.DecimaResult(anndata)[source]¶

Bases: object

Container for Decima results and model predictions.

This class provides a unified interface for loading pre-trained Decima models and associated metadata, making predictions, and performing attribution analyses.

The DecimaResult object contains:

An AnnData object with gene expression and metadata
A trained model for making predictions
Methods for attribution analysis and interpretation

Parameters:: anndata – AnnData object containing gene expression data and metadata

Examples

>>> # Load default pre-trained model and metadata
>>> result = DecimaResult.load()
>>> result.load_model(
...     rep=0
... )
>>> # Perform attribution analysis
>>> attributions = result.attributions(
...     output_dir="attrs_SP1I_classical_monoctypes",
...     gene="SPI1",
...     tasks='cell_type == "classical monocyte"',
... )

Properties:: model: Decima model genes: List of gene names cells: List of cell names cell_metadata: Cell metadata gene_metadata: Gene metadata shape: Shape of the expression matrix attributions: Attributions for a gene

__init__(anndata)[source]¶

__repr__()[source]¶: Return repr(self).

attributions(gene, tasks=None, off_tasks=None, transform='specificity', method='inputxgradient', threshold=0.0005, min_seqlet_len=4, max_seqlet_len=25, additional_flanks=0)[source]¶

Get attributions for a specific gene.

Parameters:

gene (str) – Gene name
tasks (Optional[List[str]]) – List of cells to use as on task
off_tasks (Optional[List[str]]) – List of cells to use as off task
transform (str) – Attribution transform method
method (str) – Attribution method
n_peaks – Number of peaks to find
min_dist – Minimum distance between peaks

Returns:

Container with inputs, predictions, attribution scores and TSS position

Return type:

Attribution

property cell_metadata: DataFrame¶: Cell metadata including annotations, metrics, etc.

property cells: List[str]¶: List of cell identifiers in the dataset.

property gene_metadata: DataFrame¶: Gene metadata.

gene_sequence(gene, stranded=True)[source]¶

Get sequence for a gene.

Return type:: str

property genes: List[str]¶: List of gene names in the dataset.

get_cell_metadata(cell)[source]¶

Get metadata for a specific cell.

Return type:: CellMetadata

get_gene_metadata(gene)[source]¶

Get metadata for a specific gene.

Return type:: GeneMetadata

classmethod load(anndata_path=None)[source]¶

Load a DecimaResult object from an anndata file or a path to an anndata file.

Parameters:: anndata_path (Union[str, AnnData, None]) – Path to anndata file or anndata object
Returns:: DecimaResult object

Examples

>>> result = DecimaResult.load()  # Load default decima metadata
>>> result = DecimaResult.load(
...     "path/to/anndata.h5ad"
... )  # Load custom anndata object from file

load_model(model=0, device='cpu')[source]¶

Load the trained model from a checkpoint path.

Parameters:

model (Union[str, int, None]) – Path to model checkpoint or replicate number (0-3) for pre-trained models
device (str) – Device to load model on

Returns:

self

Examples

>>> result = DecimaResult.load()
>>> result.load_model()  # Load default model (rep0)
>>> result.load_model(
...     model="path/to/checkpoint.ckpt"
... )
>>> result.load_model(
...     model=2
... )

property model¶: Decima model.

predicted_expression_matrix(genes=None, model_name=None)[source]¶

Get predicted expression matrix for all or specific genes.

Parameters:: genes (Optional[List[str]]) – Optional list of genes to get predictions for. If None, returns all genes.
Returns:: Predicted expression matrix (cells x genes)
Return type:: pd.DataFrame

predicted_gene_expression(gene, model_name)[source]¶

prepare_one_hot(gene, variants=None, padding=0)[source]¶

Prepare one-hot encoding for a gene.

Parameters:: gene (str) – Gene name
Returns:: One-hot encoding of the gene
Return type:: torch.Tensor

query_cells(query)[source]¶

query_tasks(tasks=None, off_tasks=None)[source]¶

property shape: tuple¶: Shape of the expression matrix (n_cells, n_genes).

decima.predict_save_attributions(output_dir, genes=None, seqs=None, tasks=None, off_tasks=None, model=0, metadata_anndata=None, method='inputxgradient', device=None, plot_peaks=True, plot_seqlogo=False, seqlogo_window=50, dpi=100)[source]¶

Generate and save attribution analysis results for a gene. This function performs attribution analysis for a given gene and cell types, saving the following output files to the specified directory:

output_dir/ ├── peaks.bed # List of attribution peaks in BED format ├── peaks.png # Plot showing peak locations ├── qc.log # QC warnings about prediction reliability ├── motifs.tsv # Detected motifs in peak regions ├── attributions.h5 # Raw attribution score matrix ├── attributions.bigwig # Genome browser track of attribution scores └── attributions_seq_logos/ # Directory containing attribution plots

└── {peak}.png # Attribution plot for each peak region

Parameters:

output_dir (str) – Directory to save output files
gene – Gene symbol or ID to analyze
tasks (Optional[List[str]]) – List of cell types to analyze attributions for
off_tasks (Optional[List[str]]) – Optional list of cell types to contrast against
model (Union[str, int, None]) – Optional model to use for attribution analysis
method (str) – Method to use for attribution analysis
device (Optional[str]) – Device to use for attribution analysis
dpi (int) – DPI for attribution plots.

Raises:

FileExistsError – If output directory already exists.

Examples: >>> predict_save_attributions( … output_dir=”output_dir”, … genes=[ … “SPI1”, … “CD68”, … ], … tasks=”cell_type == ‘classical monocyte’”, … )

decima.predict_variant_effect(df_variant, output_pq=None, tasks=None, model='ensemble', metadata_anndata=None, chunksize=10000, batch_size=8, num_workers=16, device=None, include_cols=None, gene_col=None, distance_type='tss', min_distance=0, max_distance=inf, genome='hg38', save_replicates=False, reference_cache=True, float_precision='32')[source]¶

Predict variant effect and save to parquet

Parameters:

df_variant (pd.DataFrame) – DataFrame with variant information
output_path (str) – Path to save the parquet file
tasks (str, optional) – Tasks to predict. Defaults to None.
model (int, optional) – Model to use. Defaults to 0.
metadata_anndata (str, optional) – Path to anndata file. Defaults to None.
chunksize (int, optional) – Number of variants to predict in each chunk. Defaults to 10_000.
batch_size (int, optional) – Batch size. Defaults to 8.
num_workers (int, optional) – Number of workers. Defaults to 16.
device (str, optional) – Device to use. Defaults to “cpu”.
include_cols (list, optional) – Columns to include in the output. Defaults to None.
gene_col (str, optional) – Column name for gene names. Defaults to None.
distance_type (str, optional) – Type of distance. Defaults to “tss”.
min_distance (float, optional) – Minimum distance from the end of the gene. Defaults to 0 (inclusive).
max_distance (float, optional) – Maximum distance from the TSS. Defaults to inf (exclusive).
genome (str, optional) – Genome build. Defaults to “hg38”.

Return type:

None