decima.interpret package

Submodules

decima.interpret.attributions module

class decima.interpret.attributions.Attribution(inputs, attrs, gene='', chrom=None, start=None, end=None, strand=None, threshold=0.0005, min_seqlet_len=4, max_seqlet_len=25, additional_flanks=0)[source]

Bases: object

Attribution analysis results for a gene.

Parameters:
  • gene (Optional[str]) – Gene symbol or ID to analyze

  • inputs (Tensor) – One-hot encoded sequence

  • attrs (ndarray) – Attribution scores

  • gene – Gene name

  • chrom (Optional[str]) – Chromosome name

  • start (Optional[int]) – Start position

  • end (Optional[int]) – End position

  • strand (Optional[str]) – Strand

  • threshold (Optional[float]) – Threshold for peak finding

  • min_seqlet_len (Optional[int]) – Minimum sequence length for peak finding

  • max_seqlet_len (Optional[int]) – Maximum sequence length for peak finding

  • additional_flanks (Optional[int]) – Additional flanks to add to the gene

Returns:

Attribution analysis results for the gene and tasks

Return type:

Attribution

Examples

>>> attribution = Attribution(
    gene="A1BG",
    inputs=inputs,
    attrs=attrs,
    chrom="chr1",
    start=100,
    end=200,
    strand="+",
    threshold=5e-4,
    min_seqlet_len=4,
    max_seqlet_len=25,
    additional_flanks=0,
)
>>> attribution.plot_peaks()
>>> attribution.scan_motifs()
>>> attribution.save_bigwig(
...     "attributions.bigwig"
... )
>>> attribution.peaks_to_bed()
__init__(inputs, attrs, gene='', chrom=None, start=None, end=None, strand=None, threshold=0.0005, min_seqlet_len=4, max_seqlet_len=25, additional_flanks=0)[source]

Initialize Attribution.

Parameters:
  • inputs (Tensor) – One-hot encoded sequence

  • attrs (ndarray) – Attribution scores

  • gene (Optional[str]) – Gene name

  • chrom (Optional[str]) – Chromosome name

  • start (Optional[int]) – Start position

  • end (Optional[int]) – End position

  • strand (Optional[str]) – Strand

  • threshold (Optional[float]) – Threshold for peak finding

  • min_seqlet_len (Optional[int]) – Minimum sequence length for peak finding

  • max_seqlet_len (Optional[int]) – Maximum sequence length for peak finding

  • additional_flanks (Optional[int]) – Additional flanks to add to the gene

__repr__()[source]

Return repr(self).

property chrom: str

Get the chromosome name.

property end: int

Get the end position.

fasta_str()[source]

Get attribution scores as a fasta string.

static find_peaks(attrs, threshold=0.0005, min_seqlet_len=4, max_seqlet_len=25, additional_flanks=0)[source]
classmethod from_seq(inputs, tasks=None, off_tasks=None, model=0, transform='specificity', method='inputxgradient', device=None, result=None, gene='', chrom=None, start=None, end=None, strand=None, gene_mask_start=None, gene_mask_end=None, threshold=0.0005, min_seqlet_len=4, max_seqlet_len=25, additional_flanks=0)[source]

Initialize Attribution from sequence.

Parameters:
  • inputs (Union[str, Tensor, ndarray]) – Sequence to analyze either string of sequence, torch.Tensor or np.ndarray with shape (4, 524288) or (5, 524288) where the last dimension is a binary mask. If 4-dimensional, gene_mask_start and gene_mask_end must be provided.

  • tasks (Optional[list]) – List of cell types to analyze attributions for

  • off_tasks (Optional[list]) – List of cell types to contrast against

  • model (Union[str, int, None]) – Model to use for attribution analysis

  • transform (str) – Transformation to apply to attributions

  • device (Optional[str]) – Device to use for attribution analysis

  • gene (Optional[str]) – Gene name

  • chrom (Optional[str]) – Chromosome name

  • start (Optional[int]) – Start position

  • end (Optional[int]) – End position

  • strand (Optional[str]) – Strand

  • gene_start – Gene start position

  • gene_end – Gene end position

  • threshold (Optional[float]) – Threshold for peak finding

  • min_seqlet_len (Optional[int]) – Minimum sequence length for peak finding

  • max_seqlet_len (Optional[int]) – Maximum sequence length for peak finding

  • additional_flanks (Optional[int]) – Additional flanks to add to the gene

property gene_end: int

Get the gene end position.

property gene_start: int

Get the gene start position.

peaks_to_bed()[source]

Convert peaks to bed format.

Returns:

Peaks in bed format where columns are:
  • chrom: Chromosome name

  • start: Start position in genome

  • end: End position in genome

  • name: Peak name in format “gene@from_tss

  • score: Score (-log10(p-value)) clipped to 0-100 based on the seqlet calling

  • strand: Strand == ‘.’

Return type:

pd.DataFrame

plot_peaks(overlapping_min_dist=1000, figsize=(10, 2))[source]

Plot attribution scores and highlight peaks.

Parameters:
  • overlapping_min_dist – Minimum distance between peaks to consider them overlapping

  • figsize – Figure size in inches (width, height)

Returns:

The plotted figure showing attribution scores with highlighted peaks

Return type:

plotnine.ggplot

Plot attribution scores around a relative location.

Parameters:
  • relative_loc – Position relative to TSS to center plot on

  • window – Number of bases to show on each side of center

Returns:

Attribution plot

Return type:

matplotlib.pyplot.Figure

save_bigwig(bigwig_path)[source]

Save attribution scores as a bigwig file.

Parameters:

bigwig_path (str) – Path to save bigwig file.

save_fasta(fasta_path)[source]

Save attribution scores as a fasta file.

save_peaks(bed_path)[source]

Save peaks to bed file.

Parameters:

bed_path (str) – Path to save bed file.

scan_motifs(motifs='hocomoco_v12', window=18, pthresh=0.0005)[source]

Scan for motifs in peak regions.

Parameters:
  • motifs (str) – Motif database to use

  • window (int) – Window size around peaks

  • pthresh (float) – P-value threshold for motif matches

Returns:

Motif scan results

Return type:

pd.DataFrame

property start: int

Get the start position.

property strand: str

Get the strand.

decima.interpret.attributions.attributions(inputs, tasks, off_tasks=None, model=0, transform='specificity', method='inputxgradient', device=None, **kwargs)[source]

Compute attributions for a gene.

Parameters:
  • gene – Gene symbol or ID to analyze

  • tasks – List of cell types to analyze attributions for

  • off_tasks – List of cell types to contrast against

  • model – Model to use for attribution analysis

  • device – Device to use for attribution analysis

  • inputs – One-hot encoded sequence

  • transform – Transformation to apply to attributions

  • method – Method to use for attribution analysis

Returns:

Attribution analysis results for the gene and tasks

Return type:

Attribution

decima.interpret.attributions.get_attribution_method(method)[source]

Get attribution method from string.

Parameters:

method (str) – Method to use for attribution analysis

Returns:

Attribution analysis results for the gene and tasks

Return type:

Attribution

decima.interpret.ism module

decima.interpret.save_attributions module

decima.interpret.save_attributions.predict_save_attributions(output_dir, genes=None, seqs=None, tasks=None, off_tasks=None, model=0, metadata_anndata=None, method='inputxgradient', device=None, plot_peaks=True, plot_seqlogo=False, seqlogo_window=50, dpi=100)[source]

Generate and save attribution analysis results for a gene. This function performs attribution analysis for a given gene and cell types, saving the following output files to the specified directory:

output_dir/ ├── peaks.bed # List of attribution peaks in BED format ├── peaks.png # Plot showing peak locations ├── qc.log # QC warnings about prediction reliability ├── motifs.tsv # Detected motifs in peak regions ├── attributions.h5 # Raw attribution score matrix ├── attributions.bigwig # Genome browser track of attribution scores └── attributions_seq_logos/ # Directory containing attribution plots

└── {peak}.png # Attribution plot for each peak region

Parameters:
  • output_dir (str) – Directory to save output files

  • gene – Gene symbol or ID to analyze

  • tasks (Optional[List[str]]) – List of cell types to analyze attributions for

  • off_tasks (Optional[List[str]]) – Optional list of cell types to contrast against

  • model (Union[str, int, None]) – Optional model to use for attribution analysis

  • method (str) – Method to use for attribution analysis

  • device (Optional[str]) – Device to use for attribution analysis

  • dpi (int) – DPI for attribution plots.

Raises:

FileExistsError – If output directory already exists.

Examples: >>> predict_save_attributions( … output_dir=”output_dir”, … genes=[ … “SPI1”, … “CD68”, … ], … tasks=”cell_type == ‘classical monocyte’”, … )

Module contents