decima.interpret package¶

Submodules¶

decima.interpret.attributions module¶

class decima.interpret.attributions.Attribution(inputs, attrs, gene='', chrom=None, start=None, end=None, strand=None, threshold=0.0005, min_seqlet_len=4, max_seqlet_len=25, additional_flanks=0)[source]¶

Bases: object

Attribution analysis results for a gene.

Parameters:

gene (Optional[str]) – Gene symbol or ID to analyze
inputs (Tensor) – One-hot encoded sequence
attrs (ndarray) – Attribution scores
gene – Gene name
chrom (Optional[str]) – Chromosome name
start (Optional[int]) – Start position
end (Optional[int]) – End position
strand (Optional[str]) – Strand
threshold (Optional[float]) – Threshold for peak finding
min_seqlet_len (Optional[int]) – Minimum sequence length for peak finding
max_seqlet_len (Optional[int]) – Maximum sequence length for peak finding
additional_flanks (Optional[int]) – Additional flanks to add to the gene

Returns:

Attribution analysis results for the gene and tasks

Return type:

Attribution

Examples

>>> attribution = Attribution(
    gene="A1BG",
    inputs=inputs,
    attrs=attrs,
    chrom="chr1",
    start=100,
    end=200,
    strand="+",
    threshold=5e-4,
    min_seqlet_len=4,
    max_seqlet_len=25,
    additional_flanks=0,
)
>>> attribution.plot_peaks()
>>> attribution.scan_motifs()
>>> attribution.save_bigwig(
...     "attributions.bigwig"
... )
>>> attribution.peaks_to_bed()

__init__(inputs, attrs, gene='', chrom=None, start=None, end=None, strand=None, threshold=0.0005, min_seqlet_len=4, max_seqlet_len=25, additional_flanks=0)[source]¶

Initialize Attribution.

Parameters:

inputs (Tensor) – One-hot encoded sequence
attrs (ndarray) – Attribution scores
gene (Optional[str]) – Gene name
chrom (Optional[str]) – Chromosome name
start (Optional[int]) – Start position
end (Optional[int]) – End position
strand (Optional[str]) – Strand
threshold (Optional[float]) – Threshold for peak finding
min_seqlet_len (Optional[int]) – Minimum sequence length for peak finding
max_seqlet_len (Optional[int]) – Maximum sequence length for peak finding
additional_flanks (Optional[int]) – Additional flanks to add to the gene

__repr__()[source]¶: Return repr(self).

property chrom: str¶: Get the chromosome name.

property end: int¶: Get the end position.

fasta_str()[source]¶: Get attribution scores as a fasta string.

static find_peaks(attrs, threshold=0.0005, min_seqlet_len=4, max_seqlet_len=25, additional_flanks=0)[source]¶

classmethod from_seq(inputs, tasks=None, off_tasks=None, model=0, transform='specificity', method='inputxgradient', device=None, result=None, gene='', chrom=None, start=None, end=None, strand=None, gene_mask_start=None, gene_mask_end=None, threshold=0.0005, min_seqlet_len=4, max_seqlet_len=25, additional_flanks=0)[source]¶

Initialize Attribution from sequence.

Parameters:

inputs (Union[str, Tensor, ndarray]) – Sequence to analyze either string of sequence, torch.Tensor or np.ndarray with shape (4, 524288) or (5, 524288) where the last dimension is a binary mask. If 4-dimensional, gene_mask_start and gene_mask_end must be provided.
tasks (Optional[list]) – List of cell types to analyze attributions for
off_tasks (Optional[list]) – List of cell types to contrast against
model (Union[str, int, None]) – Model to use for attribution analysis
transform (str) – Transformation to apply to attributions
device (Optional[str]) – Device to use for attribution analysis
gene (Optional[str]) – Gene name
chrom (Optional[str]) – Chromosome name
start (Optional[int]) – Start position
end (Optional[int]) – End position
strand (Optional[str]) – Strand
gene_start – Gene start position
gene_end – Gene end position
threshold (Optional[float]) – Threshold for peak finding
min_seqlet_len (Optional[int]) – Minimum sequence length for peak finding
max_seqlet_len (Optional[int]) – Maximum sequence length for peak finding
additional_flanks (Optional[int]) – Additional flanks to add to the gene

property gene_end: int¶: Get the gene end position.

property gene_start: int¶: Get the gene start position.

peaks_to_bed()[source]¶

Convert peaks to bed format.

Returns:

Peaks in bed format where columns are:

chrom: Chromosome name
start: Start position in genome
end: End position in genome
name: Peak name in format “gene@from_tss”
score: Score (-log10(p-value)) clipped to 0-100 based on the seqlet calling
strand: Strand == ‘.’

Return type:

pd.DataFrame

plot_peaks(overlapping_min_dist=1000, figsize=(10, 2))[source]¶

Plot attribution scores and highlight peaks.

Parameters:

overlapping_min_dist – Minimum distance between peaks to consider them overlapping
figsize – Figure size in inches (width, height)

Returns:

The plotted figure showing attribution scores with highlighted peaks

Return type:

plotnine.ggplot

plot_seqlogo(relative_loc=0, window=50, figsize=(10, 2))[source]¶

Plot attribution scores around a relative location.

Parameters:

relative_loc – Position relative to TSS to center plot on
window – Number of bases to show on each side of center

Returns:

Attribution plot

Return type:

matplotlib.pyplot.Figure

save_bigwig(bigwig_path)[source]¶

Save attribution scores as a bigwig file.

Parameters:: bigwig_path (str) – Path to save bigwig file.

save_fasta(fasta_path)[source]¶: Save attribution scores as a fasta file.

save_peaks(bed_path)[source]¶

Save peaks to bed file.

Parameters:: bed_path (str) – Path to save bed file.

scan_motifs(motifs='hocomoco_v12', window=18, pthresh=0.0005)[source]¶

Scan for motifs in peak regions.

Parameters:

motifs (str) – Motif database to use
window (int) – Window size around peaks
pthresh (float) – P-value threshold for motif matches

Returns:

Motif scan results

Return type:

pd.DataFrame

property start: int¶: Get the start position.

property strand: str¶: Get the strand.

decima.interpret.attributions.attributions(inputs, tasks, off_tasks=None, model=0, transform='specificity', method='inputxgradient', device=None, **kwargs)[source]¶

Compute attributions for a gene.

Parameters:

gene – Gene symbol or ID to analyze
tasks – List of cell types to analyze attributions for
off_tasks – List of cell types to contrast against
model – Model to use for attribution analysis
device – Device to use for attribution analysis
inputs – One-hot encoded sequence
transform – Transformation to apply to attributions
method – Method to use for attribution analysis

Returns:

Attribution analysis results for the gene and tasks

Return type:

Attribution

decima.interpret.attributions.get_attribution_method(method)[source]¶

Get attribution method from string.

Parameters:: method (str) – Method to use for attribution analysis
Returns:: Attribution analysis results for the gene and tasks
Return type:: Attribution

decima.interpret.ism module¶

decima.interpret.save_attributions module¶

decima.interpret.save_attributions.predict_save_attributions(output_dir, genes=None, seqs=None, tasks=None, off_tasks=None, model=0, metadata_anndata=None, method='inputxgradient', device=None, plot_peaks=True, plot_seqlogo=False, seqlogo_window=50, dpi=100)[source]¶

Generate and save attribution analysis results for a gene. This function performs attribution analysis for a given gene and cell types, saving the following output files to the specified directory:

output_dir/ ├── peaks.bed # List of attribution peaks in BED format ├── peaks.png # Plot showing peak locations ├── qc.log # QC warnings about prediction reliability ├── motifs.tsv # Detected motifs in peak regions ├── attributions.h5 # Raw attribution score matrix ├── attributions.bigwig # Genome browser track of attribution scores └── attributions_seq_logos/ # Directory containing attribution plots

└── {peak}.png # Attribution plot for each peak region

Parameters:

output_dir (str) – Directory to save output files
gene – Gene symbol or ID to analyze
tasks (Optional[List[str]]) – List of cell types to analyze attributions for
off_tasks (Optional[List[str]]) – Optional list of cell types to contrast against
model (Union[str, int, None]) – Optional model to use for attribution analysis
method (str) – Method to use for attribution analysis
device (Optional[str]) – Device to use for attribution analysis
dpi (int) – DPI for attribution plots.

Raises:

FileExistsError – If output directory already exists.

Examples: >>> predict_save_attributions( … output_dir=”output_dir”, … genes=[ … “SPI1”, … “CD68”, … ], … tasks=”cell_type == ‘classical monocyte’”, … )

decima.interpret package¶

Submodules¶

decima.interpret.attributions module¶

decima.interpret.ism module¶

decima.interpret.save_attributions module¶

Module contents¶