decima.interpret package¶
Submodules¶
decima.interpret.attributions module¶
- class decima.interpret.attributions.Attribution(inputs, attrs, gene='', chrom=None, start=None, end=None, strand=None, threshold=0.0005, min_seqlet_len=4, max_seqlet_len=25, additional_flanks=0)[source]¶
Bases:
object
Attribution analysis results for a gene.
- Parameters:
inputs (
Tensor
) – One-hot encoded sequenceattrs (
ndarray
) – Attribution scoresgene – Gene name
min_seqlet_len (
Optional
[int
]) – Minimum sequence length for peak findingmax_seqlet_len (
Optional
[int
]) – Maximum sequence length for peak findingadditional_flanks (
Optional
[int
]) – Additional flanks to add to the gene
- Returns:
Attribution analysis results for the gene and tasks
- Return type:
Examples
>>> attribution = Attribution( gene="A1BG", inputs=inputs, attrs=attrs, chrom="chr1", start=100, end=200, strand="+", threshold=5e-4, min_seqlet_len=4, max_seqlet_len=25, additional_flanks=0, ) >>> attribution.plot_peaks() >>> attribution.scan_motifs() >>> attribution.save_bigwig( ... "attributions.bigwig" ... ) >>> attribution.peaks_to_bed()
- __init__(inputs, attrs, gene='', chrom=None, start=None, end=None, strand=None, threshold=0.0005, min_seqlet_len=4, max_seqlet_len=25, additional_flanks=0)[source]¶
Initialize Attribution.
- Parameters:
inputs (
Tensor
) – One-hot encoded sequenceattrs (
ndarray
) – Attribution scoresmin_seqlet_len (
Optional
[int
]) – Minimum sequence length for peak findingmax_seqlet_len (
Optional
[int
]) – Maximum sequence length for peak findingadditional_flanks (
Optional
[int
]) – Additional flanks to add to the gene
- static find_peaks(attrs, threshold=0.0005, min_seqlet_len=4, max_seqlet_len=25, additional_flanks=0)[source]¶
- classmethod from_seq(inputs, tasks=None, off_tasks=None, model=0, transform='specificity', method='inputxgradient', device=None, result=None, gene='', chrom=None, start=None, end=None, strand=None, gene_mask_start=None, gene_mask_end=None, threshold=0.0005, min_seqlet_len=4, max_seqlet_len=25, additional_flanks=0)[source]¶
Initialize Attribution from sequence.
- Parameters:
inputs (
Union
[str
,Tensor
,ndarray
]) – Sequence to analyze either string of sequence, torch.Tensor or np.ndarray with shape (4, 524288) or (5, 524288) where the last dimension is a binary mask. If 4-dimensional, gene_mask_start and gene_mask_end must be provided.tasks (
Optional
[list
]) – List of cell types to analyze attributions foroff_tasks (
Optional
[list
]) – List of cell types to contrast againstmodel (
Union
[str
,int
,None
]) – Model to use for attribution analysistransform (
str
) – Transformation to apply to attributionsdevice (
Optional
[str
]) – Device to use for attribution analysisgene_start – Gene start position
gene_end – Gene end position
min_seqlet_len (
Optional
[int
]) – Minimum sequence length for peak findingmax_seqlet_len (
Optional
[int
]) – Maximum sequence length for peak findingadditional_flanks (
Optional
[int
]) – Additional flanks to add to the gene
- peaks_to_bed()[source]¶
Convert peaks to bed format.
- Returns:
- Peaks in bed format where columns are:
chrom: Chromosome name
start: Start position in genome
end: End position in genome
name: Peak name in format “gene@from_tss”
score: Score (-log10(p-value)) clipped to 0-100 based on the seqlet calling
strand: Strand == ‘.’
- Return type:
pd.DataFrame
- plot_peaks(overlapping_min_dist=1000, figsize=(10, 2))[source]¶
Plot attribution scores and highlight peaks.
- Parameters:
overlapping_min_dist – Minimum distance between peaks to consider them overlapping
figsize – Figure size in inches (width, height)
- Returns:
The plotted figure showing attribution scores with highlighted peaks
- Return type:
plotnine.ggplot
- plot_seqlogo(relative_loc=0, window=50, figsize=(10, 2))[source]¶
Plot attribution scores around a relative location.
- Parameters:
relative_loc – Position relative to TSS to center plot on
window – Number of bases to show on each side of center
- Returns:
Attribution plot
- Return type:
matplotlib.pyplot.Figure
- save_bigwig(bigwig_path)[source]¶
Save attribution scores as a bigwig file.
- Parameters:
bigwig_path (
str
) – Path to save bigwig file.
- save_peaks(bed_path)[source]¶
Save peaks to bed file.
- Parameters:
bed_path (
str
) – Path to save bed file.
- decima.interpret.attributions.attributions(inputs, tasks, off_tasks=None, model=0, transform='specificity', method='inputxgradient', device=None, **kwargs)[source]¶
Compute attributions for a gene.
- Parameters:
gene – Gene symbol or ID to analyze
tasks – List of cell types to analyze attributions for
off_tasks – List of cell types to contrast against
model – Model to use for attribution analysis
device – Device to use for attribution analysis
inputs – One-hot encoded sequence
transform – Transformation to apply to attributions
method – Method to use for attribution analysis
- Returns:
Attribution analysis results for the gene and tasks
- Return type:
decima.interpret.ism module¶
decima.interpret.save_attributions module¶
- decima.interpret.save_attributions.predict_save_attributions(output_dir, genes=None, seqs=None, tasks=None, off_tasks=None, model=0, metadata_anndata=None, method='inputxgradient', device=None, plot_peaks=True, plot_seqlogo=False, seqlogo_window=50, dpi=100)[source]¶
Generate and save attribution analysis results for a gene. This function performs attribution analysis for a given gene and cell types, saving the following output files to the specified directory:
output_dir/ ├── peaks.bed # List of attribution peaks in BED format ├── peaks.png # Plot showing peak locations ├── qc.log # QC warnings about prediction reliability ├── motifs.tsv # Detected motifs in peak regions ├── attributions.h5 # Raw attribution score matrix ├── attributions.bigwig # Genome browser track of attribution scores └── attributions_seq_logos/ # Directory containing attribution plots
└── {peak}.png # Attribution plot for each peak region
- Parameters:
output_dir (
str
) – Directory to save output filesgene – Gene symbol or ID to analyze
tasks (
Optional
[List
[str
]]) – List of cell types to analyze attributions foroff_tasks (
Optional
[List
[str
]]) – Optional list of cell types to contrast againstmodel (
Union
[str
,int
,None
]) – Optional model to use for attribution analysismethod (
str
) – Method to use for attribution analysisdevice (
Optional
[str
]) – Device to use for attribution analysisdpi (
int
) – DPI for attribution plots.
- Raises:
FileExistsError – If output directory already exists.
Examples: >>> predict_save_attributions( … output_dir=”output_dir”, … genes=[ … “SPI1”, … “CD68”, … ], … tasks=”cell_type == ‘classical monocyte’”, … )