decima.interpret package¶
Submodules¶
decima.interpret.attributer module¶
- class decima.interpret.attributer.DecimaAttributer(model, tasks, off_tasks=None, method='inputxgradient', transform='specificity')[source]¶
Bases:
objectDecimaAttributer class for attribution analysis.
- Parameters:
model – Model to attribute.
tasks – Tasks to attribute.
off_tasks – Off tasks to attribute.
method (
str) – Method to use for attribution analysis available options: “saliency”, “inputxgradient”, “integratedgradients”.transform – Transform to use for attribution analysis.
Examples
>>> attributer = DecimaAttributer( ... model, ... tasks, ... off_tasks, ... method, ... transform, ... ) >>> attributer.attribute( ... inputs ... )
- attribute(inputs, **kwargs)[source]¶
Attribute inputs.
- Parameters:
inputs – Inputs to attribute.
**kwargs – Additional arguments to pass to the attribution method.
- Returns:
Attribution analysis results for the gene and tasks
- Return type:
torch.Tensor
- classmethod load_decima_attributer(model_name, tasks, off_tasks=None, method='inputxgradient', transform='specificity', device='cpu')[source]¶
Load DecimaAttributer.
- Parameters:
model_name – Model name to load.
tasks – Tasks to attribute.
off_tasks – Off tasks to attribute.
method (
str) – Method to use for attribution analysis available options: “saliency”, “inputxgradient”, “integratedgradients”.transform – Transform to use for attribution analysis.
device – Device to use for attribution analysis.
decima.interpret.attributions module¶
Attributions module predict attributes and performs recursive seqlet calling.
Examples
>>> predict_save_attributions(
... output_prefix="output_prefix",
... tasks=[
... "agg1",
... "agg2",
... ],
... off_tasks=[
... "agg3",
... "agg4",
... ],
... )
>>> recursive_seqlet_calling(
... output_prefix="output_prefix",
... attributions="attributions.h5",
... tasks=[
... "agg1",
... "agg2",
... ],
... off_tasks=[
... "agg3",
... "agg4",
... ],
... )
- decima.interpret.attributions.plot_attributions(output_prefix, genes=None, metadata_anndata=None, tss_distance=None, seqlogo_window=50, agg_func='mean', custom_genome=False, dpi=100)[source]¶
Plot attributions.
- Parameters:
output_prefix (
str) – Prefix for the output files.genes (
Union[str,List[str],None]) – Genes to attribute if not provided, all genes will be used.tss_distance (
Optional[int]) – TSS distance for attribution for plotting.seqlogo_window (
int) – Seqlogo window.agg_func (
Optional[str]) – Agg func for aggregation of attributions across replicates. Available options: ‘mean’, ‘max’.custom_genome (
bool) – Custom genome if custom genome bigwig files will be generated as each gene is difference chromosome.dpi (
int) – DPI for attribution plots.
- decima.interpret.attributions.predict_attributions_seqlet_calling(output_prefix, genes=None, seqs=None, tasks=None, off_tasks=None, model='ensemble', metadata_anndata=None, method='inputxgradient', transform='specificity', num_workers=2, tss_distance=None, batch_size=1, top_n_markers=None, device=None, threshold=0.0005, min_seqlet_len=4, max_seqlet_len=25, additional_flanks=0, pattern_type='both', meme_motif_db='hocomoco_v13', genome='hg38')[source]¶
Generate and save attribution analysis results for a gene. This function performs attribution analysis for a given gene and cell types, saving the following output files to the specified directory:
- Output files:
├── {output_prefix}.attributions.h5 # Raw attribution score matrix per gene.
├── {output_prefix}.attributions.bigwig # Genome browser track of attribution as bigwig file.
├── {output_prefix}.seqlets.bed # List of attribution peaks in BED format.
├── {output_prefix}.motifs.tsv # Detected motifs in peak regions.
└── {output_prefix}.warnings.qc.log # QC warnings about prediction reliability.
- Parameters:
output_dir – Directory to save output files
gene – Gene symbol or ID to analyze
tasks (
Optional[List[str]]) – List of cell types to analyze attributions for either list of task names or query string to filter cell types to analyze attributions for (e.g. ‘cell_type == ‘classical monocyte’’). If not provided, all tasks will be analyzed.off_tasks (
Optional[List[str]]) – Optional list of cell types to contrast against either list of task names or query string to filter cell types to contrast against (e.g. ‘cell_type == ‘classical monocyte’’). If not provided, all tasks will be used as off tasks.model (
Union[int,str,None]) – Optional model to use for attribution analysis either replicate number or path to the model.method (
str) – Method to use for attribution analysis available options: “saliency”, “inputxgradient”, “integratedgradients”.device (
Optional[str]) – Device to use for attribution analysis (e.g. ‘cuda’, ‘cpu’). If not provided, the best available device will be used automatically.dpi – DPI for attribution plots.
- Raises:
FileExistsError – If output directory already exists.
Examples: >>> predict_save_attributions( … output_dir=”output_dir”, … genes=[ … “SPI1”, … “CD68”, … ], … tasks=”cell_type == ‘classical monocyte’”, … )
- decima.interpret.attributions.predict_save_attributions(output_prefix, tasks=None, off_tasks=None, model='ensemble', metadata_anndata=None, method='inputxgradient', transform='specificity', batch_size=1, genes=None, seqs=None, top_n_markers=None, bigwig=True, correct_grad_bigwig=True, num_workers=4, device=None, genome='hg38')[source]¶
Generate and save attribution analysis results for a gene.
- Parameters:
output_prefix (
str) – Prefix for the output files where attribution results will be saved.tasks (
Optional[List[str]]) – Tasks to attribute for prediction either list of task names or query string to filter cell types to analyze attributions for (e.g. ‘cell_type == ‘classical monocyte’’). If not provided, all tasks will be analyzed.off_tasks (
Optional[List[str]]) – Off tasks to attribute for prediction either list of task names or query string to filter cell types to contrast against (e.g. ‘cell_type == ‘classical monocyte’’). If not provided, no contrast will be performed.model (
Optional[int]) – Model to attribute for prediction default is 0. Can be replicate number (0-3) or path to custom model.metadata_anndata (
Optional[str]) – Metadata anndata path or DecimaResult object. If not provided, the default metadata will be downloaded from wandb.method (
str) – Method to use for attribution analysis available options: “saliency”, “inputxgradient”, “integratedgradients”. Default is “inputxgradient”.transform (
str) – Transform to use for attribution analysis available options: “specificity”, “aggregate”. Default is “specificity”.batch_size (
int) – Batch size for attribution analysis default is 1. Increasing batch size may speed up computation but requires more memory.genes (
Optional[List[str]]) – Genes to attribute for prediction if not provided, all genes will be used. Can be list of gene symbols or IDs.seqs (
Union[str,DataFrame,ndarray,Tensor,None]) – Sequences to attribute for prediction. Can be path to fasta file, DataFrame, or numpy/torch tensor. Mutually exclusive with genes parameter.top_n_markers (
Optional[int]) – Top n markers for prediction if not provided, genes will be used. Useful for analyzing only the most important marker genes.bigwig (
bool) – Whether to save attribution scores as a bigwig file default is True. Bigwig files can be loaded in genome browsers for visualization.correct_grad_bigwig (
bool) – Whether to correct the gradient bigwig file default is True. Applies gradient correction for better visualization.num_workers (
int) – Number of workers for attribution analysis default is 4. Increasing number of workers will speed up the process.device (
Optional[str]) – Device to use for attribution analysis (e.g. ‘cuda’, ‘cpu’). If not provided, the best available device will be used automatically.genome (
str) – Genome to use for attribution analysis default is “hg38”. Can be genome name or path to custom genome fasta file.
- Returns:
Path to the attribution file.
Examples
>>> predict_save_attributions( ... output_prefix="output_prefix", ... tasks=[ ... "task1", ... "task2", ... ], ... off_tasks=[ ... "task3", ... "task4", ... ], ... model=0, ... metadata_anndata="metadata_anndata.h5ad", ... method="inputxgradient", ... transform="specificity", ... batch_size=1, ... genes=[ ... "gene1", ... "gene2", ... ], ... seqs="seqs.fasta", ... top_n_markers=10, ... bigwig=True, ... correct_grad_bigwig=True, ... num_workers=4, ... device="cuda", ... genome="hg38", ... )
- decima.interpret.attributions.recursive_seqlet_calling(output_prefix, attributions, tasks=None, off_tasks=None, tss_distance=None, metadata_anndata=None, genes=None, top_n_markers=None, num_workers=4, agg_func='mean', threshold=0.0005, min_seqlet_len=4, max_seqlet_len=25, additional_flanks=0, pattern_type='both', custom_genome=False, meme_motif_db='hocomoco_v13')[source]¶
Recursive seqlet calling for attribution analysis.
- Parameters:
output_prefix (
str) – Prefix for the output files where seqlet calling results will be saved.attributions (
Union[str,List[str]]) – Attributions to use for recursive seqlet calling generated by decima attributions-predict or decima attributions commands. Can be single file path or list of attribution files.tasks (
Optional[List[str]]) – Tasks to attribute for recursive seqlet calling either list of task names or query string to filter cell types to analyze attributions for (e.g. ‘cell_type == ‘classical monocyte’’). If not provided, all tasks will be analyzed.off_tasks (
Optional[List[str]]) – Off tasks to attribute for recursive seqlet calling either list of task names or query string to filter cell types to contrast against (e.g. ‘cell_type == ‘classical monocyte’’). If not provided, no contrast will be performed.tss_distance (
Optional[int]) – TSS distance for recursive seqlet calling default is full context size of decima of 524288. Controls the genomic window size around TSS for analysis.metadata_anndata (
Optional[str]) – Metadata anndata path or DecimaResult object. If not provided, the default metadata will be used from the attribution files.genes (
Optional[List[str]]) – Genes to attribute for recursive seqlet calling if not provided, all genes will be used. Can be list of gene symbols or IDs to focus analysis on specific genes.top_n_markers (
Optional[int]) – Top n markers for recursive seqlet calling if not provided, genes will be used. Useful for analyzing only the most important marker genes for the specified tasks.num_workers (
int) – Number of workers for recursive seqlet calling default is 4. Increasing number of workers will speed up the process but requires more memory.agg_func (
Optional[str]) – Aggregation function for recursive seqlet calling default is ‘mean’. Available options: ‘mean’, ‘max’. Determines how attribution scores are aggregated across replicates.threshold (
float) – P-value threshold for recursive seqlet calling default is 5e-4. Lower values result in more stringent peak calling and fewer detected seqlets.min_seqlet_len (
int) – Minimum seqlet length for recursive seqlet calling default is 4. Shorter sequences will be filtered out from the analysis.max_seqlet_len (
int) – Maximum seqlet length for recursive seqlet calling default is 25. Longer sequences will be truncated or filtered based on the algorithm.additional_flanks (
int) – Additional flanks for recursive seqlet calling default is 0. Extends seqlet regions by this number of base pairs on each side.pattern_type (
str) – Pattern type for recursive seqlet calling default is ‘both’. Available options: ‘both’, ‘pos’, ‘neg’. Controls whether to consider positive peaks, negative peaks, or both.custom_genome (
bool) – Custom genome flag for recursive seqlet calling default is False. If True, bigwig files will be generated with each gene as a different chromosome for custom sequences.meme_motif_db (
str) – MEME motif database for motif discovery default is ‘hocomoco_v13’. Specifies which motif database to use for downstream motif enrichment analysis.
Examples
>>> recursive_seqlet_calling( ... output_prefix="output_prefix", ... attributions="attributions.h5", ... tasks=[ ... "task1", ... "task2", ... ], ... off_tasks=[ ... "task3", ... "task4", ... ], ... )
decima.interpret.modisco module¶
Modisco module perform modisco motif clustering from attributions.
Examples
>>> predict_save_modisco_attributions(
... output_prefix="output_prefix",
... tasks=[
... "agg1",
... "agg2",
... ],
... off_tasks=[
... "agg3",
... "agg4",
... ],
... )
- decima.interpret.modisco.modisco(output_prefix, tasks=None, off_tasks=None, model='ensemble', tss_distance=1000, metadata_anndata=None, genes=None, top_n_markers=None, correct_grad=True, num_workers=4, genome='hg38', method='saliency', transform='specificity', batch_size=2, device=None, sliding_window_size=21, flank_size=10, min_metacluster_size=100, weak_threshold_for_counting_sign=0.8, max_seqlets_per_metacluster=20000, target_seqlet_fdr=0.2, min_passing_windows_frac=0.03, max_passing_windows_frac=0.2, n_leiden_runs=16, n_leiden_iterations=-1, min_overlap_while_sliding=0.7, nearest_neighbors_to_compute=500, affmat_correlation_threshold=0.15, tsne_perplexity=10.0, frac_support_to_trim_to=0.2, min_num_to_trim_to=30, trim_to_window_size=30, initial_flank_to_add=10, final_flank_to_add=0, prob_and_pertrack_sim_merge_thresholds=[(0.8, 0.8), (0.5, 0.85), (0.2, 0.9)], prob_and_pertrack_sim_dealbreaker_thresholds=[(0.4, 0.75), (0.2, 0.8), (0.1, 0.85), (0.0, 0.9)], subcluster_perplexity=50, merging_max_seqlets_subsample=300, final_min_cluster_size=20, min_ic_in_window=0.6, min_ic_windowsize=6, ppm_pseudocount=0.001, stranded=False, pattern_type='both', img_path_suffix='', meme_motif_db='hocomoco_v13', is_writing_tomtom_matrix=False, top_n_matches=3, trim_threshold=0.3, trim_min_length=3, tomtomlite=False, seqlet_motif_trim_threshold=0.2)[source]¶
Perform modisco motif clustering from attributions.
- Parameters:
output_prefix (
str) – Path prefix to save comprehensive modisco results where all output files will be written.tasks (
Optional[List[str]]) – Tasks to analyze for full modisco pipeline either list of task names or query string to filter cell types to analyze attributions for (e.g. ‘cell_type == ‘classical monocyte’’). If not provided, all tasks will be analyzed.off_tasks (
Optional[List[str]]) – Off tasks to analyze for full modisco pipeline either list of task names or query string to filter cell types to contrast against (e.g. ‘cell_type == ‘classical monocyte’’). If not provided, no contrast will be performed.model (
Union[str,int,None]) – Model to use for attribution analysis default is 0. Can be replicate number (0-3) or path to custom model.tss_distance (
int) – Distance from TSS to call seqlets default is 1000. Controls the genomic window size around TSS for seqlet detection. If set to full context size of decima (524288), analyzes the entire accessible region.metadata_anndata (
Optional[str]) – Path to metadata anndata file or DecimaResult object. If not provided, the default metadata will be downloaded from wandb.genes (
Optional[List[str]]) – List of genes to analyze for full modisco pipeline if not provided, all genes will be used. Can be list of gene symbols or IDs to focus analysis on specific genes.top_n_markers (
Optional[int]) – Top n markers to analyze for full modisco pipeline if not provided, all markers will be analyzed. Useful for focusing on the most important marker genes for the specified tasks.correct_grad (
bool) – Whether to correct gradient for attribution analysis default is True. Applies gradient correction for better attribution quality.num_workers (
int) – Number of workers for parallel processing default is 4. Increasing number of workers will speed up the process but requires more memory.genome (
str) – Genome reference to use default is “hg38”. Can be genome name or path to custom genome fasta file.method (
str) – Method to use for attribution analysis default is “saliency”. Available options: “saliency”, “inputxgradient”, “integratedgradients”. For MoDISco, “saliency” is often preferred for pattern discovery.transform (
Optional[str]) – Transform to use for attribution analysis default is “specificity”. Available options: “specificity”, “aggregate”. Specificity transform is recommended for MoDISco to highlight cell-type-specific patterns.batch_size (
int) – Batch size for attribution analysis default is 2. Increasing batch size may speed up computation but requires more memory.device (
Optional[str]) – Device to use for computation (e.g. ‘cuda’, ‘cpu’). If not provided, the best available device will be used automatically.sliding_window_size (
int) – Sliding window size.flank_size (
int) – Flank size.min_metacluster_size (
int) – Min metacluster size.weak_threshold_for_counting_sign (
float) – Weak threshold for counting sign.max_seqlets_per_metacluster (
int) – Max seqlets per metacluster.target_seqlet_fdr (
float) – Target seqlet FDR.min_passing_windows_frac (
float) – Min passing windows fraction.max_passing_windows_frac (
float) – Max passing windows fraction.n_leiden_runs (
int) – Number of Leiden runs.n_leiden_iterations (
int) – Number of Leiden iterations.min_overlap_while_sliding (
float) – Min overlap while sliding.nearest_neighbors_to_compute (
int) – Nearest neighbors to compute.affmat_correlation_threshold (
float) – Affmat correlation threshold.tsne_perplexity (
float) – TSNE perplexity.frac_support_to_trim_to (
float) – Frac support to trim to.min_num_to_trim_to (
int) – Min num to trim to.trim_to_window_size (
int) – Trim to window size.initial_flank_to_add (
int) – Initial flank to add.final_flank_to_add (
int) – Final flank to add.prob_and_pertrack_sim_merge_thresholds (
List[Tuple[float,float]]) – Prob and pertrack sim merge thresholds.prob_and_pertrack_sim_dealbreaker_thresholds (
List[Tuple[float,float]]) – Prob and pertrack sim dealbreaker thresholds.subcluster_perplexity (
int) – Subcluster perplexity.merging_max_seqlets_subsample (
int) – Merging max seqlets subsample.final_min_cluster_size (
int) – Final min cluster size.min_ic_in_window (
float) – Min IC in window.min_ic_windowsize (
int) – Min IC windowsize.ppm_pseudocount (
float) – PPM pseudocount.stranded (
bool) – Stranded.pattern_type (
str) – Pattern type.is_writing_tomtom_matrix (
bool) – Whether to write tomtom matrix.top_n_matches (
int) – Top n matches.trim_threshold (
float) – Trim threshold.trim_min_length (
int) – Trim min length.tomtomlite (
bool) – Whether to use tomtomlite.seqlet_motif_trim_threshold (
float) – Seqlet motif trim threshold.
- decima.interpret.modisco.modisco_patterns(output_prefix, attributions, tasks=None, off_tasks=None, tss_distance=10000, metadata_anndata=None, genes=None, top_n_markers=None, correct_grad=True, num_workers=4, sliding_window_size=20, flank_size=10, min_metacluster_size=100, weak_threshold_for_counting_sign=0.8, max_seqlets_per_metacluster=20000, target_seqlet_fdr=0.2, min_passing_windows_frac=0.03, max_passing_windows_frac=0.2, n_leiden_runs=16, n_leiden_iterations=-1, min_overlap_while_sliding=0.7, nearest_neighbors_to_compute=500, affmat_correlation_threshold=0.15, tsne_perplexity=10.0, frac_support_to_trim_to=0.2, min_num_to_trim_to=30, trim_to_window_size=30, initial_flank_to_add=10, final_flank_to_add=0, prob_and_pertrack_sim_merge_thresholds=[(0.8, 0.8), (0.5, 0.85), (0.2, 0.9)], prob_and_pertrack_sim_dealbreaker_thresholds=[(0.4, 0.75), (0.2, 0.8), (0.1, 0.85), (0.0, 0.9)], subcluster_perplexity=50, merging_max_seqlets_subsample=300, final_min_cluster_size=20, min_ic_in_window=0.6, min_ic_windowsize=6, ppm_pseudocount=0.001, stranded=False, pattern_type='both')[source]¶
Perform TF-MoDISco pattern discovery and motif clustering from attribution data.
This function runs the core TF-MoDISco algorithm to discover recurring patterns (motifs) in attribution data by clustering similar seqlets and identifying consensus motifs.
- Parameters:
output_prefix (
str) – Prefix for the output files where MoDISco results will be saved. Results will be saved as “{output_prefix}.modisco.h5”.attributions (
Union[str,List[str]]) – Path to attribution file(s) or list of attribution files containing computed attribution scores from previous analysis.tasks (
Optional[List[str]]) – Tasks to analyze either list of task names or query string to filter cell types to analyze attributions for (e.g. ‘cell_type == ‘classical monocyte’’). If not provided, all tasks will be analyzed.off_tasks (
Optional[List[str]]) – Off tasks to analyze either list of task names or query string to filter cell types to contrast against (e.g. ‘cell_type == ‘classical monocyte’’). If not provided, all tasks will be used as off tasks.tss_distance (
int) – Distance from TSS to analyze for pattern discovery default is 10000. Controls the genomic window size around TSS for seqlet detection and motif discovery.metadata_anndata (
Optional[str]) – Name of the model or path to metadata anndata file or DecimaResult object. If not provided, the compatible metadata of the saved attribution files will be used.genes (
Optional[List[str]]) – Genes to analyze for pattern discovery if not provided, all genes will be used. Can be list of gene symbols or IDs to focus analysis on specific genes.top_n_markers (
Optional[int]) – Top n markers to analyze for pattern discovery if not provided, all markers will be analyzed. Useful for focusing on the most important marker genes for the specified tasks.correct_grad (
bool) – Whether to correct gradient for attribution analysis default is True. Applies gradient correction for better attribution quality before pattern discovery.num_workers (
int) – Number of workers for parallel processing default is 4. Increasing number of workers will speed up the process but requires more memory.sliding_window_size (
int) – Sliding window size.flank_size (
int) – Flank size.min_metacluster_size (
int) – Min metacluster size.weak_threshold_for_counting_sign (
float) – Weak threshold for counting sign.max_seqlets_per_metacluster (
int) – Max seqlets per metacluster.target_seqlet_fdr (
float) – Target seqlet FDR.min_passing_windows_frac (
float) – Min passing windows fraction.max_passing_windows_frac (
float) – Max passing windows fraction.n_leiden_runs (
int) – Number of Leiden runs.n_leiden_iterations (
int) – Number of Leiden iterations.min_overlap_while_sliding (
float) – Min overlap while sliding.nearest_neighbors_to_compute (
int) – Nearest neighbors to compute.affmat_correlation_threshold (
float) – Affmat correlation threshold.tsne_perplexity (
float) – TSNE perplexity.frac_support_to_trim_to (
float) – Frac support to trim to.min_num_to_trim_to (
int) – Min num to trim to.trim_to_window_size (
int) – Trim to window size.initial_flank_to_add (
int) – Initial flank to add.final_flank_to_add (
int) – Final flank to add.prob_and_pertrack_sim_merge_thresholds (
List[Tuple[float,float]]) – Prob and pertrack sim merge thresholds.prob_and_pertrack_sim_dealbreaker_thresholds (
List[Tuple[float,float]]) – Prob and pertrack sim dealbreaker thresholds.subcluster_perplexity (
int) – Subcluster perplexity.merging_max_seqlets_subsample (
int) – Merging max seqlets subsample.final_min_cluster_size (
int) – Final min cluster size.min_ic_in_window (
float) – Min IC in window.min_ic_windowsize (
int) – Min IC windowsize.ppm_pseudocount (
float) – PPM pseudocount.stranded (
bool) – Stranded.pattern_type (
str) – Pattern type.
Examples
>>> modisco_patterns( ... output_prefix="output_prefix", ... attributions="attributions.h5", ... tasks=[ ... "agg1", ... "agg2", ... ], ... off_tasks=[ ... "agg3", ... "agg4", ... ], ... )
- decima.interpret.modisco.modisco_reports(output_prefix, modisco_h5, meme_motif_db='hocomoco_v13', img_path_suffix='', is_writing_tomtom_matrix=False, top_n_matches=3, trim_threshold=0.3, trim_min_length=3, tomtomlite=False, num_workers=4)[source]¶
Generate comprehensive HTML reports and motif comparisons from MoDISco results.
This function takes MoDISco pattern discovery results and generates detailed HTML reports including motif visualizations, database comparisons, and statistical summaries.
- Parameters:
output_prefix (
str) – Prefix for the output report files where results will be saved. A “_report” suffix will be added to create the output directory.modisco_h5 (
str) – Path to the MoDISco HDF5 file containing discovered patterns and motifs from previous MoDISco analysis.meme_motif_db (
Union[Path,str,None]) – MEME motif database for comparison default is “hocomoco_v13”. Database used for motif comparison and annotation. Can be database name or path to custom MEME format database.img_path_suffix (
Optional[str]) – Image path suffix for output plots default is “”. Optional suffix to add to image file paths for organizational purposes.is_writing_tomtom_matrix (
bool) – Whether to write TOMTOM comparison matrix default is False. If True, outputs detailed comparison matrix between discovered and database motifs for downstream analysis.top_n_matches (
int) – Top n matches to report default is 3. Number of top database matches to report for each discovered motif in the HTML output.trim_threshold (
float) – Trim threshold for motif boundaries default is 0.3. Threshold for determining where to trim motif boundaries based on information content when generating logos.trim_min_length (
int) – Minimum trim length default is 3. Minimum number of positions to retain when trimming motifs to ensure meaningful motif representations.tomtomlite (
bool) – Whether to use TOMTOM lite mode default is False. If True, uses a faster but less comprehensive version of TOMTOM for motif comparison.num_workers (
int) – Number of workers for parallel processing default is 4. Increasing number of workers will speed up report generation but requires more memory.
Examples
>>> modisco_reports( ... output_prefix="output_prefix", ... modisco_h5="modisco.h5", ... meme_motif_db="hocomoco_v13", ... img_path_suffix="", ... )
- decima.interpret.modisco.modisco_seqlet_bed(output_prefix, modisco_h5, metadata_anndata=None, trim_threshold=0.2)[source]¶
Extract seqlet locations from MoDISco results and save as BED format file.
This function processes MoDISco pattern discovery results to extract the genomic coordinates of discovered seqlets (sequence motifs) and outputs them in standard BED format for downstream analysis and visualization in genome browsers.
- Parameters:
output_prefix (
str) – Prefix for the output BED file where seqlet coordinates will be saved. The output will be saved as “{output_prefix}.seqlets.bed”.modisco_h5 (
str) – Path to the MoDISco HDF5 file containing discovered patterns and seqlet information from previous MoDISco analysis.metadata_anndata (
Optional[str]) – Path to metadata anndata file or DecimaResult object default is None. Required for mapping seqlet coordinates to genomic positions. If not provided, relative coordinates will be used.trim_threshold (
float) – Trim threshold for seqlet boundaries default is 0.2. Threshold for determining seqlet boundaries based on contribution scores - lower values result in longer seqlets.
Examples
>>> modisco_seqlet_bed( ... output_prefix="my_analysis", ... modisco_h5="my_analysis.modisco.h5", ... metadata_anndata="metadata.h5ad", ... trim_threshold=0.15, ... )
- decima.interpret.modisco.predict_save_modisco_attributions(output_prefix, tasks=None, off_tasks=None, model='ensemble', metadata_anndata=None, method='saliency', transform='specificity', batch_size=1, genes=None, top_n_markers=None, bigwig=True, correct_grad_bigwig=True, num_workers=4, device=None, genome='hg38')[source]¶
Generate and save attribution analysis results optimized for MoDISco motif discovery.
This function performs attribution analysis for specified genes and cell types, generating attribution scores that will be used downstream for MoDISco pattern discovery and motif analysis.
- Parameters:
output_prefix (
str) – Prefix for the output files where attribution results will be saved.tasks (
Optional[List[str]]) – Tasks to analyze for modisco attribution either list of task names or query string to filter cell types to analyze attributions for (e.g. ‘cell_type == ‘classical monocyte’’). If not provided, all tasks will be analyzed.off_tasks (
Optional[List[str]]) – Off tasks to analyze for modisco attribution either list of task names or query string to filter cell types to contrast against (e.g. ‘cell_type == ‘classical monocyte’’). If not provided, no contrast will be performed.model (
Union[str,int,None]) – Model to use for attribution analysis default is 0. Can be replicate number (0-3) or path to custom model.metadata_anndata (
Optional[str]) – Metadata anndata path or DecimaResult object. If not provided, the default metadata will be downloaded from wandb.method (
str) – Method to use for attribution analysis default is “saliency”. Available options: “saliency”, “inputxgradient”, “integratedgradients”. For MoDISco, “saliency” is often preferred for pattern discovery.transform (
str) – Transform to use for attribution analysis default is “specificity”. Available options: “specificity”, “aggregate”. Specificity transform is recommended for MoDISco to highlight cell-type-specific patterns.batch_size (
int) – Batch size for attribution analysis default is 1. Increasing batch size may speed up computation but requires more memory.genes (
Optional[List[str]]) – Genes to analyze for modisco attribution if not provided, all genes will be used. Can be list of gene symbols or IDs to focus analysis on specific genes.top_n_markers (
Optional[int]) – Top n markers for modisco attribution if not provided, all markers will be analyzed. Useful for focusing on the most important marker genes for the specified tasks.bigwig (
bool) – Whether to save attribution scores as a bigwig file default is True. Bigwig files can be loaded in genome browsers for visualization.correct_grad_bigwig (
bool) – Whether to correct the gradient bigwig file default is True. Applies gradient correction for better visualization quality.num_workers (
int) – Number of workers for attribution analysis default is 4. Increasing number of workers will speed up the process but requires more memory.device (
Optional[str]) – Device to use for attribution analysis (e.g. ‘cuda’, ‘cpu’). If not provided, the best available device will be used automatically.genome (
str) – Genome to use for attribution analysis default is “hg38”. Can be genome name or path to custom genome fasta file.
Examples: >>> predict_save_modisco_attributions( … output_dir=”output_dir”, … tasks=”cell_type == ‘classical monocyte’”, … )