Attribution and Motifs Detection with Decima¶
This documentation demonstrates how to use Decima’s attribution analysis capabilities to identify important regulatory regions in genomic sequences and discover transcription factor binding motifs within those regions. Attribution analysis helps reveal which parts of the DNA sequence most strongly influence gene expression predictions, while motif scanning can identify specific transcription factor binding sites in these regions of interest.
CLI API¶
Let’s look at a simple example using Decima’s CLI API to analyze the SPI1 and BRD3 genes. SPI1 is a key transcription factor in myeloid cell development. We’ll examine its regulation across different monocyte and macrophage cell types where it is known to be important.
! decima attributions --help
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
Usage: decima attributions [OPTIONS]
Generate and save attribution analysis results for a gene or a set of
sequences and perform seqlet calling on the attributions.
Output files:
├── {output_prefix}.attributions.h5 # Raw attribution score matrix
per gene.
├── {output_prefix}.attributions.bigwig # Genome browser track of
attribution as bigwig file.
├── {output_prefix}.seqlets.bed # List of attribution peaks in
BED format.
├── {output_prefix}.motifs.tsv # Detected motifs in peak
regions.
└── {output_prefix}.warnings.qc.log # QC warnings about prediction
reliability.
Examples:
>>> decima attributions -o output_prefix -g SPI1
>>> decima attributions -o output_prefix -g SPI1,CD68 --tasks "cell_type
== 'classical monocyte'" --device 0
>>> decima attributions -o output_prefix --seqs tests/data/seqs.fasta
--tasks "cell_type == 'classical monocyte'" --device 0
Options:
-o, --output-prefix TEXT Prefix path to the output files [required]
-g, --genes TEXT Comma-separated list of gene symbols or IDs
to analyze.
--seqs TEXT Path to a file containing sequences to
analyze
--tasks TEXT Query string to filter cell types to analyze
attributions for (e.g. 'cell_type ==
'classical monocyte'')
--off-tasks TEXT Optional query string to filter cell types
to contrast against.
--model TEXT Model to use for attribution analysis either
replicate number or path to the model.
[default: ensemble]
--metadata TEXT Path to the metadata anndata file or name of
the model. If not provided, the compabilite
metadata for the model will be used.
--method TEXT Method to use for attribution analysis.
--transform [specificity|aggregate]
Transform to use for attribution analysis.
--num-workers INTEGER Number of workers for attribution analysis.
--tss-distance INTEGER TSS distance for attribution analysis.
--batch-size INTEGER Batch size for attribution analysis.
--top-n-markers INTEGER Top n markers to predict. If not provided,
all markers will be predicted.
--threshold FLOAT Threshold for attribution analysis.
--min-seqlet-len INTEGER Minimum length for seqlet calling.
--max-seqlet-len INTEGER Maximum length for seqlet calling.
--additional-flanks INTEGER Additional flanks for seqlet calling.
--pattern-type [both|pos|neg] Type of pattern to call.
--meme-motif-db TEXT Path to the MEME motif database. [default:
hocomoco_v13]
--device TEXT Device to use for attribution analysis.
--genome TEXT Genome name or path to the genome fasta
file. [default: hg38]
--help Show this message and exit.
This decima command analyzes gene attributions: --genes "SPI1,BRD3" specifies focusing on SPI1 and BRD3; --tasks "cell_type == 'classical monocyte'" filters the analysis to classical monocytes only; and --output_prefix output_classical_monoctypes/ designates the output directory for the results. You can also pass --off-tasks that are cell types used as a contrast group when analyzing cell type specificity - they represent the cell types you want to compare against when determining. If you do not pass, --tasks argument all avaliable cells will be used for attribution calculation.
! decima attributions --model v1_rep0 --genes "SPI1,BRD3" --tasks "cell_type == 'classical monocyte'" --output-prefix example/output_classical_monoctypes
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
decima - INFO - Using device: 0
decima - INFO - Loading model v1_rep0 and metadata to compute attributions...
wandb: Currently logged in as: mhcelik (mhcw) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: Downloading large artifact 'rep0:latest', 720.03MB. 1 files...
wandb: 1 of 1 files downloaded.
Done. 00:00:01.6 (445.8MB/s)
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb: 1 of 1 files downloaded.
Done. 00:00:07.1 (437.5MB/s)
/home/celikm5/Projects/decima/src/decima/interpret/attributer.py:66: UserWarning: `off_tasks` is not provided. Using all other tasks as off_tasks.
Computing attributions...: 0%| | 0/2 [00:00<?, ?it/s]
Computing attributions...: 50%|█████████ | 1/2 [00:02<00:02, 2.23s/it]
decima - WARNING - Gene BRD3 has low correlation with the model. Pearson: 0.3440624267844621. Be careful with the predictions of the model for this gene. Check `DecimaResult.load().gene_metadata['pearson']` to see the correlation of the gene with the model.
Computing attributions...: 100%|██████████████████| 2/2 [00:02<00:00, 1.33s/it]
Computing attributions...: 100%|██████████████████| 2/2 [00:02<00:00, 1.49s/it]
decima - INFO - Loading model and metadata to compute attributions...
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb: 1 of 1 files downloaded.
Done. 00:00:02.0 (1528.2MB/s)
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb: 1 of 1 files downloaded.
Done. 00:00:02.0 (1574.5MB/s)
Computing recursive seqlet calling...: 0%| | 0/2 [00:00<?, ?it/s]
Computing recursive seqlet calling...: 100%|█████| 2/2 [00:00<00:00, 459.62it/s]
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/memelite/fimo.py:406: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/memelite/fimo.py:406: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
! ls example/output_classical_monoctypes*
example/output_classical_monoctypes_0.attributions.bigwig
example/output_classical_monoctypes_0.attributions.h5
example/output_classical_monoctypes_0.warnings.qc.log
example/output_classical_monoctypes_1.attributions.bigwig
example/output_classical_monoctypes_1.attributions.h5
example/output_classical_monoctypes_1.warnings.qc.log
example/output_classical_monoctypes.attributions.bigwig
example/output_classical_monoctypes.attributions.h5
example/output_classical_monoctypes.motifs.tsv
example/output_classical_monoctypes.seqlets.bed
example/output_classical_monoctypes.warnings.qc.log
example/output_classical_monoctypes_plots:
BRD3.peaks.png BRD3_seqlogos SPI1.peaks.png SPI1_seqlogos
import h5py
with h5py.File("example/output_classical_monoctypes.attributions.h5", "r") as f:
print(f["genes"][:])
print(f["sequence"][:].shape)
print(f["attribution"][:].shape)
[b'SPI1' b'BRD3']
(2, 524288)
(2, 4, 524288)
! head example/output_classical_monoctypes.seqlets.bed | column -t -s $'\t'
chr11 47152015 47152022 neg.SPI1@-29953 3.31894 . -0.41688821464776993
chr11 47160163 47160167 neg.SPI1@-21805 3.33346 . -0.2519867978990078
chr11 47160309 47160319 neg.SPI1@-21659 3.36046 . -0.7482765801250935
chr11 47160323 47160333 neg.SPI1@-21645 3.35597 . -0.9791450921911746
chr11 47165053 47165060 neg.SPI1@-16915 3.35769 . -0.46799773909151554
chr11 47165593 47165606 pos.SPI1@-16375 3.41662 . 1.8055671770125628
chr11 47165642 47165653 pos.SPI1@-16326 3.37915 . 1.059839816763997
chr11 47165653 47165664 neg.SPI1@-16315 3.64033 . -1.4299414344131947
chr11 47165664 47165670 neg.SPI1@-16304 3.36339 . -0.4015889251604676
chr11 47165690 47165703 pos.SPI1@-16278 3.3902 . 1.1793060060590506
! tail example/output_classical_monoctypes.motifs.tsv | column -t -s $'\t'
ZNF507.H13CORE.0.I.B pos.BRD3@1374 165221 165230 + 9.005010962486267 0.0004997253417968743 CTCCTTCCC 0.0001575700912831558 -0.0002066142760199578 1381
PPARA.H13CORE.1.P.B pos.BRD3@-61106 102728 102737 + 8.097402691841125 0.0004997253417968743 AAGAGGTGA 0.0009877644590435214 0.0027158458094883164 -61112
ZNF507.H13CORE.0.I.B neg.BRD3@1388 165221 165230 + 9.005010962486267 0.0004997253417968743 CTCCTTCCC 0.0001575700912831558 -0.0002066142760199578 1381
ARNT.H13CORE.0.P.B pos.BRD3@26580 190423 190432 + 8.84844446182251 0.0004997253417968755 GGACGTGTT 0.0001840576308798821 -0.00032131952384467405 26583
ZN394.H13CORE.0.P.C pos.BRD3@573 164409 164428 + 6.16656231880188 0.0004997442047169893 GCCGCCGGAGCCGCGAGGC 0.0016583528068670268 0.003810155552068253 569
ZNF30.H13CORE.0.P.C neg.BRD3@291 164117 164140 - 7.218540787696838 0.0004997881442250214 CGGGCGCCGAGCCCCGCCCCCGC -0.0007085893436021212 -0.001097117428680497 277
NR1H4.H13CORE.1.P.B pos.BRD3@194003 357832 357850 - 7.282587647438049 0.0004999181110179031 CCTTGGAGGCAGTGACTC 0.0006487710052169859 0.0014354905397428942 193992
CGGBP1.H13CORE.0.PSGIB.A neg.BRD3@614 164462 164473 - 9.182251572608948 0.0004999637603759763 GGGGCGGCGGG 4.89058068276129e-05 0.000644476072918215 622
KLF7.H13CORE.0.P.B neg.BRD3@-102394 61443 61453 - 15.217368483543396 CCCCGCCCCC -0.0013154596599633805 -0.0038895047854197937 -102397
KLF7.H13CORE.0.P.B neg.BRD3@291 164128 164138 - 15.217368483543396 CCCCGCCCCC -0.0009656054913648404 -0.0033124240223047752 288
QC file (qc.warnings.log) is a quality control log file that contains warnings about prediction reliability for genes. Specifically, it warns when a gene has low correlation with the model’s predictions (Pearson correlation < 0.7).
! head output_classical_monoctypes.warnings.qc.log
head: cannot open 'output_classical_monoctypes.warnings.qc.log' for reading: No such file or directory
CLI Subcommands¶
The Decima CLI supports running the attribution analysis pipeline step by step using dedicated subcommands. This modular approach allows you to execute each stage of the workflow independently, such as:
Generating model predictions for selected genes and cell types (
attributions-predict).Calling significant seqlets from the attributions (
attributions-recursive-seqlet-calling).Visualizing the results and motif logos (
attributions-plot). By chaining these subcommands, you can customize, debug, or parallelize each step of the analysis as needed.
This cell demonstrates how to run the Decima CLI to generate attributions for selected genes and cell types. The following command runs the attributions-predict subcommand for model 0 and 1, focusing on the genes SPI1 and BRD3 in cells where the cell_type is ‘classical monocyte’. The results are saved with the specified output prefix.
! decima attributions-predict --model v1_rep0 --genes "SPI1,BRD3" --tasks "cell_type == 'classical monocyte'" --output-prefix example/output_classical_monoctypes_0
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
decima - INFO - Using device: 0
decima - INFO - Loading model v1_rep0 and metadata to compute attributions...
wandb: Currently logged in as: mhcelik (mhcw) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: Downloading large artifact 'rep0:latest', 720.03MB. 1 files...
wandb: 1 of 1 files downloaded.
Done. 00:00:01.4 (524.8MB/s)
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb: 1 of 1 files downloaded.
Done. 00:00:02.0 (1580.3MB/s)
/home/celikm5/Projects/decima/src/decima/interpret/attributer.py:66: UserWarning: `off_tasks` is not provided. Using all other tasks as off_tasks.
Computing attributions...: 0%| | 0/2 [00:00<?, ?it/s]
Computing attributions...: 50%|█████████ | 1/2 [00:01<00:01, 1.33s/it]
decima - WARNING - Gene BRD3 has low correlation with the model. Pearson: 0.3440624267844621. Be careful with the predictions of the model for this gene. Check `DecimaResult.load().gene_metadata['pearson']` to see the correlation of the gene with the model.
Computing attributions...: 100%|██████████████████| 2/2 [00:02<00:00, 1.04it/s]
Computing attributions...: 100%|██████████████████| 2/2 [00:02<00:00, 1.04s/it]
! decima attributions-predict --model v1_rep1 --genes "SPI1,BRD3" --tasks "cell_type == 'classical monocyte'" --output-prefix example/output_classical_monoctypes_1
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
decima - INFO - Using device: 0
decima - INFO - Loading model v1_rep1 and metadata to compute attributions...
wandb: Currently logged in as: mhcelik (mhcw) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: Downloading large artifact 'rep1:latest', 720.03MB. 1 files...
wandb: 1 of 1 files downloaded.
Done. 00:00:00.9 (803.4MB/s)
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb: 1 of 1 files downloaded.
Done. 00:00:01.9 (1614.1MB/s)
/home/celikm5/Projects/decima/src/decima/interpret/attributer.py:66: UserWarning: `off_tasks` is not provided. Using all other tasks as off_tasks.
Computing attributions...: 0%| | 0/2 [00:00<?, ?it/s]
Computing attributions...: 50%|█████████ | 1/2 [00:01<00:01, 1.37s/it]
decima - WARNING - Gene BRD3 has low correlation with the model. Pearson: 0.3440624267844621. Be careful with the predictions of the model for this gene. Check `DecimaResult.load().gene_metadata['pearson']` to see the correlation of the gene with the model.
Computing attributions...: 100%|██████████████████| 2/2 [00:02<00:00, 1.01it/s]
Computing attributions...: 100%|██████████████████| 2/2 [00:02<00:00, 1.07s/it]
This cell runs the recursive seqlet calling step of the Decima attribution pipeline. It takes the attributions from two models (model 0 and model 1) for the genes SPI1 and BRD3in classical monocytes, and calls significant seqlets (regions with high attribution).
! decima attributions-recursive-seqlet-calling --attributions "example/output_classical_monoctypes_0.attributions.h5,example/output_classical_monoctypes_1.attributions.h5" --output-prefix example/output_classical_monoctypes
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
decima - INFO - Loading model and metadata to compute attributions...
wandb: Currently logged in as: mhcelik (mhcw) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb: 1 of 1 files downloaded.
Done. 00:00:02.0 (1584.0MB/s)
decima - INFO - No genes provided, using all 2 genes in the attribution files.
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb: 1 of 1 files downloaded.
Done. 00:00:01.8 (1691.6MB/s)
Computing recursive seqlet calling...: 0%| | 0/2 [00:00<?, ?it/s]
Computing recursive seqlet calling...: 100%|█████| 2/2 [00:00<00:00, 855.46it/s]
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/memelite/fimo.py:406: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/memelite/fimo.py:406: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
The following cell runs the Decima CLI to generate plots for the attributions and discovered seqlets. It uses the output prefix from previous steps and generates visualizations for the specified genes (SPI1, BRD3), highlighting motif locations within 500bp of the transcription start site (TSS).
! decima attributions-plot --output-prefix example/output_classical_monoctypes -g "SPI1,BRD3" --tss-distance 500
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
Plotting attributions...: 0%| | 0/2 [00:00<?, ?it/s]
wandb: Currently logged in as: mhcelik (mhcw) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb: 1 of 1 files downloaded.
Done. 00:00:01.9 (1666.6MB/s)
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/plotnine/ggplot.py:630: PlotnineWarning: Saving 10 x 2 in image.
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/plotnine/ggplot.py:631: PlotnineWarning: Filename: example/output_classical_monoctypes_plots/SPI1.peaks.png
Plotting attributions...: 50%|█████████▌ | 1/2 [00:12<00:12, 12.95s/it]
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb: 1 of 1 files downloaded.
Done. 00:00:01.9 (1648.9MB/s)
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/plotnine/ggplot.py:630: PlotnineWarning: Saving 10 x 2 in image.
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/plotnine/ggplot.py:631: PlotnineWarning: Filename: example/output_classical_monoctypes_plots/BRD3.peaks.png
Plotting attributions...: 100%|███████████████████| 2/2 [00:23<00:00, 11.76s/it]
Plotting attributions...: 100%|███████████████████| 2/2 [00:23<00:00, 11.94s/it]
from IPython.display import Image
Image("example/output_classical_monoctypes_plots/SPI1_seqlogos/SPI1@267.png")
Querying Cells¶
To obtain attributions, cells of interest must be selected using the query API. We support Pandas’ query API functionality on the cell metadata DataFrame. Here are examples of how to write queries:
! decima query-cell --help
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
Usage: decima query-cell [OPTIONS] [QUERY]
Query a cell using query string
Examples:
>>> decima query-cell 'cell_type == "classical monocyte"' ...
>>> decima query-cell 'cell_type == "classical monocyte" and disease ==
"healthy" and tissue == "blood"' ...
>>> decima query-cell 'cell_type.str.contains("monocyte") and disease ==
"healthy"' ...
Options:
--metadata TEXT Path to the metadata anndata file or name of the model.
Default: ensemble.
--help Show this message and exit.
Query cells of type “classical monocyte” using Pandas query syntax: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.query.html
! decima query-cell 'cell_type == "classical monocyte"' | column -t -s $'\t'
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
wandb: Currently logged in as: mhcelik (mhcw) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb: 1 of 1 files downloaded.
Done. 00:00:01.8 (1700.8MB/s)
cell_type tissue organ disease study dataset region subregion celltype_coarse n_cells total_counts n_genes size_factor train_pearson val_pearson test_pearson
agg_4705 classical monocyte alveolar system lung COVID-19 GSE155249 scimilarity nan nan 7244 26544273.0 15325 34749.092791034054 0.946616874183219 0.8437000068912937 0.8506571540216992
agg_4706 classical monocyte alveolar system lung healthy GSE155249 scimilarity nan nan 72 218105.0 9142 30484.31888978114 0.9102228263646758 0.8083487523192785 0.8047828694155461
agg_4707 classical monocyte ampulla of uterine tube fallopian tube healthy fc77d2ae-247d-44d7-aa24-3f4859254c2c scimilarity nan nan 78 550950.0 9639 30719.377971431015 0.9077670011915634 0.8045070167513724 0.7896845423359651
agg_4708 classical monocyte aorta vasculature Abdominal Aortic Aneurysm GSE166676 scimilarity nan nan 432 1091075.0 11192 32981.443348717905 0.9389265854768138 0.8357299205241656 0.830575965756882
agg_4709 classical monocyte aorta vasculature healthy GSE166676 scimilarity nan nan 25 162858.0 8859 31216.275954364824 0.8819013257206973 0.7821403055329706 0.7646999711802146
agg_4710 classical monocyte apex of heart heart healthy b52eb423-5d0d-4645-b217-e1c6d38b2e72 scimilarity nan nan 397 1226515.0 12369 32022.563851814968 0.9469178617442242 0.8326145310572417 0.8365506153530168
agg_4711 classical monocyte blood blood COVID-19 03f821b4-87be-4ff4-b65a-b5fc00061da7 scimilarity nan nan 17462 78882609.0 15711 33080.17541357136 0.9536210517623883 0.8539379752673611 0.8485800714004562
agg_4712 classical monocyte blood blood COVID-19 7d7cabfd-1d1f-40af-96b7-26a0825a306d scimilarity nan nan 141914 659177004.0 15175 32282.29230923367 0.9575085257562032 0.8570056960758388 0.8507770895392532
agg_4713 classical monocyte blood blood COVID-19 GSE154567 scimilarity nan nan 8613 40239000.0 16023 34450.628692057784 0.9605287026438375 0.8551842525202262 0.8491670381176235
agg_4714 classical monocyte blood blood COVID-19 GSE158034 scimilarity nan nan 35 91390.0 7372 27446.425385592618 0.8476138307738649 0.7606096001026369 0.7256993661246048
agg_4715 classical monocyte blood blood COVID-19 GSE161918 scimilarity nan nan 163244 1023475761.0 15929 31151.84947891148 0.9361102289470363 0.8261601916328626 0.8168421752771801
agg_4716 classical monocyte blood blood COVID-19 GSE163668 scimilarity nan nan 8399 55036800.0 15792 33644.10626885235 0.9571638472088531 0.8529462339847145 0.8506441761860545
agg_4717 classical monocyte blood blood COVID-19 GSE166992 scimilarity nan nan 2238 12283507.0 14186 32596.636302802952 0.9567710210039843 0.8531485173368566 0.8416173697367906
agg_4718 classical monocyte blood blood COVID-19 ddfad306-714d-4cc0-9985-d9072820c530 scimilarity nan nan 61002 230056884.0 16484 32520.14418628346 0.9487335053479237 0.8533686239486711 0.8454541123444707
agg_4719 classical monocyte blood blood COVID-19 eb735cc9-d0a7-48fa-b255-db726bf365af scimilarity nan nan 19777 105875381.0 15812 32330.088619084574 0.9558882745902155 0.8545238316898663 0.8468877639468763
agg_4720 classical monocyte blood blood HIV enteropathy GSE157829 scimilarity nan nan 491 1449812.0 12290 33110.90004135926 0.9412108394642186 0.8352699509238034 0.8345507070277177
agg_4721 classical monocyte blood blood Myelofibrosis GSE117824 scimilarity nan nan 357 1492491.0 11548 32726.985198452294 0.9446223529088382 0.8417521390049872 0.8328218073658378
agg_4722 classical monocyte blood blood NA GSE132950 scimilarity nan nan 146 784054.0 10913 30417.15641845661 0.9276395863920666 0.8264978172767997 0.8176327551177259
agg_4723 classical monocyte blood blood NA GSE135325 scimilarity nan nan 232 633533.0 11129 31159.105128910356 0.9369963391148282 0.8254811186623798 0.8207578599532835
agg_4724 classical monocyte blood blood NA GSE150233 scimilarity nan nan 1141 2453545.0 12228 32204.245569759012 0.9354773292749718 0.8333534679658088 0.8202743631285762
agg_4725 classical monocyte blood blood NA GSE151310 scimilarity nan nan 48 151358.0 8028 27001.118740317568 0.8873812787091045 0.7886356061906991 0.766461694552445
agg_4726 classical monocyte blood blood NA GSE164378 scimilarity nan nan 54305 476237982.0 17463 34023.11682209347 0.9636663701487779 0.856267291847072 0.8496477594095655
agg_4727 classical monocyte blood blood NA GSE164402 scimilarity nan nan 6577 33889420.0 14992 33855.14311643263 0.9502216042319906 0.846017695872854 0.8447747394204608
agg_4728 classical monocyte blood blood Sezary's disease GSE122703 scimilarity nan nan 35 148650.0 8487 29592.979037498706 0.8928094999389883 0.7911806688728295 0.7911936593448785
agg_4729 classical monocyte blood blood dengue disease GSE145307 scimilarity nan nan 785 7639702.0 13722 33610.52078618725 0.9561427618691068 0.8544883780028308 0.8514781068765508
agg_4730 classical monocyte blood blood dengue disease GSE154386 scimilarity nan nan 19173 143929741.0 16877 34242.50262506596 0.9586193824399223 0.8509705295166231 0.8546685528097621
agg_4731 classical monocyte blood blood drug hypersensitivity syndrome GSE132802 scimilarity nan nan 1269 7314697.0 13270 32574.34811388645 0.9570929839341253 0.8466339050741839 0.8442788242172967
agg_4732 classical monocyte blood blood fibrosis GSE136103 scimilarity nan nan 1774 5003888.0 13389 31155.271000486402 0.9562933985421416 0.8435982250231042 0.8386367834560556
agg_4733 classical monocyte blood blood healthy 03f821b4-87be-4ff4-b65a-b5fc00061da7 scimilarity nan nan 32464 109280914.0 16158 33646.02843110038 0.9568728712803031 0.8545324533535094 0.8487445540580735
agg_4734 classical monocyte blood blood healthy 436154da-bcf1-4130-9c8b-120ff9a888f2 scimilarity nan nan 76800 206490628.0 16683 30736.453324546856 0.955313650235467 0.8494127267867799 0.84210567312908
agg_4735 classical monocyte blood blood healthy 5d445965-6f1a-4b68-ba3a-b8f765155d3a scimilarity nan nan 1044 3638976.0 12306 30542.8977364245 0.9465384578921433 0.851991795617166 0.8308191715256182
agg_4736 classical monocyte blood blood healthy DS000010023 scimilarity nan nan 243 362606.0 8414 25201.261024165953 0.8865446516098515 0.7621391376670988 0.7681818769796727
agg_4737 classical monocyte blood blood healthy GSE122703 scimilarity nan nan 18 83417.0 7546 28194.725173315186 0.859225612396475 0.7640890699253299 0.7443681539461423
agg_4738 classical monocyte blood blood healthy GSE130117 scimilarity nan nan 2017 7130588.0 13535 33078.53692191542 0.9553673450160365 0.851402109626239 0.8385936100758409
agg_4739 classical monocyte blood blood healthy GSE132802 scimilarity nan nan 1601 9955248.0 13132 32063.630951743195 0.9478882791739611 0.8391025143866828 0.8303465877530952
agg_4740 classical monocyte blood blood healthy GSE139324 scimilarity nan nan 2333 8331045.0 13985 31135.881287246768 0.9608208780142045 0.8473885992448625 0.8432790193723467
agg_4741 classical monocyte blood blood healthy GSE145809 scimilarity nan nan 69 245221.0 8962 29135.67197629852 0.8825701041728526 0.7811799267734735 0.7818647625179129
agg_4742 classical monocyte blood blood healthy GSE149313 scimilarity nan nan 2420 6974751.0 13143 29560.496854576566 0.9574598613513423 0.8505290963248237 0.8379199735887167
agg_4743 classical monocyte blood blood healthy GSE153421 scimilarity nan nan 3691 15561725.0 14569 34377.465875728165 0.9636686704566925 0.8576434473562725 0.8511814190737197
agg_4744 classical monocyte blood blood healthy GSE156989 scimilarity nan nan 13554 160011485.0 16915 34135.439844737564 0.9640667421350761 0.8577967800377495 0.8517975138366085
agg_4745 classical monocyte blood blood healthy GSE157829 scimilarity nan nan 1619 6957811.0 13507 30199.39288988673 0.9484019976492215 0.8436979316400604 0.8347196616710685
agg_4746 classical monocyte blood blood healthy GSE159113 scimilarity nan nan 1025 6298250.0 12083 27477.50809897617 0.9078020151513733 0.8121457150205226 0.7980372877810575
agg_4747 classical monocyte blood blood healthy GSE161329 scimilarity nan nan 5654 25653579.0 14349 28848.0539929647 0.9549801428956252 0.8450430950674043 0.8406188789518544
agg_4748 classical monocyte blood blood healthy GSE161738 scimilarity nan nan 2676 13801473.0 12825 33337.477050230416 0.9541962906717452 0.8512846409758499 0.8485408028961247
agg_4749 classical monocyte blood blood healthy GSE163668 scimilarity nan nan 2644 10486314.0 14049 33786.96584264489 0.9597801578342394 0.8560775485935677 0.8512149509551471
agg_4750 classical monocyte blood blood healthy GSE166992 scimilarity nan nan 7501 28033216.0 15079 33455.367364577316 0.9622273594219685 0.8558958139235102 0.8495571689751152
agg_4751 classical monocyte blood blood healthy GSE167363 scimilarity nan nan 3135 14722635.0 14375 29977.24002819913 0.942417448875388 0.8368071803109702 0.8258536430202982
agg_4752 classical monocyte blood blood healthy GSE168710 scimilarity nan nan 16484 104881872.0 16223 34107.336261357574 0.9398282119039322 0.8424821834537695 0.8372971004604842
agg_4753 classical monocyte blood blood healthy GSE168732 scimilarity nan nan 770 2548822.0 12508 33411.30103713399 0.9552513581030765 0.8508279875038706 0.847461536110767
agg_4754 classical monocyte blood blood healthy b0cf0afa-ec40-4d65-b570-ed4ceacc6813 scimilarity nan nan 40975 300555227.0 15784 35938.85772500803 0.9622425892039956 0.853424173800979 0.8508714303589978
agg_4755 classical monocyte blood blood healthy ddfad306-714d-4cc0-9985-d9072820c530 scimilarity nan nan 8827 36073928.0 15131 33208.591584008376 0.9546118779961532 0.8543086616569785 0.8462739374830107
agg_4756 classical monocyte blood blood intracranial hypotension GSE138266 scimilarity nan nan 2503 9675804.0 14485 30160.767605621222 0.9452052724479383 0.8423537848756032 0.8326629487875993
agg_4757 classical monocyte blood blood mucocutaneous lymph node syndrome GSE168732 scimilarity nan nan 5745 25930751.0 14822 33366.18751424575 0.9564515409367231 0.8556431530577528 0.8540185868162636
agg_4758 classical monocyte blood blood multiple sclerosis GSE138266 scimilarity nan nan 3988 13926825.0 14991 31442.03464388843 0.9522779953120408 0.847799219646348 0.8382058078578654
agg_4759 classical monocyte blood blood non-alcoholic fatty liver disease GSE136103 scimilarity nan nan 8306 29424841.0 15410 32004.200489375227 0.9619190264492873 0.8478709124980346 0.8451242436776344
agg_4760 classical monocyte blood blood rheumatoid arthritis GSE159117 scimilarity nan nan 834 4637566.0 12079 31364.847230552205 0.9356058277176598 0.8232134520999813 0.8176333921128414
agg_4761 classical monocyte blood blood septic shock GSE167363 scimilarity nan nan 3860 51041813.0 15830 31688.79561595612 0.948652055959824 0.8541736693569211 0.8427375237296424
agg_4762 classical monocyte blood blood systemic lupus erythematosus 436154da-bcf1-4130-9c8b-120ff9a888f2 scimilarity nan nan 200468 516575809.0 16896 30011.373010792136 0.9562030923644677 0.8480520393236465 0.844052374052952
agg_4763 classical monocyte blood blood systemic lupus erythematosus GSE142016 scimilarity nan nan 8268 22146620.0 14873 30889.72528098081 0.9588937962174496 0.8480448150326806 0.8395981302528143
agg_4764 classical monocyte blood blood systemic lupus erythematosus GSE153765 scimilarity nan nan 42 109982.0 7500 27335.812367044335 0.8566470004710053 0.7665719714665945 0.7374607536624445
agg_4765 classical monocyte blood blood systemic lupus erythematosus GSE156989 scimilarity nan nan 30367 310637290.0 17082 33485.308563356346 0.9623402060903008 0.8532487075078466 0.8473526649094757
agg_4766 classical monocyte blood blood thrombocytopenia GSE149313 scimilarity nan nan 2724 15059814.0 14328 30599.80301260898 0.9543473550386421 0.8520722945129096 0.8417995728182829
agg_4767 classical monocyte bone bone Langerhans Cell Histiocytosis GSE133704 scimilarity nan nan 439 1404680.0 11388 30817.807833507268 0.9358504157466769 0.830348562008033 0.826566269904344
agg_4769 classical monocyte bone marrow bone marrow NA GSE162692 scimilarity nan nan 1234 4757721.0 13466 31707.380952189662 0.953852620063789 0.8503857428588029 0.8377674131460707
agg_4770 classical monocyte bone marrow bone marrow essential thrombocythemia GSE117824 scimilarity nan nan 1649 7825780.0 13487 32454.468003620656 0.9503614540027875 0.8479601234457582 0.838408163377457
agg_4772 classical monocyte bone marrow bone marrow healthy GSE132509 scimilarity nan nan 610 2315570.0 12950 31768.06513427212 0.95159369508558 0.8517118261701931 0.836658919433696
agg_4773 classical monocyte bone marrow bone marrow healthy GSE154109 scimilarity nan nan 531 1431388.0 11793 31377.450948003392 0.9490546933955852 0.8431566630120637 0.8370883160295727
agg_4774 classical monocyte bone marrow bone marrow healthy GSE163278 scimilarity nan nan 1119 3970394.0 13361 32081.93302956569 0.9620394897163868 0.8531148861215617 0.8426785397396367
agg_4775 classical monocyte bone marrow bone marrow healthy e5f58829-1a66-40b5-a624-9046778e74f5 scimilarity nan nan 151 8025584.0 12883 29444.78352863075 0.8440608178227607 0.740574910750325 0.7417328577454956
agg_4776 classical monocyte bone marrow bone marrow monoclonal gammopathy GSE163278 scimilarity nan nan 1010 3124102.0 12959 30344.72094719757 0.9581906874137958 0.8503948562041261 0.8391672246132192
agg_4777 classical monocyte breast breast healthy GSE164898 scimilarity nan nan 136 641471.0 12971 34463.52724138501 0.9163324788498406 0.8116274576633968 0.7978555908123931
agg_4778 classical monocyte breast breast healthy c9706a92-0e5f-46c1-96d8-20e42467f287 scimilarity nan nan 98 1444245.0 13491 30678.263421880285 0.9165520953567395 0.8162053142576849 0.7994301225229256
agg_4779 classical monocyte bronchus airway COVID-19 03f821b4-87be-4ff4-b65a-b5fc00061da7 scimilarity nan nan 104 270108.0 8816 27933.501359354133 0.8884582873892427 0.7928474093439662 0.786626157931558
agg_4780 classical monocyte bronchus airway COVID-19 GSE168215 scimilarity nan nan 90 217928.0 8444 27417.845704249117 0.880738805029126 0.7855823736726739 0.7798557380738542
agg_4782 classical monocyte bronchus airway healthy GSE158127 scimilarity nan nan 158 1158198.0 12643 34764.50196701077 0.9364512338084163 0.8259291909369686 0.8266638555276521
agg_4783 classical monocyte cardiac muscle of left ventricle heart healthy GSE156703 scimilarity nan nan 13 116181.0 9463 35695.66320276271 0.8542740960069863 0.7515621053395214 0.7561639038477878
agg_4784 classical monocyte carotid artery segment vasculature atherosclerosis GSE155512 scimilarity nan nan 58 515211.0 10839 32837.84505237503 0.9343565353773022 0.8190650931322585 0.8202426969221358
agg_4785 classical monocyte caudate lobe of liver liver healthy 44531dd9-1388-4416-a117-af0a99de2294 scimilarity nan nan 238 730016.0 11505 31342.386314731422 0.9217674983890346 0.8140551552218395 0.8040417787954989
agg_4786 classical monocyte cortex of kidney kidney healthy 120e86b4-1195-48c5-845b-b98054105eec scimilarity nan nan 79 323010.0 10939 32378.76683324232 0.9028856137251035 0.7978822439778066 0.7839454035009307
agg_4787 classical monocyte cortex of kidney kidney healthy a98b828a-622a-483a-80e0-15703678befd scimilarity nan nan 91 477355.0 10898 32358.068865763344 0.9328436291917394 0.8237810319569842 0.8195391931798526
agg_4789 classical monocyte digestive tract gut healthy DS000011665 scimilarity nan nan 347 1679116.0 12155 33347.55517047197 0.9422556928648441 0.8417267634096297 0.84018452536733
agg_4790 classical monocyte exocrine pancreas pancreas healthy e5f58829-1a66-40b5-a624-9046778e74f5 scimilarity nan nan 821 7824069.0 14847 36135.64587109593 0.9493709998172055 0.8440837716099078 0.8410246939313819
agg_4791 classical monocyte fallopian tube fallopian tube healthy fc77d2ae-247d-44d7-aa24-3f4859254c2c scimilarity nan nan 131 734093.0 11240 33115.640434504094 0.9359103734457376 0.8339026306142181 0.8225901799509813
agg_4792 classical monocyte fimbria of uterine tube fallopian tube healthy fc77d2ae-247d-44d7-aa24-3f4859254c2c scimilarity nan nan 34 209362.0 7663 27635.733860508328 0.8560416684135254 0.7382997749370328 0.7459235366949488
agg_4794 classical monocyte gingiva mouth periodontitis GSE152042 scimilarity nan nan 198 879477.0 11312 32107.813302914532 0.9416262876541264 0.8333723697695014 0.8279530215775117
agg_4795 classical monocyte head of femur bone healthy GSE169396 scimilarity nan nan 450 3669304.0 13216 33082.323604222154 0.9529417082022753 0.8522346343107771 0.8359032081996703
agg_4797 classical monocyte heart left ventricle heart NA ENCODE scimilarity nan nan 50 138407.30523254164 11428 41105.63651890687 0.8614790128015358 0.7843765107548858 0.7889308929671582
agg_4798 classical monocyte heart left ventricle heart healthy b52eb423-5d0d-4645-b217-e1c6d38b2e72 scimilarity nan nan 192 585001.0 11159 31422.874036870588 0.9363985001217598 0.8226438123601741 0.8283173244446851
agg_4799 classical monocyte heart right ventricle heart healthy b52eb423-5d0d-4645-b217-e1c6d38b2e72 scimilarity nan nan 316 936263.0 11904 32002.67227624691 0.9425900348990802 0.8306128730459813 0.8308116461021977
agg_4800 classical monocyte ileum gut Crohn's disease 17481d16-ee44-49e5-bcf0-28c0780d8c4a scimilarity nan nan 76 311515.0 9984 29687.611679190355 0.9103310804624617 0.8063080284939284 0.7916226068478351
agg_4801 classical monocyte ileum gut Crohn's disease DS000011665 scimilarity nan nan 119 298206.0 8021 26013.286557459236 0.880272438867354 0.7572099232100128 0.7459713937143965
agg_4802 classical monocyte inferior nasal concha bone chronic rhinosinusitis with nasal polyps GSE156285 scimilarity nan nan 241 1048981.0 12463 35082.083353928334 0.9475981193848912 0.8330375982148592 0.833114773003936
agg_4803 classical monocyte interventricular septum heart healthy b52eb423-5d0d-4645-b217-e1c6d38b2e72 scimilarity nan nan 442 1322725.0 12418 32235.5434681197 0.94751399102473 0.8340623939411483 0.8353858365852226
agg_4804 classical monocyte isthmus of fallopian tube fallopian tube healthy fc77d2ae-247d-44d7-aa24-3f4859254c2c scimilarity nan nan 62 318330.0 8642 29768.198512126502 0.8846027131590791 0.784220371073871 0.7739277341448829
agg_4805 classical monocyte kidney kidney NA GSE145927 scimilarity nan nan 1789 6957949.0 15145 34853.400215205314 0.9586386429540185 0.8579622487738222 0.8524123833700197
agg_4806 classical monocyte kidney kidney acute kidney failure bcb61471-2a44-4d00-a0af-ff085512674c scimilarity nan nan 587 1589224.0 12335 32471.78147434854 0.9513927341618255 0.84278469020191 0.8402011848101985
agg_4807 classical monocyte kidney kidney chronic kidney disease bcb61471-2a44-4d00-a0af-ff085512674c scimilarity nan nan 134 440788.0 10831 32410.84600407974 0.9323603662663799 0.8153954443953603 0.8190416636069936
agg_4808 classical monocyte kidney kidney healthy 120e86b4-1195-48c5-845b-b98054105eec scimilarity nan nan 762 4034828.0 14946 34015.00816823295 0.9520640694425091 0.848205266473767 0.836085027208869
agg_4809 classical monocyte kidney kidney healthy DS000010415 scimilarity nan nan 55 127079.0 8055 27206.419037355434 0.8216238135756493 0.7570847030479543 0.72524174726152
agg_4810 classical monocyte kidney kidney healthy GSE140989 scimilarity nan nan 174 563438.0 11016 29887.593299155575 0.914390252807459 0.8069762795735104 0.8086759072557722
agg_4811 classical monocyte left cardiac atrium heart NA ENCODE scimilarity nan nan 59 225070.96128814947 12831 43727.214042795575 0.8938217272345973 0.8048974803645641 0.8168195119621123
agg_4812 classical monocyte left cardiac atrium heart healthy b52eb423-5d0d-4645-b217-e1c6d38b2e72 scimilarity nan nan 450 1446734.0 12669 32433.20393285361 0.9467532324494885 0.8378571513042986 0.8360845068688169
agg_4813 classical monocyte left lung lung NA ENCODE scimilarity nan nan 16 40636.89964582212 7786 32627.503591080356 0.7956982592153874 0.7257891637293196 0.6945729831399207
agg_4814 classical monocyte liver liver Alagille syndrome GSE163650 scimilarity nan nan 92 188745.0 7678 24676.271768005652 0.8778490663838523 0.7727787177978541 0.7475297818459176
agg_4815 classical monocyte liver liver Biliary atresia GSE163650 scimilarity nan nan 367 1615410.0 11423 31280.031237098276 0.9308940943983487 0.8275363405345841 0.8098394949074608
agg_4816 classical monocyte liver liver fibrosis GSE136103 scimilarity nan nan 1053 5824229.0 14559 31950.801484792315 0.9632028296773183 0.8486875622591808 0.8450295232730788
agg_4817 classical monocyte liver liver healthy GSE136103 scimilarity nan nan 2036 7668787.0 14818 32446.71686472893 0.9614468191909097 0.8455504517344443 0.8464065028454753
agg_4818 classical monocyte liver liver healthy GSE159977 scimilarity nan nan 584 4703990.0 13644 34306.87421529999 0.9580119653448788 0.8463006395755477 0.8456137609888349
agg_4819 classical monocyte liver liver healthy GSE163650 scimilarity nan nan 440 4840312.0 12272 31180.407161439263 0.9312379663603724 0.8198899526213368 0.8071334502842269
agg_4820 classical monocyte liver liver non-alcoholic fatty liver disease GSE136103 scimilarity nan nan 675 3625081.0 13858 31875.772311414476 0.9607644852684971 0.8451153856892568 0.8434410362324897
agg_4821 classical monocyte liver liver non-alcoholic steatohepatitis GSE159977 scimilarity nan nan 818 5328417.0 13712 34244.79288663241 0.9625204736588413 0.8495046961098656 0.843851264103893
agg_4822 classical monocyte lower lobe of left lung lung NA ENCODE scimilarity nan nan 119 332235.9213328175 13992 45607.18635453728 0.9075609223976155 0.8257092626936027 0.8254709531577699
agg_4823 classical monocyte lower lobe of lung lung healthy GSE169471 scimilarity nan nan 305 1224338.0 11342 28255.985922767635 0.9404343350984603 0.8261785132237449 0.8150341534919611
agg_4824 classical monocyte lung lung COVID-19 GSE145926 scimilarity nan nan 6755 29326462.0 15670 32143.238185602037 0.9347325303698273 0.8382075353537076 0.8350131792189084
agg_4825 classical monocyte lung lung COVID-19 GSE149878 scimilarity nan nan 1388 17477477.0 15453 32118.37391645824 0.9547396360778304 0.8488321261303191 0.8334226404565429
agg_4826 classical monocyte lung lung COVID-19 covid scimilarity nan nan 87 182436.0 8979 30922.571470240666 0.8944321462094855 0.7906545756360502 0.7941388516891991
agg_4827 classical monocyte lung lung Idiopathic pulmonary arterial hypertension GSE169471 scimilarity nan nan 338 1099281.0 11441 29048.65706098467 0.9394268881459132 0.8205211245968264 0.8008915371926529
agg_4828 classical monocyte lung lung NA GSE122960 scimilarity nan nan 2035 4747594.0 13592 31170.75054081323 0.947029489389696 0.8239671657460403 0.8229589179837169
agg_4829 classical monocyte lung lung NA GSE150708 scimilarity nan nan 1711 18922764.0 15768 34651.58457127426 0.9197817700449655 0.8258478966731367 0.8332402246613281
agg_4830 classical monocyte lung lung NA GSE159354 scimilarity nan nan 804 1319717.0 12267 30466.67987009986 0.9289888078126158 0.8179900347657229 0.8046761734394422
agg_4831 classical monocyte lung lung chronic obstructive pulmonary disease DS000011735 scimilarity nan nan 1757 5736362.0 16385 37750.32029026267 0.8922650938609307 0.8174310943962452 0.7970204631208431
agg_4832 classical monocyte lung lung healthy 5d445965-6f1a-4b68-ba3a-b8f765155d3a scimilarity nan nan 1254 5397217.0 13298 31013.01576531177 0.9490555294982109 0.8457060773457411 0.8329330004620054
agg_4833 classical monocyte lung lung healthy DS000011735 scimilarity nan nan 4653 16523051.0 17066 37593.98867222708 0.8985708908575675 0.8260014412964971 0.8051199142820423
agg_4834 classical monocyte lung lung healthy GSE128033 scimilarity nan nan 1047 3646581.0 13185 29331.2035341724 0.9513045959002788 0.837770557527153 0.8238987539695043
agg_4835 classical monocyte lung lung healthy GSE128169 scimilarity nan nan 1732 11798577.0 15051 32941.755873862814 0.9636814596401634 0.8539834251349044 0.8457626672394015
agg_4836 classical monocyte lung lung healthy GSE132771 scimilarity nan nan 1601 4614408.0 13275 29761.03818692572 0.9531291409651284 0.8457644131717956 0.8310980214000753
agg_4837 classical monocyte lung lung healthy GSE169471 scimilarity nan nan 498 1613976.0 11886 28956.213107123967 0.9433400854993297 0.8271067022019086 0.815595680192867
agg_4838 classical monocyte lung lung hypersensitivity pneumonitis GSE122960 scimilarity nan nan 374 1513589.0 11850 30594.494180377842 0.9436379625667726 0.8236248274374875 0.8201281668004226
agg_4839 classical monocyte lung lung idiopathic pulmonary fibrosis DS000011735 scimilarity nan nan 3273 11098539.0 16692 36983.80245044498 0.9002060376489591 0.825628591643822 0.8057309447193657
agg_4840 classical monocyte lung lung idiopathic pulmonary fibrosis GSE122960 scimilarity nan nan 795 2302741.0 12763 31758.949309599942 0.9481965424789537 0.8315974088368188 0.8291678346888611
agg_4841 classical monocyte lung lung idiopathic pulmonary fibrosis GSE128033 scimilarity nan nan 264 892053.0 10997 28549.410927787198 0.9388857876621541 0.8259212982973368 0.8088642639807162
agg_4842 classical monocyte lung lung idiopathic pulmonary fibrosis GSE132771 scimilarity nan nan 562 1301612.0 12354 30446.385456748263 0.9469963680992495 0.8353213795556867 0.820474733820443
agg_4844 classical monocyte lung lung idiopathic pulmonary fibrosis GSE143706 scimilarity nan nan 28 77859.0 5999 21933.076720558005 0.8162248508692449 0.6872290766118224 0.6675933221673609
agg_4845 classical monocyte lung lung idiopathic pulmonary fibrosis GSE146981 scimilarity nan nan 28 77859.0 5999 21933.076720558005 0.8151661654118738 0.6848887765520302 0.667713152235204
agg_4846 classical monocyte lung lung idiopathic pulmonary fibrosis GSE159354 scimilarity nan nan 963 1825354.0 12518 29731.366588446697 0.9431677835768482 0.8335606700375219 0.8147956336542009
agg_4847 classical monocyte lung lung interstitial lung disease GSE122960 scimilarity nan nan 255 622277.0 10480 29149.028142322823 0.9283467350849584 0.8054016349869099 0.8007862785744999
agg_4848 classical monocyte lung lung interstitial lung disease GSE128169 scimilarity nan nan 697 1972432.0 12243 29254.839846468705 0.9423878335093786 0.8300974626358707 0.8196994274197087
agg_4849 classical monocyte lung lung scleroderma GSE128169 scimilarity nan nan 108 906362.0 11850 32908.696557354284 0.9438117150692724 0.8371885386238901 0.8225520820106291
agg_4850 classical monocyte lung lung scleroderma GSE132771 scimilarity nan nan 98 335776.0 9515 28212.056049440183 0.9149834322044056 0.8124364794406282 0.7889831369493125
agg_4851 classical monocyte lung lung systemic scleroderma;interstitial lung disease GSE159354 scimilarity nan nan 680 1244200.0 11364 27669.832360293723 0.9311681193836778 0.8218066897336025 0.801632654842017
agg_4852 classical monocyte lung parenchyma lung COVID-19 GSE158127 scimilarity nan nan 1028 2949423.0 13468 33486.58561312063 0.9573331305866764 0.8476148647425674 0.8402420746316328
agg_4853 classical monocyte lung parenchyma lung healthy GSE158127 scimilarity nan nan 791 2735456.0 13260 33646.87553319058 0.9544351950657847 0.8399981618864122 0.8327646420552953
agg_4854 classical monocyte lymph node lymph node Langerhans Cell Histiocytosis GSE133704 scimilarity nan nan 41 112531.0 7250 25404.282603262254 0.8424914182716917 0.7490029862443883 0.7315420690462492
agg_4855 classical monocyte mesenteric artery vasculature healthy GSE156341 scimilarity nan nan 49 408553.0 10083 30979.99239432764 0.9337200372314851 0.8169964046619824 0.8163109060481625
agg_4856 classical monocyte mesenteric artery vasculature type II diabetes mellitus GSE156341 scimilarity nan nan 107 869426.0 11343 33124.102127533 0.9473048055491107 0.8341783608252021 0.8308049817701406
agg_4857 classical monocyte mesenteric lymph node lymph node healthy 7681c7d7-0168-4892-a547-6f02a6430ace scimilarity nan nan 23 211416.0 9219 31018.02041830142 0.9058403141644794 0.7914556298280883 0.7867496166249129
agg_4858 classical monocyte muscle tissue muscle healthy e5f58829-1a66-40b5-a624-9046778e74f5 scimilarity nan nan 1800 50754027.0 16141 33323.26969602708 0.9316672745867468 0.8348295016474899 0.8219918807835366
agg_4859 classical monocyte nasal cavity airway COVID-19 03f821b4-87be-4ff4-b65a-b5fc00061da7 scimilarity nan nan 2907 12130592.0 15268 34541.15937505348 0.9398365208964624 0.8399826209015534 0.8406896100498221
agg_4860 classical monocyte nasal cavity airway healthy 03f821b4-87be-4ff4-b65a-b5fc00061da7 scimilarity nan nan 129 319231.0 10961 33948.695520725145 0.908031594986554 0.802885512146251 0.7998873890770253
agg_4861 classical monocyte nasopharynx airway nasopharyngeal neoplasm GSE150825 scimilarity nan nan 248 919710.0 11652 32725.826204558824 0.9483756316738621 0.8416477499125612 0.8434913191320811
agg_4862 classical monocyte nose airway chronic rhinosinusitis with nasal polyps GSE156285 scimilarity nan nan 89 407982.0 10658 32874.53627471163 0.9356986809715955 0.8282644612568757 0.8193450383260489
agg_4863 classical monocyte olfactory epithelium airway NA GSE139522 scimilarity nan nan 152 645745.0 11496 32760.770927681508 0.9344670047519139 0.8326335387583638 0.8215291009295241
agg_4864 classical monocyte omental fat pad peritoneum obesity GSE163830 scimilarity nan nan 248 603440.0 11376 33014.67391159742 0.9276969765366693 0.8135481953870003 0.8105166731324462
agg_4865 classical monocyte omentum peritoneum NA GSE151889 scimilarity nan nan 106 233037.0 9833 30451.216970905818 0.9023265636700606 0.794410906944691 0.7787417127899898
agg_4868 classical monocyte peritoneum peritoneum NA GSE130888 scimilarity nan nan 20547 75515682.0 16606 31611.601467579523 0.9640573553126023 0.8513803623213743 0.8467189186177884
agg_4869 classical monocyte peritoneum peritoneum healthy GSE130888 scimilarity nan nan 297 509237.0 11456 32213.169493243313 0.9218188146119148 0.8055575625334659 0.8039145027939639
agg_4870 classical monocyte prostate gland prostate healthy GSE145843 scimilarity nan nan 24 87555.0 5997 21432.99958526179 0.816337811891654 0.7246769991610698 0.6943045232082502
agg_4871 classical monocyte prostate gland prostate healthy e5f58829-1a66-40b5-a624-9046778e74f5 scimilarity nan nan 220 2445216.0 12460 33911.259081133416 0.9301914834403244 0.8259560877683964 0.824440772955729
agg_4872 classical monocyte renal medulla kidney healthy 120e86b4-1195-48c5-845b-b98054105eec scimilarity nan nan 21 101089.0 7354 26191.96315382653 0.8491245141516279 0.751292403843078 0.7310820608427115
agg_4873 classical monocyte respiratory airway airway COVID-19 29f92179-ca10-4309-a32b-d383d80347c1 scimilarity nan nan 24222 187246624.0 17810 38621.12130270373 0.911673853186318 0.8025805020768422 0.8054824859649656
agg_4874 classical monocyte respiratory tract epithelium airway NA GSE139522 scimilarity nan nan 69 371203.0 11152 33714.18718588416 0.9189481336748234 0.8044925205508522 0.8036843863420214
agg_4875 classical monocyte right cardiac atrium heart healthy b52eb423-5d0d-4645-b217-e1c6d38b2e72 scimilarity nan nan 311 977934.0 11891 31570.89048372305 0.9357294628028995 0.8260631626226992 0.8224053236502149
agg_4876 classical monocyte sigmoid colon gut ulcerative colitis DS000010618 scimilarity nan nan 56 157830.0 7772 25795.237651990203 0.8795503731261843 0.7579451181338609 0.7495331251552053
agg_4877 classical monocyte spleen spleen HIV infection GSE148796 scimilarity nan nan 48 118392.0 7120 25206.082214526155 0.8589535988735265 0.7723626781751426 0.7544543625246387
agg_4878 classical monocyte spleen spleen healthy 4d74781b-8186-4c9a-b659-ff4dc4601d91 scimilarity nan nan 2166 7905128.0 13952 30832.98149016589 0.957626764427953 0.8498277489734691 0.8368540073560422
agg_4879 classical monocyte spleen spleen healthy GSE148796 scimilarity nan nan 49 99684.0 6785 24492.260723886982 0.8477188947186437 0.7504685014947085 0.7295165069095441
agg_4880 classical monocyte spleen spleen healthy e5f58829-1a66-40b5-a624-9046778e74f5 scimilarity nan nan 3483 86078540.0 17775 36266.36759486 0.9100465322095738 0.8097019968114763 0.8081945289287479
agg_4881 classical monocyte subcutaneous adipose tissue adipose healthy e5f58829-1a66-40b5-a624-9046778e74f5 scimilarity nan nan 2019 27914261.0 15739 35142.992419516115 0.9421666034710219 0.8357406603138717 0.8249091266671545
agg_4882 classical monocyte synovial fluid synovial joint juvenile idiopathic arthritis GSE160097 scimilarity nan nan 68 430194.0 9771 30748.449183442222 0.9095023395695137 0.8036624580038914 0.8039477211085259
agg_4883 classical monocyte synovial fluid synovial joint psoriatic arthritis GSE161500 scimilarity nan nan 675 4107697.0 12980 33508.9642503863 0.9516464026609945 0.8477381221507095 0.8460202124356817
agg_4884 classical monocyte tertiary ovarian follicle ovary NA GSE146512 scimilarity nan nan 100 296748.0 10411 33084.00694991347 0.9139378500397654 0.8064064097295065 0.8023402394768804
agg_4885 classical monocyte testis testis NA GSE153819 scimilarity nan nan 17 97625.0 7958 29072.297180853668 0.855345538109863 0.764463114789657 0.7520891122089112
agg_4886 classical monocyte thoracic lymph node lymph node healthy 62ef75e4-cbea-454e-a0ce-998ec40223d3 scimilarity nan nan 20194 147502356.0 17156 34278.43204322054 0.9584542376154844 0.8568164088302802 0.8468420651789338
agg_4887 classical monocyte thymus thymus healthy 62ef75e4-cbea-454e-a0ce-998ec40223d3 scimilarity nan nan 487 2692133.0 12969 33549.45213075861 0.9497232200872254 0.8549259958290932 0.843613914494647
agg_4888 classical monocyte thymus thymus healthy 83ed3be8-4cb9-43e6-9aaa-3fbbf5d1bd3a scimilarity nan nan 27 80042.0 6527 23800.833698344715 0.8448423801602987 0.7347769941739797 0.7181319535525903
agg_4889 classical monocyte thymus thymus healthy de13e3e2-23b6-40ed-a413-e9e12d7d3910 scimilarity nan nan 52 298983.0 9582 30002.684399867492 0.9175531872094376 0.8185046437128072 0.8176606503789746
agg_4890 classical monocyte tonsil tonsil healthy GSE119506 scimilarity nan nan 321 1114339.0 11546 30516.473940957327 0.936749794981687 0.8307356996245999 0.8215841775914924
agg_4893 classical monocyte trachea airway healthy 03f821b4-87be-4ff4-b65a-b5fc00061da7 scimilarity nan nan 126 457580.0 11135 33734.327765812595 0.9299687996532033 0.8279876048002456 0.8230768581762299
agg_4894 classical monocyte trachea airway healthy e5f58829-1a66-40b5-a624-9046778e74f5 scimilarity nan nan 130 8245622.0 13663 32584.555010387805 0.859406194907081 0.7522520630806427 0.7550758397192211
agg_4895 classical monocyte transition zone of prostate prostate prostatic hypertrophy 4b54248f-2165-477c-a027-dd55082e8818 scimilarity nan nan 520 2949099.0 13618 29095.077520604846 0.9205242051339535 0.807648434373933 0.7862293702463532
agg_4896 classical monocyte transverse colon gut healthy 62ef75e4-cbea-454e-a0ce-998ec40223d3 scimilarity nan nan 503 2932999.0 12868 33265.22762476361 0.9476555210791636 0.8481166846094836 0.8392740144102075
agg_4897 classical monocyte tympanic membrane ear NA GSE128892 scimilarity nan nan 33 153723.0 7971 26217.848855873086 0.8515891731868822 0.7526405790618411 0.7328434865867175
agg_4899 classical monocyte upper lobe of lung lung healthy GSE169471 scimilarity nan nan 180 594059.0 10222 27881.011904781462 0.9248153838159024 0.8106326147324702 0.8003438801026445
agg_4900 classical monocyte urine urinary healthy GSE165396 scimilarity nan nan 20 109197.0 7299 25505.26771828214 0.8530560359297166 0.7505269381795711 0.7350625083451162
agg_4901 classical monocyte uterus uterus healthy 32f2fd23-ec74-486f-9544-e5b2f41725f5 scimilarity nan nan 18 189472.0 9397 30891.07905591756 0.8870309576225303 0.7796319609750195 0.7827462282311286
agg_4902 classical monocyte vasculature vasculature healthy e5f58829-1a66-40b5-a624-9046778e74f5 scimilarity nan nan 14537 224206261.0 17922 36651.90273209977 0.9438232475974557 0.8425086108952623 0.8343079266995497
agg_4903 classical monocyte visceral fat adipose obesity GSE128518 scimilarity nan nan 74 196657.0 9100 28836.082756573305 0.8890453654508667 0.7810993233599736 0.7788838086708725
Query cells that:
have “monocyte” in their cell type name (cell_type.str.contains(“monocyte”))
are from healthy donors (disease == “healthy”)
! decima query-cell 'cell_type.str.contains("monocyte") and disease == "healthy"' | column -t -s $'\t'
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
wandb: Currently logged in as: mhcelik (mhcw) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb: 1 of 1 files downloaded.
Done. 00:00:01.9 (1657.8MB/s)
cell_type tissue organ disease study dataset region subregion celltype_coarse n_cells total_counts n_genes size_factor train_pearson val_pearson test_pearson
agg_4706 classical monocyte alveolar system lung healthy GSE155249 scimilarity nan nan 72 218105.0 9142 30484.31888978114 0.9102228263646758 0.8083487523192785 0.8047828694155461
agg_4707 classical monocyte ampulla of uterine tube fallopian tube healthy fc77d2ae-247d-44d7-aa24-3f4859254c2c scimilarity nan nan 78 550950.0 9639 30719.377971431015 0.9077670011915634 0.8045070167513724 0.7896845423359651
agg_4709 classical monocyte aorta vasculature healthy GSE166676 scimilarity nan nan 25 162858.0 8859 31216.275954364824 0.8819013257206973 0.7821403055329706 0.7646999711802146
agg_4710 classical monocyte apex of heart heart healthy b52eb423-5d0d-4645-b217-e1c6d38b2e72 scimilarity nan nan 397 1226515.0 12369 32022.563851814968 0.9469178617442242 0.8326145310572417 0.8365506153530168
agg_4733 classical monocyte blood blood healthy 03f821b4-87be-4ff4-b65a-b5fc00061da7 scimilarity nan nan 32464 109280914.0 16158 33646.02843110038 0.9568728712803031 0.8545324533535094 0.8487445540580735
agg_4734 classical monocyte blood blood healthy 436154da-bcf1-4130-9c8b-120ff9a888f2 scimilarity nan nan 76800 206490628.0 16683 30736.453324546856 0.955313650235467 0.8494127267867799 0.84210567312908
agg_4735 classical monocyte blood blood healthy 5d445965-6f1a-4b68-ba3a-b8f765155d3a scimilarity nan nan 1044 3638976.0 12306 30542.8977364245 0.9465384578921433 0.851991795617166 0.8308191715256182
agg_4736 classical monocyte blood blood healthy DS000010023 scimilarity nan nan 243 362606.0 8414 25201.261024165953 0.8865446516098515 0.7621391376670988 0.7681818769796727
agg_4737 classical monocyte blood blood healthy GSE122703 scimilarity nan nan 18 83417.0 7546 28194.725173315186 0.859225612396475 0.7640890699253299 0.7443681539461423
agg_4738 classical monocyte blood blood healthy GSE130117 scimilarity nan nan 2017 7130588.0 13535 33078.53692191542 0.9553673450160365 0.851402109626239 0.8385936100758409
agg_4739 classical monocyte blood blood healthy GSE132802 scimilarity nan nan 1601 9955248.0 13132 32063.630951743195 0.9478882791739611 0.8391025143866828 0.8303465877530952
agg_4740 classical monocyte blood blood healthy GSE139324 scimilarity nan nan 2333 8331045.0 13985 31135.881287246768 0.9608208780142045 0.8473885992448625 0.8432790193723467
agg_4741 classical monocyte blood blood healthy GSE145809 scimilarity nan nan 69 245221.0 8962 29135.67197629852 0.8825701041728526 0.7811799267734735 0.7818647625179129
agg_4742 classical monocyte blood blood healthy GSE149313 scimilarity nan nan 2420 6974751.0 13143 29560.496854576566 0.9574598613513423 0.8505290963248237 0.8379199735887167
agg_4743 classical monocyte blood blood healthy GSE153421 scimilarity nan nan 3691 15561725.0 14569 34377.465875728165 0.9636686704566925 0.8576434473562725 0.8511814190737197
agg_4744 classical monocyte blood blood healthy GSE156989 scimilarity nan nan 13554 160011485.0 16915 34135.439844737564 0.9640667421350761 0.8577967800377495 0.8517975138366085
agg_4745 classical monocyte blood blood healthy GSE157829 scimilarity nan nan 1619 6957811.0 13507 30199.39288988673 0.9484019976492215 0.8436979316400604 0.8347196616710685
agg_4746 classical monocyte blood blood healthy GSE159113 scimilarity nan nan 1025 6298250.0 12083 27477.50809897617 0.9078020151513733 0.8121457150205226 0.7980372877810575
agg_4747 classical monocyte blood blood healthy GSE161329 scimilarity nan nan 5654 25653579.0 14349 28848.0539929647 0.9549801428956252 0.8450430950674043 0.8406188789518544
agg_4748 classical monocyte blood blood healthy GSE161738 scimilarity nan nan 2676 13801473.0 12825 33337.477050230416 0.9541962906717452 0.8512846409758499 0.8485408028961247
agg_4749 classical monocyte blood blood healthy GSE163668 scimilarity nan nan 2644 10486314.0 14049 33786.96584264489 0.9597801578342394 0.8560775485935677 0.8512149509551471
agg_4750 classical monocyte blood blood healthy GSE166992 scimilarity nan nan 7501 28033216.0 15079 33455.367364577316 0.9622273594219685 0.8558958139235102 0.8495571689751152
agg_4751 classical monocyte blood blood healthy GSE167363 scimilarity nan nan 3135 14722635.0 14375 29977.24002819913 0.942417448875388 0.8368071803109702 0.8258536430202982
agg_4752 classical monocyte blood blood healthy GSE168710 scimilarity nan nan 16484 104881872.0 16223 34107.336261357574 0.9398282119039322 0.8424821834537695 0.8372971004604842
agg_4753 classical monocyte blood blood healthy GSE168732 scimilarity nan nan 770 2548822.0 12508 33411.30103713399 0.9552513581030765 0.8508279875038706 0.847461536110767
agg_4754 classical monocyte blood blood healthy b0cf0afa-ec40-4d65-b570-ed4ceacc6813 scimilarity nan nan 40975 300555227.0 15784 35938.85772500803 0.9622425892039956 0.853424173800979 0.8508714303589978
agg_4755 classical monocyte blood blood healthy ddfad306-714d-4cc0-9985-d9072820c530 scimilarity nan nan 8827 36073928.0 15131 33208.591584008376 0.9546118779961532 0.8543086616569785 0.8462739374830107
agg_4772 classical monocyte bone marrow bone marrow healthy GSE132509 scimilarity nan nan 610 2315570.0 12950 31768.06513427212 0.95159369508558 0.8517118261701931 0.836658919433696
agg_4773 classical monocyte bone marrow bone marrow healthy GSE154109 scimilarity nan nan 531 1431388.0 11793 31377.450948003392 0.9490546933955852 0.8431566630120637 0.8370883160295727
agg_4774 classical monocyte bone marrow bone marrow healthy GSE163278 scimilarity nan nan 1119 3970394.0 13361 32081.93302956569 0.9620394897163868 0.8531148861215617 0.8426785397396367
agg_4775 classical monocyte bone marrow bone marrow healthy e5f58829-1a66-40b5-a624-9046778e74f5 scimilarity nan nan 151 8025584.0 12883 29444.78352863075 0.8440608178227607 0.740574910750325 0.7417328577454956
agg_4777 classical monocyte breast breast healthy GSE164898 scimilarity nan nan 136 641471.0 12971 34463.52724138501 0.9163324788498406 0.8116274576633968 0.7978555908123931
agg_4778 classical monocyte breast breast healthy c9706a92-0e5f-46c1-96d8-20e42467f287 scimilarity nan nan 98 1444245.0 13491 30678.263421880285 0.9165520953567395 0.8162053142576849 0.7994301225229256
agg_4782 classical monocyte bronchus airway healthy GSE158127 scimilarity nan nan 158 1158198.0 12643 34764.50196701077 0.9364512338084163 0.8259291909369686 0.8266638555276521
agg_4783 classical monocyte cardiac muscle of left ventricle heart healthy GSE156703 scimilarity nan nan 13 116181.0 9463 35695.66320276271 0.8542740960069863 0.7515621053395214 0.7561639038477878
agg_4785 classical monocyte caudate lobe of liver liver healthy 44531dd9-1388-4416-a117-af0a99de2294 scimilarity nan nan 238 730016.0 11505 31342.386314731422 0.9217674983890346 0.8140551552218395 0.8040417787954989
agg_4786 classical monocyte cortex of kidney kidney healthy 120e86b4-1195-48c5-845b-b98054105eec scimilarity nan nan 79 323010.0 10939 32378.76683324232 0.9028856137251035 0.7978822439778066 0.7839454035009307
agg_4787 classical monocyte cortex of kidney kidney healthy a98b828a-622a-483a-80e0-15703678befd scimilarity nan nan 91 477355.0 10898 32358.068865763344 0.9328436291917394 0.8237810319569842 0.8195391931798526
agg_4789 classical monocyte digestive tract gut healthy DS000011665 scimilarity nan nan 347 1679116.0 12155 33347.55517047197 0.9422556928648441 0.8417267634096297 0.84018452536733
agg_4790 classical monocyte exocrine pancreas pancreas healthy e5f58829-1a66-40b5-a624-9046778e74f5 scimilarity nan nan 821 7824069.0 14847 36135.64587109593 0.9493709998172055 0.8440837716099078 0.8410246939313819
agg_4791 classical monocyte fallopian tube fallopian tube healthy fc77d2ae-247d-44d7-aa24-3f4859254c2c scimilarity nan nan 131 734093.0 11240 33115.640434504094 0.9359103734457376 0.8339026306142181 0.8225901799509813
agg_4792 classical monocyte fimbria of uterine tube fallopian tube healthy fc77d2ae-247d-44d7-aa24-3f4859254c2c scimilarity nan nan 34 209362.0 7663 27635.733860508328 0.8560416684135254 0.7382997749370328 0.7459235366949488
agg_4795 classical monocyte head of femur bone healthy GSE169396 scimilarity nan nan 450 3669304.0 13216 33082.323604222154 0.9529417082022753 0.8522346343107771 0.8359032081996703
agg_4798 classical monocyte heart left ventricle heart healthy b52eb423-5d0d-4645-b217-e1c6d38b2e72 scimilarity nan nan 192 585001.0 11159 31422.874036870588 0.9363985001217598 0.8226438123601741 0.8283173244446851
agg_4799 classical monocyte heart right ventricle heart healthy b52eb423-5d0d-4645-b217-e1c6d38b2e72 scimilarity nan nan 316 936263.0 11904 32002.67227624691 0.9425900348990802 0.8306128730459813 0.8308116461021977
agg_4803 classical monocyte interventricular septum heart healthy b52eb423-5d0d-4645-b217-e1c6d38b2e72 scimilarity nan nan 442 1322725.0 12418 32235.5434681197 0.94751399102473 0.8340623939411483 0.8353858365852226
agg_4804 classical monocyte isthmus of fallopian tube fallopian tube healthy fc77d2ae-247d-44d7-aa24-3f4859254c2c scimilarity nan nan 62 318330.0 8642 29768.198512126502 0.8846027131590791 0.784220371073871 0.7739277341448829
agg_4808 classical monocyte kidney kidney healthy 120e86b4-1195-48c5-845b-b98054105eec scimilarity nan nan 762 4034828.0 14946 34015.00816823295 0.9520640694425091 0.848205266473767 0.836085027208869
agg_4809 classical monocyte kidney kidney healthy DS000010415 scimilarity nan nan 55 127079.0 8055 27206.419037355434 0.8216238135756493 0.7570847030479543 0.72524174726152
agg_4810 classical monocyte kidney kidney healthy GSE140989 scimilarity nan nan 174 563438.0 11016 29887.593299155575 0.914390252807459 0.8069762795735104 0.8086759072557722
agg_4812 classical monocyte left cardiac atrium heart healthy b52eb423-5d0d-4645-b217-e1c6d38b2e72 scimilarity nan nan 450 1446734.0 12669 32433.20393285361 0.9467532324494885 0.8378571513042986 0.8360845068688169
agg_4817 classical monocyte liver liver healthy GSE136103 scimilarity nan nan 2036 7668787.0 14818 32446.71686472893 0.9614468191909097 0.8455504517344443 0.8464065028454753
agg_4818 classical monocyte liver liver healthy GSE159977 scimilarity nan nan 584 4703990.0 13644 34306.87421529999 0.9580119653448788 0.8463006395755477 0.8456137609888349
agg_4819 classical monocyte liver liver healthy GSE163650 scimilarity nan nan 440 4840312.0 12272 31180.407161439263 0.9312379663603724 0.8198899526213368 0.8071334502842269
agg_4823 classical monocyte lower lobe of lung lung healthy GSE169471 scimilarity nan nan 305 1224338.0 11342 28255.985922767635 0.9404343350984603 0.8261785132237449 0.8150341534919611
agg_4832 classical monocyte lung lung healthy 5d445965-6f1a-4b68-ba3a-b8f765155d3a scimilarity nan nan 1254 5397217.0 13298 31013.01576531177 0.9490555294982109 0.8457060773457411 0.8329330004620054
agg_4833 classical monocyte lung lung healthy DS000011735 scimilarity nan nan 4653 16523051.0 17066 37593.98867222708 0.8985708908575675 0.8260014412964971 0.8051199142820423
agg_4834 classical monocyte lung lung healthy GSE128033 scimilarity nan nan 1047 3646581.0 13185 29331.2035341724 0.9513045959002788 0.837770557527153 0.8238987539695043
agg_4835 classical monocyte lung lung healthy GSE128169 scimilarity nan nan 1732 11798577.0 15051 32941.755873862814 0.9636814596401634 0.8539834251349044 0.8457626672394015
agg_4836 classical monocyte lung lung healthy GSE132771 scimilarity nan nan 1601 4614408.0 13275 29761.03818692572 0.9531291409651284 0.8457644131717956 0.8310980214000753
agg_4837 classical monocyte lung lung healthy GSE169471 scimilarity nan nan 498 1613976.0 11886 28956.213107123967 0.9433400854993297 0.8271067022019086 0.815595680192867
agg_4853 classical monocyte lung parenchyma lung healthy GSE158127 scimilarity nan nan 791 2735456.0 13260 33646.87553319058 0.9544351950657847 0.8399981618864122 0.8327646420552953
agg_4855 classical monocyte mesenteric artery vasculature healthy GSE156341 scimilarity nan nan 49 408553.0 10083 30979.99239432764 0.9337200372314851 0.8169964046619824 0.8163109060481625
agg_4857 classical monocyte mesenteric lymph node lymph node healthy 7681c7d7-0168-4892-a547-6f02a6430ace scimilarity nan nan 23 211416.0 9219 31018.02041830142 0.9058403141644794 0.7914556298280883 0.7867496166249129
agg_4858 classical monocyte muscle tissue muscle healthy e5f58829-1a66-40b5-a624-9046778e74f5 scimilarity nan nan 1800 50754027.0 16141 33323.26969602708 0.9316672745867468 0.8348295016474899 0.8219918807835366
agg_4860 classical monocyte nasal cavity airway healthy 03f821b4-87be-4ff4-b65a-b5fc00061da7 scimilarity nan nan 129 319231.0 10961 33948.695520725145 0.908031594986554 0.802885512146251 0.7998873890770253
agg_4869 classical monocyte peritoneum peritoneum healthy GSE130888 scimilarity nan nan 297 509237.0 11456 32213.169493243313 0.9218188146119148 0.8055575625334659 0.8039145027939639
agg_4870 classical monocyte prostate gland prostate healthy GSE145843 scimilarity nan nan 24 87555.0 5997 21432.99958526179 0.816337811891654 0.7246769991610698 0.6943045232082502
agg_4871 classical monocyte prostate gland prostate healthy e5f58829-1a66-40b5-a624-9046778e74f5 scimilarity nan nan 220 2445216.0 12460 33911.259081133416 0.9301914834403244 0.8259560877683964 0.824440772955729
agg_4872 classical monocyte renal medulla kidney healthy 120e86b4-1195-48c5-845b-b98054105eec scimilarity nan nan 21 101089.0 7354 26191.96315382653 0.8491245141516279 0.751292403843078 0.7310820608427115
agg_4875 classical monocyte right cardiac atrium heart healthy b52eb423-5d0d-4645-b217-e1c6d38b2e72 scimilarity nan nan 311 977934.0 11891 31570.89048372305 0.9357294628028995 0.8260631626226992 0.8224053236502149
agg_4878 classical monocyte spleen spleen healthy 4d74781b-8186-4c9a-b659-ff4dc4601d91 scimilarity nan nan 2166 7905128.0 13952 30832.98149016589 0.957626764427953 0.8498277489734691 0.8368540073560422
agg_4879 classical monocyte spleen spleen healthy GSE148796 scimilarity nan nan 49 99684.0 6785 24492.260723886982 0.8477188947186437 0.7504685014947085 0.7295165069095441
agg_4880 classical monocyte spleen spleen healthy e5f58829-1a66-40b5-a624-9046778e74f5 scimilarity nan nan 3483 86078540.0 17775 36266.36759486 0.9100465322095738 0.8097019968114763 0.8081945289287479
agg_4881 classical monocyte subcutaneous adipose tissue adipose healthy e5f58829-1a66-40b5-a624-9046778e74f5 scimilarity nan nan 2019 27914261.0 15739 35142.992419516115 0.9421666034710219 0.8357406603138717 0.8249091266671545
agg_4886 classical monocyte thoracic lymph node lymph node healthy 62ef75e4-cbea-454e-a0ce-998ec40223d3 scimilarity nan nan 20194 147502356.0 17156 34278.43204322054 0.9584542376154844 0.8568164088302802 0.8468420651789338
agg_4887 classical monocyte thymus thymus healthy 62ef75e4-cbea-454e-a0ce-998ec40223d3 scimilarity nan nan 487 2692133.0 12969 33549.45213075861 0.9497232200872254 0.8549259958290932 0.843613914494647
agg_4888 classical monocyte thymus thymus healthy 83ed3be8-4cb9-43e6-9aaa-3fbbf5d1bd3a scimilarity nan nan 27 80042.0 6527 23800.833698344715 0.8448423801602987 0.7347769941739797 0.7181319535525903
agg_4889 classical monocyte thymus thymus healthy de13e3e2-23b6-40ed-a413-e9e12d7d3910 scimilarity nan nan 52 298983.0 9582 30002.684399867492 0.9175531872094376 0.8185046437128072 0.8176606503789746
agg_4890 classical monocyte tonsil tonsil healthy GSE119506 scimilarity nan nan 321 1114339.0 11546 30516.473940957327 0.936749794981687 0.8307356996245999 0.8215841775914924
agg_4893 classical monocyte trachea airway healthy 03f821b4-87be-4ff4-b65a-b5fc00061da7 scimilarity nan nan 126 457580.0 11135 33734.327765812595 0.9299687996532033 0.8279876048002456 0.8230768581762299
agg_4894 classical monocyte trachea airway healthy e5f58829-1a66-40b5-a624-9046778e74f5 scimilarity nan nan 130 8245622.0 13663 32584.555010387805 0.859406194907081 0.7522520630806427 0.7550758397192211
agg_4896 classical monocyte transverse colon gut healthy 62ef75e4-cbea-454e-a0ce-998ec40223d3 scimilarity nan nan 503 2932999.0 12868 33265.22762476361 0.9476555210791636 0.8481166846094836 0.8392740144102075
agg_4899 classical monocyte upper lobe of lung lung healthy GSE169471 scimilarity nan nan 180 594059.0 10222 27881.011904781462 0.9248153838159024 0.8106326147324702 0.8003438801026445
agg_4900 classical monocyte urine urinary healthy GSE165396 scimilarity nan nan 20 109197.0 7299 25505.26771828214 0.8530560359297166 0.7505269381795711 0.7350625083451162
agg_4901 classical monocyte uterus uterus healthy 32f2fd23-ec74-486f-9544-e5b2f41725f5 scimilarity nan nan 18 189472.0 9397 30891.07905591756 0.8870309576225303 0.7796319609750195 0.7827462282311286
agg_4902 classical monocyte vasculature vasculature healthy e5f58829-1a66-40b5-a624-9046778e74f5 scimilarity nan nan 14537 224206261.0 17922 36651.90273209977 0.9438232475974557 0.8425086108952623 0.8343079266995497
agg_6287 intermediate monocyte head of femur bone healthy GSE169396 scimilarity nan nan 102 191075.0 7853 26179.035956297153 0.8330771518439726 0.7503209876663273 0.7113081302663875
agg_6289 intermediate monocyte lung lung healthy 5d445965-6f1a-4b68-ba3a-b8f765155d3a scimilarity nan nan 178 1172582.0 11314 30515.569680815937 0.9409051040470379 0.8435279441582394 0.8229749449946738
agg_6290 intermediate monocyte spleen spleen healthy e5f58829-1a66-40b5-a624-9046778e74f5 scimilarity nan nan 60 252220.0 8236 26163.955486425446 0.836498746238646 0.7439980946911184 0.7168040239671246
agg_6291 intermediate monocyte thymus thymus healthy GSE159745 scimilarity nan nan 29 82115.0 5987 22540.234815420707 0.7817665678679913 0.7043553158094606 0.6760717657015229
agg_6292 intermediate monocyte vasculature vasculature healthy e5f58829-1a66-40b5-a624-9046778e74f5 scimilarity nan nan 162 1525515.0 11980 32186.1781582846 0.9269571292216415 0.8224395619954739 0.8055668264019443
agg_7919 non-classical monocyte apex of heart heart healthy b52eb423-5d0d-4645-b217-e1c6d38b2e72 scimilarity nan nan 158 675731.0 11222 30876.899222310585 0.9301440448035162 0.8258961291954683 0.824159827752181
agg_7939 non-classical monocyte blood blood healthy 03f821b4-87be-4ff4-b65a-b5fc00061da7 scimilarity nan nan 7359 36418526.0 14851 33258.733090217436 0.9519276590597959 0.8501060197631037 0.850960653386464
agg_7940 non-classical monocyte blood blood healthy 436154da-bcf1-4130-9c8b-120ff9a888f2 scimilarity nan nan 14619 54479703.0 15493 30191.499157063707 0.9490211472741342 0.8448211022633122 0.8418061408505518
agg_7941 non-classical monocyte blood blood healthy 5d445965-6f1a-4b68-ba3a-b8f765155d3a scimilarity nan nan 195 1239240.0 11002 30223.31815928311 0.9379359624101378 0.8397815574875395 0.8299059656859994
agg_7942 non-classical monocyte blood blood healthy GSE130117 scimilarity nan nan 322 1200290.0 11182 31409.27333671809 0.9418345510352945 0.8405863822794143 0.8304108431330093
agg_7943 non-classical monocyte blood blood healthy GSE132802 scimilarity nan nan 102 726593.0 10232 30435.311954214158 0.9262718659348109 0.8186907385315048 0.8147462064075649
agg_7944 non-classical monocyte blood blood healthy GSE134004 scimilarity nan nan 21 123360.0 6990 24135.62187154626 0.8660944307658819 0.7608357820845066 0.7535994530765779
agg_7945 non-classical monocyte blood blood healthy GSE139324 scimilarity nan nan 435 2395489.0 12428 30987.28210955269 0.9487843521961865 0.8347185640893492 0.8386196076774184
agg_7946 non-classical monocyte blood blood healthy GSE149313 scimilarity nan nan 567 2445950.0 11737 29144.914765151065 0.9480753865144765 0.8398428245307203 0.8354975068540325
agg_7947 non-classical monocyte blood blood healthy GSE153421 scimilarity nan nan 441 2114891.0 11966 32647.54233438482 0.9524227157990666 0.8453840087915955 0.8445318485082767
agg_7948 non-classical monocyte blood blood healthy GSE156989 scimilarity nan nan 3151 40420221.0 15662 32233.769948686153 0.9558775883405131 0.8527879494870606 0.8461332941066178
agg_7949 non-classical monocyte blood blood healthy GSE157829 scimilarity nan nan 144 890675.0 10657 29172.377435644317 0.9302786443476869 0.8283014100006509 0.8220722486967346
agg_7950 non-classical monocyte blood blood healthy GSE161329 scimilarity nan nan 1118 7175719.0 12865 29244.857112487658 0.9476682128749678 0.8393859845018986 0.8392730294908749
agg_7951 non-classical monocyte blood blood healthy GSE161738 scimilarity nan nan 1497 12143757.0 12362 32632.110821778042 0.9476207444575895 0.8463853459866758 0.8490136875251815
agg_7952 non-classical monocyte blood blood healthy GSE163668 scimilarity nan nan 323 1716760.0 11769 32812.39505818172 0.9472316303194352 0.8438698542111769 0.8424970169398833
agg_7953 non-classical monocyte blood blood healthy GSE166992 scimilarity nan nan 1613 7143035.0 13383 32605.389353410996 0.9532209129181096 0.8486599938147087 0.8451557766892072
agg_7954 non-classical monocyte blood blood healthy GSE167363 scimilarity nan nan 458 3094035.0 12228 29678.03739631962 0.9435654504659038 0.8379948419319221 0.8241961683725367
agg_7955 non-classical monocyte blood blood healthy GSE168710 scimilarity nan nan 75 701113.0 10776 32559.49721301115 0.9233871475920297 0.8250528447177075 0.8185333840139741
agg_7956 non-classical monocyte blood blood healthy GSE168732 scimilarity nan nan 229 1242404.0 11269 31965.299177754878 0.9416742339845935 0.8377482237588458 0.8415500681607002
agg_7957 non-classical monocyte blood blood healthy b0cf0afa-ec40-4d65-b570-ed4ceacc6813 scimilarity nan nan 5897 43180935.0 14970 35595.18649267558 0.9543288699936718 0.8496535044071925 0.8536782531653531
agg_7970 non-classical monocyte bone marrow bone marrow healthy GSE132509 scimilarity nan nan 28 86280.0 7085 25870.53624683356 0.8392355979135352 0.755387271124993 0.7303317003908475
agg_7971 non-classical monocyte bone marrow bone marrow healthy GSE154109 scimilarity nan nan 50 289907.0 9147 28526.85661530531 0.9070952567396775 0.808361820498357 0.7964598438037417
agg_7972 non-classical monocyte bone marrow bone marrow healthy GSE163278 scimilarity nan nan 127 682328.0 10897 31202.529821716787 0.9358619814006104 0.8294675777784289 0.8230468472234642
agg_7974 non-classical monocyte breast breast healthy GSE164898 scimilarity nan nan 54 120275.0 7652 26423.010551181265 0.8534343500588628 0.755424451856796 0.7230125883881353
agg_7976 non-classical monocyte cortex of kidney kidney healthy 120e86b4-1195-48c5-845b-b98054105eec scimilarity nan nan 63 401864.0 11464 32990.96311584531 0.9014527631813947 0.7931942866954178 0.7865930763896118
agg_7977 non-classical monocyte cortex of kidney kidney healthy a98b828a-622a-483a-80e0-15703678befd scimilarity nan nan 161 772141.0 11062 31056.660350613587 0.9346806312815913 0.8317520409566359 0.8313984606487961
agg_7979 non-classical monocyte fallopian tube fallopian tube healthy fc77d2ae-247d-44d7-aa24-3f4859254c2c scimilarity nan nan 16 222500.0 7755 27898.660349927846 0.8621134765305163 0.7601890611687369 0.744729424431635
agg_7980 non-classical monocyte fimbria of uterine tube fallopian tube healthy fc77d2ae-247d-44d7-aa24-3f4859254c2c scimilarity nan nan 28 244266.0 7735 27678.814270356754 0.8734809587140066 0.7648153554084077 0.76608428730196
agg_7982 non-classical monocyte heart left ventricle heart healthy b52eb423-5d0d-4645-b217-e1c6d38b2e72 scimilarity nan nan 57 213565.0 9322 29851.355460106315 0.8978296307325292 0.7939956162210045 0.776120799756924
agg_7983 non-classical monocyte heart right ventricle heart healthy b52eb423-5d0d-4645-b217-e1c6d38b2e72 scimilarity nan nan 124 613752.0 10880 30311.360727579897 0.9311389328985725 0.8253171421863371 0.8219320702682233
agg_7985 non-classical monocyte interventricular septum heart healthy b52eb423-5d0d-4645-b217-e1c6d38b2e72 scimilarity nan nan 144 524658.0 10853 31170.857886799495 0.9241302459076691 0.8138209035301205 0.8224136632163046
agg_7986 non-classical monocyte isthmus of fallopian tube fallopian tube healthy fc77d2ae-247d-44d7-aa24-3f4859254c2c scimilarity nan nan 12 86198.0 5668 23558.81110396413 0.793224614700874 0.7114672551137675 0.6825985710920001
agg_7990 non-classical monocyte kidney kidney healthy 120e86b4-1195-48c5-845b-b98054105eec scimilarity nan nan 214 1788808.0 13749 33794.71179753717 0.9324162382479619 0.8250902105825786 0.818301334304562
agg_7991 non-classical monocyte kidney kidney healthy GSE140989 scimilarity nan nan 473 1769375.0 13008 31797.13462972748 0.9190676320157254 0.8182361375222441 0.8172110487458264
agg_7992 non-classical monocyte left cardiac atrium heart healthy b52eb423-5d0d-4645-b217-e1c6d38b2e72 scimilarity nan nan 82 357018.0 10073 29995.2881818953 0.916928150799629 0.8074046233049013 0.808753490409063
agg_7995 non-classical monocyte liver liver healthy GSE136103 scimilarity nan nan 423 2383574.0 13122 31645.435149503457 0.9524231389227664 0.8376748793878739 0.8415142758287522
agg_7996 non-classical monocyte liver liver healthy GSE159977 scimilarity nan nan 473 4877555.0 13370 33200.636271185205 0.9498326621251537 0.8392029926705806 0.8469259328198118
agg_7997 non-classical monocyte liver liver healthy GSE163650 scimilarity nan nan 10 96148.0 6782 24958.605923030595 0.844996457443227 0.7380858347163884 0.7209649786125333
agg_8007 non-classical monocyte lung lung healthy 5d445965-6f1a-4b68-ba3a-b8f765155d3a scimilarity nan nan 576 3313836.0 12566 30969.255628748215 0.9438879246302309 0.8444989044016609 0.8358111974870654
agg_8008 non-classical monocyte lung lung healthy DS000011735 scimilarity nan nan 169 779577.0 12885 36247.76234710614 0.8808886302153663 0.8102555656654346 0.7948721237235893
agg_8009 non-classical monocyte lung lung healthy GSE128033 scimilarity nan nan 79 343546.0 9330 27406.22701580364 0.9035509606166174 0.7968173004217675 0.7863212826693502
agg_8010 non-classical monocyte lung lung healthy GSE128169 scimilarity nan nan 276 2433151.0 12769 32027.902610765417 0.9547437308289124 0.8449880167209973 0.8438649522788478
agg_8011 non-classical monocyte lung lung healthy GSE132771 scimilarity nan nan 37 151922.0 7860 26295.743163049112 0.8838770944570796 0.790431305730734 0.7607028661311058
agg_8012 non-classical monocyte lung lung healthy GSE169471 scimilarity nan nan 27 103204.0 6854 23947.65597317191 0.8348392917103126 0.7488344663849291 0.7237935010019484
agg_8024 non-classical monocyte lung parenchyma lung healthy GSE158127 scimilarity nan nan 309 1788828.0 12136 32078.56380429425 0.9418105467635777 0.8295808338281161 0.831837004521098
agg_8026 non-classical monocyte muscle tissue muscle healthy e5f58829-1a66-40b5-a624-9046778e74f5 scimilarity nan nan 321 23778338.0 13169 27977.08919973388 0.8814221140322411 0.7869360982600341 0.7677337716492793
agg_8034 non-classical monocyte right cardiac atrium heart healthy b52eb423-5d0d-4645-b217-e1c6d38b2e72 scimilarity nan nan 70 408965.0 10146 29746.121058681525 0.9252128588702657 0.8186384457806745 0.8103901461358153
agg_8036 non-classical monocyte spleen spleen healthy 4d74781b-8186-4c9a-b659-ff4dc4601d91 scimilarity nan nan 336 1586973.0 11934 30580.798338873254 0.9471354298985436 0.8378586626071394 0.8322814736104384
agg_8039 non-classical monocyte thoracic lymph node lymph node healthy 62ef75e4-cbea-454e-a0ce-998ec40223d3 scimilarity nan nan 1950 18888557.0 15221 33581.76389036113 0.9559331607078543 0.8537562341521224 0.8469325123405803
agg_8040 non-classical monocyte thymus thymus healthy 62ef75e4-cbea-454e-a0ce-998ec40223d3 scimilarity nan nan 68 441502.0 10301 31247.452676451114 0.917407487628822 0.8223254514108821 0.8197270854172879
agg_8041 non-classical monocyte trachea airway healthy 03f821b4-87be-4ff4-b65a-b5fc00061da7 scimilarity nan nan 12 97840.0 7863 29204.083428972655 0.8702163087134754 0.7685799000235117 0.7716054237714325
agg_8042 non-classical monocyte trachea airway healthy e5f58829-1a66-40b5-a624-9046778e74f5 scimilarity nan nan 16 534304.0 8970 24807.157880956805 0.8232311272852844 0.7180336962121132 0.7003798955542161
agg_8044 non-classical monocyte transverse colon gut healthy 62ef75e4-cbea-454e-a0ce-998ec40223d3 scimilarity nan nan 135 908452.0 11380 32804.296806012215 0.933818206129888 0.8353788654625934 0.8345447172426862
agg_8045 non-classical monocyte upper lobe of lung lung healthy GSE169471 scimilarity nan nan 32 143341.0 7269 24395.316476390286 0.8673226900308589 0.7523857428733156 0.7403920706432944
agg_8046 non-classical monocyte vasculature vasculature healthy e5f58829-1a66-40b5-a624-9046778e74f5 scimilarity nan nan 987 10267209.0 15019 35019.47682838501 0.9379668399645168 0.8304128458333673 0.8283221722893935
This query selects cells that are:
classical monocytes (cell_type == “classical monocyte”)
from healthy donors (disease == “healthy”)
from blood tissue (tissue == “blood”)
! decima query-cell 'cell_type == "classical monocyte" and disease == "healthy" and tissue == "blood"' | column -t -s $'\t'
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
wandb: Currently logged in as: mhcelik (mhcw) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb: 1 of 1 files downloaded.
Done. 00:00:01.9 (1663.1MB/s)
cell_type tissue organ disease study dataset region subregion celltype_coarse n_cells total_counts n_genes size_factor train_pearson val_pearson test_pearson
agg_4733 classical monocyte blood blood healthy 03f821b4-87be-4ff4-b65a-b5fc00061da7 scimilarity nan nan 32464 109280914.0 16158 33646.02843110038 0.9568728712803031 0.8545324533535094 0.8487445540580735
agg_4734 classical monocyte blood blood healthy 436154da-bcf1-4130-9c8b-120ff9a888f2 scimilarity nan nan 76800 206490628.0 16683 30736.453324546856 0.955313650235467 0.8494127267867799 0.84210567312908
agg_4735 classical monocyte blood blood healthy 5d445965-6f1a-4b68-ba3a-b8f765155d3a scimilarity nan nan 1044 3638976.0 12306 30542.8977364245 0.9465384578921433 0.851991795617166 0.8308191715256182
agg_4736 classical monocyte blood blood healthy DS000010023 scimilarity nan nan 243 362606.0 8414 25201.261024165953 0.8865446516098515 0.7621391376670988 0.7681818769796727
agg_4737 classical monocyte blood blood healthy GSE122703 scimilarity nan nan 18 83417.0 7546 28194.725173315186 0.859225612396475 0.7640890699253299 0.7443681539461423
agg_4738 classical monocyte blood blood healthy GSE130117 scimilarity nan nan 2017 7130588.0 13535 33078.53692191542 0.9553673450160365 0.851402109626239 0.8385936100758409
agg_4739 classical monocyte blood blood healthy GSE132802 scimilarity nan nan 1601 9955248.0 13132 32063.630951743195 0.9478882791739611 0.8391025143866828 0.8303465877530952
agg_4740 classical monocyte blood blood healthy GSE139324 scimilarity nan nan 2333 8331045.0 13985 31135.881287246768 0.9608208780142045 0.8473885992448625 0.8432790193723467
agg_4741 classical monocyte blood blood healthy GSE145809 scimilarity nan nan 69 245221.0 8962 29135.67197629852 0.8825701041728526 0.7811799267734735 0.7818647625179129
agg_4742 classical monocyte blood blood healthy GSE149313 scimilarity nan nan 2420 6974751.0 13143 29560.496854576566 0.9574598613513423 0.8505290963248237 0.8379199735887167
agg_4743 classical monocyte blood blood healthy GSE153421 scimilarity nan nan 3691 15561725.0 14569 34377.465875728165 0.9636686704566925 0.8576434473562725 0.8511814190737197
agg_4744 classical monocyte blood blood healthy GSE156989 scimilarity nan nan 13554 160011485.0 16915 34135.439844737564 0.9640667421350761 0.8577967800377495 0.8517975138366085
agg_4745 classical monocyte blood blood healthy GSE157829 scimilarity nan nan 1619 6957811.0 13507 30199.39288988673 0.9484019976492215 0.8436979316400604 0.8347196616710685
agg_4746 classical monocyte blood blood healthy GSE159113 scimilarity nan nan 1025 6298250.0 12083 27477.50809897617 0.9078020151513733 0.8121457150205226 0.7980372877810575
agg_4747 classical monocyte blood blood healthy GSE161329 scimilarity nan nan 5654 25653579.0 14349 28848.0539929647 0.9549801428956252 0.8450430950674043 0.8406188789518544
agg_4748 classical monocyte blood blood healthy GSE161738 scimilarity nan nan 2676 13801473.0 12825 33337.477050230416 0.9541962906717452 0.8512846409758499 0.8485408028961247
agg_4749 classical monocyte blood blood healthy GSE163668 scimilarity nan nan 2644 10486314.0 14049 33786.96584264489 0.9597801578342394 0.8560775485935677 0.8512149509551471
agg_4750 classical monocyte blood blood healthy GSE166992 scimilarity nan nan 7501 28033216.0 15079 33455.367364577316 0.9622273594219685 0.8558958139235102 0.8495571689751152
agg_4751 classical monocyte blood blood healthy GSE167363 scimilarity nan nan 3135 14722635.0 14375 29977.24002819913 0.942417448875388 0.8368071803109702 0.8258536430202982
agg_4752 classical monocyte blood blood healthy GSE168710 scimilarity nan nan 16484 104881872.0 16223 34107.336261357574 0.9398282119039322 0.8424821834537695 0.8372971004604842
agg_4753 classical monocyte blood blood healthy GSE168732 scimilarity nan nan 770 2548822.0 12508 33411.30103713399 0.9552513581030765 0.8508279875038706 0.847461536110767
agg_4754 classical monocyte blood blood healthy b0cf0afa-ec40-4d65-b570-ed4ceacc6813 scimilarity nan nan 40975 300555227.0 15784 35938.85772500803 0.9622425892039956 0.853424173800979 0.8508714303589978
agg_4755 classical monocyte blood blood healthy ddfad306-714d-4cc0-9985-d9072820c530 scimilarity nan nan 8827 36073928.0 15131 33208.591584008376 0.9546118779961532 0.8543086616569785 0.8462739374830107
Attribution calling with custom genes and sequences¶
In this section, we demonstrate how to call attributions using custom gene sequences. You can provide your own FASTA file containing sequences of interest and run attribution analysis for any set of genes or genomic regions, using the Decima command-line interface. The following examples show how to inspect your FASTA file, run attributions, and explore the output files. The FASTA header line for each sequence contains the gene name and the coordinates of the masked region used for attribution analysis. For example, in the header:
CD68|gene_mask_start=163840|gene_mask_end=166460
“CD68” is the gene name, “gene_mask_start” and “gene_mask_end” specify the start and end positions (relative to the input sequence) of the region that was masked and analyzed for attributions.
! cat ../tests/data/seqs.fasta | cut -c 1-200
cat: ../tests/data/seqs.fasta: No such file or directory
! decima attributions --model v1_rep0 --seqs ../../tests/data/seqs.fasta --tasks "cell_type == 'classical monocyte'" --output-prefix example/output_custom_seqs
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
decima - INFO - Using device: 0
decima - INFO - Loading model v1_rep0 and metadata to compute attributions...
wandb: Currently logged in as: mhcelik (mhcw) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: Downloading large artifact 'rep0:latest', 720.03MB. 1 files...
wandb: 1 of 1 files downloaded.
Done. 00:00:00.9 (837.1MB/s)
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb: 1 of 1 files downloaded.
Done. 00:00:02.0 (1562.9MB/s)
/home/celikm5/Projects/decima/src/decima/interpret/attributer.py:66: UserWarning: `off_tasks` is not provided. Using all other tasks as off_tasks.
Computing attributions...: 0%| | 0/2 [00:00<?, ?it/s]
Computing attributions...: 50%|█████████ | 1/2 [00:01<00:01, 1.38s/it]
Computing attributions...: 100%|██████████████████| 2/2 [00:02<00:00, 1.00it/s]
Computing attributions...: 100%|██████████████████| 2/2 [00:02<00:00, 1.08s/it]
decima - INFO - Saving sequences...
Saving sequences...: 0it [00:00, ?it/s]
Saving sequences...: 2it [00:00, 10965.50it/s]
decima - INFO - Loading model and metadata to compute attributions...
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb: 1 of 1 files downloaded.
Done. 00:00:02.2 (1445.0MB/s)
decima - INFO - No genes provided, using all 2 genes in the attribution files.
Computing recursive seqlet calling...: 0%| | 0/2 [00:00<?, ?it/s]
Computing recursive seqlet calling...: 100%|█████| 2/2 [00:00<00:00, 597.44it/s]
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/memelite/fimo.py:406: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/memelite/fimo.py:406: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
The output file for custom sequence also contains additional file of seqs.fasta which contains custom seqs. To visualize prediction on IGV, first load this fasta file and its index (.fai) to IGV, then load bam and bed files.
! ls example/output_custom_seqs*
example/output_custom_seqs.attributions.bigwig
example/output_custom_seqs.attributions.h5
example/output_custom_seqs.motifs.tsv
example/output_custom_seqs.seqlets.bed
example/output_custom_seqs.seqs.fasta
example/output_custom_seqs.seqs.fasta.fai
example/output_custom_seqs.warnings.qc.log
Python User API¶
! ls example/output_classical_monoctypes.*
example/output_classical_monoctypes.attributions.bigwig
example/output_classical_monoctypes.attributions.h5
example/output_classical_monoctypes.motifs.tsv
example/output_classical_monoctypes.seqlets.bed
example/output_classical_monoctypes.warnings.qc.log
from decima.interpret.attributions import AttributionResult
with AttributionResult("example/output_classical_monoctypes.attributions.h5") as ar:
seqs, attrs = ar.load(["SPI1"])
print("seqs:", seqs)
print("attrs:", attrs)
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
Loading attributions and sequences...: 0%| | 0/1 [00:00<?, ?it/s]
Loading attributions and sequences...: 100%|██████████| 1/1 [00:00<00:00, 432.31it/s]
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
seqs: [[[0. 0. 0. ... 0. 0. 0.]
[0. 0. 1. ... 0. 0. 0.]
[0. 1. 0. ... 1. 0. 1.]
[1. 0. 0. ... 0. 1. 0.]]]
attrs: [[[-5.10058962e-05 -3.67399698e-05 7.25216159e-06 ... -1.40011580e-05
-5.10658174e-06 -6.25329176e-06]
[-5.10058962e-05 -3.67399698e-05 -2.17564848e-05 ... -1.40011580e-05
-5.10658174e-06 -6.25329176e-06]
[-5.10058962e-05 1.10219909e-04 7.25216159e-06 ... 4.20034739e-05
-5.10658174e-06 1.87598753e-05]
[ 1.53017689e-04 -3.67399698e-05 7.25216159e-06 ... -1.40011580e-05
1.53197452e-05 -6.25329176e-06]]]
Let’s look at a simple example using Decima’s Python API to analyze the SPI1 gene, which is a key transcription factor in myeloid cell development. We’ll examine its regulation across different monocyte and macrophage cell types where it is known to be important.
First we choice the cells, we are interested in:
with AttributionResult("example/output_classical_monoctypes.attributions.h5") as ar:
attribution = ar.load_attribution("SPI1")
wandb: Currently logged in as: mhcelik (mhcw) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb: 1 of 1 files downloaded.
Done. 00:00:01.8 (1731.2MB/s)
import matplotlib.pyplot as plt
attribution.plot_seqlogo(relative_loc=291)
plt.show()
import torch
from decima import predict_attributions_seqlet_calling
device = "cuda" if torch.cuda.is_available() else "cpu"
%matplotlib inline
spi1_cell_types = [
"classical monocyte",
"intermediate monocyte",
"non-classical monocyte",
"alveolar macrophage",
"macrophage",
]
predict_attributions_seqlet_calling(
output_prefix="example/attrs_SP1I_monoctypes",
genes=["SPI1"],
tasks=f"cell_type in {spi1_cell_types}",
device=device,
)
wandb: Downloading large artifact 'rep0:latest', 720.03MB. 1 files...
wandb: 1 of 1 files downloaded.
Done. 00:00:00.6 (1180.1MB/s)
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb: 1 of 1 files downloaded.
Done. 00:00:01.8 (1694.4MB/s)
/home/celikm5/Projects/decima/src/decima/interpret/attributer.py:66: UserWarning: `off_tasks` is not provided. Using all other tasks as off_tasks.
Computing attributions...: 0%| | 0/1 [00:00<?, ?it/s]
Computing attributions...: 100%|██████████| 1/1 [00:01<00:00, 1.51s/it]
Computing attributions...: 100%|██████████| 1/1 [00:01<00:00, 1.55s/it]
wandb: Downloading large artifact 'rep1:latest', 720.03MB. 1 files...
wandb: 1 of 1 files downloaded.
Done. 00:00:01.6 (442.2MB/s)
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb: 1 of 1 files downloaded.
Done. 00:00:01.8 (1701.1MB/s)
/home/celikm5/Projects/decima/src/decima/interpret/attributer.py:66: UserWarning: `off_tasks` is not provided. Using all other tasks as off_tasks.
Computing attributions...: 0%| | 0/1 [00:00<?, ?it/s]
Computing attributions...: 100%|██████████| 1/1 [00:00<00:00, 1.10it/s]
Computing attributions...: 100%|██████████| 1/1 [00:00<00:00, 1.04it/s]
wandb: Downloading large artifact 'rep2:latest', 720.03MB. 1 files...
wandb: 1 of 1 files downloaded.
Done. 00:00:01.9 (387.4MB/s)
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb: 1 of 1 files downloaded.
Done. 00:00:01.9 (1684.4MB/s)
/home/celikm5/Projects/decima/src/decima/interpret/attributer.py:66: UserWarning: `off_tasks` is not provided. Using all other tasks as off_tasks.
Computing attributions...: 0%| | 0/1 [00:00<?, ?it/s]
Computing attributions...: 100%|██████████| 1/1 [00:00<00:00, 1.07it/s]
Computing attributions...: 100%|██████████| 1/1 [00:00<00:00, 1.01it/s]
wandb: Downloading large artifact 'rep3:latest', 720.03MB. 1 files...
wandb: 1 of 1 files downloaded.
Done. 00:00:01.8 (402.4MB/s)
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb: 1 of 1 files downloaded.
Done. 00:00:01.9 (1651.1MB/s)
/home/celikm5/Projects/decima/src/decima/interpret/attributer.py:66: UserWarning: `off_tasks` is not provided. Using all other tasks as off_tasks.
Computing attributions...: 0%| | 0/1 [00:00<?, ?it/s]
Computing attributions...: 100%|██████████| 1/1 [00:00<00:00, 1.08it/s]
Computing attributions...: 100%|██████████| 1/1 [00:00<00:00, 1.02it/s]
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb: 1 of 1 files downloaded.
Done. 00:00:01.8 (1737.2MB/s)
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb: 1 of 1 files downloaded.
Done. 00:00:01.9 (1658.9MB/s)
Computing recursive seqlet calling...: 0%| | 0/1 [00:00<?, ?it/s]
Computing recursive seqlet calling...: 100%|██████████| 1/1 [00:01<00:00, 1.08s/it]
Computing recursive seqlet calling...: 100%|██████████| 1/1 [00:01<00:00, 1.08s/it]
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/memelite/fimo.py:406: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
Similar to command line you can use predict_save_attributions and recursive_seqlet_calling functions calls attirubitions and seqlets step by step.
Custom Sequences¶
Attributions for a custom sequence can be calculated by passing data frame with columns of seq, gene_mask_start, gene_mask_end. The index of the DataFrame will be used as gene names.
import pandas as pd
df_seqs = pd.read_csv("../tests/data/seqs.csv", index_col=0)
df_seqs
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
Cell In[28], line 3
1 import pandas as pd
----> 3 df_seqs = pd.read_csv("../tests/data/seqs.csv", index_col=0)
4 df_seqs
File ~/miniforge3/envs/decima2/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1026, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, date_format, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options, dtype_backend)
1013 kwds_defaults = _refine_defaults_read(
1014 dialect,
1015 delimiter,
(...) 1022 dtype_backend=dtype_backend,
1023 )
1024 kwds.update(kwds_defaults)
-> 1026 return _read(filepath_or_buffer, kwds)
File ~/miniforge3/envs/decima2/lib/python3.11/site-packages/pandas/io/parsers/readers.py:620, in _read(filepath_or_buffer, kwds)
617 _validate_names(kwds.get("names", None))
619 # Create the parser.
--> 620 parser = TextFileReader(filepath_or_buffer, **kwds)
622 if chunksize or iterator:
623 return parser
File ~/miniforge3/envs/decima2/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1620, in TextFileReader.__init__(self, f, engine, **kwds)
1617 self.options["has_index_names"] = kwds["has_index_names"]
1619 self.handles: IOHandles | None = None
-> 1620 self._engine = self._make_engine(f, self.engine)
File ~/miniforge3/envs/decima2/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1880, in TextFileReader._make_engine(self, f, engine)
1878 if "b" not in mode:
1879 mode += "b"
-> 1880 self.handles = get_handle(
1881 f,
1882 mode,
1883 encoding=self.options.get("encoding", None),
1884 compression=self.options.get("compression", None),
1885 memory_map=self.options.get("memory_map", False),
1886 is_text=is_text,
1887 errors=self.options.get("encoding_errors", "strict"),
1888 storage_options=self.options.get("storage_options", None),
1889 )
1890 assert self.handles is not None
1891 f = self.handles.handle
File ~/miniforge3/envs/decima2/lib/python3.11/site-packages/pandas/io/common.py:873, in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
868 elif isinstance(handle, str):
869 # Check whether the filename is to be opened in binary mode.
870 # Binary mode does not support 'encoding' and 'newline'.
871 if ioargs.encoding and "b" not in ioargs.mode:
872 # Encoding
--> 873 handle = open(
874 handle,
875 ioargs.mode,
876 encoding=ioargs.encoding,
877 errors=errors,
878 newline="",
879 )
880 else:
881 # Binary mode
882 handle = open(handle, ioargs.mode)
FileNotFoundError: [Errno 2] No such file or directory: '../tests/data/seqs.csv'
predict_attributions_seqlet_calling(
output_prefix="example/attrs_custom_seqs_monoctypes",
seqs=df_seqs, # <-- custom sequences
tasks=f"cell_type in {spi1_cell_types}",
device=device,
)
! ls attrs_custom_seqs_monoctypes
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[29], line 3
1 predict_attributions_seqlet_calling(
2 output_prefix="example/attrs_custom_seqs_monoctypes",
----> 3 seqs=df_seqs, # <-- custom sequences
4 tasks=f"cell_type in {spi1_cell_types}",
5 device=device,
6 )
7 get_ipython().system(' ls attrs_custom_seqs_monoctypes')
NameError: name 'df_seqs' is not defined
import random
import torch
from grelu.sequence.format import strings_to_one_hot
from decima.constants import DECIMA_CONTEXT_SIZE
DECIMA_CONTEXT_SIZE
524288
seqs = torch.cat(
[
strings_to_one_hot(
["".join(random.choice(["A", "T", "C", "G"]) for _ in range(DECIMA_CONTEXT_SIZE))]
), # one-hot encoded sequence
torch.ones(1, 1, DECIMA_CONTEXT_SIZE), # binary mask for the gene
],
dim=1,
)
seqs.shape
torch.Size([1, 5, 524288])
predict_attributions_seqlet_calling(
output_prefix="example/attrs_custom_tensors_monoctypes",
seqs=seqs, # <-- custom sequences as torch.Tensor where (batch_size, 5, seq_len), second dimension is one-hot encoded sequence and binary mask for the gene
tasks=f"cell_type in {spi1_cell_types}",
device=device,
model=0,
threshold=1e-6,
)
wandb: Downloading large artifact 'rep0:latest', 720.03MB. 1 files...
wandb: 1 of 1 files downloaded.
Done. 00:00:00.6 (1145.4MB/s)
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb: 1 of 1 files downloaded.
Done. 00:00:01.8 (1748.9MB/s)
/home/celikm5/Projects/decima/src/decima/interpret/attributer.py:66: UserWarning: `off_tasks` is not provided. Using all other tasks as off_tasks.
Computing attributions...: 0%| | 0/1 [00:00<?, ?it/s]
Computing attributions...: 100%|██████████| 1/1 [00:00<00:00, 1.08it/s]
Computing attributions...: 100%|██████████| 1/1 [00:00<00:00, 1.02it/s]
Saving sequences...: 0it [00:00, ?it/s]
Saving sequences...: 1it [00:00, 8525.01it/s]
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb: 1 of 1 files downloaded.
Done. 00:00:01.7 (1786.9MB/s)
Computing recursive seqlet calling...: 0%| | 0/1 [00:00<?, ?it/s]
Computing recursive seqlet calling...: 100%|██████████| 1/1 [00:00<00:00, 1403.25it/s]
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/memelite/fimo.py:406: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
! ls example/attrs_custom_tensors_monoctypes*
example/attrs_custom_tensors_monoctypes.attributions.bigwig
example/attrs_custom_tensors_monoctypes.attributions.h5
example/attrs_custom_tensors_monoctypes.motifs.tsv
example/attrs_custom_tensors_monoctypes.seqlets.bed
example/attrs_custom_tensors_monoctypes.seqs.fasta
example/attrs_custom_tensors_monoctypes.seqs.fasta.fai
example/attrs_custom_tensors_monoctypes.warnings.qc.log
Advance Developer API¶
DecimaResult provides a unified interface for working with Decima results in anndata format. It contains an AnnData structure storing cell x gene expression data and metadata. Through DecimaResult, users can load pre-trained models, compute attributions to understand genomic regulation, and analyze results through visualizations or export to genomic file formats. The object provides convenient access to cell and gene annotations through its metadata properties.
from decima import DecimaResult
result = DecimaResult.load()
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb: 1 of 1 files downloaded.
Done. 00:00:01.9 (1639.9MB/s)
result.cell_metadata.query("cell_type.str.endswith('macrophage')")
| cell_type | tissue | organ | disease | study | dataset | region | subregion | celltype_coarse | n_cells | total_counts | n_genes | size_factor | train_pearson | val_pearson | test_pearson | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| agg_4063 | alveolar macrophage | alveolar system | lung | COVID-19 | GSE155249 | scimilarity | nan | nan | NaN | 1453 | 8.001524e+06 | 14711 | 36293.472025 | 0.943059 | 0.837210 | 0.849998 |
| agg_4064 | alveolar macrophage | alveolar system | lung | healthy | GSE155249 | scimilarity | nan | nan | NaN | 1279 | 7.598244e+06 | 13673 | 34158.514496 | 0.932819 | 0.831024 | 0.843684 |
| agg_4065 | alveolar macrophage | left lung | lung | NA | ENCODE | scimilarity | nan | nan | NaN | 405 | 3.000961e+06 | 16595 | 46501.375857 | 0.936081 | 0.847924 | 0.845485 |
| agg_4066 | alveolar macrophage | lingula of left lung | lung | healthy | a3ffde6c-7ad2-498a-903c-d58e732f7470 | scimilarity | nan | nan | NaN | 854 | 1.713753e+06 | 15110 | 42773.009735 | 0.893927 | 0.806000 | 0.804835 |
| agg_4067 | alveolar macrophage | lower lobe of left lung | lung | NA | ENCODE | scimilarity | nan | nan | NaN | 763 | 1.344798e+07 | 17973 | 49020.804487 | 0.940586 | 0.854680 | 0.863014 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| agg_6644 | macrophage | uterus | uterus | healthy | 32f2fd23-ec74-486f-9544-e5b2f41725f5 | scimilarity | nan | nan | NaN | 425 | 4.340830e+06 | 15233 | 36624.136739 | 0.954753 | 0.850247 | 0.843175 |
| agg_6645 | macrophage | uterus | uterus | healthy | e5f58829-1a66-40b5-a624-9046778e74f5 | scimilarity | nan | nan | NaN | 231 | 3.007554e+07 | 14787 | 27615.762157 | 0.839476 | 0.730554 | 0.719085 |
| agg_6646 | macrophage | vasculature | vasculature | healthy | e5f58829-1a66-40b5-a624-9046778e74f5 | scimilarity | nan | nan | NaN | 12497 | 4.040685e+08 | 18199 | 36829.498964 | 0.938862 | 0.836819 | 0.833474 |
| agg_6647 | macrophage | visceral fat | adipose | obesity | GSE128518 | scimilarity | nan | nan | NaN | 729 | 2.078431e+06 | 13760 | 34188.716187 | 0.941596 | 0.827360 | 0.823912 |
| agg_6648 | macrophage | white adipose tissue | adipose | NA | GSE128890 | scimilarity | nan | nan | NaN | 45 | 1.381560e+05 | 8257 | 27604.748095 | 0.859386 | 0.745328 | 0.745539 |
325 rows × 16 columns
The results and metadata stored in anndata format which you can access directly if needed but most operation are supported by DecimaResult object.
result.anndata
AnnData object with n_obs × n_vars = 8856 × 18457
obs: 'cell_type', 'tissue', 'organ', 'disease', 'study', 'dataset', 'region', 'subregion', 'celltype_coarse', 'n_cells', 'total_counts', 'n_genes', 'size_factor', 'train_pearson', 'val_pearson', 'test_pearson'
var: 'chrom', 'start', 'end', 'strand', 'gene_type', 'frac_nan', 'mean_counts', 'n_tracks', 'gene_start', 'gene_end', 'gene_length', 'gene_mask_start', 'gene_mask_end', 'frac_N', 'fold', 'dataset', 'gene_id', 'pearson', 'size_factor_pearson', 'ensembl_canonical_tss'
layers: 'preds', 'v1_rep0', 'v1_rep1', 'v1_rep2', 'v1_rep3'
These are the cell metadata contained in the Decima object.
result.cell_metadata
| cell_type | tissue | organ | disease | study | dataset | region | subregion | celltype_coarse | n_cells | total_counts | n_genes | size_factor | train_pearson | val_pearson | test_pearson | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| agg_0 | Amygdala excitatory | Amygdala_Amygdala | CNS | healthy | jhpce#tran2021 | brain_atlas | Amygdala | Amygdala | NaN | 331 | 1.592883e+07 | 17000 | 41431.465186 | 0.942459 | 0.841377 | 0.865640 |
| agg_1 | Amygdala excitatory | Amygdala_Basolateral nuclear group (BLN) - lat... | CNS | healthy | SCR_016152 | brain_atlas | Amygdala | Basolateral nuclear group (BLN) - lateral nucl... | NaN | 11369 | 2.952133e+08 | 18080 | 40765.341481 | 0.943098 | 0.838936 | 0.861092 |
| agg_2 | Amygdala excitatory | Amygdala_Bed nucleus of stria terminalis and n... | CNS | healthy | SCR_016152 | brain_atlas | Amygdala | Bed nucleus of stria terminalis and nearby - BNST | NaN | 139 | 2.593231e+06 | 15418 | 42556.387020 | 0.952170 | 0.854544 | 0.866654 |
| agg_3 | Amygdala excitatory | Amygdala_Central nuclear group - CEN | CNS | healthy | SCR_016152 | brain_atlas | Amygdala | Central nuclear group - CEN | NaN | 3892 | 9.946371e+07 | 17959 | 42884.641430 | 0.959744 | 0.863585 | 0.881554 |
| agg_4 | Amygdala excitatory | Amygdala_Corticomedial nuclear group (CMN) - a... | CNS | healthy | SCR_016152 | brain_atlas | Amygdala | Corticomedial nuclear group (CMN) - anterior c... | NaN | 2945 | 1.281619e+08 | 17885 | 41816.741933 | 0.951365 | 0.854304 | 0.868902 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| agg_9533 | vascular associated smooth muscle cell | upper lobe of right lung | lung | NA | ENCODE | scimilarity | nan | nan | NaN | 21 | 3.483375e+04 | 8515 | 35404.911768 | 0.735213 | 0.665647 | 0.654491 |
| agg_9535 | vascular associated smooth muscle cell | urinary bladder | urinary | healthy | GSE129845 | scimilarity | nan | nan | NaN | 24 | 8.498500e+04 | 7337 | 26189.415789 | 0.809852 | 0.690022 | 0.656160 |
| agg_9536 | vascular associated smooth muscle cell | uterus | uterus | NA | ENCODE | scimilarity | nan | nan | NaN | 272 | 5.700762e+05 | 14769 | 44938.403867 | 0.915329 | 0.808941 | 0.839993 |
| agg_9537 | vascular associated smooth muscle cell | uterus | uterus | healthy | e5f58829-1a66-40b5-a624-9046778e74f5 | scimilarity | nan | nan | NaN | 472 | 1.089170e+07 | 14514 | 30145.422152 | 0.852339 | 0.717682 | 0.727469 |
| agg_9538 | vascular associated smooth muscle cell | vasculature | vasculature | healthy | e5f58829-1a66-40b5-a624-9046778e74f5 | scimilarity | nan | nan | NaN | 1853 | 5.992697e+07 | 16764 | 36464.273371 | 0.909855 | 0.780413 | 0.796351 |
8856 rows × 16 columns
Similarly, these are the gene metadata contained in the Decima object.
result.gene_metadata
| chrom | start | end | strand | gene_type | frac_nan | mean_counts | n_tracks | gene_start | gene_end | gene_length | gene_mask_start | gene_mask_end | frac_N | fold | dataset | gene_id | pearson | size_factor_pearson | ensembl_canonical_tss | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| STRADA | chr17 | 63381538 | 63905826 | - | protein_coding | 0.000000 | 2.208074 | 7616 | 63682336 | 63741986 | 59650 | 163840 | 223490 | 0.000000 | ['fold1'] | train | ENSG00000266173 | 0.469923 | 0.476627 | 63741799.0 |
| ETV4 | chr17 | 43219172 | 43743460 | - | protein_coding | 0.030873 | 0.925863 | 5004 | 43527844 | 43579620 | 51776 | 163840 | 215616 | 0.000000 | ['fold1'] | train | ENSG00000175832 | 0.738092 | 0.613281 | 43546340.0 |
| USP25 | chr21 | 15566185 | 16090473 | + | protein_coding | 0.000000 | 3.650355 | 8604 | 15730025 | 15880069 | 150044 | 163840 | 313884 | 0.000000 | ['fold6'] | train | ENSG00000155313 | 0.905222 | 0.784446 | 15729982.0 |
| ZSWIM5 | chr1 | 44945761 | 45470049 | - | protein_coding | 0.000620 | 2.190115 | 6123 | 45016399 | 45306209 | 289810 | 163840 | 453650 | 0.000000 | ['fold5'] | train | ENSG00000162415 | 0.961772 | 0.795131 | 45206605.0 |
| C21orf58 | chr21 | 45963427 | 46487715 | - | protein_coding | 0.000791 | 1.650467 | 7354 | 46300181 | 46323875 | 23694 | 163840 | 187534 | 0.000000 | ['fold6'] | train | ENSG00000160298 | 0.645268 | 0.412368 | 46323870.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| NPDC1 | chr9 | 136685731 | 137210019 | - | protein_coding | 0.000000 | 2.625285 | 7852 | 137039463 | 137046179 | 6716 | 163840 | 170556 | 0.000000 | ['fold3'] | test | ENSG00000107281 | 0.316322 | 0.178204 | 137046177.0 |
| ZNF425 | chr7 | 148765876 | 149290164 | - | protein_coding | 0.001048 | 1.292957 | 6511 | 149102784 | 149126324 | 23540 | 163840 | 187380 | 0.000000 | ['fold7'] | train | ENSG00000204947 | 0.821292 | 0.737081 | 149126324.0 |
| COL5A1 | chr9 | 134477934 | 135002222 | + | protein_coding | 0.002159 | 1.492664 | 6209 | 134641774 | 134844843 | 203069 | 163840 | 366909 | 0.000000 | ['fold3'] | test | ENSG00000130635 | 0.766624 | 0.456999 | 134641803.0 |
| BRD3 | chr9 | 133708087 | 134232375 | - | protein_coding | 0.000000 | 3.190450 | 8675 | 134030305 | 134068535 | 38230 | 163840 | 202070 | 0.004662 | ['fold3'] | test | ENSG00000169925 | 0.344062 | 0.280283 | 134068026.0 |
| EVI5L | chr19 | 7666393 | 8190681 | + | protein_coding | 0.000000 | 1.959605 | 7570 | 7830233 | 7864976 | 34743 | 163840 | 198583 | 0.000000 | ['fold3'] | test | ENSG00000142459 | 0.810152 | 0.704828 | 7830218.0 |
18457 rows × 20 columns
You can also access the genes and cells:
result.genes
Index(['STRADA', 'ETV4', 'USP25', 'ZSWIM5', 'C21orf58', 'MIR497HG', 'CFAP74',
'GSE1', 'LPP', 'CLK1',
...
'STRIP2', 'TNFRSF1A', 'RBM14-RBM4', 'C1orf21', 'LINC00639', 'NPDC1',
'ZNF425', 'COL5A1', 'BRD3', 'EVI5L'],
dtype='object', length=18457)
Cell indexes can be also accessed:
result.cells
Index(['agg_0', 'agg_1', 'agg_2', 'agg_3', 'agg_4', 'agg_5', 'agg_6', 'agg_7',
'agg_8', 'agg_9',
...
'agg_9528', 'agg_9529', 'agg_9530', 'agg_9531', 'agg_9532', 'agg_9533',
'agg_9535', 'agg_9536', 'agg_9537', 'agg_9538'],
dtype='object', length=8856)
Predicted gene expression for specific gene can be accessed:
result.predicted_expression_matrix(genes=["SPI1"])
| SPI1 | |
|---|---|
| agg_0 | 0.256442 |
| agg_1 | 0.221014 |
| agg_2 | 0.179371 |
| agg_3 | 0.219646 |
| agg_4 | 0.217516 |
| ... | ... |
| agg_9533 | 0.493780 |
| agg_9535 | 0.292091 |
| agg_9536 | 0.370765 |
| agg_9537 | 0.168036 |
| agg_9538 | 0.239733 |
8856 rows × 1 columns
Or for all the genes:
result.predicted_expression_matrix()
| STRADA | ETV4 | USP25 | ZSWIM5 | C21orf58 | MIR497HG | CFAP74 | GSE1 | LPP | CLK1 | ... | STRIP2 | TNFRSF1A | RBM14-RBM4 | C1orf21 | LINC00639 | NPDC1 | ZNF425 | COL5A1 | BRD3 | EVI5L | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| agg_0 | 2.973438 | 1.845565 | 4.592531 | 5.099802 | 1.774879 | 0.356812 | 2.590836 | 4.629774 | 4.897171 | 3.326940 | ... | 2.836060 | 0.297015 | 1.883849 | 4.293593 | 1.463565 | 3.183534 | 2.340202 | 2.374942 | 2.911916 | 3.230072 |
| agg_1 | 2.954213 | 1.896726 | 4.688557 | 5.510440 | 1.666929 | 0.352725 | 2.292625 | 4.459535 | 4.915286 | 3.192858 | ... | 3.125704 | 0.242543 | 1.908177 | 4.439424 | 1.236739 | 3.494824 | 2.425672 | 2.054568 | 2.713408 | 3.491463 |
| agg_2 | 2.938851 | 2.197247 | 4.861410 | 5.617520 | 1.773381 | 0.380867 | 2.394917 | 4.415038 | 4.836399 | 3.390717 | ... | 3.082098 | 0.263285 | 2.006456 | 4.383455 | 1.208590 | 4.013819 | 2.408381 | 2.297343 | 2.892222 | 3.695785 |
| agg_3 | 3.045972 | 2.138573 | 4.863791 | 5.273604 | 1.760097 | 0.463555 | 2.391702 | 3.940975 | 4.857763 | 3.410926 | ... | 2.882890 | 0.290327 | 1.922963 | 4.550189 | 1.430520 | 3.693118 | 2.297103 | 2.121887 | 2.626117 | 3.223912 |
| agg_4 | 3.025518 | 2.019096 | 4.602948 | 5.257001 | 1.755338 | 0.382190 | 2.432810 | 4.392480 | 4.959488 | 3.250500 | ... | 3.082296 | 0.258540 | 2.038277 | 4.464807 | 1.249043 | 3.665800 | 2.400820 | 2.255862 | 2.925619 | 3.471005 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| agg_9533 | 2.333562 | 0.633322 | 4.675825 | 2.793023 | 0.752030 | 0.692083 | 0.503531 | 4.327948 | 6.903193 | 3.695593 | ... | 0.549795 | 2.270181 | 1.563218 | 4.395422 | 0.550088 | 1.330252 | 1.044471 | 3.759369 | 2.491346 | 1.872717 |
| agg_9535 | 0.835037 | 0.358773 | 1.964896 | 0.307449 | 0.337240 | 0.834196 | 0.093885 | 1.853794 | 3.700790 | 4.467302 | ... | 0.176885 | 1.370898 | 1.022708 | 3.400267 | 0.052162 | 1.908870 | 0.253417 | 1.448111 | 1.622033 | 1.064292 |
| agg_9536 | 3.008039 | 1.209324 | 4.798392 | 3.931870 | 1.401328 | 1.638555 | 0.969720 | 4.779201 | 6.631931 | 4.127797 | ... | 1.174298 | 1.870530 | 2.506874 | 5.151776 | 0.967644 | 1.809947 | 2.205356 | 4.244005 | 2.974467 | 2.659873 |
| agg_9537 | 1.241936 | 0.455059 | 2.919995 | 0.571672 | 0.486448 | 1.175586 | 0.145397 | 2.412148 | 4.759118 | 4.913945 | ... | 0.371035 | 1.361073 | 1.668085 | 4.005738 | 0.078611 | 1.571750 | 0.508187 | 2.067150 | 2.323764 | 1.429850 |
| agg_9538 | 1.715507 | 0.700955 | 3.044732 | 0.858696 | 0.903406 | 1.763168 | 0.215304 | 2.604478 | 4.549708 | 4.839124 | ... | 0.594310 | 1.801298 | 2.075996 | 3.933860 | 0.165590 | 1.970268 | 0.993521 | 2.232347 | 2.473388 | 1.902884 |
8856 rows × 18457 columns
result.load_model(device=device)
wandb: WARNING A graphql request initiated by the public wandb API timed out (timeout=19 sec). Create a new API with an integer timeout larger than 19, e.g., `api = wandb.Api(timeout=29)` to increase the graphql timeout.
wandb: Downloading large artifact 'rep0:latest', 720.03MB. 1 files...
wandb: 1 of 1 files downloaded.
Done. 00:00:00.7 (1008.5MB/s)
DecimaResult(anndata=AnnData object with n_obs × n_vars = 8856 × 18457
obs: 'cell_type', 'tissue', 'organ', 'disease', 'study', 'dataset', 'region', 'subregion', 'celltype_coarse', 'n_cells', 'total_counts', 'n_genes', 'size_factor', 'train_pearson', 'val_pearson', 'test_pearson'
var: 'chrom', 'start', 'end', 'strand', 'gene_type', 'frac_nan', 'mean_counts', 'n_tracks', 'gene_start', 'gene_end', 'gene_length', 'gene_mask_start', 'gene_mask_end', 'frac_N', 'fold', 'dataset', 'gene_id', 'pearson', 'size_factor_pearson', 'ensembl_canonical_tss'
layers: 'preds', 'v1_rep0', 'v1_rep1', 'v1_rep2', 'v1_rep3')
Prepare an input for th SPI1 genes.
Takes around ~10 seconds on GPU and ~5 minutes to call attributions on CPU.
attrs = result.attributions(
gene="SPI1",
tasks=result.query_cells(f"cell_type in {spi1_cell_types}"),
off_tasks=result.query_cells(f'organ == "blood" and cell_type not in {spi1_cell_types}'),
)
Attributions can be visualized and processed with attributions object:
attrs.peaks
| peak | start | end | attribution | p-value | from_tss | |
|---|---|---|---|---|---|---|
| 0 | pos.SPI1@37 | 163877 | 163902 | 12.817252 | 2.186883e-11 | 37 |
| 1 | pos.SPI1@-121 | 163719 | 163744 | 5.595659 | 1.899081e-05 | -121 |
| 2 | pos.SPI1@-57 | 163783 | 163803 | 9.307484 | 3.054640e-05 | -57 |
| 3 | pos.SPI1@62 | 163902 | 163909 | 1.281183 | 3.068997e-05 | 62 |
| 4 | pos.SPI1@-79 | 163761 | 163765 | 0.833269 | 6.109865e-05 | -79 |
| ... | ... | ... | ... | ... | ... | ... |
| 72 | neg.SPI1@443 | 164283 | 164293 | -0.717349 | 4.916059e-04 | 443 |
| 73 | neg.SPI1@23600 | 187440 | 187445 | -0.267438 | 4.916059e-04 | 23600 |
| 74 | neg.SPI1@32783 | 196623 | 196630 | -0.461813 | 4.918151e-04 | 32783 |
| 75 | neg.SPI1@1735 | 165575 | 165592 | -1.437498 | 4.918151e-04 | 1735 |
| 76 | neg.SPI1@31668 | 195508 | 195512 | -0.213403 | 4.918151e-04 | 31668 |
135 rows × 6 columns
attrs.peaks_to_bed()
| chrom | start | end | name | score | strand | attribution | |
|---|---|---|---|---|---|---|---|
| 38 | chr11 | 47216350 | 47216357 | pos.SPI1@162219 | 3.33494 | . | 0.543797 |
| 49 | chr11 | 47257597 | 47257605 | pos.SPI1@120971 | 3.31931 | . | 0.680714 |
| 65 | chr11 | 47257633 | 47257637 | neg.SPI1@120939 | 3.31455 | . | -0.221530 |
| 63 | chr11 | 47257734 | 47257739 | neg.SPI1@120837 | 3.32086 | . | -0.273840 |
| 43 | chr11 | 47345731 | 47345736 | neg.SPI1@32840 | 3.35317 | . | -0.298483 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 69 | chr11 | 47395483 | 47395492 | neg.SPI1@-16916 | 3.31094 | . | -0.567760 |
| 39 | chr11 | 47400211 | 47400221 | neg.SPI1@-21645 | 3.35527 | . | -0.900000 |
| 37 | chr11 | 47400225 | 47400235 | neg.SPI1@-21659 | 3.35844 | . | -0.729126 |
| 68 | chr11 | 47400376 | 47400382 | neg.SPI1@-21806 | 3.31094 | . | -0.329538 |
| 58 | chr11 | 47400703 | 47400709 | neg.SPI1@-22133 | 3.33067 | . | -0.325769 |
135 rows × 7 columns
import matplotlib.pyplot as plt
attrs.plot_seqlogo(relative_loc=-45)
plt.show()
This comment takes around ~1 minutes and detects motifs in the attributions using FIMO. The motifs are ranked by their attribution scores:
df_motifs = attrs.scan_motifs()
df_motifs
| motif | peak | start | end | strand | score | p-value | matched_seq | site_attr_score | motif_attr_score | from_tss | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 3874 | ZNF746.H13CORE.0.PSG.A | neg.SPI1@1917 | 165744 | 165770 | - | 26.355452 | 7.435330e-10 | AGGGAGGAGGGAGGAAGGTGGGAGGA | -0.010775 | -0.016253 | 1904 |
| 3453 | ZN263.H13CORE.1.P.B | neg.SPI1@1898 | 165732 | 165753 | + | 24.008722 | 1.311946e-09 | GGGGAGGAGGACAGGGAGGAG | -0.006567 | -0.016637 | 1892 |
| 781 | ZN479.H13CORE.0.P.C | neg.SPI1@-174 | 163668 | 163686 | - | 22.937369 | 2.837623e-09 | GCCCCCAAAGTCATCCCT | -0.007155 | -0.013835 | -172 |
| 1036 | ZNF746.H13CORE.0.PSG.A | neg.SPI1@-191 | 163639 | 163665 | + | 24.462995 | 3.833248e-09 | TCTCCCTCCCATCCTCCCTCCCCAGC | -0.002449 | -0.001297 | -201 |
| 3545 | ZNF746.H13CORE.0.PSG.A | neg.SPI1@1898 | 165732 | 165758 | - | 23.523391 | 7.853286e-09 | GGGGAGGAGGACAGGGAGGAGGGAGG | -0.005327 | -0.010747 | 1892 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1088 | CREB3.H13CORE.0.SM.B | neg.SPI1@-21 | 163819 | 163833 | + | 1.754682 | 4.999340e-04 | GCGGTGATGTCACC | -0.206348 | -0.585193 | -21 |
| 2067 | RXRB.H13CORE.2.PS.A | neg.SPI1@1182 | 165019 | 165030 | - | 12.213856 | NaN | CCATGACCTCT | -0.008323 | -0.024233 | 1179 |
| 2913 | KLF7.H13CORE.0.P.B | neg.SPI1@1813 | 165662 | 165672 | + | 15.217368 | NaN | GGGGGCGGGG | 0.008973 | 0.025625 | 1822 |
| 2986 | KLF7.H13CORE.0.P.B | neg.SPI1@1832 | 165662 | 165672 | + | 15.217368 | NaN | GGGGGCGGGG | 0.008973 | 0.025625 | 1822 |
| 7451 | KLF7.H13CORE.0.P.B | pos.SPI1@1820 | 165662 | 165672 | + | 15.217368 | NaN | GGGGGCGGGG | 0.008973 | 0.025625 | 1822 |
8556 rows × 11 columns
If you just want attribution tensor from input one_hot encoded sequence prepare your input and call attributions object:
one_hot_seq, gene_mask = result.prepare_one_hot("SPI1")
inputs = torch.vstack([one_hot_seq, gene_mask]).unsqueeze(0)
inputs.shape # (batch_size, 5, seq_len)
torch.Size([1, 5, 524288])
from decima.interpret.attributer import DecimaAttributer
attributer = DecimaAttributer(
model=result.model,
tasks=result.query_cells(f"cell_type in {spi1_cell_types}"),
off_tasks=result.query_cells(f'organ == "blood" and cell_type not in {spi1_cell_types}'),
transform="specificity",
method="inputxgradient",
)
attrs = attributer.attribute(inputs=inputs)
attrs # (batch_size, 4, seq_len) gene mask is removed from final attributions
tensor([[[-0.0000e+00, 0.0000e+00, -0.0000e+00, ..., -0.0000e+00,
0.0000e+00, 0.0000e+00],
[-0.0000e+00, -0.0000e+00, -2.6888e-05, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[-0.0000e+00, 1.2651e-04, 0.0000e+00, ..., 3.7016e-05,
-0.0000e+00, 1.5136e-05],
[ 1.7333e-04, -0.0000e+00, 0.0000e+00, ..., -0.0000e+00,
1.2473e-05, -0.0000e+00]]], device='cuda:0')