Attribution and Motifs Detection with Decima

This documentation demonstrates how to use Decima’s attribution analysis capabilities to identify important regulatory regions in genomic sequences and discover transcription factor binding motifs within those regions. Attribution analysis helps reveal which parts of the DNA sequence most strongly influence gene expression predictions, while motif scanning can identify specific transcription factor binding sites in these regions of interest.

CLI API

Let’s look at a simple example using Decima’s CLI API to analyze the SPI1 and BRD3 genes. SPI1 is a key transcription factor in myeloid cell development. We’ll examine its regulation across different monocyte and macrophage cell types where it is known to be important.

! decima attributions --help
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
Usage: decima attributions [OPTIONS]

  Generate and save attribution analysis results for a gene or a set of
  sequences and perform seqlet calling on the attributions.

  Output files:

      ├── {output_prefix}.attributions.h5      # Raw attribution score matrix
      per gene.

      ├── {output_prefix}.attributions.bigwig  # Genome browser track of
      attribution as bigwig file.

      ├── {output_prefix}.seqlets.bed          # List of attribution peaks in
      BED format.

      ├── {output_prefix}.motifs.tsv           # Detected motifs in peak
      regions.

      └── {output_prefix}.warnings.qc.log      # QC warnings about prediction
      reliability.

  Examples:

      >>> decima attributions -o output_prefix -g SPI1

      >>> decima attributions -o output_prefix -g SPI1,CD68 --tasks "cell_type
      == 'classical monocyte'" --device 0

      >>> decima attributions -o output_prefix --seqs tests/data/seqs.fasta
      --tasks "cell_type == 'classical monocyte'" --device 0

Options:
  -o, --output-prefix TEXT        Prefix path to the output files  [required]
  -g, --genes TEXT                Comma-separated list of gene symbols or IDs
                                  to analyze.
  --seqs TEXT                     Path to a file containing sequences to
                                  analyze
  --tasks TEXT                    Query string to filter cell types to analyze
                                  attributions for (e.g. 'cell_type ==
                                  'classical monocyte'')
  --off-tasks TEXT                Optional query string to filter cell types
                                  to contrast against.
  --model TEXT                    Model to use for attribution analysis either
                                  replicate number or path to the model.
                                  [default: ensemble]
  --metadata TEXT                 Path to the metadata anndata file or name of
                                  the model. If not provided, the compabilite
                                  metadata for the model will be used.
  --method TEXT                   Method to use for attribution analysis.
  --transform [specificity|aggregate]
                                  Transform to use for attribution analysis.
  --num-workers INTEGER           Number of workers for attribution analysis.
  --tss-distance INTEGER          TSS distance for attribution analysis.
  --batch-size INTEGER            Batch size for attribution analysis.
  --top-n-markers INTEGER         Top n markers to predict. If not provided,
                                  all markers will be predicted.
  --threshold FLOAT               Threshold for attribution analysis.
  --min-seqlet-len INTEGER        Minimum length for seqlet calling.
  --max-seqlet-len INTEGER        Maximum length for seqlet calling.
  --additional-flanks INTEGER     Additional flanks for seqlet calling.
  --pattern-type [both|pos|neg]   Type of pattern to call.
  --meme-motif-db TEXT            Path to the MEME motif database.  [default:
                                  hocomoco_v13]
  --device TEXT                   Device to use for attribution analysis.
  --genome TEXT                   Genome name or path to the genome fasta
                                  file.  [default: hg38]
  --help                          Show this message and exit.

This decima command analyzes gene attributions: --genes "SPI1,BRD3" specifies focusing on SPI1 and BRD3; --tasks "cell_type == 'classical monocyte'" filters the analysis to classical monocytes only; and --output_prefix output_classical_monoctypes/ designates the output directory for the results. You can also pass --off-tasks that are cell types used as a contrast group when analyzing cell type specificity - they represent the cell types you want to compare against when determining. If you do not pass, --tasks argument all avaliable cells will be used for attribution calculation.

! decima attributions --model v1_rep0 --genes "SPI1,BRD3" --tasks "cell_type == 'classical monocyte'" --output-prefix example/output_classical_monoctypes
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
decima - INFO - Using device: 0
decima - INFO - Loading model v1_rep0 and metadata to compute attributions...
wandb: Currently logged in as: mhcelik (mhcw) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: Downloading large artifact 'rep0:latest', 720.03MB. 1 files...
wandb:   1 of 1 files downloaded.  
Done. 00:00:01.6 (445.8MB/s)
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb:   1 of 1 files downloaded.  
Done. 00:00:07.1 (437.5MB/s)
/home/celikm5/Projects/decima/src/decima/interpret/attributer.py:66: UserWarning: `off_tasks` is not provided. Using all other tasks as off_tasks.
Computing attributions...:   0%|                          | 0/2 [00:00<?, ?it/s]
Computing attributions...:  50%|█████████         | 1/2 [00:02<00:02,  2.23s/it]
decima - WARNING - Gene BRD3 has low correlation with the model. Pearson: 0.3440624267844621. Be careful with the predictions of the model for this gene. Check `DecimaResult.load().gene_metadata['pearson']` to see the correlation of the gene with the model.

Computing attributions...: 100%|██████████████████| 2/2 [00:02<00:00,  1.33s/it]
Computing attributions...: 100%|██████████████████| 2/2 [00:02<00:00,  1.49s/it]
decima - INFO - Loading model and metadata to compute attributions...
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb:   1 of 1 files downloaded.  
Done. 00:00:02.0 (1528.2MB/s)
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb:   1 of 1 files downloaded.  
Done. 00:00:02.0 (1574.5MB/s)
Computing recursive seqlet calling...:   0%|              | 0/2 [00:00<?, ?it/s]
Computing recursive seqlet calling...: 100%|█████| 2/2 [00:00<00:00, 459.62it/s]
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/memelite/fimo.py:406: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/memelite/fimo.py:406: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.



! ls example/output_classical_monoctypes*
example/output_classical_monoctypes_0.attributions.bigwig
example/output_classical_monoctypes_0.attributions.h5
example/output_classical_monoctypes_0.warnings.qc.log
example/output_classical_monoctypes_1.attributions.bigwig
example/output_classical_monoctypes_1.attributions.h5
example/output_classical_monoctypes_1.warnings.qc.log
example/output_classical_monoctypes.attributions.bigwig
example/output_classical_monoctypes.attributions.h5
example/output_classical_monoctypes.motifs.tsv
example/output_classical_monoctypes.seqlets.bed
example/output_classical_monoctypes.warnings.qc.log

example/output_classical_monoctypes_plots:
BRD3.peaks.png	BRD3_seqlogos  SPI1.peaks.png  SPI1_seqlogos
import h5py

with h5py.File("example/output_classical_monoctypes.attributions.h5", "r") as f:
    print(f["genes"][:])
    print(f["sequence"][:].shape)
    print(f["attribution"][:].shape)
[b'SPI1' b'BRD3']
(2, 524288)
(2, 4, 524288)
! head  example/output_classical_monoctypes.seqlets.bed | column -t -s $'\t' 
chr11  47152015  47152022  neg.SPI1@-29953  3.31894  .  -0.41688821464776993
chr11  47160163  47160167  neg.SPI1@-21805  3.33346  .  -0.2519867978990078
chr11  47160309  47160319  neg.SPI1@-21659  3.36046  .  -0.7482765801250935
chr11  47160323  47160333  neg.SPI1@-21645  3.35597  .  -0.9791450921911746
chr11  47165053  47165060  neg.SPI1@-16915  3.35769  .  -0.46799773909151554
chr11  47165593  47165606  pos.SPI1@-16375  3.41662  .  1.8055671770125628
chr11  47165642  47165653  pos.SPI1@-16326  3.37915  .  1.059839816763997
chr11  47165653  47165664  neg.SPI1@-16315  3.64033  .  -1.4299414344131947
chr11  47165664  47165670  neg.SPI1@-16304  3.36339  .  -0.4015889251604676
chr11  47165690  47165703  pos.SPI1@-16278  3.3902   .  1.1793060060590506
! tail example/output_classical_monoctypes.motifs.tsv | column -t -s $'\t' 
ZNF507.H13CORE.0.I.B      pos.BRD3@1374     165221  165230  +  9.005010962486267   0.0004997253417968743  CTCCTTCCC                0.0001575700912831558   -0.0002066142760199578   1381
PPARA.H13CORE.1.P.B       pos.BRD3@-61106   102728  102737  +  8.097402691841125   0.0004997253417968743  AAGAGGTGA                0.0009877644590435214   0.0027158458094883164    -61112
ZNF507.H13CORE.0.I.B      neg.BRD3@1388     165221  165230  +  9.005010962486267   0.0004997253417968743  CTCCTTCCC                0.0001575700912831558   -0.0002066142760199578   1381
ARNT.H13CORE.0.P.B        pos.BRD3@26580    190423  190432  +  8.84844446182251    0.0004997253417968755  GGACGTGTT                0.0001840576308798821   -0.00032131952384467405  26583
ZN394.H13CORE.0.P.C       pos.BRD3@573      164409  164428  +  6.16656231880188    0.0004997442047169893  GCCGCCGGAGCCGCGAGGC      0.0016583528068670268   0.003810155552068253     569
ZNF30.H13CORE.0.P.C       neg.BRD3@291      164117  164140  -  7.218540787696838   0.0004997881442250214  CGGGCGCCGAGCCCCGCCCCCGC  -0.0007085893436021212  -0.001097117428680497    277
NR1H4.H13CORE.1.P.B       pos.BRD3@194003   357832  357850  -  7.282587647438049   0.0004999181110179031  CCTTGGAGGCAGTGACTC       0.0006487710052169859   0.0014354905397428942    193992
CGGBP1.H13CORE.0.PSGIB.A  neg.BRD3@614      164462  164473  -  9.182251572608948   0.0004999637603759763  GGGGCGGCGGG              4.89058068276129e-05    0.000644476072918215     622
KLF7.H13CORE.0.P.B        neg.BRD3@-102394  61443   61453   -  15.217368483543396                         CCCCGCCCCC               -0.0013154596599633805  -0.0038895047854197937   -102397
KLF7.H13CORE.0.P.B        neg.BRD3@291      164128  164138  -  15.217368483543396                         CCCCGCCCCC               -0.0009656054913648404  -0.0033124240223047752   288

QC file (qc.warnings.log) is a quality control log file that contains warnings about prediction reliability for genes. Specifically, it warns when a gene has low correlation with the model’s predictions (Pearson correlation < 0.7).

! head output_classical_monoctypes.warnings.qc.log
head: cannot open 'output_classical_monoctypes.warnings.qc.log' for reading: No such file or directory

CLI Subcommands

The Decima CLI supports running the attribution analysis pipeline step by step using dedicated subcommands. This modular approach allows you to execute each stage of the workflow independently, such as:

  1. Generating model predictions for selected genes and cell types (attributions-predict).

  2. Calling significant seqlets from the attributions (attributions-recursive-seqlet-calling).

  3. Visualizing the results and motif logos (attributions-plot). By chaining these subcommands, you can customize, debug, or parallelize each step of the analysis as needed.

This cell demonstrates how to run the Decima CLI to generate attributions for selected genes and cell types. The following command runs the attributions-predict subcommand for model 0 and 1, focusing on the genes SPI1 and BRD3 in cells where the cell_type is ‘classical monocyte’. The results are saved with the specified output prefix.

! decima attributions-predict --model v1_rep0 --genes "SPI1,BRD3" --tasks "cell_type == 'classical monocyte'" --output-prefix example/output_classical_monoctypes_0
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
decima - INFO - Using device: 0
decima - INFO - Loading model v1_rep0 and metadata to compute attributions...
wandb: Currently logged in as: mhcelik (mhcw) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: Downloading large artifact 'rep0:latest', 720.03MB. 1 files...
wandb:   1 of 1 files downloaded.  
Done. 00:00:01.4 (524.8MB/s)
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb:   1 of 1 files downloaded.  
Done. 00:00:02.0 (1580.3MB/s)
/home/celikm5/Projects/decima/src/decima/interpret/attributer.py:66: UserWarning: `off_tasks` is not provided. Using all other tasks as off_tasks.

Computing attributions...:   0%|                          | 0/2 [00:00<?, ?it/s]
Computing attributions...:  50%|█████████         | 1/2 [00:01<00:01,  1.33s/it]
decima - WARNING - Gene BRD3 has low correlation with the model. Pearson: 0.3440624267844621. Be careful with the predictions of the model for this gene. Check `DecimaResult.load().gene_metadata['pearson']` to see the correlation of the gene with the model.

Computing attributions...: 100%|██████████████████| 2/2 [00:02<00:00,  1.04it/s]
Computing attributions...: 100%|██████████████████| 2/2 [00:02<00:00,  1.04s/it]

! decima attributions-predict --model v1_rep1 --genes "SPI1,BRD3" --tasks "cell_type == 'classical monocyte'" --output-prefix example/output_classical_monoctypes_1
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
decima - INFO - Using device: 0
decima - INFO - Loading model v1_rep1 and metadata to compute attributions...
wandb: Currently logged in as: mhcelik (mhcw) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: Downloading large artifact 'rep1:latest', 720.03MB. 1 files...
wandb:   1 of 1 files downloaded.  
Done. 00:00:00.9 (803.4MB/s)
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb:   1 of 1 files downloaded.  
Done. 00:00:01.9 (1614.1MB/s)
/home/celikm5/Projects/decima/src/decima/interpret/attributer.py:66: UserWarning: `off_tasks` is not provided. Using all other tasks as off_tasks.

Computing attributions...:   0%|                          | 0/2 [00:00<?, ?it/s]
Computing attributions...:  50%|█████████         | 1/2 [00:01<00:01,  1.37s/it]
decima - WARNING - Gene BRD3 has low correlation with the model. Pearson: 0.3440624267844621. Be careful with the predictions of the model for this gene. Check `DecimaResult.load().gene_metadata['pearson']` to see the correlation of the gene with the model.

Computing attributions...: 100%|██████████████████| 2/2 [00:02<00:00,  1.01it/s]
Computing attributions...: 100%|██████████████████| 2/2 [00:02<00:00,  1.07s/it]

This cell runs the recursive seqlet calling step of the Decima attribution pipeline. It takes the attributions from two models (model 0 and model 1) for the genes SPI1 and BRD3in classical monocytes, and calls significant seqlets (regions with high attribution).

! decima attributions-recursive-seqlet-calling --attributions "example/output_classical_monoctypes_0.attributions.h5,example/output_classical_monoctypes_1.attributions.h5" --output-prefix example/output_classical_monoctypes
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
decima - INFO - Loading model and metadata to compute attributions...
wandb: Currently logged in as: mhcelik (mhcw) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb:   1 of 1 files downloaded.  
Done. 00:00:02.0 (1584.0MB/s)
decima - INFO - No genes provided, using all 2 genes in the attribution files.
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb:   1 of 1 files downloaded.  
Done. 00:00:01.8 (1691.6MB/s)
Computing recursive seqlet calling...:   0%|              | 0/2 [00:00<?, ?it/s]
Computing recursive seqlet calling...: 100%|█████| 2/2 [00:00<00:00, 855.46it/s]
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/memelite/fimo.py:406: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/memelite/fimo.py:406: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.



The following cell runs the Decima CLI to generate plots for the attributions and discovered seqlets. It uses the output prefix from previous steps and generates visualizations for the specified genes (SPI1, BRD3), highlighting motif locations within 500bp of the transcription start site (TSS).

! decima attributions-plot --output-prefix example/output_classical_monoctypes -g "SPI1,BRD3" --tss-distance 500
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
Plotting attributions...:   0%|                           | 0/2 [00:00<?, ?it/s]
wandb: Currently logged in as: mhcelik (mhcw) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb:   1 of 1 files downloaded.  
Done. 00:00:01.9 (1666.6MB/s)
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/plotnine/ggplot.py:630: PlotnineWarning: Saving 10 x 2 in image.
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/plotnine/ggplot.py:631: PlotnineWarning: Filename: example/output_classical_monoctypes_plots/SPI1.peaks.png
Plotting attributions...:  50%|█████████▌         | 1/2 [00:12<00:12, 12.95s/it]
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb:   1 of 1 files downloaded.  
Done. 00:00:01.9 (1648.9MB/s)
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/plotnine/ggplot.py:630: PlotnineWarning: Saving 10 x 2 in image.
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/plotnine/ggplot.py:631: PlotnineWarning: Filename: example/output_classical_monoctypes_plots/BRD3.peaks.png
Plotting attributions...: 100%|███████████████████| 2/2 [00:23<00:00, 11.76s/it]
Plotting attributions...: 100%|███████████████████| 2/2 [00:23<00:00, 11.94s/it]

from IPython.display import Image

Image("example/output_classical_monoctypes_plots/SPI1_seqlogos/SPI1@267.png")
../_images/7d6dfb4a4facc44844f463a3422a7d722e8fc8385f23b18283fa9237833bdf97.png

Querying Cells

To obtain attributions, cells of interest must be selected using the query API. We support Pandas’ query API functionality on the cell metadata DataFrame. Here are examples of how to write queries:

! decima query-cell --help
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
Usage: decima query-cell [OPTIONS] [QUERY]

  Query a cell using query string

  Examples:

      >>> decima query-cell 'cell_type == "classical monocyte"'     ...

      >>> decima query-cell 'cell_type == "classical monocyte" and disease ==
      "healthy" and tissue == "blood"'     ...

      >>> decima query-cell 'cell_type.str.contains("monocyte") and disease ==
      "healthy"'     ...

Options:
  --metadata TEXT  Path to the metadata anndata file or name of the model.
                   Default: ensemble.
  --help           Show this message and exit.

Query cells of type “classical monocyte” using Pandas query syntax: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.query.html

! decima query-cell 'cell_type == "classical monocyte"' | column -t -s $'\t'
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
wandb: Currently logged in as: mhcelik (mhcw) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb:   1 of 1 files downloaded.  
Done. 00:00:01.8 (1700.8MB/s)
          cell_type           tissue                            organ           disease                                         study                                 dataset      region  subregion  celltype_coarse  n_cells  total_counts        n_genes  size_factor         train_pearson       val_pearson         test_pearson
agg_4705  classical monocyte  alveolar system                   lung            COVID-19                                        GSE155249                             scimilarity  nan     nan                         7244     26544273.0          15325    34749.092791034054  0.946616874183219   0.8437000068912937  0.8506571540216992
agg_4706  classical monocyte  alveolar system                   lung            healthy                                         GSE155249                             scimilarity  nan     nan                         72       218105.0            9142     30484.31888978114   0.9102228263646758  0.8083487523192785  0.8047828694155461
agg_4707  classical monocyte  ampulla of uterine tube           fallopian tube  healthy                                         fc77d2ae-247d-44d7-aa24-3f4859254c2c  scimilarity  nan     nan                         78       550950.0            9639     30719.377971431015  0.9077670011915634  0.8045070167513724  0.7896845423359651
agg_4708  classical monocyte  aorta                             vasculature     Abdominal Aortic Aneurysm                       GSE166676                             scimilarity  nan     nan                         432      1091075.0           11192    32981.443348717905  0.9389265854768138  0.8357299205241656  0.830575965756882
agg_4709  classical monocyte  aorta                             vasculature     healthy                                         GSE166676                             scimilarity  nan     nan                         25       162858.0            8859     31216.275954364824  0.8819013257206973  0.7821403055329706  0.7646999711802146
agg_4710  classical monocyte  apex of heart                     heart           healthy                                         b52eb423-5d0d-4645-b217-e1c6d38b2e72  scimilarity  nan     nan                         397      1226515.0           12369    32022.563851814968  0.9469178617442242  0.8326145310572417  0.8365506153530168
agg_4711  classical monocyte  blood                             blood           COVID-19                                        03f821b4-87be-4ff4-b65a-b5fc00061da7  scimilarity  nan     nan                         17462    78882609.0          15711    33080.17541357136   0.9536210517623883  0.8539379752673611  0.8485800714004562
agg_4712  classical monocyte  blood                             blood           COVID-19                                        7d7cabfd-1d1f-40af-96b7-26a0825a306d  scimilarity  nan     nan                         141914   659177004.0         15175    32282.29230923367   0.9575085257562032  0.8570056960758388  0.8507770895392532
agg_4713  classical monocyte  blood                             blood           COVID-19                                        GSE154567                             scimilarity  nan     nan                         8613     40239000.0          16023    34450.628692057784  0.9605287026438375  0.8551842525202262  0.8491670381176235
agg_4714  classical monocyte  blood                             blood           COVID-19                                        GSE158034                             scimilarity  nan     nan                         35       91390.0             7372     27446.425385592618  0.8476138307738649  0.7606096001026369  0.7256993661246048
agg_4715  classical monocyte  blood                             blood           COVID-19                                        GSE161918                             scimilarity  nan     nan                         163244   1023475761.0        15929    31151.84947891148   0.9361102289470363  0.8261601916328626  0.8168421752771801
agg_4716  classical monocyte  blood                             blood           COVID-19                                        GSE163668                             scimilarity  nan     nan                         8399     55036800.0          15792    33644.10626885235   0.9571638472088531  0.8529462339847145  0.8506441761860545
agg_4717  classical monocyte  blood                             blood           COVID-19                                        GSE166992                             scimilarity  nan     nan                         2238     12283507.0          14186    32596.636302802952  0.9567710210039843  0.8531485173368566  0.8416173697367906
agg_4718  classical monocyte  blood                             blood           COVID-19                                        ddfad306-714d-4cc0-9985-d9072820c530  scimilarity  nan     nan                         61002    230056884.0         16484    32520.14418628346   0.9487335053479237  0.8533686239486711  0.8454541123444707
agg_4719  classical monocyte  blood                             blood           COVID-19                                        eb735cc9-d0a7-48fa-b255-db726bf365af  scimilarity  nan     nan                         19777    105875381.0         15812    32330.088619084574  0.9558882745902155  0.8545238316898663  0.8468877639468763
agg_4720  classical monocyte  blood                             blood           HIV enteropathy                                 GSE157829                             scimilarity  nan     nan                         491      1449812.0           12290    33110.90004135926   0.9412108394642186  0.8352699509238034  0.8345507070277177
agg_4721  classical monocyte  blood                             blood           Myelofibrosis                                   GSE117824                             scimilarity  nan     nan                         357      1492491.0           11548    32726.985198452294  0.9446223529088382  0.8417521390049872  0.8328218073658378
agg_4722  classical monocyte  blood                             blood           NA                                              GSE132950                             scimilarity  nan     nan                         146      784054.0            10913    30417.15641845661   0.9276395863920666  0.8264978172767997  0.8176327551177259
agg_4723  classical monocyte  blood                             blood           NA                                              GSE135325                             scimilarity  nan     nan                         232      633533.0            11129    31159.105128910356  0.9369963391148282  0.8254811186623798  0.8207578599532835
agg_4724  classical monocyte  blood                             blood           NA                                              GSE150233                             scimilarity  nan     nan                         1141     2453545.0           12228    32204.245569759012  0.9354773292749718  0.8333534679658088  0.8202743631285762
agg_4725  classical monocyte  blood                             blood           NA                                              GSE151310                             scimilarity  nan     nan                         48       151358.0            8028     27001.118740317568  0.8873812787091045  0.7886356061906991  0.766461694552445
agg_4726  classical monocyte  blood                             blood           NA                                              GSE164378                             scimilarity  nan     nan                         54305    476237982.0         17463    34023.11682209347   0.9636663701487779  0.856267291847072   0.8496477594095655
agg_4727  classical monocyte  blood                             blood           NA                                              GSE164402                             scimilarity  nan     nan                         6577     33889420.0          14992    33855.14311643263   0.9502216042319906  0.846017695872854   0.8447747394204608
agg_4728  classical monocyte  blood                             blood           Sezary's disease                                GSE122703                             scimilarity  nan     nan                         35       148650.0            8487     29592.979037498706  0.8928094999389883  0.7911806688728295  0.7911936593448785
agg_4729  classical monocyte  blood                             blood           dengue disease                                  GSE145307                             scimilarity  nan     nan                         785      7639702.0           13722    33610.52078618725   0.9561427618691068  0.8544883780028308  0.8514781068765508
agg_4730  classical monocyte  blood                             blood           dengue disease                                  GSE154386                             scimilarity  nan     nan                         19173    143929741.0         16877    34242.50262506596   0.9586193824399223  0.8509705295166231  0.8546685528097621
agg_4731  classical monocyte  blood                             blood           drug hypersensitivity syndrome                  GSE132802                             scimilarity  nan     nan                         1269     7314697.0           13270    32574.34811388645   0.9570929839341253  0.8466339050741839  0.8442788242172967
agg_4732  classical monocyte  blood                             blood           fibrosis                                        GSE136103                             scimilarity  nan     nan                         1774     5003888.0           13389    31155.271000486402  0.9562933985421416  0.8435982250231042  0.8386367834560556
agg_4733  classical monocyte  blood                             blood           healthy                                         03f821b4-87be-4ff4-b65a-b5fc00061da7  scimilarity  nan     nan                         32464    109280914.0         16158    33646.02843110038   0.9568728712803031  0.8545324533535094  0.8487445540580735
agg_4734  classical monocyte  blood                             blood           healthy                                         436154da-bcf1-4130-9c8b-120ff9a888f2  scimilarity  nan     nan                         76800    206490628.0         16683    30736.453324546856  0.955313650235467   0.8494127267867799  0.84210567312908
agg_4735  classical monocyte  blood                             blood           healthy                                         5d445965-6f1a-4b68-ba3a-b8f765155d3a  scimilarity  nan     nan                         1044     3638976.0           12306    30542.8977364245    0.9465384578921433  0.851991795617166   0.8308191715256182
agg_4736  classical monocyte  blood                             blood           healthy                                         DS000010023                           scimilarity  nan     nan                         243      362606.0            8414     25201.261024165953  0.8865446516098515  0.7621391376670988  0.7681818769796727
agg_4737  classical monocyte  blood                             blood           healthy                                         GSE122703                             scimilarity  nan     nan                         18       83417.0             7546     28194.725173315186  0.859225612396475   0.7640890699253299  0.7443681539461423
agg_4738  classical monocyte  blood                             blood           healthy                                         GSE130117                             scimilarity  nan     nan                         2017     7130588.0           13535    33078.53692191542   0.9553673450160365  0.851402109626239   0.8385936100758409
agg_4739  classical monocyte  blood                             blood           healthy                                         GSE132802                             scimilarity  nan     nan                         1601     9955248.0           13132    32063.630951743195  0.9478882791739611  0.8391025143866828  0.8303465877530952
agg_4740  classical monocyte  blood                             blood           healthy                                         GSE139324                             scimilarity  nan     nan                         2333     8331045.0           13985    31135.881287246768  0.9608208780142045  0.8473885992448625  0.8432790193723467
agg_4741  classical monocyte  blood                             blood           healthy                                         GSE145809                             scimilarity  nan     nan                         69       245221.0            8962     29135.67197629852   0.8825701041728526  0.7811799267734735  0.7818647625179129
agg_4742  classical monocyte  blood                             blood           healthy                                         GSE149313                             scimilarity  nan     nan                         2420     6974751.0           13143    29560.496854576566  0.9574598613513423  0.8505290963248237  0.8379199735887167
agg_4743  classical monocyte  blood                             blood           healthy                                         GSE153421                             scimilarity  nan     nan                         3691     15561725.0          14569    34377.465875728165  0.9636686704566925  0.8576434473562725  0.8511814190737197
agg_4744  classical monocyte  blood                             blood           healthy                                         GSE156989                             scimilarity  nan     nan                         13554    160011485.0         16915    34135.439844737564  0.9640667421350761  0.8577967800377495  0.8517975138366085
agg_4745  classical monocyte  blood                             blood           healthy                                         GSE157829                             scimilarity  nan     nan                         1619     6957811.0           13507    30199.39288988673   0.9484019976492215  0.8436979316400604  0.8347196616710685
agg_4746  classical monocyte  blood                             blood           healthy                                         GSE159113                             scimilarity  nan     nan                         1025     6298250.0           12083    27477.50809897617   0.9078020151513733  0.8121457150205226  0.7980372877810575
agg_4747  classical monocyte  blood                             blood           healthy                                         GSE161329                             scimilarity  nan     nan                         5654     25653579.0          14349    28848.0539929647    0.9549801428956252  0.8450430950674043  0.8406188789518544
agg_4748  classical monocyte  blood                             blood           healthy                                         GSE161738                             scimilarity  nan     nan                         2676     13801473.0          12825    33337.477050230416  0.9541962906717452  0.8512846409758499  0.8485408028961247
agg_4749  classical monocyte  blood                             blood           healthy                                         GSE163668                             scimilarity  nan     nan                         2644     10486314.0          14049    33786.96584264489   0.9597801578342394  0.8560775485935677  0.8512149509551471
agg_4750  classical monocyte  blood                             blood           healthy                                         GSE166992                             scimilarity  nan     nan                         7501     28033216.0          15079    33455.367364577316  0.9622273594219685  0.8558958139235102  0.8495571689751152
agg_4751  classical monocyte  blood                             blood           healthy                                         GSE167363                             scimilarity  nan     nan                         3135     14722635.0          14375    29977.24002819913   0.942417448875388   0.8368071803109702  0.8258536430202982
agg_4752  classical monocyte  blood                             blood           healthy                                         GSE168710                             scimilarity  nan     nan                         16484    104881872.0         16223    34107.336261357574  0.9398282119039322  0.8424821834537695  0.8372971004604842
agg_4753  classical monocyte  blood                             blood           healthy                                         GSE168732                             scimilarity  nan     nan                         770      2548822.0           12508    33411.30103713399   0.9552513581030765  0.8508279875038706  0.847461536110767
agg_4754  classical monocyte  blood                             blood           healthy                                         b0cf0afa-ec40-4d65-b570-ed4ceacc6813  scimilarity  nan     nan                         40975    300555227.0         15784    35938.85772500803   0.9622425892039956  0.853424173800979   0.8508714303589978
agg_4755  classical monocyte  blood                             blood           healthy                                         ddfad306-714d-4cc0-9985-d9072820c530  scimilarity  nan     nan                         8827     36073928.0          15131    33208.591584008376  0.9546118779961532  0.8543086616569785  0.8462739374830107
agg_4756  classical monocyte  blood                             blood           intracranial hypotension                        GSE138266                             scimilarity  nan     nan                         2503     9675804.0           14485    30160.767605621222  0.9452052724479383  0.8423537848756032  0.8326629487875993
agg_4757  classical monocyte  blood                             blood           mucocutaneous lymph node syndrome               GSE168732                             scimilarity  nan     nan                         5745     25930751.0          14822    33366.18751424575   0.9564515409367231  0.8556431530577528  0.8540185868162636
agg_4758  classical monocyte  blood                             blood           multiple sclerosis                              GSE138266                             scimilarity  nan     nan                         3988     13926825.0          14991    31442.03464388843   0.9522779953120408  0.847799219646348   0.8382058078578654
agg_4759  classical monocyte  blood                             blood           non-alcoholic fatty liver disease               GSE136103                             scimilarity  nan     nan                         8306     29424841.0          15410    32004.200489375227  0.9619190264492873  0.8478709124980346  0.8451242436776344
agg_4760  classical monocyte  blood                             blood           rheumatoid arthritis                            GSE159117                             scimilarity  nan     nan                         834      4637566.0           12079    31364.847230552205  0.9356058277176598  0.8232134520999813  0.8176333921128414
agg_4761  classical monocyte  blood                             blood           septic shock                                    GSE167363                             scimilarity  nan     nan                         3860     51041813.0          15830    31688.79561595612   0.948652055959824   0.8541736693569211  0.8427375237296424
agg_4762  classical monocyte  blood                             blood           systemic lupus erythematosus                    436154da-bcf1-4130-9c8b-120ff9a888f2  scimilarity  nan     nan                         200468   516575809.0         16896    30011.373010792136  0.9562030923644677  0.8480520393236465  0.844052374052952
agg_4763  classical monocyte  blood                             blood           systemic lupus erythematosus                    GSE142016                             scimilarity  nan     nan                         8268     22146620.0          14873    30889.72528098081   0.9588937962174496  0.8480448150326806  0.8395981302528143
agg_4764  classical monocyte  blood                             blood           systemic lupus erythematosus                    GSE153765                             scimilarity  nan     nan                         42       109982.0            7500     27335.812367044335  0.8566470004710053  0.7665719714665945  0.7374607536624445
agg_4765  classical monocyte  blood                             blood           systemic lupus erythematosus                    GSE156989                             scimilarity  nan     nan                         30367    310637290.0         17082    33485.308563356346  0.9623402060903008  0.8532487075078466  0.8473526649094757
agg_4766  classical monocyte  blood                             blood           thrombocytopenia                                GSE149313                             scimilarity  nan     nan                         2724     15059814.0          14328    30599.80301260898   0.9543473550386421  0.8520722945129096  0.8417995728182829
agg_4767  classical monocyte  bone                              bone            Langerhans Cell Histiocytosis                   GSE133704                             scimilarity  nan     nan                         439      1404680.0           11388    30817.807833507268  0.9358504157466769  0.830348562008033   0.826566269904344
agg_4769  classical monocyte  bone marrow                       bone marrow     NA                                              GSE162692                             scimilarity  nan     nan                         1234     4757721.0           13466    31707.380952189662  0.953852620063789   0.8503857428588029  0.8377674131460707
agg_4770  classical monocyte  bone marrow                       bone marrow     essential thrombocythemia                       GSE117824                             scimilarity  nan     nan                         1649     7825780.0           13487    32454.468003620656  0.9503614540027875  0.8479601234457582  0.838408163377457
agg_4772  classical monocyte  bone marrow                       bone marrow     healthy                                         GSE132509                             scimilarity  nan     nan                         610      2315570.0           12950    31768.06513427212   0.95159369508558    0.8517118261701931  0.836658919433696
agg_4773  classical monocyte  bone marrow                       bone marrow     healthy                                         GSE154109                             scimilarity  nan     nan                         531      1431388.0           11793    31377.450948003392  0.9490546933955852  0.8431566630120637  0.8370883160295727
agg_4774  classical monocyte  bone marrow                       bone marrow     healthy                                         GSE163278                             scimilarity  nan     nan                         1119     3970394.0           13361    32081.93302956569   0.9620394897163868  0.8531148861215617  0.8426785397396367
agg_4775  classical monocyte  bone marrow                       bone marrow     healthy                                         e5f58829-1a66-40b5-a624-9046778e74f5  scimilarity  nan     nan                         151      8025584.0           12883    29444.78352863075   0.8440608178227607  0.740574910750325   0.7417328577454956
agg_4776  classical monocyte  bone marrow                       bone marrow     monoclonal gammopathy                           GSE163278                             scimilarity  nan     nan                         1010     3124102.0           12959    30344.72094719757   0.9581906874137958  0.8503948562041261  0.8391672246132192
agg_4777  classical monocyte  breast                            breast          healthy                                         GSE164898                             scimilarity  nan     nan                         136      641471.0            12971    34463.52724138501   0.9163324788498406  0.8116274576633968  0.7978555908123931
agg_4778  classical monocyte  breast                            breast          healthy                                         c9706a92-0e5f-46c1-96d8-20e42467f287  scimilarity  nan     nan                         98       1444245.0           13491    30678.263421880285  0.9165520953567395  0.8162053142576849  0.7994301225229256
agg_4779  classical monocyte  bronchus                          airway          COVID-19                                        03f821b4-87be-4ff4-b65a-b5fc00061da7  scimilarity  nan     nan                         104      270108.0            8816     27933.501359354133  0.8884582873892427  0.7928474093439662  0.786626157931558
agg_4780  classical monocyte  bronchus                          airway          COVID-19                                        GSE168215                             scimilarity  nan     nan                         90       217928.0            8444     27417.845704249117  0.880738805029126   0.7855823736726739  0.7798557380738542
agg_4782  classical monocyte  bronchus                          airway          healthy                                         GSE158127                             scimilarity  nan     nan                         158      1158198.0           12643    34764.50196701077   0.9364512338084163  0.8259291909369686  0.8266638555276521
agg_4783  classical monocyte  cardiac muscle of left ventricle  heart           healthy                                         GSE156703                             scimilarity  nan     nan                         13       116181.0            9463     35695.66320276271   0.8542740960069863  0.7515621053395214  0.7561639038477878
agg_4784  classical monocyte  carotid artery segment            vasculature     atherosclerosis                                 GSE155512                             scimilarity  nan     nan                         58       515211.0            10839    32837.84505237503   0.9343565353773022  0.8190650931322585  0.8202426969221358
agg_4785  classical monocyte  caudate lobe of liver             liver           healthy                                         44531dd9-1388-4416-a117-af0a99de2294  scimilarity  nan     nan                         238      730016.0            11505    31342.386314731422  0.9217674983890346  0.8140551552218395  0.8040417787954989
agg_4786  classical monocyte  cortex of kidney                  kidney          healthy                                         120e86b4-1195-48c5-845b-b98054105eec  scimilarity  nan     nan                         79       323010.0            10939    32378.76683324232   0.9028856137251035  0.7978822439778066  0.7839454035009307
agg_4787  classical monocyte  cortex of kidney                  kidney          healthy                                         a98b828a-622a-483a-80e0-15703678befd  scimilarity  nan     nan                         91       477355.0            10898    32358.068865763344  0.9328436291917394  0.8237810319569842  0.8195391931798526
agg_4789  classical monocyte  digestive tract                   gut             healthy                                         DS000011665                           scimilarity  nan     nan                         347      1679116.0           12155    33347.55517047197   0.9422556928648441  0.8417267634096297  0.84018452536733
agg_4790  classical monocyte  exocrine pancreas                 pancreas        healthy                                         e5f58829-1a66-40b5-a624-9046778e74f5  scimilarity  nan     nan                         821      7824069.0           14847    36135.64587109593   0.9493709998172055  0.8440837716099078  0.8410246939313819
agg_4791  classical monocyte  fallopian tube                    fallopian tube  healthy                                         fc77d2ae-247d-44d7-aa24-3f4859254c2c  scimilarity  nan     nan                         131      734093.0            11240    33115.640434504094  0.9359103734457376  0.8339026306142181  0.8225901799509813
agg_4792  classical monocyte  fimbria of uterine tube           fallopian tube  healthy                                         fc77d2ae-247d-44d7-aa24-3f4859254c2c  scimilarity  nan     nan                         34       209362.0            7663     27635.733860508328  0.8560416684135254  0.7382997749370328  0.7459235366949488
agg_4794  classical monocyte  gingiva                           mouth           periodontitis                                   GSE152042                             scimilarity  nan     nan                         198      879477.0            11312    32107.813302914532  0.9416262876541264  0.8333723697695014  0.8279530215775117
agg_4795  classical monocyte  head of femur                     bone            healthy                                         GSE169396                             scimilarity  nan     nan                         450      3669304.0           13216    33082.323604222154  0.9529417082022753  0.8522346343107771  0.8359032081996703
agg_4797  classical monocyte  heart left ventricle              heart           NA                                              ENCODE                                scimilarity  nan     nan                         50       138407.30523254164  11428    41105.63651890687   0.8614790128015358  0.7843765107548858  0.7889308929671582
agg_4798  classical monocyte  heart left ventricle              heart           healthy                                         b52eb423-5d0d-4645-b217-e1c6d38b2e72  scimilarity  nan     nan                         192      585001.0            11159    31422.874036870588  0.9363985001217598  0.8226438123601741  0.8283173244446851
agg_4799  classical monocyte  heart right ventricle             heart           healthy                                         b52eb423-5d0d-4645-b217-e1c6d38b2e72  scimilarity  nan     nan                         316      936263.0            11904    32002.67227624691   0.9425900348990802  0.8306128730459813  0.8308116461021977
agg_4800  classical monocyte  ileum                             gut             Crohn's disease                                 17481d16-ee44-49e5-bcf0-28c0780d8c4a  scimilarity  nan     nan                         76       311515.0            9984     29687.611679190355  0.9103310804624617  0.8063080284939284  0.7916226068478351
agg_4801  classical monocyte  ileum                             gut             Crohn's disease                                 DS000011665                           scimilarity  nan     nan                         119      298206.0            8021     26013.286557459236  0.880272438867354   0.7572099232100128  0.7459713937143965
agg_4802  classical monocyte  inferior nasal concha             bone            chronic rhinosinusitis with nasal polyps        GSE156285                             scimilarity  nan     nan                         241      1048981.0           12463    35082.083353928334  0.9475981193848912  0.8330375982148592  0.833114773003936
agg_4803  classical monocyte  interventricular septum           heart           healthy                                         b52eb423-5d0d-4645-b217-e1c6d38b2e72  scimilarity  nan     nan                         442      1322725.0           12418    32235.5434681197    0.94751399102473    0.8340623939411483  0.8353858365852226
agg_4804  classical monocyte  isthmus of fallopian tube         fallopian tube  healthy                                         fc77d2ae-247d-44d7-aa24-3f4859254c2c  scimilarity  nan     nan                         62       318330.0            8642     29768.198512126502  0.8846027131590791  0.784220371073871   0.7739277341448829
agg_4805  classical monocyte  kidney                            kidney          NA                                              GSE145927                             scimilarity  nan     nan                         1789     6957949.0           15145    34853.400215205314  0.9586386429540185  0.8579622487738222  0.8524123833700197
agg_4806  classical monocyte  kidney                            kidney          acute kidney failure                            bcb61471-2a44-4d00-a0af-ff085512674c  scimilarity  nan     nan                         587      1589224.0           12335    32471.78147434854   0.9513927341618255  0.84278469020191    0.8402011848101985
agg_4807  classical monocyte  kidney                            kidney          chronic kidney disease                          bcb61471-2a44-4d00-a0af-ff085512674c  scimilarity  nan     nan                         134      440788.0            10831    32410.84600407974   0.9323603662663799  0.8153954443953603  0.8190416636069936
agg_4808  classical monocyte  kidney                            kidney          healthy                                         120e86b4-1195-48c5-845b-b98054105eec  scimilarity  nan     nan                         762      4034828.0           14946    34015.00816823295   0.9520640694425091  0.848205266473767   0.836085027208869
agg_4809  classical monocyte  kidney                            kidney          healthy                                         DS000010415                           scimilarity  nan     nan                         55       127079.0            8055     27206.419037355434  0.8216238135756493  0.7570847030479543  0.72524174726152
agg_4810  classical monocyte  kidney                            kidney          healthy                                         GSE140989                             scimilarity  nan     nan                         174      563438.0            11016    29887.593299155575  0.914390252807459   0.8069762795735104  0.8086759072557722
agg_4811  classical monocyte  left cardiac atrium               heart           NA                                              ENCODE                                scimilarity  nan     nan                         59       225070.96128814947  12831    43727.214042795575  0.8938217272345973  0.8048974803645641  0.8168195119621123
agg_4812  classical monocyte  left cardiac atrium               heart           healthy                                         b52eb423-5d0d-4645-b217-e1c6d38b2e72  scimilarity  nan     nan                         450      1446734.0           12669    32433.20393285361   0.9467532324494885  0.8378571513042986  0.8360845068688169
agg_4813  classical monocyte  left lung                         lung            NA                                              ENCODE                                scimilarity  nan     nan                         16       40636.89964582212   7786     32627.503591080356  0.7956982592153874  0.7257891637293196  0.6945729831399207
agg_4814  classical monocyte  liver                             liver           Alagille syndrome                               GSE163650                             scimilarity  nan     nan                         92       188745.0            7678     24676.271768005652  0.8778490663838523  0.7727787177978541  0.7475297818459176
agg_4815  classical monocyte  liver                             liver           Biliary atresia                                 GSE163650                             scimilarity  nan     nan                         367      1615410.0           11423    31280.031237098276  0.9308940943983487  0.8275363405345841  0.8098394949074608
agg_4816  classical monocyte  liver                             liver           fibrosis                                        GSE136103                             scimilarity  nan     nan                         1053     5824229.0           14559    31950.801484792315  0.9632028296773183  0.8486875622591808  0.8450295232730788
agg_4817  classical monocyte  liver                             liver           healthy                                         GSE136103                             scimilarity  nan     nan                         2036     7668787.0           14818    32446.71686472893   0.9614468191909097  0.8455504517344443  0.8464065028454753
agg_4818  classical monocyte  liver                             liver           healthy                                         GSE159977                             scimilarity  nan     nan                         584      4703990.0           13644    34306.87421529999   0.9580119653448788  0.8463006395755477  0.8456137609888349
agg_4819  classical monocyte  liver                             liver           healthy                                         GSE163650                             scimilarity  nan     nan                         440      4840312.0           12272    31180.407161439263  0.9312379663603724  0.8198899526213368  0.8071334502842269
agg_4820  classical monocyte  liver                             liver           non-alcoholic fatty liver disease               GSE136103                             scimilarity  nan     nan                         675      3625081.0           13858    31875.772311414476  0.9607644852684971  0.8451153856892568  0.8434410362324897
agg_4821  classical monocyte  liver                             liver           non-alcoholic steatohepatitis                   GSE159977                             scimilarity  nan     nan                         818      5328417.0           13712    34244.79288663241   0.9625204736588413  0.8495046961098656  0.843851264103893
agg_4822  classical monocyte  lower lobe of left lung           lung            NA                                              ENCODE                                scimilarity  nan     nan                         119      332235.9213328175   13992    45607.18635453728   0.9075609223976155  0.8257092626936027  0.8254709531577699
agg_4823  classical monocyte  lower lobe of lung                lung            healthy                                         GSE169471                             scimilarity  nan     nan                         305      1224338.0           11342    28255.985922767635  0.9404343350984603  0.8261785132237449  0.8150341534919611
agg_4824  classical monocyte  lung                              lung            COVID-19                                        GSE145926                             scimilarity  nan     nan                         6755     29326462.0          15670    32143.238185602037  0.9347325303698273  0.8382075353537076  0.8350131792189084
agg_4825  classical monocyte  lung                              lung            COVID-19                                        GSE149878                             scimilarity  nan     nan                         1388     17477477.0          15453    32118.37391645824   0.9547396360778304  0.8488321261303191  0.8334226404565429
agg_4826  classical monocyte  lung                              lung            COVID-19                                        covid                                 scimilarity  nan     nan                         87       182436.0            8979     30922.571470240666  0.8944321462094855  0.7906545756360502  0.7941388516891991
agg_4827  classical monocyte  lung                              lung            Idiopathic pulmonary arterial hypertension      GSE169471                             scimilarity  nan     nan                         338      1099281.0           11441    29048.65706098467   0.9394268881459132  0.8205211245968264  0.8008915371926529
agg_4828  classical monocyte  lung                              lung            NA                                              GSE122960                             scimilarity  nan     nan                         2035     4747594.0           13592    31170.75054081323   0.947029489389696   0.8239671657460403  0.8229589179837169
agg_4829  classical monocyte  lung                              lung            NA                                              GSE150708                             scimilarity  nan     nan                         1711     18922764.0          15768    34651.58457127426   0.9197817700449655  0.8258478966731367  0.8332402246613281
agg_4830  classical monocyte  lung                              lung            NA                                              GSE159354                             scimilarity  nan     nan                         804      1319717.0           12267    30466.67987009986   0.9289888078126158  0.8179900347657229  0.8046761734394422
agg_4831  classical monocyte  lung                              lung            chronic obstructive pulmonary disease           DS000011735                           scimilarity  nan     nan                         1757     5736362.0           16385    37750.32029026267   0.8922650938609307  0.8174310943962452  0.7970204631208431
agg_4832  classical monocyte  lung                              lung            healthy                                         5d445965-6f1a-4b68-ba3a-b8f765155d3a  scimilarity  nan     nan                         1254     5397217.0           13298    31013.01576531177   0.9490555294982109  0.8457060773457411  0.8329330004620054
agg_4833  classical monocyte  lung                              lung            healthy                                         DS000011735                           scimilarity  nan     nan                         4653     16523051.0          17066    37593.98867222708   0.8985708908575675  0.8260014412964971  0.8051199142820423
agg_4834  classical monocyte  lung                              lung            healthy                                         GSE128033                             scimilarity  nan     nan                         1047     3646581.0           13185    29331.2035341724    0.9513045959002788  0.837770557527153   0.8238987539695043
agg_4835  classical monocyte  lung                              lung            healthy                                         GSE128169                             scimilarity  nan     nan                         1732     11798577.0          15051    32941.755873862814  0.9636814596401634  0.8539834251349044  0.8457626672394015
agg_4836  classical monocyte  lung                              lung            healthy                                         GSE132771                             scimilarity  nan     nan                         1601     4614408.0           13275    29761.03818692572   0.9531291409651284  0.8457644131717956  0.8310980214000753
agg_4837  classical monocyte  lung                              lung            healthy                                         GSE169471                             scimilarity  nan     nan                         498      1613976.0           11886    28956.213107123967  0.9433400854993297  0.8271067022019086  0.815595680192867
agg_4838  classical monocyte  lung                              lung            hypersensitivity pneumonitis                    GSE122960                             scimilarity  nan     nan                         374      1513589.0           11850    30594.494180377842  0.9436379625667726  0.8236248274374875  0.8201281668004226
agg_4839  classical monocyte  lung                              lung            idiopathic pulmonary fibrosis                   DS000011735                           scimilarity  nan     nan                         3273     11098539.0          16692    36983.80245044498   0.9002060376489591  0.825628591643822   0.8057309447193657
agg_4840  classical monocyte  lung                              lung            idiopathic pulmonary fibrosis                   GSE122960                             scimilarity  nan     nan                         795      2302741.0           12763    31758.949309599942  0.9481965424789537  0.8315974088368188  0.8291678346888611
agg_4841  classical monocyte  lung                              lung            idiopathic pulmonary fibrosis                   GSE128033                             scimilarity  nan     nan                         264      892053.0            10997    28549.410927787198  0.9388857876621541  0.8259212982973368  0.8088642639807162
agg_4842  classical monocyte  lung                              lung            idiopathic pulmonary fibrosis                   GSE132771                             scimilarity  nan     nan                         562      1301612.0           12354    30446.385456748263  0.9469963680992495  0.8353213795556867  0.820474733820443
agg_4844  classical monocyte  lung                              lung            idiopathic pulmonary fibrosis                   GSE143706                             scimilarity  nan     nan                         28       77859.0             5999     21933.076720558005  0.8162248508692449  0.6872290766118224  0.6675933221673609
agg_4845  classical monocyte  lung                              lung            idiopathic pulmonary fibrosis                   GSE146981                             scimilarity  nan     nan                         28       77859.0             5999     21933.076720558005  0.8151661654118738  0.6848887765520302  0.667713152235204
agg_4846  classical monocyte  lung                              lung            idiopathic pulmonary fibrosis                   GSE159354                             scimilarity  nan     nan                         963      1825354.0           12518    29731.366588446697  0.9431677835768482  0.8335606700375219  0.8147956336542009
agg_4847  classical monocyte  lung                              lung            interstitial lung disease                       GSE122960                             scimilarity  nan     nan                         255      622277.0            10480    29149.028142322823  0.9283467350849584  0.8054016349869099  0.8007862785744999
agg_4848  classical monocyte  lung                              lung            interstitial lung disease                       GSE128169                             scimilarity  nan     nan                         697      1972432.0           12243    29254.839846468705  0.9423878335093786  0.8300974626358707  0.8196994274197087
agg_4849  classical monocyte  lung                              lung            scleroderma                                     GSE128169                             scimilarity  nan     nan                         108      906362.0            11850    32908.696557354284  0.9438117150692724  0.8371885386238901  0.8225520820106291
agg_4850  classical monocyte  lung                              lung            scleroderma                                     GSE132771                             scimilarity  nan     nan                         98       335776.0            9515     28212.056049440183  0.9149834322044056  0.8124364794406282  0.7889831369493125
agg_4851  classical monocyte  lung                              lung            systemic scleroderma;interstitial lung disease  GSE159354                             scimilarity  nan     nan                         680      1244200.0           11364    27669.832360293723  0.9311681193836778  0.8218066897336025  0.801632654842017
agg_4852  classical monocyte  lung parenchyma                   lung            COVID-19                                        GSE158127                             scimilarity  nan     nan                         1028     2949423.0           13468    33486.58561312063   0.9573331305866764  0.8476148647425674  0.8402420746316328
agg_4853  classical monocyte  lung parenchyma                   lung            healthy                                         GSE158127                             scimilarity  nan     nan                         791      2735456.0           13260    33646.87553319058   0.9544351950657847  0.8399981618864122  0.8327646420552953
agg_4854  classical monocyte  lymph node                        lymph node      Langerhans Cell Histiocytosis                   GSE133704                             scimilarity  nan     nan                         41       112531.0            7250     25404.282603262254  0.8424914182716917  0.7490029862443883  0.7315420690462492
agg_4855  classical monocyte  mesenteric artery                 vasculature     healthy                                         GSE156341                             scimilarity  nan     nan                         49       408553.0            10083    30979.99239432764   0.9337200372314851  0.8169964046619824  0.8163109060481625
agg_4856  classical monocyte  mesenteric artery                 vasculature     type II diabetes mellitus                       GSE156341                             scimilarity  nan     nan                         107      869426.0            11343    33124.102127533     0.9473048055491107  0.8341783608252021  0.8308049817701406
agg_4857  classical monocyte  mesenteric lymph node             lymph node      healthy                                         7681c7d7-0168-4892-a547-6f02a6430ace  scimilarity  nan     nan                         23       211416.0            9219     31018.02041830142   0.9058403141644794  0.7914556298280883  0.7867496166249129
agg_4858  classical monocyte  muscle tissue                     muscle          healthy                                         e5f58829-1a66-40b5-a624-9046778e74f5  scimilarity  nan     nan                         1800     50754027.0          16141    33323.26969602708   0.9316672745867468  0.8348295016474899  0.8219918807835366
agg_4859  classical monocyte  nasal cavity                      airway          COVID-19                                        03f821b4-87be-4ff4-b65a-b5fc00061da7  scimilarity  nan     nan                         2907     12130592.0          15268    34541.15937505348   0.9398365208964624  0.8399826209015534  0.8406896100498221
agg_4860  classical monocyte  nasal cavity                      airway          healthy                                         03f821b4-87be-4ff4-b65a-b5fc00061da7  scimilarity  nan     nan                         129      319231.0            10961    33948.695520725145  0.908031594986554   0.802885512146251   0.7998873890770253
agg_4861  classical monocyte  nasopharynx                       airway          nasopharyngeal neoplasm                         GSE150825                             scimilarity  nan     nan                         248      919710.0            11652    32725.826204558824  0.9483756316738621  0.8416477499125612  0.8434913191320811
agg_4862  classical monocyte  nose                              airway          chronic rhinosinusitis with nasal polyps        GSE156285                             scimilarity  nan     nan                         89       407982.0            10658    32874.53627471163   0.9356986809715955  0.8282644612568757  0.8193450383260489
agg_4863  classical monocyte  olfactory epithelium              airway          NA                                              GSE139522                             scimilarity  nan     nan                         152      645745.0            11496    32760.770927681508  0.9344670047519139  0.8326335387583638  0.8215291009295241
agg_4864  classical monocyte  omental fat pad                   peritoneum      obesity                                         GSE163830                             scimilarity  nan     nan                         248      603440.0            11376    33014.67391159742   0.9276969765366693  0.8135481953870003  0.8105166731324462
agg_4865  classical monocyte  omentum                           peritoneum      NA                                              GSE151889                             scimilarity  nan     nan                         106      233037.0            9833     30451.216970905818  0.9023265636700606  0.794410906944691   0.7787417127899898
agg_4868  classical monocyte  peritoneum                        peritoneum      NA                                              GSE130888                             scimilarity  nan     nan                         20547    75515682.0          16606    31611.601467579523  0.9640573553126023  0.8513803623213743  0.8467189186177884
agg_4869  classical monocyte  peritoneum                        peritoneum      healthy                                         GSE130888                             scimilarity  nan     nan                         297      509237.0            11456    32213.169493243313  0.9218188146119148  0.8055575625334659  0.8039145027939639
agg_4870  classical monocyte  prostate gland                    prostate        healthy                                         GSE145843                             scimilarity  nan     nan                         24       87555.0             5997     21432.99958526179   0.816337811891654   0.7246769991610698  0.6943045232082502
agg_4871  classical monocyte  prostate gland                    prostate        healthy                                         e5f58829-1a66-40b5-a624-9046778e74f5  scimilarity  nan     nan                         220      2445216.0           12460    33911.259081133416  0.9301914834403244  0.8259560877683964  0.824440772955729
agg_4872  classical monocyte  renal medulla                     kidney          healthy                                         120e86b4-1195-48c5-845b-b98054105eec  scimilarity  nan     nan                         21       101089.0            7354     26191.96315382653   0.8491245141516279  0.751292403843078   0.7310820608427115
agg_4873  classical monocyte  respiratory airway                airway          COVID-19                                        29f92179-ca10-4309-a32b-d383d80347c1  scimilarity  nan     nan                         24222    187246624.0         17810    38621.12130270373   0.911673853186318   0.8025805020768422  0.8054824859649656
agg_4874  classical monocyte  respiratory tract epithelium      airway          NA                                              GSE139522                             scimilarity  nan     nan                         69       371203.0            11152    33714.18718588416   0.9189481336748234  0.8044925205508522  0.8036843863420214
agg_4875  classical monocyte  right cardiac atrium              heart           healthy                                         b52eb423-5d0d-4645-b217-e1c6d38b2e72  scimilarity  nan     nan                         311      977934.0            11891    31570.89048372305   0.9357294628028995  0.8260631626226992  0.8224053236502149
agg_4876  classical monocyte  sigmoid colon                     gut             ulcerative colitis                              DS000010618                           scimilarity  nan     nan                         56       157830.0            7772     25795.237651990203  0.8795503731261843  0.7579451181338609  0.7495331251552053
agg_4877  classical monocyte  spleen                            spleen          HIV infection                                   GSE148796                             scimilarity  nan     nan                         48       118392.0            7120     25206.082214526155  0.8589535988735265  0.7723626781751426  0.7544543625246387
agg_4878  classical monocyte  spleen                            spleen          healthy                                         4d74781b-8186-4c9a-b659-ff4dc4601d91  scimilarity  nan     nan                         2166     7905128.0           13952    30832.98149016589   0.957626764427953   0.8498277489734691  0.8368540073560422
agg_4879  classical monocyte  spleen                            spleen          healthy                                         GSE148796                             scimilarity  nan     nan                         49       99684.0             6785     24492.260723886982  0.8477188947186437  0.7504685014947085  0.7295165069095441
agg_4880  classical monocyte  spleen                            spleen          healthy                                         e5f58829-1a66-40b5-a624-9046778e74f5  scimilarity  nan     nan                         3483     86078540.0          17775    36266.36759486      0.9100465322095738  0.8097019968114763  0.8081945289287479
agg_4881  classical monocyte  subcutaneous adipose tissue       adipose         healthy                                         e5f58829-1a66-40b5-a624-9046778e74f5  scimilarity  nan     nan                         2019     27914261.0          15739    35142.992419516115  0.9421666034710219  0.8357406603138717  0.8249091266671545
agg_4882  classical monocyte  synovial fluid                    synovial joint  juvenile idiopathic arthritis                   GSE160097                             scimilarity  nan     nan                         68       430194.0            9771     30748.449183442222  0.9095023395695137  0.8036624580038914  0.8039477211085259
agg_4883  classical monocyte  synovial fluid                    synovial joint  psoriatic arthritis                             GSE161500                             scimilarity  nan     nan                         675      4107697.0           12980    33508.9642503863    0.9516464026609945  0.8477381221507095  0.8460202124356817
agg_4884  classical monocyte  tertiary ovarian follicle         ovary           NA                                              GSE146512                             scimilarity  nan     nan                         100      296748.0            10411    33084.00694991347   0.9139378500397654  0.8064064097295065  0.8023402394768804
agg_4885  classical monocyte  testis                            testis          NA                                              GSE153819                             scimilarity  nan     nan                         17       97625.0             7958     29072.297180853668  0.855345538109863   0.764463114789657   0.7520891122089112
agg_4886  classical monocyte  thoracic lymph node               lymph node      healthy                                         62ef75e4-cbea-454e-a0ce-998ec40223d3  scimilarity  nan     nan                         20194    147502356.0         17156    34278.43204322054   0.9584542376154844  0.8568164088302802  0.8468420651789338
agg_4887  classical monocyte  thymus                            thymus          healthy                                         62ef75e4-cbea-454e-a0ce-998ec40223d3  scimilarity  nan     nan                         487      2692133.0           12969    33549.45213075861   0.9497232200872254  0.8549259958290932  0.843613914494647
agg_4888  classical monocyte  thymus                            thymus          healthy                                         83ed3be8-4cb9-43e6-9aaa-3fbbf5d1bd3a  scimilarity  nan     nan                         27       80042.0             6527     23800.833698344715  0.8448423801602987  0.7347769941739797  0.7181319535525903
agg_4889  classical monocyte  thymus                            thymus          healthy                                         de13e3e2-23b6-40ed-a413-e9e12d7d3910  scimilarity  nan     nan                         52       298983.0            9582     30002.684399867492  0.9175531872094376  0.8185046437128072  0.8176606503789746
agg_4890  classical monocyte  tonsil                            tonsil          healthy                                         GSE119506                             scimilarity  nan     nan                         321      1114339.0           11546    30516.473940957327  0.936749794981687   0.8307356996245999  0.8215841775914924
agg_4893  classical monocyte  trachea                           airway          healthy                                         03f821b4-87be-4ff4-b65a-b5fc00061da7  scimilarity  nan     nan                         126      457580.0            11135    33734.327765812595  0.9299687996532033  0.8279876048002456  0.8230768581762299
agg_4894  classical monocyte  trachea                           airway          healthy                                         e5f58829-1a66-40b5-a624-9046778e74f5  scimilarity  nan     nan                         130      8245622.0           13663    32584.555010387805  0.859406194907081   0.7522520630806427  0.7550758397192211
agg_4895  classical monocyte  transition zone of prostate       prostate        prostatic hypertrophy                           4b54248f-2165-477c-a027-dd55082e8818  scimilarity  nan     nan                         520      2949099.0           13618    29095.077520604846  0.9205242051339535  0.807648434373933   0.7862293702463532
agg_4896  classical monocyte  transverse colon                  gut             healthy                                         62ef75e4-cbea-454e-a0ce-998ec40223d3  scimilarity  nan     nan                         503      2932999.0           12868    33265.22762476361   0.9476555210791636  0.8481166846094836  0.8392740144102075
agg_4897  classical monocyte  tympanic membrane                 ear             NA                                              GSE128892                             scimilarity  nan     nan                         33       153723.0            7971     26217.848855873086  0.8515891731868822  0.7526405790618411  0.7328434865867175
agg_4899  classical monocyte  upper lobe of lung                lung            healthy                                         GSE169471                             scimilarity  nan     nan                         180      594059.0            10222    27881.011904781462  0.9248153838159024  0.8106326147324702  0.8003438801026445
agg_4900  classical monocyte  urine                             urinary         healthy                                         GSE165396                             scimilarity  nan     nan                         20       109197.0            7299     25505.26771828214   0.8530560359297166  0.7505269381795711  0.7350625083451162
agg_4901  classical monocyte  uterus                            uterus          healthy                                         32f2fd23-ec74-486f-9544-e5b2f41725f5  scimilarity  nan     nan                         18       189472.0            9397     30891.07905591756   0.8870309576225303  0.7796319609750195  0.7827462282311286
agg_4902  classical monocyte  vasculature                       vasculature     healthy                                         e5f58829-1a66-40b5-a624-9046778e74f5  scimilarity  nan     nan                         14537    224206261.0         17922    36651.90273209977   0.9438232475974557  0.8425086108952623  0.8343079266995497
agg_4903  classical monocyte  visceral fat                      adipose         obesity                                         GSE128518                             scimilarity  nan     nan                         74       196657.0            9100     28836.082756573305  0.8890453654508667  0.7810993233599736  0.7788838086708725

Query cells that:

  • have “monocyte” in their cell type name (cell_type.str.contains(“monocyte”))

  • are from healthy donors (disease == “healthy”)

! decima query-cell 'cell_type.str.contains("monocyte") and disease == "healthy"' | column -t -s $'\t'
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
wandb: Currently logged in as: mhcelik (mhcw) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb:   1 of 1 files downloaded.  
Done. 00:00:01.9 (1657.8MB/s)
          cell_type               tissue                            organ           disease  study                                 dataset      region  subregion  celltype_coarse  n_cells  total_counts  n_genes  size_factor         train_pearson       val_pearson         test_pearson
agg_4706  classical monocyte      alveolar system                   lung            healthy  GSE155249                             scimilarity  nan     nan                         72       218105.0      9142     30484.31888978114   0.9102228263646758  0.8083487523192785  0.8047828694155461
agg_4707  classical monocyte      ampulla of uterine tube           fallopian tube  healthy  fc77d2ae-247d-44d7-aa24-3f4859254c2c  scimilarity  nan     nan                         78       550950.0      9639     30719.377971431015  0.9077670011915634  0.8045070167513724  0.7896845423359651
agg_4709  classical monocyte      aorta                             vasculature     healthy  GSE166676                             scimilarity  nan     nan                         25       162858.0      8859     31216.275954364824  0.8819013257206973  0.7821403055329706  0.7646999711802146
agg_4710  classical monocyte      apex of heart                     heart           healthy  b52eb423-5d0d-4645-b217-e1c6d38b2e72  scimilarity  nan     nan                         397      1226515.0     12369    32022.563851814968  0.9469178617442242  0.8326145310572417  0.8365506153530168
agg_4733  classical monocyte      blood                             blood           healthy  03f821b4-87be-4ff4-b65a-b5fc00061da7  scimilarity  nan     nan                         32464    109280914.0   16158    33646.02843110038   0.9568728712803031  0.8545324533535094  0.8487445540580735
agg_4734  classical monocyte      blood                             blood           healthy  436154da-bcf1-4130-9c8b-120ff9a888f2  scimilarity  nan     nan                         76800    206490628.0   16683    30736.453324546856  0.955313650235467   0.8494127267867799  0.84210567312908
agg_4735  classical monocyte      blood                             blood           healthy  5d445965-6f1a-4b68-ba3a-b8f765155d3a  scimilarity  nan     nan                         1044     3638976.0     12306    30542.8977364245    0.9465384578921433  0.851991795617166   0.8308191715256182
agg_4736  classical monocyte      blood                             blood           healthy  DS000010023                           scimilarity  nan     nan                         243      362606.0      8414     25201.261024165953  0.8865446516098515  0.7621391376670988  0.7681818769796727
agg_4737  classical monocyte      blood                             blood           healthy  GSE122703                             scimilarity  nan     nan                         18       83417.0       7546     28194.725173315186  0.859225612396475   0.7640890699253299  0.7443681539461423
agg_4738  classical monocyte      blood                             blood           healthy  GSE130117                             scimilarity  nan     nan                         2017     7130588.0     13535    33078.53692191542   0.9553673450160365  0.851402109626239   0.8385936100758409
agg_4739  classical monocyte      blood                             blood           healthy  GSE132802                             scimilarity  nan     nan                         1601     9955248.0     13132    32063.630951743195  0.9478882791739611  0.8391025143866828  0.8303465877530952
agg_4740  classical monocyte      blood                             blood           healthy  GSE139324                             scimilarity  nan     nan                         2333     8331045.0     13985    31135.881287246768  0.9608208780142045  0.8473885992448625  0.8432790193723467
agg_4741  classical monocyte      blood                             blood           healthy  GSE145809                             scimilarity  nan     nan                         69       245221.0      8962     29135.67197629852   0.8825701041728526  0.7811799267734735  0.7818647625179129
agg_4742  classical monocyte      blood                             blood           healthy  GSE149313                             scimilarity  nan     nan                         2420     6974751.0     13143    29560.496854576566  0.9574598613513423  0.8505290963248237  0.8379199735887167
agg_4743  classical monocyte      blood                             blood           healthy  GSE153421                             scimilarity  nan     nan                         3691     15561725.0    14569    34377.465875728165  0.9636686704566925  0.8576434473562725  0.8511814190737197
agg_4744  classical monocyte      blood                             blood           healthy  GSE156989                             scimilarity  nan     nan                         13554    160011485.0   16915    34135.439844737564  0.9640667421350761  0.8577967800377495  0.8517975138366085
agg_4745  classical monocyte      blood                             blood           healthy  GSE157829                             scimilarity  nan     nan                         1619     6957811.0     13507    30199.39288988673   0.9484019976492215  0.8436979316400604  0.8347196616710685
agg_4746  classical monocyte      blood                             blood           healthy  GSE159113                             scimilarity  nan     nan                         1025     6298250.0     12083    27477.50809897617   0.9078020151513733  0.8121457150205226  0.7980372877810575
agg_4747  classical monocyte      blood                             blood           healthy  GSE161329                             scimilarity  nan     nan                         5654     25653579.0    14349    28848.0539929647    0.9549801428956252  0.8450430950674043  0.8406188789518544
agg_4748  classical monocyte      blood                             blood           healthy  GSE161738                             scimilarity  nan     nan                         2676     13801473.0    12825    33337.477050230416  0.9541962906717452  0.8512846409758499  0.8485408028961247
agg_4749  classical monocyte      blood                             blood           healthy  GSE163668                             scimilarity  nan     nan                         2644     10486314.0    14049    33786.96584264489   0.9597801578342394  0.8560775485935677  0.8512149509551471
agg_4750  classical monocyte      blood                             blood           healthy  GSE166992                             scimilarity  nan     nan                         7501     28033216.0    15079    33455.367364577316  0.9622273594219685  0.8558958139235102  0.8495571689751152
agg_4751  classical monocyte      blood                             blood           healthy  GSE167363                             scimilarity  nan     nan                         3135     14722635.0    14375    29977.24002819913   0.942417448875388   0.8368071803109702  0.8258536430202982
agg_4752  classical monocyte      blood                             blood           healthy  GSE168710                             scimilarity  nan     nan                         16484    104881872.0   16223    34107.336261357574  0.9398282119039322  0.8424821834537695  0.8372971004604842
agg_4753  classical monocyte      blood                             blood           healthy  GSE168732                             scimilarity  nan     nan                         770      2548822.0     12508    33411.30103713399   0.9552513581030765  0.8508279875038706  0.847461536110767
agg_4754  classical monocyte      blood                             blood           healthy  b0cf0afa-ec40-4d65-b570-ed4ceacc6813  scimilarity  nan     nan                         40975    300555227.0   15784    35938.85772500803   0.9622425892039956  0.853424173800979   0.8508714303589978
agg_4755  classical monocyte      blood                             blood           healthy  ddfad306-714d-4cc0-9985-d9072820c530  scimilarity  nan     nan                         8827     36073928.0    15131    33208.591584008376  0.9546118779961532  0.8543086616569785  0.8462739374830107
agg_4772  classical monocyte      bone marrow                       bone marrow     healthy  GSE132509                             scimilarity  nan     nan                         610      2315570.0     12950    31768.06513427212   0.95159369508558    0.8517118261701931  0.836658919433696
agg_4773  classical monocyte      bone marrow                       bone marrow     healthy  GSE154109                             scimilarity  nan     nan                         531      1431388.0     11793    31377.450948003392  0.9490546933955852  0.8431566630120637  0.8370883160295727
agg_4774  classical monocyte      bone marrow                       bone marrow     healthy  GSE163278                             scimilarity  nan     nan                         1119     3970394.0     13361    32081.93302956569   0.9620394897163868  0.8531148861215617  0.8426785397396367
agg_4775  classical monocyte      bone marrow                       bone marrow     healthy  e5f58829-1a66-40b5-a624-9046778e74f5  scimilarity  nan     nan                         151      8025584.0     12883    29444.78352863075   0.8440608178227607  0.740574910750325   0.7417328577454956
agg_4777  classical monocyte      breast                            breast          healthy  GSE164898                             scimilarity  nan     nan                         136      641471.0      12971    34463.52724138501   0.9163324788498406  0.8116274576633968  0.7978555908123931
agg_4778  classical monocyte      breast                            breast          healthy  c9706a92-0e5f-46c1-96d8-20e42467f287  scimilarity  nan     nan                         98       1444245.0     13491    30678.263421880285  0.9165520953567395  0.8162053142576849  0.7994301225229256
agg_4782  classical monocyte      bronchus                          airway          healthy  GSE158127                             scimilarity  nan     nan                         158      1158198.0     12643    34764.50196701077   0.9364512338084163  0.8259291909369686  0.8266638555276521
agg_4783  classical monocyte      cardiac muscle of left ventricle  heart           healthy  GSE156703                             scimilarity  nan     nan                         13       116181.0      9463     35695.66320276271   0.8542740960069863  0.7515621053395214  0.7561639038477878
agg_4785  classical monocyte      caudate lobe of liver             liver           healthy  44531dd9-1388-4416-a117-af0a99de2294  scimilarity  nan     nan                         238      730016.0      11505    31342.386314731422  0.9217674983890346  0.8140551552218395  0.8040417787954989
agg_4786  classical monocyte      cortex of kidney                  kidney          healthy  120e86b4-1195-48c5-845b-b98054105eec  scimilarity  nan     nan                         79       323010.0      10939    32378.76683324232   0.9028856137251035  0.7978822439778066  0.7839454035009307
agg_4787  classical monocyte      cortex of kidney                  kidney          healthy  a98b828a-622a-483a-80e0-15703678befd  scimilarity  nan     nan                         91       477355.0      10898    32358.068865763344  0.9328436291917394  0.8237810319569842  0.8195391931798526
agg_4789  classical monocyte      digestive tract                   gut             healthy  DS000011665                           scimilarity  nan     nan                         347      1679116.0     12155    33347.55517047197   0.9422556928648441  0.8417267634096297  0.84018452536733
agg_4790  classical monocyte      exocrine pancreas                 pancreas        healthy  e5f58829-1a66-40b5-a624-9046778e74f5  scimilarity  nan     nan                         821      7824069.0     14847    36135.64587109593   0.9493709998172055  0.8440837716099078  0.8410246939313819
agg_4791  classical monocyte      fallopian tube                    fallopian tube  healthy  fc77d2ae-247d-44d7-aa24-3f4859254c2c  scimilarity  nan     nan                         131      734093.0      11240    33115.640434504094  0.9359103734457376  0.8339026306142181  0.8225901799509813
agg_4792  classical monocyte      fimbria of uterine tube           fallopian tube  healthy  fc77d2ae-247d-44d7-aa24-3f4859254c2c  scimilarity  nan     nan                         34       209362.0      7663     27635.733860508328  0.8560416684135254  0.7382997749370328  0.7459235366949488
agg_4795  classical monocyte      head of femur                     bone            healthy  GSE169396                             scimilarity  nan     nan                         450      3669304.0     13216    33082.323604222154  0.9529417082022753  0.8522346343107771  0.8359032081996703
agg_4798  classical monocyte      heart left ventricle              heart           healthy  b52eb423-5d0d-4645-b217-e1c6d38b2e72  scimilarity  nan     nan                         192      585001.0      11159    31422.874036870588  0.9363985001217598  0.8226438123601741  0.8283173244446851
agg_4799  classical monocyte      heart right ventricle             heart           healthy  b52eb423-5d0d-4645-b217-e1c6d38b2e72  scimilarity  nan     nan                         316      936263.0      11904    32002.67227624691   0.9425900348990802  0.8306128730459813  0.8308116461021977
agg_4803  classical monocyte      interventricular septum           heart           healthy  b52eb423-5d0d-4645-b217-e1c6d38b2e72  scimilarity  nan     nan                         442      1322725.0     12418    32235.5434681197    0.94751399102473    0.8340623939411483  0.8353858365852226
agg_4804  classical monocyte      isthmus of fallopian tube         fallopian tube  healthy  fc77d2ae-247d-44d7-aa24-3f4859254c2c  scimilarity  nan     nan                         62       318330.0      8642     29768.198512126502  0.8846027131590791  0.784220371073871   0.7739277341448829
agg_4808  classical monocyte      kidney                            kidney          healthy  120e86b4-1195-48c5-845b-b98054105eec  scimilarity  nan     nan                         762      4034828.0     14946    34015.00816823295   0.9520640694425091  0.848205266473767   0.836085027208869
agg_4809  classical monocyte      kidney                            kidney          healthy  DS000010415                           scimilarity  nan     nan                         55       127079.0      8055     27206.419037355434  0.8216238135756493  0.7570847030479543  0.72524174726152
agg_4810  classical monocyte      kidney                            kidney          healthy  GSE140989                             scimilarity  nan     nan                         174      563438.0      11016    29887.593299155575  0.914390252807459   0.8069762795735104  0.8086759072557722
agg_4812  classical monocyte      left cardiac atrium               heart           healthy  b52eb423-5d0d-4645-b217-e1c6d38b2e72  scimilarity  nan     nan                         450      1446734.0     12669    32433.20393285361   0.9467532324494885  0.8378571513042986  0.8360845068688169
agg_4817  classical monocyte      liver                             liver           healthy  GSE136103                             scimilarity  nan     nan                         2036     7668787.0     14818    32446.71686472893   0.9614468191909097  0.8455504517344443  0.8464065028454753
agg_4818  classical monocyte      liver                             liver           healthy  GSE159977                             scimilarity  nan     nan                         584      4703990.0     13644    34306.87421529999   0.9580119653448788  0.8463006395755477  0.8456137609888349
agg_4819  classical monocyte      liver                             liver           healthy  GSE163650                             scimilarity  nan     nan                         440      4840312.0     12272    31180.407161439263  0.9312379663603724  0.8198899526213368  0.8071334502842269
agg_4823  classical monocyte      lower lobe of lung                lung            healthy  GSE169471                             scimilarity  nan     nan                         305      1224338.0     11342    28255.985922767635  0.9404343350984603  0.8261785132237449  0.8150341534919611
agg_4832  classical monocyte      lung                              lung            healthy  5d445965-6f1a-4b68-ba3a-b8f765155d3a  scimilarity  nan     nan                         1254     5397217.0     13298    31013.01576531177   0.9490555294982109  0.8457060773457411  0.8329330004620054
agg_4833  classical monocyte      lung                              lung            healthy  DS000011735                           scimilarity  nan     nan                         4653     16523051.0    17066    37593.98867222708   0.8985708908575675  0.8260014412964971  0.8051199142820423
agg_4834  classical monocyte      lung                              lung            healthy  GSE128033                             scimilarity  nan     nan                         1047     3646581.0     13185    29331.2035341724    0.9513045959002788  0.837770557527153   0.8238987539695043
agg_4835  classical monocyte      lung                              lung            healthy  GSE128169                             scimilarity  nan     nan                         1732     11798577.0    15051    32941.755873862814  0.9636814596401634  0.8539834251349044  0.8457626672394015
agg_4836  classical monocyte      lung                              lung            healthy  GSE132771                             scimilarity  nan     nan                         1601     4614408.0     13275    29761.03818692572   0.9531291409651284  0.8457644131717956  0.8310980214000753
agg_4837  classical monocyte      lung                              lung            healthy  GSE169471                             scimilarity  nan     nan                         498      1613976.0     11886    28956.213107123967  0.9433400854993297  0.8271067022019086  0.815595680192867
agg_4853  classical monocyte      lung parenchyma                   lung            healthy  GSE158127                             scimilarity  nan     nan                         791      2735456.0     13260    33646.87553319058   0.9544351950657847  0.8399981618864122  0.8327646420552953
agg_4855  classical monocyte      mesenteric artery                 vasculature     healthy  GSE156341                             scimilarity  nan     nan                         49       408553.0      10083    30979.99239432764   0.9337200372314851  0.8169964046619824  0.8163109060481625
agg_4857  classical monocyte      mesenteric lymph node             lymph node      healthy  7681c7d7-0168-4892-a547-6f02a6430ace  scimilarity  nan     nan                         23       211416.0      9219     31018.02041830142   0.9058403141644794  0.7914556298280883  0.7867496166249129
agg_4858  classical monocyte      muscle tissue                     muscle          healthy  e5f58829-1a66-40b5-a624-9046778e74f5  scimilarity  nan     nan                         1800     50754027.0    16141    33323.26969602708   0.9316672745867468  0.8348295016474899  0.8219918807835366
agg_4860  classical monocyte      nasal cavity                      airway          healthy  03f821b4-87be-4ff4-b65a-b5fc00061da7  scimilarity  nan     nan                         129      319231.0      10961    33948.695520725145  0.908031594986554   0.802885512146251   0.7998873890770253
agg_4869  classical monocyte      peritoneum                        peritoneum      healthy  GSE130888                             scimilarity  nan     nan                         297      509237.0      11456    32213.169493243313  0.9218188146119148  0.8055575625334659  0.8039145027939639
agg_4870  classical monocyte      prostate gland                    prostate        healthy  GSE145843                             scimilarity  nan     nan                         24       87555.0       5997     21432.99958526179   0.816337811891654   0.7246769991610698  0.6943045232082502
agg_4871  classical monocyte      prostate gland                    prostate        healthy  e5f58829-1a66-40b5-a624-9046778e74f5  scimilarity  nan     nan                         220      2445216.0     12460    33911.259081133416  0.9301914834403244  0.8259560877683964  0.824440772955729
agg_4872  classical monocyte      renal medulla                     kidney          healthy  120e86b4-1195-48c5-845b-b98054105eec  scimilarity  nan     nan                         21       101089.0      7354     26191.96315382653   0.8491245141516279  0.751292403843078   0.7310820608427115
agg_4875  classical monocyte      right cardiac atrium              heart           healthy  b52eb423-5d0d-4645-b217-e1c6d38b2e72  scimilarity  nan     nan                         311      977934.0      11891    31570.89048372305   0.9357294628028995  0.8260631626226992  0.8224053236502149
agg_4878  classical monocyte      spleen                            spleen          healthy  4d74781b-8186-4c9a-b659-ff4dc4601d91  scimilarity  nan     nan                         2166     7905128.0     13952    30832.98149016589   0.957626764427953   0.8498277489734691  0.8368540073560422
agg_4879  classical monocyte      spleen                            spleen          healthy  GSE148796                             scimilarity  nan     nan                         49       99684.0       6785     24492.260723886982  0.8477188947186437  0.7504685014947085  0.7295165069095441
agg_4880  classical monocyte      spleen                            spleen          healthy  e5f58829-1a66-40b5-a624-9046778e74f5  scimilarity  nan     nan                         3483     86078540.0    17775    36266.36759486      0.9100465322095738  0.8097019968114763  0.8081945289287479
agg_4881  classical monocyte      subcutaneous adipose tissue       adipose         healthy  e5f58829-1a66-40b5-a624-9046778e74f5  scimilarity  nan     nan                         2019     27914261.0    15739    35142.992419516115  0.9421666034710219  0.8357406603138717  0.8249091266671545
agg_4886  classical monocyte      thoracic lymph node               lymph node      healthy  62ef75e4-cbea-454e-a0ce-998ec40223d3  scimilarity  nan     nan                         20194    147502356.0   17156    34278.43204322054   0.9584542376154844  0.8568164088302802  0.8468420651789338
agg_4887  classical monocyte      thymus                            thymus          healthy  62ef75e4-cbea-454e-a0ce-998ec40223d3  scimilarity  nan     nan                         487      2692133.0     12969    33549.45213075861   0.9497232200872254  0.8549259958290932  0.843613914494647
agg_4888  classical monocyte      thymus                            thymus          healthy  83ed3be8-4cb9-43e6-9aaa-3fbbf5d1bd3a  scimilarity  nan     nan                         27       80042.0       6527     23800.833698344715  0.8448423801602987  0.7347769941739797  0.7181319535525903
agg_4889  classical monocyte      thymus                            thymus          healthy  de13e3e2-23b6-40ed-a413-e9e12d7d3910  scimilarity  nan     nan                         52       298983.0      9582     30002.684399867492  0.9175531872094376  0.8185046437128072  0.8176606503789746
agg_4890  classical monocyte      tonsil                            tonsil          healthy  GSE119506                             scimilarity  nan     nan                         321      1114339.0     11546    30516.473940957327  0.936749794981687   0.8307356996245999  0.8215841775914924
agg_4893  classical monocyte      trachea                           airway          healthy  03f821b4-87be-4ff4-b65a-b5fc00061da7  scimilarity  nan     nan                         126      457580.0      11135    33734.327765812595  0.9299687996532033  0.8279876048002456  0.8230768581762299
agg_4894  classical monocyte      trachea                           airway          healthy  e5f58829-1a66-40b5-a624-9046778e74f5  scimilarity  nan     nan                         130      8245622.0     13663    32584.555010387805  0.859406194907081   0.7522520630806427  0.7550758397192211
agg_4896  classical monocyte      transverse colon                  gut             healthy  62ef75e4-cbea-454e-a0ce-998ec40223d3  scimilarity  nan     nan                         503      2932999.0     12868    33265.22762476361   0.9476555210791636  0.8481166846094836  0.8392740144102075
agg_4899  classical monocyte      upper lobe of lung                lung            healthy  GSE169471                             scimilarity  nan     nan                         180      594059.0      10222    27881.011904781462  0.9248153838159024  0.8106326147324702  0.8003438801026445
agg_4900  classical monocyte      urine                             urinary         healthy  GSE165396                             scimilarity  nan     nan                         20       109197.0      7299     25505.26771828214   0.8530560359297166  0.7505269381795711  0.7350625083451162
agg_4901  classical monocyte      uterus                            uterus          healthy  32f2fd23-ec74-486f-9544-e5b2f41725f5  scimilarity  nan     nan                         18       189472.0      9397     30891.07905591756   0.8870309576225303  0.7796319609750195  0.7827462282311286
agg_4902  classical monocyte      vasculature                       vasculature     healthy  e5f58829-1a66-40b5-a624-9046778e74f5  scimilarity  nan     nan                         14537    224206261.0   17922    36651.90273209977   0.9438232475974557  0.8425086108952623  0.8343079266995497
agg_6287  intermediate monocyte   head of femur                     bone            healthy  GSE169396                             scimilarity  nan     nan                         102      191075.0      7853     26179.035956297153  0.8330771518439726  0.7503209876663273  0.7113081302663875
agg_6289  intermediate monocyte   lung                              lung            healthy  5d445965-6f1a-4b68-ba3a-b8f765155d3a  scimilarity  nan     nan                         178      1172582.0     11314    30515.569680815937  0.9409051040470379  0.8435279441582394  0.8229749449946738
agg_6290  intermediate monocyte   spleen                            spleen          healthy  e5f58829-1a66-40b5-a624-9046778e74f5  scimilarity  nan     nan                         60       252220.0      8236     26163.955486425446  0.836498746238646   0.7439980946911184  0.7168040239671246
agg_6291  intermediate monocyte   thymus                            thymus          healthy  GSE159745                             scimilarity  nan     nan                         29       82115.0       5987     22540.234815420707  0.7817665678679913  0.7043553158094606  0.6760717657015229
agg_6292  intermediate monocyte   vasculature                       vasculature     healthy  e5f58829-1a66-40b5-a624-9046778e74f5  scimilarity  nan     nan                         162      1525515.0     11980    32186.1781582846    0.9269571292216415  0.8224395619954739  0.8055668264019443
agg_7919  non-classical monocyte  apex of heart                     heart           healthy  b52eb423-5d0d-4645-b217-e1c6d38b2e72  scimilarity  nan     nan                         158      675731.0      11222    30876.899222310585  0.9301440448035162  0.8258961291954683  0.824159827752181
agg_7939  non-classical monocyte  blood                             blood           healthy  03f821b4-87be-4ff4-b65a-b5fc00061da7  scimilarity  nan     nan                         7359     36418526.0    14851    33258.733090217436  0.9519276590597959  0.8501060197631037  0.850960653386464
agg_7940  non-classical monocyte  blood                             blood           healthy  436154da-bcf1-4130-9c8b-120ff9a888f2  scimilarity  nan     nan                         14619    54479703.0    15493    30191.499157063707  0.9490211472741342  0.8448211022633122  0.8418061408505518
agg_7941  non-classical monocyte  blood                             blood           healthy  5d445965-6f1a-4b68-ba3a-b8f765155d3a  scimilarity  nan     nan                         195      1239240.0     11002    30223.31815928311   0.9379359624101378  0.8397815574875395  0.8299059656859994
agg_7942  non-classical monocyte  blood                             blood           healthy  GSE130117                             scimilarity  nan     nan                         322      1200290.0     11182    31409.27333671809   0.9418345510352945  0.8405863822794143  0.8304108431330093
agg_7943  non-classical monocyte  blood                             blood           healthy  GSE132802                             scimilarity  nan     nan                         102      726593.0      10232    30435.311954214158  0.9262718659348109  0.8186907385315048  0.8147462064075649
agg_7944  non-classical monocyte  blood                             blood           healthy  GSE134004                             scimilarity  nan     nan                         21       123360.0      6990     24135.62187154626   0.8660944307658819  0.7608357820845066  0.7535994530765779
agg_7945  non-classical monocyte  blood                             blood           healthy  GSE139324                             scimilarity  nan     nan                         435      2395489.0     12428    30987.28210955269   0.9487843521961865  0.8347185640893492  0.8386196076774184
agg_7946  non-classical monocyte  blood                             blood           healthy  GSE149313                             scimilarity  nan     nan                         567      2445950.0     11737    29144.914765151065  0.9480753865144765  0.8398428245307203  0.8354975068540325
agg_7947  non-classical monocyte  blood                             blood           healthy  GSE153421                             scimilarity  nan     nan                         441      2114891.0     11966    32647.54233438482   0.9524227157990666  0.8453840087915955  0.8445318485082767
agg_7948  non-classical monocyte  blood                             blood           healthy  GSE156989                             scimilarity  nan     nan                         3151     40420221.0    15662    32233.769948686153  0.9558775883405131  0.8527879494870606  0.8461332941066178
agg_7949  non-classical monocyte  blood                             blood           healthy  GSE157829                             scimilarity  nan     nan                         144      890675.0      10657    29172.377435644317  0.9302786443476869  0.8283014100006509  0.8220722486967346
agg_7950  non-classical monocyte  blood                             blood           healthy  GSE161329                             scimilarity  nan     nan                         1118     7175719.0     12865    29244.857112487658  0.9476682128749678  0.8393859845018986  0.8392730294908749
agg_7951  non-classical monocyte  blood                             blood           healthy  GSE161738                             scimilarity  nan     nan                         1497     12143757.0    12362    32632.110821778042  0.9476207444575895  0.8463853459866758  0.8490136875251815
agg_7952  non-classical monocyte  blood                             blood           healthy  GSE163668                             scimilarity  nan     nan                         323      1716760.0     11769    32812.39505818172   0.9472316303194352  0.8438698542111769  0.8424970169398833
agg_7953  non-classical monocyte  blood                             blood           healthy  GSE166992                             scimilarity  nan     nan                         1613     7143035.0     13383    32605.389353410996  0.9532209129181096  0.8486599938147087  0.8451557766892072
agg_7954  non-classical monocyte  blood                             blood           healthy  GSE167363                             scimilarity  nan     nan                         458      3094035.0     12228    29678.03739631962   0.9435654504659038  0.8379948419319221  0.8241961683725367
agg_7955  non-classical monocyte  blood                             blood           healthy  GSE168710                             scimilarity  nan     nan                         75       701113.0      10776    32559.49721301115   0.9233871475920297  0.8250528447177075  0.8185333840139741
agg_7956  non-classical monocyte  blood                             blood           healthy  GSE168732                             scimilarity  nan     nan                         229      1242404.0     11269    31965.299177754878  0.9416742339845935  0.8377482237588458  0.8415500681607002
agg_7957  non-classical monocyte  blood                             blood           healthy  b0cf0afa-ec40-4d65-b570-ed4ceacc6813  scimilarity  nan     nan                         5897     43180935.0    14970    35595.18649267558   0.9543288699936718  0.8496535044071925  0.8536782531653531
agg_7970  non-classical monocyte  bone marrow                       bone marrow     healthy  GSE132509                             scimilarity  nan     nan                         28       86280.0       7085     25870.53624683356   0.8392355979135352  0.755387271124993   0.7303317003908475
agg_7971  non-classical monocyte  bone marrow                       bone marrow     healthy  GSE154109                             scimilarity  nan     nan                         50       289907.0      9147     28526.85661530531   0.9070952567396775  0.808361820498357   0.7964598438037417
agg_7972  non-classical monocyte  bone marrow                       bone marrow     healthy  GSE163278                             scimilarity  nan     nan                         127      682328.0      10897    31202.529821716787  0.9358619814006104  0.8294675777784289  0.8230468472234642
agg_7974  non-classical monocyte  breast                            breast          healthy  GSE164898                             scimilarity  nan     nan                         54       120275.0      7652     26423.010551181265  0.8534343500588628  0.755424451856796   0.7230125883881353
agg_7976  non-classical monocyte  cortex of kidney                  kidney          healthy  120e86b4-1195-48c5-845b-b98054105eec  scimilarity  nan     nan                         63       401864.0      11464    32990.96311584531   0.9014527631813947  0.7931942866954178  0.7865930763896118
agg_7977  non-classical monocyte  cortex of kidney                  kidney          healthy  a98b828a-622a-483a-80e0-15703678befd  scimilarity  nan     nan                         161      772141.0      11062    31056.660350613587  0.9346806312815913  0.8317520409566359  0.8313984606487961
agg_7979  non-classical monocyte  fallopian tube                    fallopian tube  healthy  fc77d2ae-247d-44d7-aa24-3f4859254c2c  scimilarity  nan     nan                         16       222500.0      7755     27898.660349927846  0.8621134765305163  0.7601890611687369  0.744729424431635
agg_7980  non-classical monocyte  fimbria of uterine tube           fallopian tube  healthy  fc77d2ae-247d-44d7-aa24-3f4859254c2c  scimilarity  nan     nan                         28       244266.0      7735     27678.814270356754  0.8734809587140066  0.7648153554084077  0.76608428730196
agg_7982  non-classical monocyte  heart left ventricle              heart           healthy  b52eb423-5d0d-4645-b217-e1c6d38b2e72  scimilarity  nan     nan                         57       213565.0      9322     29851.355460106315  0.8978296307325292  0.7939956162210045  0.776120799756924
agg_7983  non-classical monocyte  heart right ventricle             heart           healthy  b52eb423-5d0d-4645-b217-e1c6d38b2e72  scimilarity  nan     nan                         124      613752.0      10880    30311.360727579897  0.9311389328985725  0.8253171421863371  0.8219320702682233
agg_7985  non-classical monocyte  interventricular septum           heart           healthy  b52eb423-5d0d-4645-b217-e1c6d38b2e72  scimilarity  nan     nan                         144      524658.0      10853    31170.857886799495  0.9241302459076691  0.8138209035301205  0.8224136632163046
agg_7986  non-classical monocyte  isthmus of fallopian tube         fallopian tube  healthy  fc77d2ae-247d-44d7-aa24-3f4859254c2c  scimilarity  nan     nan                         12       86198.0       5668     23558.81110396413   0.793224614700874   0.7114672551137675  0.6825985710920001
agg_7990  non-classical monocyte  kidney                            kidney          healthy  120e86b4-1195-48c5-845b-b98054105eec  scimilarity  nan     nan                         214      1788808.0     13749    33794.71179753717   0.9324162382479619  0.8250902105825786  0.818301334304562
agg_7991  non-classical monocyte  kidney                            kidney          healthy  GSE140989                             scimilarity  nan     nan                         473      1769375.0     13008    31797.13462972748   0.9190676320157254  0.8182361375222441  0.8172110487458264
agg_7992  non-classical monocyte  left cardiac atrium               heart           healthy  b52eb423-5d0d-4645-b217-e1c6d38b2e72  scimilarity  nan     nan                         82       357018.0      10073    29995.2881818953    0.916928150799629   0.8074046233049013  0.808753490409063
agg_7995  non-classical monocyte  liver                             liver           healthy  GSE136103                             scimilarity  nan     nan                         423      2383574.0     13122    31645.435149503457  0.9524231389227664  0.8376748793878739  0.8415142758287522
agg_7996  non-classical monocyte  liver                             liver           healthy  GSE159977                             scimilarity  nan     nan                         473      4877555.0     13370    33200.636271185205  0.9498326621251537  0.8392029926705806  0.8469259328198118
agg_7997  non-classical monocyte  liver                             liver           healthy  GSE163650                             scimilarity  nan     nan                         10       96148.0       6782     24958.605923030595  0.844996457443227   0.7380858347163884  0.7209649786125333
agg_8007  non-classical monocyte  lung                              lung            healthy  5d445965-6f1a-4b68-ba3a-b8f765155d3a  scimilarity  nan     nan                         576      3313836.0     12566    30969.255628748215  0.9438879246302309  0.8444989044016609  0.8358111974870654
agg_8008  non-classical monocyte  lung                              lung            healthy  DS000011735                           scimilarity  nan     nan                         169      779577.0      12885    36247.76234710614   0.8808886302153663  0.8102555656654346  0.7948721237235893
agg_8009  non-classical monocyte  lung                              lung            healthy  GSE128033                             scimilarity  nan     nan                         79       343546.0      9330     27406.22701580364   0.9035509606166174  0.7968173004217675  0.7863212826693502
agg_8010  non-classical monocyte  lung                              lung            healthy  GSE128169                             scimilarity  nan     nan                         276      2433151.0     12769    32027.902610765417  0.9547437308289124  0.8449880167209973  0.8438649522788478
agg_8011  non-classical monocyte  lung                              lung            healthy  GSE132771                             scimilarity  nan     nan                         37       151922.0      7860     26295.743163049112  0.8838770944570796  0.790431305730734   0.7607028661311058
agg_8012  non-classical monocyte  lung                              lung            healthy  GSE169471                             scimilarity  nan     nan                         27       103204.0      6854     23947.65597317191   0.8348392917103126  0.7488344663849291  0.7237935010019484
agg_8024  non-classical monocyte  lung parenchyma                   lung            healthy  GSE158127                             scimilarity  nan     nan                         309      1788828.0     12136    32078.56380429425   0.9418105467635777  0.8295808338281161  0.831837004521098
agg_8026  non-classical monocyte  muscle tissue                     muscle          healthy  e5f58829-1a66-40b5-a624-9046778e74f5  scimilarity  nan     nan                         321      23778338.0    13169    27977.08919973388   0.8814221140322411  0.7869360982600341  0.7677337716492793
agg_8034  non-classical monocyte  right cardiac atrium              heart           healthy  b52eb423-5d0d-4645-b217-e1c6d38b2e72  scimilarity  nan     nan                         70       408965.0      10146    29746.121058681525  0.9252128588702657  0.8186384457806745  0.8103901461358153
agg_8036  non-classical monocyte  spleen                            spleen          healthy  4d74781b-8186-4c9a-b659-ff4dc4601d91  scimilarity  nan     nan                         336      1586973.0     11934    30580.798338873254  0.9471354298985436  0.8378586626071394  0.8322814736104384
agg_8039  non-classical monocyte  thoracic lymph node               lymph node      healthy  62ef75e4-cbea-454e-a0ce-998ec40223d3  scimilarity  nan     nan                         1950     18888557.0    15221    33581.76389036113   0.9559331607078543  0.8537562341521224  0.8469325123405803
agg_8040  non-classical monocyte  thymus                            thymus          healthy  62ef75e4-cbea-454e-a0ce-998ec40223d3  scimilarity  nan     nan                         68       441502.0      10301    31247.452676451114  0.917407487628822   0.8223254514108821  0.8197270854172879
agg_8041  non-classical monocyte  trachea                           airway          healthy  03f821b4-87be-4ff4-b65a-b5fc00061da7  scimilarity  nan     nan                         12       97840.0       7863     29204.083428972655  0.8702163087134754  0.7685799000235117  0.7716054237714325
agg_8042  non-classical monocyte  trachea                           airway          healthy  e5f58829-1a66-40b5-a624-9046778e74f5  scimilarity  nan     nan                         16       534304.0      8970     24807.157880956805  0.8232311272852844  0.7180336962121132  0.7003798955542161
agg_8044  non-classical monocyte  transverse colon                  gut             healthy  62ef75e4-cbea-454e-a0ce-998ec40223d3  scimilarity  nan     nan                         135      908452.0      11380    32804.296806012215  0.933818206129888   0.8353788654625934  0.8345447172426862
agg_8045  non-classical monocyte  upper lobe of lung                lung            healthy  GSE169471                             scimilarity  nan     nan                         32       143341.0      7269     24395.316476390286  0.8673226900308589  0.7523857428733156  0.7403920706432944
agg_8046  non-classical monocyte  vasculature                       vasculature     healthy  e5f58829-1a66-40b5-a624-9046778e74f5  scimilarity  nan     nan                         987      10267209.0    15019    35019.47682838501   0.9379668399645168  0.8304128458333673  0.8283221722893935

This query selects cells that are:

  • classical monocytes (cell_type == “classical monocyte”)

  • from healthy donors (disease == “healthy”)

  • from blood tissue (tissue == “blood”)

! decima query-cell 'cell_type == "classical monocyte" and disease == "healthy" and tissue == "blood"' | column -t -s $'\t'
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
wandb: Currently logged in as: mhcelik (mhcw) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb:   1 of 1 files downloaded.  
Done. 00:00:01.9 (1663.1MB/s)
          cell_type           tissue  organ  disease  study                                 dataset      region  subregion  celltype_coarse  n_cells  total_counts  n_genes  size_factor         train_pearson       val_pearson         test_pearson
agg_4733  classical monocyte  blood   blood  healthy  03f821b4-87be-4ff4-b65a-b5fc00061da7  scimilarity  nan     nan                         32464    109280914.0   16158    33646.02843110038   0.9568728712803031  0.8545324533535094  0.8487445540580735
agg_4734  classical monocyte  blood   blood  healthy  436154da-bcf1-4130-9c8b-120ff9a888f2  scimilarity  nan     nan                         76800    206490628.0   16683    30736.453324546856  0.955313650235467   0.8494127267867799  0.84210567312908
agg_4735  classical monocyte  blood   blood  healthy  5d445965-6f1a-4b68-ba3a-b8f765155d3a  scimilarity  nan     nan                         1044     3638976.0     12306    30542.8977364245    0.9465384578921433  0.851991795617166   0.8308191715256182
agg_4736  classical monocyte  blood   blood  healthy  DS000010023                           scimilarity  nan     nan                         243      362606.0      8414     25201.261024165953  0.8865446516098515  0.7621391376670988  0.7681818769796727
agg_4737  classical monocyte  blood   blood  healthy  GSE122703                             scimilarity  nan     nan                         18       83417.0       7546     28194.725173315186  0.859225612396475   0.7640890699253299  0.7443681539461423
agg_4738  classical monocyte  blood   blood  healthy  GSE130117                             scimilarity  nan     nan                         2017     7130588.0     13535    33078.53692191542   0.9553673450160365  0.851402109626239   0.8385936100758409
agg_4739  classical monocyte  blood   blood  healthy  GSE132802                             scimilarity  nan     nan                         1601     9955248.0     13132    32063.630951743195  0.9478882791739611  0.8391025143866828  0.8303465877530952
agg_4740  classical monocyte  blood   blood  healthy  GSE139324                             scimilarity  nan     nan                         2333     8331045.0     13985    31135.881287246768  0.9608208780142045  0.8473885992448625  0.8432790193723467
agg_4741  classical monocyte  blood   blood  healthy  GSE145809                             scimilarity  nan     nan                         69       245221.0      8962     29135.67197629852   0.8825701041728526  0.7811799267734735  0.7818647625179129
agg_4742  classical monocyte  blood   blood  healthy  GSE149313                             scimilarity  nan     nan                         2420     6974751.0     13143    29560.496854576566  0.9574598613513423  0.8505290963248237  0.8379199735887167
agg_4743  classical monocyte  blood   blood  healthy  GSE153421                             scimilarity  nan     nan                         3691     15561725.0    14569    34377.465875728165  0.9636686704566925  0.8576434473562725  0.8511814190737197
agg_4744  classical monocyte  blood   blood  healthy  GSE156989                             scimilarity  nan     nan                         13554    160011485.0   16915    34135.439844737564  0.9640667421350761  0.8577967800377495  0.8517975138366085
agg_4745  classical monocyte  blood   blood  healthy  GSE157829                             scimilarity  nan     nan                         1619     6957811.0     13507    30199.39288988673   0.9484019976492215  0.8436979316400604  0.8347196616710685
agg_4746  classical monocyte  blood   blood  healthy  GSE159113                             scimilarity  nan     nan                         1025     6298250.0     12083    27477.50809897617   0.9078020151513733  0.8121457150205226  0.7980372877810575
agg_4747  classical monocyte  blood   blood  healthy  GSE161329                             scimilarity  nan     nan                         5654     25653579.0    14349    28848.0539929647    0.9549801428956252  0.8450430950674043  0.8406188789518544
agg_4748  classical monocyte  blood   blood  healthy  GSE161738                             scimilarity  nan     nan                         2676     13801473.0    12825    33337.477050230416  0.9541962906717452  0.8512846409758499  0.8485408028961247
agg_4749  classical monocyte  blood   blood  healthy  GSE163668                             scimilarity  nan     nan                         2644     10486314.0    14049    33786.96584264489   0.9597801578342394  0.8560775485935677  0.8512149509551471
agg_4750  classical monocyte  blood   blood  healthy  GSE166992                             scimilarity  nan     nan                         7501     28033216.0    15079    33455.367364577316  0.9622273594219685  0.8558958139235102  0.8495571689751152
agg_4751  classical monocyte  blood   blood  healthy  GSE167363                             scimilarity  nan     nan                         3135     14722635.0    14375    29977.24002819913   0.942417448875388   0.8368071803109702  0.8258536430202982
agg_4752  classical monocyte  blood   blood  healthy  GSE168710                             scimilarity  nan     nan                         16484    104881872.0   16223    34107.336261357574  0.9398282119039322  0.8424821834537695  0.8372971004604842
agg_4753  classical monocyte  blood   blood  healthy  GSE168732                             scimilarity  nan     nan                         770      2548822.0     12508    33411.30103713399   0.9552513581030765  0.8508279875038706  0.847461536110767
agg_4754  classical monocyte  blood   blood  healthy  b0cf0afa-ec40-4d65-b570-ed4ceacc6813  scimilarity  nan     nan                         40975    300555227.0   15784    35938.85772500803   0.9622425892039956  0.853424173800979   0.8508714303589978
agg_4755  classical monocyte  blood   blood  healthy  ddfad306-714d-4cc0-9985-d9072820c530  scimilarity  nan     nan                         8827     36073928.0    15131    33208.591584008376  0.9546118779961532  0.8543086616569785  0.8462739374830107

Attribution calling with custom genes and sequences

In this section, we demonstrate how to call attributions using custom gene sequences. You can provide your own FASTA file containing sequences of interest and run attribution analysis for any set of genes or genomic regions, using the Decima command-line interface. The following examples show how to inspect your FASTA file, run attributions, and explore the output files. The FASTA header line for each sequence contains the gene name and the coordinates of the masked region used for attribution analysis. For example, in the header:

CD68|gene_mask_start=163840|gene_mask_end=166460

“CD68” is the gene name, “gene_mask_start” and “gene_mask_end” specify the start and end positions (relative to the input sequence) of the region that was masked and analyzed for attributions.

! cat ../tests/data/seqs.fasta | cut -c 1-200
cat: ../tests/data/seqs.fasta: No such file or directory
! decima attributions --model v1_rep0 --seqs ../../tests/data/seqs.fasta --tasks "cell_type == 'classical monocyte'" --output-prefix example/output_custom_seqs
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
decima - INFO - Using device: 0
decima - INFO - Loading model v1_rep0 and metadata to compute attributions...
wandb: Currently logged in as: mhcelik (mhcw) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: Downloading large artifact 'rep0:latest', 720.03MB. 1 files...
wandb:   1 of 1 files downloaded.  
Done. 00:00:00.9 (837.1MB/s)
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb:   1 of 1 files downloaded.  
Done. 00:00:02.0 (1562.9MB/s)
/home/celikm5/Projects/decima/src/decima/interpret/attributer.py:66: UserWarning: `off_tasks` is not provided. Using all other tasks as off_tasks.
Computing attributions...:   0%|                          | 0/2 [00:00<?, ?it/s]
Computing attributions...:  50%|█████████         | 1/2 [00:01<00:01,  1.38s/it]
Computing attributions...: 100%|██████████████████| 2/2 [00:02<00:00,  1.00it/s]
Computing attributions...: 100%|██████████████████| 2/2 [00:02<00:00,  1.08s/it]
decima - INFO - Saving sequences...

Saving sequences...: 0it [00:00, ?it/s]
Saving sequences...: 2it [00:00, 10965.50it/s]
decima - INFO - Loading model and metadata to compute attributions...
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb:   1 of 1 files downloaded.  
Done. 00:00:02.2 (1445.0MB/s)
decima - INFO - No genes provided, using all 2 genes in the attribution files.

Computing recursive seqlet calling...:   0%|              | 0/2 [00:00<?, ?it/s]
Computing recursive seqlet calling...: 100%|█████| 2/2 [00:00<00:00, 597.44it/s]
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/memelite/fimo.py:406: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/memelite/fimo.py:406: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.



The output file for custom sequence also contains additional file of seqs.fasta which contains custom seqs. To visualize prediction on IGV, first load this fasta file and its index (.fai) to IGV, then load bam and bed files.

! ls example/output_custom_seqs*
example/output_custom_seqs.attributions.bigwig
example/output_custom_seqs.attributions.h5
example/output_custom_seqs.motifs.tsv
example/output_custom_seqs.seqlets.bed
example/output_custom_seqs.seqs.fasta
example/output_custom_seqs.seqs.fasta.fai
example/output_custom_seqs.warnings.qc.log

Python User API

! ls example/output_classical_monoctypes.*
example/output_classical_monoctypes.attributions.bigwig
example/output_classical_monoctypes.attributions.h5
example/output_classical_monoctypes.motifs.tsv
example/output_classical_monoctypes.seqlets.bed
example/output_classical_monoctypes.warnings.qc.log
from decima.interpret.attributions import AttributionResult

with AttributionResult("example/output_classical_monoctypes.attributions.h5") as ar:
    seqs, attrs = ar.load(["SPI1"])
    print("seqs:", seqs)
    print("attrs:", attrs)
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
Loading attributions and sequences...:   0%|          | 0/1 [00:00<?, ?it/s]
Loading attributions and sequences...: 100%|██████████| 1/1 [00:00<00:00, 432.31it/s]

/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
seqs: [[[0. 0. 0. ... 0. 0. 0.]
  [0. 0. 1. ... 0. 0. 0.]
  [0. 1. 0. ... 1. 0. 1.]
  [1. 0. 0. ... 0. 1. 0.]]]
attrs: [[[-5.10058962e-05 -3.67399698e-05  7.25216159e-06 ... -1.40011580e-05
   -5.10658174e-06 -6.25329176e-06]
  [-5.10058962e-05 -3.67399698e-05 -2.17564848e-05 ... -1.40011580e-05
   -5.10658174e-06 -6.25329176e-06]
  [-5.10058962e-05  1.10219909e-04  7.25216159e-06 ...  4.20034739e-05
   -5.10658174e-06  1.87598753e-05]
  [ 1.53017689e-04 -3.67399698e-05  7.25216159e-06 ... -1.40011580e-05
    1.53197452e-05 -6.25329176e-06]]]

Let’s look at a simple example using Decima’s Python API to analyze the SPI1 gene, which is a key transcription factor in myeloid cell development. We’ll examine its regulation across different monocyte and macrophage cell types where it is known to be important.

First we choice the cells, we are interested in:

with AttributionResult("example/output_classical_monoctypes.attributions.h5") as ar:
    attribution = ar.load_attribution("SPI1")
wandb: Currently logged in as: mhcelik (mhcw) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb:   1 of 1 files downloaded.  
Done. 00:00:01.8 (1731.2MB/s)
import matplotlib.pyplot as plt

attribution.plot_seqlogo(relative_loc=291)
plt.show()
../_images/a7c7ae33d1abb11712016404896cc7b974b61bfe7d6d2264a6f7315dbeffae79.png
attribution.plot_peaks()
import torch
from decima import predict_attributions_seqlet_calling

device = "cuda" if torch.cuda.is_available() else "cpu"

%matplotlib inline
spi1_cell_types = [
    "classical monocyte",
    "intermediate monocyte",
    "non-classical monocyte",
    "alveolar macrophage",
    "macrophage",
]
predict_attributions_seqlet_calling(
    output_prefix="example/attrs_SP1I_monoctypes",
    genes=["SPI1"],
    tasks=f"cell_type in {spi1_cell_types}",
    device=device,
)
wandb: Downloading large artifact 'rep0:latest', 720.03MB. 1 files...
wandb:   1 of 1 files downloaded.  
Done. 00:00:00.6 (1180.1MB/s)
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb:   1 of 1 files downloaded.  
Done. 00:00:01.8 (1694.4MB/s)
/home/celikm5/Projects/decima/src/decima/interpret/attributer.py:66: UserWarning: `off_tasks` is not provided. Using all other tasks as off_tasks.
Computing attributions...:   0%|          | 0/1 [00:00<?, ?it/s]
Computing attributions...: 100%|██████████| 1/1 [00:01<00:00,  1.51s/it]
Computing attributions...: 100%|██████████| 1/1 [00:01<00:00,  1.55s/it]

wandb: Downloading large artifact 'rep1:latest', 720.03MB. 1 files...
wandb:   1 of 1 files downloaded.  
Done. 00:00:01.6 (442.2MB/s)
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb:   1 of 1 files downloaded.  
Done. 00:00:01.8 (1701.1MB/s)
/home/celikm5/Projects/decima/src/decima/interpret/attributer.py:66: UserWarning: `off_tasks` is not provided. Using all other tasks as off_tasks.
Computing attributions...:   0%|          | 0/1 [00:00<?, ?it/s]
Computing attributions...: 100%|██████████| 1/1 [00:00<00:00,  1.10it/s]
Computing attributions...: 100%|██████████| 1/1 [00:00<00:00,  1.04it/s]

wandb: Downloading large artifact 'rep2:latest', 720.03MB. 1 files...
wandb:   1 of 1 files downloaded.  
Done. 00:00:01.9 (387.4MB/s)
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb:   1 of 1 files downloaded.  
Done. 00:00:01.9 (1684.4MB/s)
/home/celikm5/Projects/decima/src/decima/interpret/attributer.py:66: UserWarning: `off_tasks` is not provided. Using all other tasks as off_tasks.
Computing attributions...:   0%|          | 0/1 [00:00<?, ?it/s]
Computing attributions...: 100%|██████████| 1/1 [00:00<00:00,  1.07it/s]
Computing attributions...: 100%|██████████| 1/1 [00:00<00:00,  1.01it/s]

wandb: Downloading large artifact 'rep3:latest', 720.03MB. 1 files...
wandb:   1 of 1 files downloaded.  
Done. 00:00:01.8 (402.4MB/s)
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb:   1 of 1 files downloaded.  
Done. 00:00:01.9 (1651.1MB/s)
/home/celikm5/Projects/decima/src/decima/interpret/attributer.py:66: UserWarning: `off_tasks` is not provided. Using all other tasks as off_tasks.
Computing attributions...:   0%|          | 0/1 [00:00<?, ?it/s]
Computing attributions...: 100%|██████████| 1/1 [00:00<00:00,  1.08it/s]
Computing attributions...: 100%|██████████| 1/1 [00:00<00:00,  1.02it/s]

wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb:   1 of 1 files downloaded.  
Done. 00:00:01.8 (1737.2MB/s)
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb:   1 of 1 files downloaded.  
Done. 00:00:01.9 (1658.9MB/s)
Computing recursive seqlet calling...:   0%|          | 0/1 [00:00<?, ?it/s]
Computing recursive seqlet calling...: 100%|██████████| 1/1 [00:01<00:00,  1.08s/it]
Computing recursive seqlet calling...: 100%|██████████| 1/1 [00:01<00:00,  1.08s/it]

/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/memelite/fimo.py:406: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.

Similar to command line you can use predict_save_attributions and recursive_seqlet_calling functions calls attirubitions and seqlets step by step.

Custom Sequences

Attributions for a custom sequence can be calculated by passing data frame with columns of seq, gene_mask_start, gene_mask_end. The index of the DataFrame will be used as gene names.

import pandas as pd

df_seqs = pd.read_csv("../tests/data/seqs.csv", index_col=0)
df_seqs
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[28], line 3
      1 import pandas as pd
----> 3 df_seqs = pd.read_csv("../tests/data/seqs.csv", index_col=0)
      4 df_seqs

File ~/miniforge3/envs/decima2/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1026, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, date_format, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options, dtype_backend)
   1013 kwds_defaults = _refine_defaults_read(
   1014     dialect,
   1015     delimiter,
   (...)   1022     dtype_backend=dtype_backend,
   1023 )
   1024 kwds.update(kwds_defaults)
-> 1026 return _read(filepath_or_buffer, kwds)

File ~/miniforge3/envs/decima2/lib/python3.11/site-packages/pandas/io/parsers/readers.py:620, in _read(filepath_or_buffer, kwds)
    617 _validate_names(kwds.get("names", None))
    619 # Create the parser.
--> 620 parser = TextFileReader(filepath_or_buffer, **kwds)
    622 if chunksize or iterator:
    623     return parser

File ~/miniforge3/envs/decima2/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1620, in TextFileReader.__init__(self, f, engine, **kwds)
   1617     self.options["has_index_names"] = kwds["has_index_names"]
   1619 self.handles: IOHandles | None = None
-> 1620 self._engine = self._make_engine(f, self.engine)

File ~/miniforge3/envs/decima2/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1880, in TextFileReader._make_engine(self, f, engine)
   1878     if "b" not in mode:
   1879         mode += "b"
-> 1880 self.handles = get_handle(
   1881     f,
   1882     mode,
   1883     encoding=self.options.get("encoding", None),
   1884     compression=self.options.get("compression", None),
   1885     memory_map=self.options.get("memory_map", False),
   1886     is_text=is_text,
   1887     errors=self.options.get("encoding_errors", "strict"),
   1888     storage_options=self.options.get("storage_options", None),
   1889 )
   1890 assert self.handles is not None
   1891 f = self.handles.handle

File ~/miniforge3/envs/decima2/lib/python3.11/site-packages/pandas/io/common.py:873, in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
    868 elif isinstance(handle, str):
    869     # Check whether the filename is to be opened in binary mode.
    870     # Binary mode does not support 'encoding' and 'newline'.
    871     if ioargs.encoding and "b" not in ioargs.mode:
    872         # Encoding
--> 873         handle = open(
    874             handle,
    875             ioargs.mode,
    876             encoding=ioargs.encoding,
    877             errors=errors,
    878             newline="",
    879         )
    880     else:
    881         # Binary mode
    882         handle = open(handle, ioargs.mode)

FileNotFoundError: [Errno 2] No such file or directory: '../tests/data/seqs.csv'
predict_attributions_seqlet_calling(
    output_prefix="example/attrs_custom_seqs_monoctypes",
    seqs=df_seqs,  # <-- custom sequences
    tasks=f"cell_type in {spi1_cell_types}",
    device=device,
)
! ls attrs_custom_seqs_monoctypes
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[29], line 3
      1 predict_attributions_seqlet_calling(
      2     output_prefix="example/attrs_custom_seqs_monoctypes",
----> 3     seqs=df_seqs,  # <-- custom sequences
      4     tasks=f"cell_type in {spi1_cell_types}",
      5     device=device,
      6 )
      7 get_ipython().system(' ls attrs_custom_seqs_monoctypes')

NameError: name 'df_seqs' is not defined
import random
import torch
from grelu.sequence.format import strings_to_one_hot
from decima.constants import DECIMA_CONTEXT_SIZE

DECIMA_CONTEXT_SIZE
524288
seqs = torch.cat(
    [
        strings_to_one_hot(
            ["".join(random.choice(["A", "T", "C", "G"]) for _ in range(DECIMA_CONTEXT_SIZE))]
        ),  # one-hot encoded sequence
        torch.ones(1, 1, DECIMA_CONTEXT_SIZE),  # binary mask for the gene
    ],
    dim=1,
)
seqs.shape
torch.Size([1, 5, 524288])
predict_attributions_seqlet_calling(
    output_prefix="example/attrs_custom_tensors_monoctypes",
    seqs=seqs,  # <-- custom sequences as torch.Tensor where (batch_size, 5, seq_len), second dimension is one-hot encoded sequence and binary mask for the gene
    tasks=f"cell_type in {spi1_cell_types}",
    device=device,
    model=0,
    threshold=1e-6,
)
wandb: Downloading large artifact 'rep0:latest', 720.03MB. 1 files...
wandb:   1 of 1 files downloaded.  
Done. 00:00:00.6 (1145.4MB/s)
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb:   1 of 1 files downloaded.  
Done. 00:00:01.8 (1748.9MB/s)
/home/celikm5/Projects/decima/src/decima/interpret/attributer.py:66: UserWarning: `off_tasks` is not provided. Using all other tasks as off_tasks.
Computing attributions...:   0%|          | 0/1 [00:00<?, ?it/s]
Computing attributions...: 100%|██████████| 1/1 [00:00<00:00,  1.08it/s]
Computing attributions...: 100%|██████████| 1/1 [00:00<00:00,  1.02it/s]

Saving sequences...: 0it [00:00, ?it/s]
Saving sequences...: 1it [00:00, 8525.01it/s]

wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb:   1 of 1 files downloaded.  
Done. 00:00:01.7 (1786.9MB/s)
Computing recursive seqlet calling...:   0%|          | 0/1 [00:00<?, ?it/s]
Computing recursive seqlet calling...: 100%|██████████| 1/1 [00:00<00:00, 1403.25it/s]

/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/memelite/fimo.py:406: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
! ls example/attrs_custom_tensors_monoctypes*
example/attrs_custom_tensors_monoctypes.attributions.bigwig
example/attrs_custom_tensors_monoctypes.attributions.h5
example/attrs_custom_tensors_monoctypes.motifs.tsv
example/attrs_custom_tensors_monoctypes.seqlets.bed
example/attrs_custom_tensors_monoctypes.seqs.fasta
example/attrs_custom_tensors_monoctypes.seqs.fasta.fai
example/attrs_custom_tensors_monoctypes.warnings.qc.log

Advance Developer API

DecimaResult provides a unified interface for working with Decima results in anndata format. It contains an AnnData structure storing cell x gene expression data and metadata. Through DecimaResult, users can load pre-trained models, compute attributions to understand genomic regulation, and analyze results through visualizations or export to genomic file formats. The object provides convenient access to cell and gene annotations through its metadata properties.

from decima import DecimaResult

result = DecimaResult.load()
wandb: Downloading large artifact 'metadata:latest', 3122.32MB. 1 files...
wandb:   1 of 1 files downloaded.  
Done. 00:00:01.9 (1639.9MB/s)
result.cell_metadata.query("cell_type.str.endswith('macrophage')")
cell_type tissue organ disease study dataset region subregion celltype_coarse n_cells total_counts n_genes size_factor train_pearson val_pearson test_pearson
agg_4063 alveolar macrophage alveolar system lung COVID-19 GSE155249 scimilarity nan nan NaN 1453 8.001524e+06 14711 36293.472025 0.943059 0.837210 0.849998
agg_4064 alveolar macrophage alveolar system lung healthy GSE155249 scimilarity nan nan NaN 1279 7.598244e+06 13673 34158.514496 0.932819 0.831024 0.843684
agg_4065 alveolar macrophage left lung lung NA ENCODE scimilarity nan nan NaN 405 3.000961e+06 16595 46501.375857 0.936081 0.847924 0.845485
agg_4066 alveolar macrophage lingula of left lung lung healthy a3ffde6c-7ad2-498a-903c-d58e732f7470 scimilarity nan nan NaN 854 1.713753e+06 15110 42773.009735 0.893927 0.806000 0.804835
agg_4067 alveolar macrophage lower lobe of left lung lung NA ENCODE scimilarity nan nan NaN 763 1.344798e+07 17973 49020.804487 0.940586 0.854680 0.863014
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
agg_6644 macrophage uterus uterus healthy 32f2fd23-ec74-486f-9544-e5b2f41725f5 scimilarity nan nan NaN 425 4.340830e+06 15233 36624.136739 0.954753 0.850247 0.843175
agg_6645 macrophage uterus uterus healthy e5f58829-1a66-40b5-a624-9046778e74f5 scimilarity nan nan NaN 231 3.007554e+07 14787 27615.762157 0.839476 0.730554 0.719085
agg_6646 macrophage vasculature vasculature healthy e5f58829-1a66-40b5-a624-9046778e74f5 scimilarity nan nan NaN 12497 4.040685e+08 18199 36829.498964 0.938862 0.836819 0.833474
agg_6647 macrophage visceral fat adipose obesity GSE128518 scimilarity nan nan NaN 729 2.078431e+06 13760 34188.716187 0.941596 0.827360 0.823912
agg_6648 macrophage white adipose tissue adipose NA GSE128890 scimilarity nan nan NaN 45 1.381560e+05 8257 27604.748095 0.859386 0.745328 0.745539

325 rows × 16 columns

The results and metadata stored in anndata format which you can access directly if needed but most operation are supported by DecimaResult object.

result.anndata
AnnData object with n_obs × n_vars = 8856 × 18457
    obs: 'cell_type', 'tissue', 'organ', 'disease', 'study', 'dataset', 'region', 'subregion', 'celltype_coarse', 'n_cells', 'total_counts', 'n_genes', 'size_factor', 'train_pearson', 'val_pearson', 'test_pearson'
    var: 'chrom', 'start', 'end', 'strand', 'gene_type', 'frac_nan', 'mean_counts', 'n_tracks', 'gene_start', 'gene_end', 'gene_length', 'gene_mask_start', 'gene_mask_end', 'frac_N', 'fold', 'dataset', 'gene_id', 'pearson', 'size_factor_pearson', 'ensembl_canonical_tss'
    layers: 'preds', 'v1_rep0', 'v1_rep1', 'v1_rep2', 'v1_rep3'

These are the cell metadata contained in the Decima object.

result.cell_metadata
cell_type tissue organ disease study dataset region subregion celltype_coarse n_cells total_counts n_genes size_factor train_pearson val_pearson test_pearson
agg_0 Amygdala excitatory Amygdala_Amygdala CNS healthy jhpce#tran2021 brain_atlas Amygdala Amygdala NaN 331 1.592883e+07 17000 41431.465186 0.942459 0.841377 0.865640
agg_1 Amygdala excitatory Amygdala_Basolateral nuclear group (BLN) - lat... CNS healthy SCR_016152 brain_atlas Amygdala Basolateral nuclear group (BLN) - lateral nucl... NaN 11369 2.952133e+08 18080 40765.341481 0.943098 0.838936 0.861092
agg_2 Amygdala excitatory Amygdala_Bed nucleus of stria terminalis and n... CNS healthy SCR_016152 brain_atlas Amygdala Bed nucleus of stria terminalis and nearby - BNST NaN 139 2.593231e+06 15418 42556.387020 0.952170 0.854544 0.866654
agg_3 Amygdala excitatory Amygdala_Central nuclear group - CEN CNS healthy SCR_016152 brain_atlas Amygdala Central nuclear group - CEN NaN 3892 9.946371e+07 17959 42884.641430 0.959744 0.863585 0.881554
agg_4 Amygdala excitatory Amygdala_Corticomedial nuclear group (CMN) - a... CNS healthy SCR_016152 brain_atlas Amygdala Corticomedial nuclear group (CMN) - anterior c... NaN 2945 1.281619e+08 17885 41816.741933 0.951365 0.854304 0.868902
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
agg_9533 vascular associated smooth muscle cell upper lobe of right lung lung NA ENCODE scimilarity nan nan NaN 21 3.483375e+04 8515 35404.911768 0.735213 0.665647 0.654491
agg_9535 vascular associated smooth muscle cell urinary bladder urinary healthy GSE129845 scimilarity nan nan NaN 24 8.498500e+04 7337 26189.415789 0.809852 0.690022 0.656160
agg_9536 vascular associated smooth muscle cell uterus uterus NA ENCODE scimilarity nan nan NaN 272 5.700762e+05 14769 44938.403867 0.915329 0.808941 0.839993
agg_9537 vascular associated smooth muscle cell uterus uterus healthy e5f58829-1a66-40b5-a624-9046778e74f5 scimilarity nan nan NaN 472 1.089170e+07 14514 30145.422152 0.852339 0.717682 0.727469
agg_9538 vascular associated smooth muscle cell vasculature vasculature healthy e5f58829-1a66-40b5-a624-9046778e74f5 scimilarity nan nan NaN 1853 5.992697e+07 16764 36464.273371 0.909855 0.780413 0.796351

8856 rows × 16 columns

Similarly, these are the gene metadata contained in the Decima object.

result.gene_metadata
chrom start end strand gene_type frac_nan mean_counts n_tracks gene_start gene_end gene_length gene_mask_start gene_mask_end frac_N fold dataset gene_id pearson size_factor_pearson ensembl_canonical_tss
STRADA chr17 63381538 63905826 - protein_coding 0.000000 2.208074 7616 63682336 63741986 59650 163840 223490 0.000000 ['fold1'] train ENSG00000266173 0.469923 0.476627 63741799.0
ETV4 chr17 43219172 43743460 - protein_coding 0.030873 0.925863 5004 43527844 43579620 51776 163840 215616 0.000000 ['fold1'] train ENSG00000175832 0.738092 0.613281 43546340.0
USP25 chr21 15566185 16090473 + protein_coding 0.000000 3.650355 8604 15730025 15880069 150044 163840 313884 0.000000 ['fold6'] train ENSG00000155313 0.905222 0.784446 15729982.0
ZSWIM5 chr1 44945761 45470049 - protein_coding 0.000620 2.190115 6123 45016399 45306209 289810 163840 453650 0.000000 ['fold5'] train ENSG00000162415 0.961772 0.795131 45206605.0
C21orf58 chr21 45963427 46487715 - protein_coding 0.000791 1.650467 7354 46300181 46323875 23694 163840 187534 0.000000 ['fold6'] train ENSG00000160298 0.645268 0.412368 46323870.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
NPDC1 chr9 136685731 137210019 - protein_coding 0.000000 2.625285 7852 137039463 137046179 6716 163840 170556 0.000000 ['fold3'] test ENSG00000107281 0.316322 0.178204 137046177.0
ZNF425 chr7 148765876 149290164 - protein_coding 0.001048 1.292957 6511 149102784 149126324 23540 163840 187380 0.000000 ['fold7'] train ENSG00000204947 0.821292 0.737081 149126324.0
COL5A1 chr9 134477934 135002222 + protein_coding 0.002159 1.492664 6209 134641774 134844843 203069 163840 366909 0.000000 ['fold3'] test ENSG00000130635 0.766624 0.456999 134641803.0
BRD3 chr9 133708087 134232375 - protein_coding 0.000000 3.190450 8675 134030305 134068535 38230 163840 202070 0.004662 ['fold3'] test ENSG00000169925 0.344062 0.280283 134068026.0
EVI5L chr19 7666393 8190681 + protein_coding 0.000000 1.959605 7570 7830233 7864976 34743 163840 198583 0.000000 ['fold3'] test ENSG00000142459 0.810152 0.704828 7830218.0

18457 rows × 20 columns

You can also access the genes and cells:

result.genes
Index(['STRADA', 'ETV4', 'USP25', 'ZSWIM5', 'C21orf58', 'MIR497HG', 'CFAP74',
       'GSE1', 'LPP', 'CLK1',
       ...
       'STRIP2', 'TNFRSF1A', 'RBM14-RBM4', 'C1orf21', 'LINC00639', 'NPDC1',
       'ZNF425', 'COL5A1', 'BRD3', 'EVI5L'],
      dtype='object', length=18457)

Cell indexes can be also accessed:

result.cells
Index(['agg_0', 'agg_1', 'agg_2', 'agg_3', 'agg_4', 'agg_5', 'agg_6', 'agg_7',
       'agg_8', 'agg_9',
       ...
       'agg_9528', 'agg_9529', 'agg_9530', 'agg_9531', 'agg_9532', 'agg_9533',
       'agg_9535', 'agg_9536', 'agg_9537', 'agg_9538'],
      dtype='object', length=8856)

Predicted gene expression for specific gene can be accessed:

result.predicted_expression_matrix(genes=["SPI1"])
SPI1
agg_0 0.256442
agg_1 0.221014
agg_2 0.179371
agg_3 0.219646
agg_4 0.217516
... ...
agg_9533 0.493780
agg_9535 0.292091
agg_9536 0.370765
agg_9537 0.168036
agg_9538 0.239733

8856 rows × 1 columns

Or for all the genes:

result.predicted_expression_matrix()
STRADA ETV4 USP25 ZSWIM5 C21orf58 MIR497HG CFAP74 GSE1 LPP CLK1 ... STRIP2 TNFRSF1A RBM14-RBM4 C1orf21 LINC00639 NPDC1 ZNF425 COL5A1 BRD3 EVI5L
agg_0 2.973438 1.845565 4.592531 5.099802 1.774879 0.356812 2.590836 4.629774 4.897171 3.326940 ... 2.836060 0.297015 1.883849 4.293593 1.463565 3.183534 2.340202 2.374942 2.911916 3.230072
agg_1 2.954213 1.896726 4.688557 5.510440 1.666929 0.352725 2.292625 4.459535 4.915286 3.192858 ... 3.125704 0.242543 1.908177 4.439424 1.236739 3.494824 2.425672 2.054568 2.713408 3.491463
agg_2 2.938851 2.197247 4.861410 5.617520 1.773381 0.380867 2.394917 4.415038 4.836399 3.390717 ... 3.082098 0.263285 2.006456 4.383455 1.208590 4.013819 2.408381 2.297343 2.892222 3.695785
agg_3 3.045972 2.138573 4.863791 5.273604 1.760097 0.463555 2.391702 3.940975 4.857763 3.410926 ... 2.882890 0.290327 1.922963 4.550189 1.430520 3.693118 2.297103 2.121887 2.626117 3.223912
agg_4 3.025518 2.019096 4.602948 5.257001 1.755338 0.382190 2.432810 4.392480 4.959488 3.250500 ... 3.082296 0.258540 2.038277 4.464807 1.249043 3.665800 2.400820 2.255862 2.925619 3.471005
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
agg_9533 2.333562 0.633322 4.675825 2.793023 0.752030 0.692083 0.503531 4.327948 6.903193 3.695593 ... 0.549795 2.270181 1.563218 4.395422 0.550088 1.330252 1.044471 3.759369 2.491346 1.872717
agg_9535 0.835037 0.358773 1.964896 0.307449 0.337240 0.834196 0.093885 1.853794 3.700790 4.467302 ... 0.176885 1.370898 1.022708 3.400267 0.052162 1.908870 0.253417 1.448111 1.622033 1.064292
agg_9536 3.008039 1.209324 4.798392 3.931870 1.401328 1.638555 0.969720 4.779201 6.631931 4.127797 ... 1.174298 1.870530 2.506874 5.151776 0.967644 1.809947 2.205356 4.244005 2.974467 2.659873
agg_9537 1.241936 0.455059 2.919995 0.571672 0.486448 1.175586 0.145397 2.412148 4.759118 4.913945 ... 0.371035 1.361073 1.668085 4.005738 0.078611 1.571750 0.508187 2.067150 2.323764 1.429850
agg_9538 1.715507 0.700955 3.044732 0.858696 0.903406 1.763168 0.215304 2.604478 4.549708 4.839124 ... 0.594310 1.801298 2.075996 3.933860 0.165590 1.970268 0.993521 2.232347 2.473388 1.902884

8856 rows × 18457 columns

result.load_model(device=device)
wandb: WARNING A graphql request initiated by the public wandb API timed out (timeout=19 sec). Create a new API with an integer timeout larger than 19, e.g., `api = wandb.Api(timeout=29)` to increase the graphql timeout.
wandb: Downloading large artifact 'rep0:latest', 720.03MB. 1 files...
wandb:   1 of 1 files downloaded.  
Done. 00:00:00.7 (1008.5MB/s)
DecimaResult(anndata=AnnData object with n_obs × n_vars = 8856 × 18457
    obs: 'cell_type', 'tissue', 'organ', 'disease', 'study', 'dataset', 'region', 'subregion', 'celltype_coarse', 'n_cells', 'total_counts', 'n_genes', 'size_factor', 'train_pearson', 'val_pearson', 'test_pearson'
    var: 'chrom', 'start', 'end', 'strand', 'gene_type', 'frac_nan', 'mean_counts', 'n_tracks', 'gene_start', 'gene_end', 'gene_length', 'gene_mask_start', 'gene_mask_end', 'frac_N', 'fold', 'dataset', 'gene_id', 'pearson', 'size_factor_pearson', 'ensembl_canonical_tss'
    layers: 'preds', 'v1_rep0', 'v1_rep1', 'v1_rep2', 'v1_rep3')

Prepare an input for th SPI1 genes.

Takes around ~10 seconds on GPU and ~5 minutes to call attributions on CPU.

attrs = result.attributions(
    gene="SPI1",
    tasks=result.query_cells(f"cell_type in {spi1_cell_types}"),
    off_tasks=result.query_cells(f'organ == "blood" and cell_type not in {spi1_cell_types}'),
)

Attributions can be visualized and processed with attributions object:

attrs.peaks
peak start end attribution p-value from_tss
0 pos.SPI1@37 163877 163902 12.817252 2.186883e-11 37
1 pos.SPI1@-121 163719 163744 5.595659 1.899081e-05 -121
2 pos.SPI1@-57 163783 163803 9.307484 3.054640e-05 -57
3 pos.SPI1@62 163902 163909 1.281183 3.068997e-05 62
4 pos.SPI1@-79 163761 163765 0.833269 6.109865e-05 -79
... ... ... ... ... ... ...
72 neg.SPI1@443 164283 164293 -0.717349 4.916059e-04 443
73 neg.SPI1@23600 187440 187445 -0.267438 4.916059e-04 23600
74 neg.SPI1@32783 196623 196630 -0.461813 4.918151e-04 32783
75 neg.SPI1@1735 165575 165592 -1.437498 4.918151e-04 1735
76 neg.SPI1@31668 195508 195512 -0.213403 4.918151e-04 31668

135 rows × 6 columns

attrs.peaks_to_bed()
chrom start end name score strand attribution
38 chr11 47216350 47216357 pos.SPI1@162219 3.33494 . 0.543797
49 chr11 47257597 47257605 pos.SPI1@120971 3.31931 . 0.680714
65 chr11 47257633 47257637 neg.SPI1@120939 3.31455 . -0.221530
63 chr11 47257734 47257739 neg.SPI1@120837 3.32086 . -0.273840
43 chr11 47345731 47345736 neg.SPI1@32840 3.35317 . -0.298483
... ... ... ... ... ... ... ...
69 chr11 47395483 47395492 neg.SPI1@-16916 3.31094 . -0.567760
39 chr11 47400211 47400221 neg.SPI1@-21645 3.35527 . -0.900000
37 chr11 47400225 47400235 neg.SPI1@-21659 3.35844 . -0.729126
68 chr11 47400376 47400382 neg.SPI1@-21806 3.31094 . -0.329538
58 chr11 47400703 47400709 neg.SPI1@-22133 3.33067 . -0.325769

135 rows × 7 columns

attrs.plot_peaks()
import matplotlib.pyplot as plt

attrs.plot_seqlogo(relative_loc=-45)
plt.show()
../_images/6810de75f87b46c8dece1b8dbc94b0e2b1a9b90b061bed80ba588c1c2ed319e7.png

This comment takes around ~1 minutes and detects motifs in the attributions using FIMO. The motifs are ranked by their attribution scores:

df_motifs = attrs.scan_motifs()
df_motifs
motif peak start end strand score p-value matched_seq site_attr_score motif_attr_score from_tss
3874 ZNF746.H13CORE.0.PSG.A neg.SPI1@1917 165744 165770 - 26.355452 7.435330e-10 AGGGAGGAGGGAGGAAGGTGGGAGGA -0.010775 -0.016253 1904
3453 ZN263.H13CORE.1.P.B neg.SPI1@1898 165732 165753 + 24.008722 1.311946e-09 GGGGAGGAGGACAGGGAGGAG -0.006567 -0.016637 1892
781 ZN479.H13CORE.0.P.C neg.SPI1@-174 163668 163686 - 22.937369 2.837623e-09 GCCCCCAAAGTCATCCCT -0.007155 -0.013835 -172
1036 ZNF746.H13CORE.0.PSG.A neg.SPI1@-191 163639 163665 + 24.462995 3.833248e-09 TCTCCCTCCCATCCTCCCTCCCCAGC -0.002449 -0.001297 -201
3545 ZNF746.H13CORE.0.PSG.A neg.SPI1@1898 165732 165758 - 23.523391 7.853286e-09 GGGGAGGAGGACAGGGAGGAGGGAGG -0.005327 -0.010747 1892
... ... ... ... ... ... ... ... ... ... ... ...
1088 CREB3.H13CORE.0.SM.B neg.SPI1@-21 163819 163833 + 1.754682 4.999340e-04 GCGGTGATGTCACC -0.206348 -0.585193 -21
2067 RXRB.H13CORE.2.PS.A neg.SPI1@1182 165019 165030 - 12.213856 NaN CCATGACCTCT -0.008323 -0.024233 1179
2913 KLF7.H13CORE.0.P.B neg.SPI1@1813 165662 165672 + 15.217368 NaN GGGGGCGGGG 0.008973 0.025625 1822
2986 KLF7.H13CORE.0.P.B neg.SPI1@1832 165662 165672 + 15.217368 NaN GGGGGCGGGG 0.008973 0.025625 1822
7451 KLF7.H13CORE.0.P.B pos.SPI1@1820 165662 165672 + 15.217368 NaN GGGGGCGGGG 0.008973 0.025625 1822

8556 rows × 11 columns

If you just want attribution tensor from input one_hot encoded sequence prepare your input and call attributions object:

one_hot_seq, gene_mask = result.prepare_one_hot("SPI1")
inputs = torch.vstack([one_hot_seq, gene_mask]).unsqueeze(0)
inputs.shape  # (batch_size, 5, seq_len)
torch.Size([1, 5, 524288])
from decima.interpret.attributer import DecimaAttributer

attributer = DecimaAttributer(
    model=result.model,
    tasks=result.query_cells(f"cell_type in {spi1_cell_types}"),
    off_tasks=result.query_cells(f'organ == "blood" and cell_type not in {spi1_cell_types}'),
    transform="specificity",
    method="inputxgradient",
)
attrs = attributer.attribute(inputs=inputs)

attrs  # (batch_size, 4, seq_len) gene mask is removed from final attributions
tensor([[[-0.0000e+00,  0.0000e+00, -0.0000e+00,  ..., -0.0000e+00,
           0.0000e+00,  0.0000e+00],
         [-0.0000e+00, -0.0000e+00, -2.6888e-05,  ...,  0.0000e+00,
           0.0000e+00,  0.0000e+00],
         [-0.0000e+00,  1.2651e-04,  0.0000e+00,  ...,  3.7016e-05,
          -0.0000e+00,  1.5136e-05],
         [ 1.7333e-04, -0.0000e+00,  0.0000e+00,  ..., -0.0000e+00,
           1.2473e-05, -0.0000e+00]]], device='cuda:0')