Querying the gReLU model zoo on HuggingFace#

This tutorial shows how to query our public model zoo and download models and datasets. The model zoo is located at https://huggingface.co/collections/Genentech/grelu-model-zoo.

The model zoo was previously hosted on wandb (https://wandb.ai/grelu). This is deprecated and will be removed in future.

import grelu.resources

List available models#

grelu.resources.list_models()
['Genentech/human-atac-catlas-model',
 'Genentech/human-chromhmm-fullstack-model',
 'Genentech/enformer-model',
 'Genentech/decima-model',
 'Genentech/borzoi-model',
 'Genentech/GM12878_dnase-model',
 'Genentech/human-mpra-gosai-2024-model']

List available datasets#

grelu.resources.list_datasets()
['Genentech/alzheimers-variant-tutorial-data',
 'Genentech/binary-atac-tutorial-data',
 'Genentech/microglia-scatac-tutorial-data',
 'Genentech/human-atac-catlas-data',
 'Genentech/human-chromhmm-fullstack-data',
 'Genentech/enformer-data',
 'Genentech/decima-data',
 'Genentech/borzoi-data',
 'Genentech/GM12878_dnase-data']

Get information about a model#

Let’s get information about the Catlas ATAC-seq model. You can also browse it here: https://huggingface.co/Genentech/human-atac-catlas-model

grelu.resources.get_model_info(repo_id="Genentech/human-atac-catlas-model")
{'id': 'Genentech/human-atac-catlas-model',
 'tags': ['pytorch-lightning',
  'biology',
  'genomics',
  'tabular-classification',
  'dataset:Genentech/human-atac-catlas-data',
  'base_model:Genentech/enformer-model',
  'base_model:finetune:Genentech/enformer-model',
  'license:mit',
  'region:us'],
 'card_data': {'base_model': ['Genentech/enformer-model'],
  'datasets': ['Genentech/human-atac-catlas-data'],
  'eval_results': None,
  'language': None,
  'library_name': 'pytorch-lightning',
  'license': 'mit',
  'license_name': None,
  'license_link': None,
  'metrics': None,
  'model_name': None,
  'pipeline_tag': 'tabular-classification',
  'tags': ['biology', 'genomics']},
 'downloads': 0,
 'last_modified': datetime.datetime(2026, 2, 23, 21, 32, 59, tzinfo=datetime.timezone.utc),
 'files': ['.gitattributes',
  '2_train.ipynb',
  'README.md',
  'model.ckpt',
  'output.log']}

Note the entry ‘files’. We see that there is a model checkpoint named model.ckpt. Training logs (output.log) and code (2_train.ipynb) are also available.

Download and load a model#

We will now load the model checkpoint.

model = grelu.resources.load_model(repo_id="Genentech/human-atac-catlas-model", filename='model.ckpt')
type(model)
grelu.lightning.LightningModel

Training logs (output.log) and code (2_train.ipynb) can also be downloaded the same way.

Query model metadata (lineage)#

# Get datasets linked to a model
grelu.resources.get_datasets_by_model(repo_id="Genentech/human-atac-catlas-model")
['Genentech/human-atac-catlas-data']
# Get base models
grelu.resources.get_base_models(repo_id="Genentech/human-atac-catlas-model")
['Genentech/enformer-model']

We see that the model was trained by fine-tuning Enformer (‘Genentech/enformer-model’) on the ‘Genentech/human-atac-catlas-data’ dataset.

Get information about a dataset#

Let’s get metadata about the Genentech/enformer-data dataset. You can also see this here: https://huggingface.co/datasets/Genentech/enformer-data

grelu.resources.get_dataset_info("Genentech/enformer-data")
{'id': 'Genentech/enformer-data',
 'tags': ['task_categories:tabular-regression',
  'license:mit',
  'size_categories:10K<n<100K',
  'format:csv',
  'modality:tabular',
  'modality:text',
  'library:datasets',
  'library:dask',
  'library:polars',
  'library:mlcroissant',
  'region:us',
  'biology',
  'genomics'],
 'card_data': {'annotations_creators': None,
  'language_creators': None,
  'language': None,
  'license': 'mit',
  'multilinguality': None,
  'size_categories': ['10K<n<100K'],
  'source_datasets': None,
  'task_categories': ['tabular-regression'],
  'task_ids': None,
  'paperswithcode_id': None,
  'pretty_name': 'Enformer Intervals',
  'config_names': None,
  'train_eval_index': None,
  'tags': ['biology', 'genomics']},
 'downloads': 18,
 'last_modified': datetime.datetime(2026, 2, 23, 21, 50, 38, tzinfo=datetime.timezone.utc),
 'files': ['.gitattributes',
  'README.md',
  'data_human.ipynb',
  'data_mouse.ipynb',
  'human_intervals.tsv',
  'mouse_intervals.tsv']}

This contains the genomic intervals used to train enformer on human and mouse genomes, as well as code used to process these files.

Download and load a dataset#

Let’s download human_intervals.tsv. This function returns the path to the downloaded data.

dataset_path = grelu.resources.download_dataset(repo_id="Genentech/enformer-data", filename='human_intervals.tsv')
print(dataset_path)
/home/lala8/.cache/huggingface/hub/datasets--Genentech--enformer-data/snapshots/886ffff993ab1adf1830f4d8fb237f692603dae6/human_intervals.tsv
import pandas as pd
pd.read_table(dataset_path).head()
chrom start end split
0 chr18 895618 1092226 train
1 chr4 113598179 113794787 train
2 chr11 18394952 18591560 train
3 chr16 85772913 85969521 train
4 chr3 158353420 158550028 train

Legacy wandb access#

For legacy access to the wandb model zoo, use grelu.resources.wandb:

from grelu.resources import wandb
# wandb.projects()  # List all projects
# wandb.load_model(project="human-atac-catlas", model_name="model")