Querying the gReLU model zoo on HuggingFace#
This tutorial shows how to query our public model zoo and download models and datasets. The model zoo is located at https://huggingface.co/collections/Genentech/grelu-model-zoo.
The model zoo was previously hosted on wandb (https://wandb.ai/grelu). This is deprecated and will be removed in future.
import grelu.resources
List available models#
grelu.resources.list_models()
['Genentech/human-atac-catlas-model',
'Genentech/human-chromhmm-fullstack-model',
'Genentech/enformer-model',
'Genentech/decima-model',
'Genentech/borzoi-model',
'Genentech/GM12878_dnase-model',
'Genentech/human-mpra-gosai-2024-model']
List available datasets#
grelu.resources.list_datasets()
['Genentech/alzheimers-variant-tutorial-data',
'Genentech/binary-atac-tutorial-data',
'Genentech/microglia-scatac-tutorial-data',
'Genentech/human-atac-catlas-data',
'Genentech/human-chromhmm-fullstack-data',
'Genentech/enformer-data',
'Genentech/decima-data',
'Genentech/borzoi-data',
'Genentech/GM12878_dnase-data']
Get information about a model#
Let’s get information about the Catlas ATAC-seq model. You can also browse it here: https://huggingface.co/Genentech/human-atac-catlas-model
grelu.resources.get_model_info(repo_id="Genentech/human-atac-catlas-model")
{'id': 'Genentech/human-atac-catlas-model',
'tags': ['pytorch-lightning',
'biology',
'genomics',
'tabular-classification',
'dataset:Genentech/human-atac-catlas-data',
'base_model:Genentech/enformer-model',
'base_model:finetune:Genentech/enformer-model',
'license:mit',
'region:us'],
'card_data': {'base_model': ['Genentech/enformer-model'],
'datasets': ['Genentech/human-atac-catlas-data'],
'eval_results': None,
'language': None,
'library_name': 'pytorch-lightning',
'license': 'mit',
'license_name': None,
'license_link': None,
'metrics': None,
'model_name': None,
'pipeline_tag': 'tabular-classification',
'tags': ['biology', 'genomics']},
'downloads': 0,
'last_modified': datetime.datetime(2026, 2, 23, 21, 32, 59, tzinfo=datetime.timezone.utc),
'files': ['.gitattributes',
'2_train.ipynb',
'README.md',
'model.ckpt',
'output.log']}
Note the entry ‘files’. We see that there is a model checkpoint named model.ckpt. Training logs (output.log) and code (2_train.ipynb) are also available.
Download and load a model#
We will now load the model checkpoint.
model = grelu.resources.load_model(repo_id="Genentech/human-atac-catlas-model", filename='model.ckpt')
type(model)
grelu.lightning.LightningModel
Training logs (output.log) and code (2_train.ipynb) can also be downloaded the same way.
Query model metadata (lineage)#
# Get datasets linked to a model
grelu.resources.get_datasets_by_model(repo_id="Genentech/human-atac-catlas-model")
['Genentech/human-atac-catlas-data']
# Get base models
grelu.resources.get_base_models(repo_id="Genentech/human-atac-catlas-model")
['Genentech/enformer-model']
We see that the model was trained by fine-tuning Enformer (‘Genentech/enformer-model’) on the ‘Genentech/human-atac-catlas-data’ dataset.
Get information about a dataset#
Let’s get metadata about the Genentech/enformer-data dataset. You can also see this here: https://huggingface.co/datasets/Genentech/enformer-data
grelu.resources.get_dataset_info("Genentech/enformer-data")
{'id': 'Genentech/enformer-data',
'tags': ['task_categories:tabular-regression',
'license:mit',
'size_categories:10K<n<100K',
'format:csv',
'modality:tabular',
'modality:text',
'library:datasets',
'library:dask',
'library:polars',
'library:mlcroissant',
'region:us',
'biology',
'genomics'],
'card_data': {'annotations_creators': None,
'language_creators': None,
'language': None,
'license': 'mit',
'multilinguality': None,
'size_categories': ['10K<n<100K'],
'source_datasets': None,
'task_categories': ['tabular-regression'],
'task_ids': None,
'paperswithcode_id': None,
'pretty_name': 'Enformer Intervals',
'config_names': None,
'train_eval_index': None,
'tags': ['biology', 'genomics']},
'downloads': 18,
'last_modified': datetime.datetime(2026, 2, 23, 21, 50, 38, tzinfo=datetime.timezone.utc),
'files': ['.gitattributes',
'README.md',
'data_human.ipynb',
'data_mouse.ipynb',
'human_intervals.tsv',
'mouse_intervals.tsv']}
This contains the genomic intervals used to train enformer on human and mouse genomes, as well as code used to process these files.
Download and load a dataset#
Let’s download human_intervals.tsv. This function returns the path to the downloaded data.
dataset_path = grelu.resources.download_dataset(repo_id="Genentech/enformer-data", filename='human_intervals.tsv')
print(dataset_path)
/home/lala8/.cache/huggingface/hub/datasets--Genentech--enformer-data/snapshots/886ffff993ab1adf1830f4d8fb237f692603dae6/human_intervals.tsv
import pandas as pd
pd.read_table(dataset_path).head()
| chrom | start | end | split | |
|---|---|---|---|---|
| 0 | chr18 | 895618 | 1092226 | train |
| 1 | chr4 | 113598179 | 113794787 | train |
| 2 | chr11 | 18394952 | 18591560 | train |
| 3 | chr16 | 85772913 | 85969521 | train |
| 4 | chr3 | 158353420 | 158550028 | train |
Legacy wandb access#
For legacy access to the wandb model zoo, use grelu.resources.wandb:
from grelu.resources import wandb
# wandb.projects() # List all projects
# wandb.load_model(project="human-atac-catlas", model_name="model")