Querying the public gReLU model zoo on Weights and Biases (wandb)#

This tutorial shows how to programmatically query our public model zoo and download models and datasets. You can also visit the model zoo in your browser at https://wandb.ai/grelu/.

Rules#

  • wandb projects are the main storage units for datasets and the models trained on them. The main idea is to always keep the links between the raw dataset, the preprocessed dataset and the models trained on them for reproducibility, documentation and sanity reasons.

  • The ideal wandb lineage is shown below. This lineage allows us to query project-model-dataset links via the API.

  • Each project contains a notebook describing the details of data preprocessing, model training and model testing (e.g. performance metrics on holdout data). For models trained by us, the training logs are also available and can be seen by visiting the model zoo website.

image.png

import os
import anndata
import grelu.resources

List all available projects in the zoo#

The grelu.resources module contains functions for interacting with the model zoo. First, we can list all available projects in the zoo:

grelu.resources.projects()
['alzheimers-variant-tutorial',
 'microglia-scatac-tutorial',
 'human-chromhmm-fullstack',
 'human-atac-catlas',
 'borzoi',
 'corces-microglia-scatac',
 'yeast-gpra',
 'enformer']

We choose the ‘human-atac-catlas’ project to interact with.

List all datasets and models in a project#

project_name = 'human-atac-catlas'

Individual objects such as datasets and models are stored as ‘artifacts’ under each project. Artifacts can be of different types, but the ones that we are generally interested in are “dataset” (the preprocessed dataset) and “model” (the trained model). We can search for these under the project of interest:

grelu.resources.artifacts(project_name, type_is="dataset")
['dataset']

This tells us that there is an artifact called “dataset” which is of the “dataset” type.

grelu.resources.artifacts(project_name, type_is="model")
['model']

This tells us that there is an artifact called “model” which is of the “model” type.

Download a dataset#

Let us now select the “dataset” artifact.

artifact = grelu.resources.get_artifact(
    name="dataset",
    project = project_name,
)
artifact
<Artifact QXJ0aWZhY3Q6ODUwODcxODM0>

We can download this artifact into a local directory.

artifact_dir = artifact.download()
artifact_dir
'/code/gReLU/docs/tutorials/artifacts/dataset:v1'

We can list the iles in this directory:

os.listdir(artifact_dir)
['preprocessed.h5ad']
ad = anndata.read_h5ad(os.path.join(artifact_dir, 'preprocessed.h5ad'))
ad
AnnData object with n_obs × n_vars = 204 × 1121319
    obs: 'cell type'
    var: 'chrom', 'start', 'end', 'cre_class', 'in_fetal', 'in_adult', 'cre_module', 'width'

We could download the trained model from the zoo in a similar way. However, we have an additional function to download a model from the zoo and directly load it into memory in one step.

One-step downloading and loading a model#

model = grelu.resources.load_model(
    project=project_name,
    model_name='model'
) # that's it!
type(model)
grelu.lightning.LightningModel