grelu.model.models#
grelu.model.models defines complete architectures for sequence-to-function deep learning models.
All models inherit from the BaseModel class and are composed of embedding and head sections, which use classes defined in grelu.model.trunks and grelu.model.heads respectively.
All models have a forward function (inherited from BaseModel) which takes as input a one-hot encoded sequence tensor of shape (N, 4, length) and return a tensor of shape (N, tasks, output_length).
Classes#
Base model class |
|
A fully convolutional model that optionally includes pooling, |
|
A model architecture based on dilated convolutional layers with residual connections. |
|
A model consisting of a convolutional tower followed by a bidirectional GRU layer and optional pooling. |
|
A model consisting of a convolutional tower followed by a transformer encoder layer and optional pooling. |
|
A convolutional tower followed by a Multi-head perceptron (MLP) layer. |
|
Model consisting of Borzoi conv and transformer layers followed by U-net upsampling and optional pooling. |
|
Borzoi model with published weights (ported from Keras). |
|
The ExplaiNN model architecture. |
|
Enformer model architecture. |
|
Borzoi model with published weights (ported from Keras). |
Module Contents#
- class grelu.model.models.BaseModel(embedding: torch.nn.Module, head: torch.nn.Module)[source]#
Bases:
torch.nn.Module
Base model class
- class grelu.model.models.ConvModel(n_tasks: int, stem_channels: int = 64, stem_kernel_size: int = 15, n_conv: int = 2, channel_init: int = 64, channel_mult: float = 1, kernel_size: int = 5, dilation_init: int = 1, dilation_mult: float = 1, act_func: str = 'relu', norm: bool = False, pool_func: str | None = None, pool_size: int | None = None, residual: bool = False, dropout: float = 0.0, crop_len: int = 0, final_pool_func: str = 'avg', dtype=None, device=None)[source]#
Bases:
BaseModel
A fully convolutional model that optionally includes pooling, residual connections, batch normalization, or dilated convolutions.
- Parameters:
n_tasks – Number of channels in the output
stem_channels – Number of channels in the stem
stem_kernel_size – Kernel width for the stem
n_conv – Number of convolutional blocks, not including the stem
kernel_size – Convolutional kernel width
channel_init – Initial number of channels,
channel_mult – Factor by which to multiply the number of channels in each block
dilation_init – Initial dilation
dilation_mult – Factor by which to multiply the dilation in each block
act_func – Name of the activation function
pool_func – Name of the pooling function
pool_size – Width of the pooling layers
dropout – Dropout probability
norm – If True, apply batch norm
residual – If True, apply residual connection
crop_len – Number of positions to crop at either end of the output
final_pool_func – Name of the pooling function to apply to the final output. If None, no pooling will be applied at the end.
dtype – Data type for the layers.
device – Device for the layers.
- class grelu.model.models.DilatedConvModel(n_tasks: int, channels: int = 64, stem_kernel_size: int = 21, kernel_size: int = 3, dilation_mult: float = 2, act_func: str = 'relu', n_conv: int = 8, crop_len: str | int = 'auto', final_pool_func: str = 'avg', dtype=None, device=None)[source]#
Bases:
BaseModel
A model architecture based on dilated convolutional layers with residual connections. Inspired by the ChromBPnet model architecture.
- Parameters:
n_tasks – Number of channels in the output
channels – Number of channels for all convolutional layers
stem_kernel_size – Kernel width for the stem
n_blocks – Number of convolutional blocks, not including the stem
kernel_size – Convolutional kernel width
dilation_mult – Factor by which to multiply the dilation in each block
act_func – Name of the activation function
crop_len – Number of positions to crop at either end of the output
final_pool_func – Name of the pooling function to apply to the final output. If None, no pooling will be applied at the end.
dtype – Data type for the layers.
device – Device for the layers.
- class grelu.model.models.ConvGRUModel(n_tasks: int, stem_channels: int = 16, stem_kernel_size: int = 15, n_conv: int = 2, channel_init: int = 16, channel_mult: float = 1, kernel_size: int = 5, act_func: str = 'relu', conv_norm: bool = False, pool_func: str | None = None, pool_size: int | None = None, residual: bool = False, crop_len: int = 0, n_gru: int = 1, dropout: float = 0.0, gru_norm: bool = False, final_pool_func: str = 'avg', dtype=None, device=None)[source]#
Bases:
BaseModel
A model consisting of a convolutional tower followed by a bidirectional GRU layer and optional pooling.
- Parameters:
n_tasks – Number of channels in the output
stem_channels – Number of channels in the stem
stem_kernel_size – Kernel width for the stem
n_conv – Number of convolutional blocks, not including the stem
kernel_size – Convolutional kernel width
channel_init – Initial number of channels,
channel_mult – Factor by which to multiply the number of channels in each block
act_func – Name of the activation function
pool_func – Name of the pooling function
pool_size – Width of the pooling layers
conv_norm – If True, apply batch normalization in the convolutional layers.
residual – If True, apply residual connections in the convolutional layers.
crop_len – Number of positions to crop at either end of the output
n_gru – Number of GRU layers
dropout – Dropout for GRU and feed-forward layers
gru_norm – If True, include layer normalization in feed-forward network.
final_pool_func – Name of the pooling function to apply to the final output. If None, no pooling will be applied at the end.
dtype – Data type for the layers.
device – Device for the layers.
- class grelu.model.models.ConvTransformerModel(n_tasks: int, stem_channels: int = 16, stem_kernel_size: int = 15, n_conv: int = 2, channel_init: int = 16, channel_mult: float = 1, kernel_size: int = 5, act_func: str = 'relu', norm: bool = False, pool_func: str | None = None, pool_size: int | None = None, residual: bool = False, crop_len: int = 0, n_transformers=1, key_len: int = 8, value_len: int = 8, n_heads: int = 1, n_pos_features: int = 4, pos_dropout: float = 0.0, attn_dropout: float = 0.0, ff_dropout: float = 0.0, final_pool_func: str = 'avg', dtype=None, device=None)[source]#
Bases:
BaseModel
A model consisting of a convolutional tower followed by a transformer encoder layer and optional pooling.
- Parameters:
n_tasks – Number of channels in the output
stem_channels – Number of channels in the stem
stem_kernel_size – Kernel width for the stem
n_conv – Number of convolutional blocks, not including the stem
kernel_size – Convolutional kernel width
channel_init – Initial number of channels,
channel_mult – Factor by which to multiply the number of channels in each block
act_func – Name of the activation function
pool_func – Name of the pooling function
pool_size – Width of the pooling layers
norm – If True, apply batch normalization in the convolutional layers.
residual – If True, apply residual connections in the convolutional layers.
crop_len – Number of positions to crop at either end of the output
n_transformers – Number of transformer encoder layers
n_heads – Number of heads in each multi-head attention layer
n_pos_features – Number of positional embedding features
key_len – Length of the key vectors
value_len – Length of the value vectors.
pos_dropout – Dropout probability in the positional embeddings
attn_dropout – Dropout probability in the output layer
ff_droppout – Dropout probability in the linear feed-forward layers
final_pool_func – Name of the pooling function to apply to the final output. If None, no pooling will be applied at the end.
dtype – Data type for the layers.
device – Device for the layers.
- class grelu.model.models.ConvMLPModel(seq_len: int, n_tasks: int, stem_channels: int = 16, stem_kernel_size: int = 15, n_conv: int = 2, channel_init: int = 16, channel_mult: float = 1, kernel_size: int = 5, act_func: str = 'relu', conv_norm: bool = False, pool_func: str | None = None, pool_size: int | None = None, residual: bool = True, mlp_norm: bool = False, mlp_act_func: str | None = 'relu', mlp_hidden_size: List[int] = [8], dropout: float = 0.0, dtype=None, device=None)[source]#
Bases:
BaseModel
A convolutional tower followed by a Multi-head perceptron (MLP) layer.
- Parameters:
n_tasks – Number of channels in the output
seq_len – Input length
stem_channels – Number of channels in the stem
stem_kernel_size – Kernel width for the stem
n_conv – Number of convolutional blocks, not including the stem
kernel_size – Convolutional kernel width
channel_init – Initial number of channels,
channel_mult – Factor by which to multiply the number of channels in each block
act_func – Name of the activation function
pool_func – Name of the pooling function
pool_size – Width of the pooling
conv_norm – If True, apply batch norm in the convolutional layers
residual – If True, apply residual connection
mlp_norm – If True, apply layer norm in the MLP layers
mlp_hidden_size – A list containing the dimensions for each hidden layer of the MLP.
dropout – Dropout probability for the MLP layers.
dtype – Data type for the layers.
device – Device for the layers.
- class grelu.model.models.BorzoiModel(n_tasks: int, stem_channels: int = 512, stem_kernel_size: int = 15, init_channels: int = 608, channels: int = 1536, n_conv: int = 7, kernel_size: int = 5, n_transformers: int = 8, key_len: int = 64, value_len: int = 192, pos_dropout: float = 0.0, attn_dropout: float = 0.0, n_heads: int = 8, n_pos_features: int = 32, crop_len: int = 16, final_act_func: str | None = None, final_pool_func: str | None = 'avg', flash_attn=False, dtype=None, device=None)[source]#
Bases:
BaseModel
Model consisting of Borzoi conv and transformer layers followed by U-net upsampling and optional pooling.
- Parameters:
stem_channels – Number of channels in the first (stem) convolutional layer
stem_kernel_size – Width of the convolutional kernel in the first (stem) convolutional layer
init_channels – Number of channels in the first convolutional block after the stem
channels – Number of channels in the output of the convolutional tower
kernel_size – Width of the convolutional kernel
n_conv – Number of convolutional/pooling blocks
n_transformers – Number of stacked transformer blocks
n_pos_features – Number of features in the positional embeddings
n_heads – Number of attention heads
key_len – Length of the key vectors
value_len – Length of the value vectors.
pos_dropout – Dropout probability in the positional embeddings
attn_dropout – Dropout probability in the attention layer
crop_len – Number of positions to crop at either end of the output
head_act_func – Name of the activation function to use in the final layer
final_pool_func – Name of the pooling function to apply to the final output. If None, no pooling will be applied at the end.
flash_attn – If True, uses Flash Attention with Rotational Position Embeddings. key_len, value_len, pos_dropout and n_pos_features are ignored.
dtype – Data type for the layers.
device – Device for the layers.
- class grelu.model.models.BorzoiPretrainedModel(n_tasks: int, fold: int = 0, n_transformers: int = 8, crop_len=0, final_pool_func='avg', dtype=None, device=None)[source]#
Bases:
BaseModel
Borzoi model with published weights (ported from Keras).
- class grelu.model.models.ExplaiNNModel(n_tasks: int, in_len: int, channels=300, kernel_size=19, dtype=None, device=None)[source]#
Bases:
torch.nn.Module
The ExplaiNN model architecture.
- class grelu.model.models.EnformerModel(n_tasks: int, n_conv: int = 7, channels: int = 1536, n_transformers: int = 11, n_heads: int = 8, key_len: int = 64, attn_dropout: float = 0.05, pos_dropout: float = 0.01, ff_dropout: float = 0.4, crop_len: int = 0, final_act_func: str | None = None, final_pool_func: str | None = 'avg', dtype=None, device=None)[source]#
Bases:
BaseModel
Enformer model architecture.
- Parameters:
n_tasks – Number of tasks for the model to predict
n_conv – Number of convolutional/pooling blocks
channels – Number of output channels for the convolutional tower
n_transformers – Number of stacked transformer blocks
n_heads – Number of attention heads
key_len – Length of the key vectors
value_len – Length of the value vectors.
pos_dropout – Dropout probability in the positional embeddings
attn_dropout – Dropout probability in the output layer
ff_droppout – Dropout probability in the linear feed-forward layers
crop_len – Number of positions to crop at either end of the output
final_act_func – Name of the activation function to use in the final layer
final_pool_func – Name of the pooling function to apply to the final output. If None, no pooling will be applied at the end.
dtype – Data type for the layers.
device – Device for the layers.