grelu.interpret.simulate#

Functions#

`marginalize_patterns`(→ Union[numpy.ndarray, ...)	Runs a marginalization experiment.
`marginalize_pattern_spacing`(→ Union[numpy.ndarray, ...)	Runs a marginalization experiment to predict the impact of the spacing between
`shuffle_tiles`(→ Union[pandas.DataFrame, ...)	Dataset class to perform regulatory element discovery by shuffling tiles along

Module Contents#

Runs a marginalization experiment.

Given a model, a pattern (short sequence) to insert, and a set of background sequences, get the predictions from the model before and after inserting the patterns into the dinucleotide-shuffled background sequences.

Parameters:

model – trained model of class grelu.lightning.LightningModel
patterns – a sequence or list of sequences to insert
seqs – background sequences
genome – Name of the genome to use if genomic intervals are supplied
devices – Index of device on which to run inference
num_workers – Number of workers for inference
batch_size – Batch size for inference
seed – Random seed
prediction_transform – A module to transform the model output
rc – If True, augment by reverse complementation
compare_func – Function to compare the predictions with and without the pattern. Options are “divide” or “subtract”. If not provided, the predictions before and after pattern insertion will be returned.

Returns:

The predictions from the background sequences preds_after: The predictions after inserting the pattern into

the background sequences.

Return type:

preds_before

grelu.interpret.simulate.marginalize_pattern_spacing(model: Callable, seqs: str | Sequence | pandas.DataFrame | numpy.ndarray, fixed_pattern: str, moving_pattern: str, genome: str | None = None, stride: int = 1, n_shuffles: int = 1, rc: bool = False, seed: int = 0, devices: str | int | List[int] = 'cpu', num_workers: int = 1, batch_size: int = 64, prediction_transform: Callable | None = None, compare_func: str | Callable | None = None) → numpy.ndarray | Tuple[numpy.ndarray, numpy.ndarray][source]#

Runs a marginalization experiment to predict the impact of the spacing between two patterns (sub-sequences). Given a model and a set of background sequences, dinucleotide-shuffles the sequences, inserts the fixed pattern into the center of each shuffled sequence, then gets the predictions from the model on inserting the moving pattern at different distances from the fixed pattern. :param model: trained model of class grelu.lightning.LightningModel :param seqs: DNA sequences as intervals, strings, integer encoded or one-hot encoded. :param fixed_pattern: A subsequence to insert in the center of each background sequence. :param moving_pattern: A subsequence to insert into the background sequences at

different distances from fixed_motif.

Parameters:

stride – Number of bases by which to shift the moving pattern.
genome – The name of the genome from which to read sequences. This is only needed if genomic intervals are supplied in seqs.
n_shuffles – Number of times to shuffle each sequence in seqs, to generate a background distribution.
rc – If True, augment by reverse complementation
seed – Seed for random number generator
devices – Index of device on which to run inference
num_workers – Number of workers for inference
batch_size – Batch size for inference
prediction_transform – A module to transform the model output
compare_func – Function to compare the predictions with and without the moving pattern. Options are “divide” or “subtract”. If not provided, the predictions without the moving pattern will be returned separately.

Returns:

The predictions from the background sequences preds_after: The predictions after inserting the pattern into

the background sequences.

distances: A list containing the distance of the moving pattern from the fixed: pattern. Distances are the number of bases between the end of one motif and the start of the other. Negative values indicate that the moving pattern is to the left of the fixed pattern.

Return type:

preds_before

Dataset class to perform regulatory element discovery by shuffling tiles along the input sequences. :param model: trained model of class grelu.lightning.LightningModel :param seqs: DNA sequences as intervals, strings, integer encoded or one-hot encoded. :param tile_len: Length of tile to shuffle. :param stride: Distance between the start positions of successive tiles. :param protect_center: Length of central region to protect :param n_shuffles: Number of times to shuffle each tile. :param seed: Seed for random number generator :param genome: The name of the genome from which to read sequences. This

is only needed if genomic intervals are supplied in seqs.

Parameters:

deviced – Index of device on which to run inference
num_workers – Number of workers for inference
batch_size – Batch size for inference
prediction_transform – A module to transform the model output
compare_func – Function to compare the predictions after and before shuffling each tile. Options are “divide” or “subtract”. If not provided, the predictions before and after shuffling will be returned separately.

Returns:

Model predictions on the original sequences. after_preds: Model predictions on the sequences with shuffled tiles. tiles: Dataframe containing the coordinates of the tiles that were shuffled.

Return type:

before_preds

grelu.interpret.simulate#

Functions#

Module Contents#

This Page