decima.vep package¶
Module contents¶
- decima.vep.predict_variant_effect(df_variant, output_pq=None, tasks=None, model=0, metadata_anndata=None, chunksize=10000, batch_size=8, num_workers=16, device=None, include_cols=None, gene_col=None, distance_type='tss', min_distance=0, max_distance=inf, genome='hg38')[source]¶
Predict variant effect and save to parquet
- Parameters:
df_variant (pd.DataFrame) – DataFrame with variant information
output_path (str) – Path to save the parquet file
tasks (str, optional) – Tasks to predict. Defaults to None.
model (int, optional) – Model to use. Defaults to 0.
metadata_anndata (str, optional) – Path to anndata file. Defaults to None.
chunksize (int, optional) – Number of variants to predict in each chunk. Defaults to 10_000.
batch_size (int, optional) – Batch size. Defaults to 8.
num_workers (int, optional) – Number of workers. Defaults to 16.
device (str, optional) – Device to use. Defaults to “cpu”.
include_cols (list, optional) – Columns to include in the output. Defaults to None.
gene_col (str, optional) – Column name for gene names. Defaults to None.
distance_type (str, optional) – Type of distance. Defaults to “tss”.
min_distance (float, optional) – Minimum distance from the end of the gene. Defaults to 0 (inclusive).
max_distance (float, optional) – Maximum distance from the TSS. Defaults to inf (exclusive).
genome (str, optional) – Genome build. Defaults to “hg38”.
- Return type: