decima.vep package

Module contents

decima.vep.predict_variant_effect(df_variant, output_pq=None, tasks=None, model=0, metadata_anndata=None, chunksize=10000, batch_size=8, num_workers=16, device=None, include_cols=None, gene_col=None, distance_type='tss', min_distance=0, max_distance=inf, genome='hg38')[source]

Predict variant effect and save to parquet

Parameters:
  • df_variant (pd.DataFrame) – DataFrame with variant information

  • output_path (str) – Path to save the parquet file

  • tasks (str, optional) – Tasks to predict. Defaults to None.

  • model (int, optional) – Model to use. Defaults to 0.

  • metadata_anndata (str, optional) – Path to anndata file. Defaults to None.

  • chunksize (int, optional) – Number of variants to predict in each chunk. Defaults to 10_000.

  • batch_size (int, optional) – Batch size. Defaults to 8.

  • num_workers (int, optional) – Number of workers. Defaults to 16.

  • device (str, optional) – Device to use. Defaults to “cpu”.

  • include_cols (list, optional) – Columns to include in the output. Defaults to None.

  • gene_col (str, optional) – Column name for gene names. Defaults to None.

  • distance_type (str, optional) – Type of distance. Defaults to “tss”.

  • min_distance (float, optional) – Minimum distance from the end of the gene. Defaults to 0 (inclusive).

  • max_distance (float, optional) – Maximum distance from the TSS. Defaults to inf (exclusive).

  • genome (str, optional) – Genome build. Defaults to “hg38”.

Return type:

None