decima.utils package

Submodules

decima.utils.dataframe module

class decima.utils.dataframe.ChunkDataFrameWriter(output_path)[source]

Bases: object

__enter__()[source]
__exit__(exc_type, exc_val, exc_tb)[source]
__init__(output_path)[source]

Initialize ParquetWriter

Parameters:

output_path (str) – Path to the output parquet file

write(chunk)[source]

Write dataframe chunk to parquet file

Parameters:

chunk (pd.DataFrame) – DataFrame chunk to write

Return type:

None

decima.utils.dataframe.chunk_df(df, chunksize)[source]

Chunk dataframe into chunks of size chunksize

Parameters:
  • df (pd.DataFrame) – Input dataframe

  • chunksize (int) – Size of each chunk

Returns:

Generator of dataframe chunks

Return type:

Generator[pd.DataFrame, None, None]

decima.utils.dataframe.write_df_chunks_to_parquet(chunks, output_path)[source]

Write dataframe chunks to parquet file

Parameters:
  • chunks (Iterator[pd.DataFrame]) – Iterator of dataframe chunks

  • output_path (str) – Path to the output parquet file

Return type:

None

decima.utils.inject module

class decima.utils.inject.SeqBuilder(chrom, start, end, anchor, track=None)[source]

Bases: object

Build the sequence from the variants.

Parameters:
  • chrom (str) – chromosome

  • start (int) – start position

  • end (int) – end position

  • anchor (int) – anchor position

  • track (List[int]) – track positions shifts due to indels.

__init__(chrom, start, end, anchor, track=None)[source]
concat()[source]

Build the string from sequence objects.

Returns:

the final sequence.

Return type:

str

inject(variant)[source]

Inject the variant into the sequence.

Parameters:

variant (Dict) – variant to inject in the format of {“chrom”: str, “pos”: int, “ref”: str, “alt”: str}

Returns:

self

decima.utils.inject.prepare_seq_alt_allele(gene, variants)[source]

Prepare the sequence and alt allele for a gene.

Example

————–{———}——–: ref *——x——{———}——–: alt new sequence fetched from the upsteam due to deletion.

————–{———}——–: ref ————–{—-++—}—-++–: alt 4 bp cropped from the downstream due to insertion.

^anchor

Parameters:
  • gene (GeneMetadata) – gene metadata in the format of GeneMetadata.

  • variants (List[Dict]) – variants to inject in the format of [{“chrom”: str, “pos”: int, “ref”: str, “alt”: str}, …].

Returns:

the sequence (str) and gene mask start and end positions (int, int)

Return type:

tuple

decima.utils.io module

decima.utils.io.read_fasta_gene_mask(fasta_file)[source]
Return type:

DataFrame

decima.utils.io.read_vcf_chunks(vcf_file, chunksize)[source]
Return type:

Iterator[DataFrame]

decima.utils.variant module

decima.utils.variant.process_variants(variants, ad=None, min_from_end=0)[source]

Module contents

decima.utils.get_compute_device(device=None)[source]

Get the best available device for computation.

Parameters:

device (Optional[str]) – Optional device specification. If None, automatically selects best available device.

Returns:

The selected device for computation

Return type:

torch.device