grelu.io.genome#

grelu.io.genome contains functions for loading genomes and related annotation files. gReLU depends upon genomepy for many of these utilities. See https://vanheeringen-lab.github.io/genomepy/ for more.

Classes#

CustomGenome

A custom genome object that can be used to load a genome from a file.

Functions#

read_sizes(→ pandas.DataFrame)

Read the chromosome sizes file for a genome and return a

get_genome(→ Union[CustomGenome, genomepy.Genome])

Install a genome from genomepy and load it as a Genome object

read_gtf(→ pandas.DataFrame)

Install a genome annotation from genomepy and load it as a dataframe.

Module Contents#

class grelu.io.genome.CustomGenome(genome: str)[source]#

A custom genome object that can be used to load a genome from a file.

Parameters:

genome – Path to the genome file.

genome[source]#
_genome[source]#
_sizes_file[source]#
get_seq(chrom: str, start: int, end: int, rc: bool = False) str[source]#

Get the sequence for a given chromosome and interval.

property sizes_file: str[source]#
grelu.io.genome.read_sizes(genome: str = 'hg38') pandas.DataFrame[source]#

Read the chromosome sizes file for a genome and return a dataframe of chromosome names and sizes.

Parameters:

genome – Either a genome name to load from genomepy, or the path to a chromosome sizes file.

Returns:

A dataframe containing columns “chrom” (chromosome names) and “size” (chromosome size).

grelu.io.genome.get_genome(genome: str, **kwargs) CustomGenome | genomepy.Genome[source]#

Install a genome from genomepy and load it as a Genome object

Parameters:
  • genome – Name of the genome to load from genomepy

  • **kwargs – Additional arguments to pass to genomepy.install_genome

Returns:

Genome object

grelu.io.genome.read_gtf(genome: str, features: str | List[str] | None = None) pandas.DataFrame[source]#

Install a genome annotation from genomepy and load it as a dataframe. UCSC tools may need to be installed for this to work. See vanheeringen-lab/genomepy for details.

Parameters:
  • genome – Name of the genome to load from genomepy

  • features – A list of specific features to return, such as “exon”, “CDS” or “transcript”

Returns:

GTF annotations