grelu.sequence.metrics#

grelu.sequence.metrics contains functions to calculate metrics based on the content of a sequence.

Functions#

`gc`(→ Union[float, List[float]])	Calculate the GC fraction of the given DNA sequence(s).
`gc_distribution`(→ numpy.ndarray)	Calculate the histogram of GC content in a set of DNA sequences.

Module Contents#

Calculate the GC fraction of the given DNA sequence(s).

Parameters:

seqs – The DNA sequences whose GC content is to be calculated. These can be in any accepted format (intervals, strings, integer-encoded or one-hot encoded).
input_type – The format of the input sequences. Accepted values are “intervals”, “strings”, “indices” or “one_hot”. If not provided, it will be deduced from the data.
genome – Name of the genome to use if genomic intervals are provided.

Returns:

The fraction of the sequence comprised of G and C bases. If multiple sequences are provided, the output will be a list of values, one for each sequence.

Calculate the histogram of GC content in a set of DNA sequences.

Parameters:

seqs – DNA sequences, as intervals, strings, indices or one-hot.
binwidth – Width of the bins to use when calculating the histogram. Default is 0.1.
normalize – Whether to normalize the histogram so that the values sum to 1.
input_type – The format of the input sequences. Accepted values are intervals, strings, indices or one_hot. If not provided, it will be deduced from the data.
genome – Name of the genome to use if genomic intervals are supplied.

Returns:

The histogram of GC content, with length equal to 1/binwidth.

grelu.sequence.metrics#

Functions#

Module Contents#

This Page