grelu.sequence.mutate#

Functions to mutate or alter DNA sequences in various ways.

Functions#

mutate(→ Union[str, numpy.ndarray])

Introduce a mutation (substitution) in one or more bases of the sequence.

insert(→ Union[str, numpy.ndarray])

Introduce an insertion in the sequence.

delete(→ Union[str, numpy.ndarray])

Introduce a deletion in the sequence.

random_mutate(→ Union[str, numpy.ndarray])

Introduce a random single-base substitution into a DNA sequence.

seq_differences(→ List[int])

List all the positions at which two sequences of equal length differ.

Module Contents#

grelu.sequence.mutate.mutate(seq: str | numpy.ndarray, allele: str | int, pos: int | None = None, input_type: str | None = None) str | numpy.ndarray[source]#

Introduce a mutation (substitution) in one or more bases of the sequence.

Parameters:
  • seq – A single DNA sequence in string or integer encoded format.

  • allele – The allele to substitute at the given position. The allele should be in the same format as the sequence.

  • pos – The start position at which to insert the allele into the input sequence. If None, the allele will be centered in the input sequence.

  • input_type – Format of the input sequence. Accepted values are “strings” or “indices”.

Returns:

Mutated sequence in the same format as the input.

Raises:

ValueError – if the input is not a string or integer encoded DNA sequence.

grelu.sequence.mutate.insert(seq: str | numpy.ndarray, insert: str, pos: int | None = None, input_type: str | None = None, keep_len: bool = False, end: str = 'both') str | numpy.ndarray[source]#

Introduce an insertion in the sequence.

Parameters:
  • seq – A single DNA sequence in string or integer encoded format.

  • insert – A sub-sequence to insert into the given sequence. The insert should be in the same format as the sequence.

  • pos – start position at which to insert the sub-sequence into the input sequence. If None, the insert will be centered in the input sequence.

  • input_type – Format of the input sequence. Accepted values are “strings” or “indices”.

  • keep_len – Whether to trim the sequence back to its original length after insertion.

  • end – Which end of the sequence to trim, if keep_len is True. Accepted values are “left”, “right” and “both”.

Returns:

The insert-containing sequence in the same format as the input.

Raises:

ValueError – if the input is not a string or integer encoded DNA sequence.

grelu.sequence.mutate.delete(seq: str | numpy.ndarray, deletion_len: int = 0, pos: int | None = None, input_type: str | None = None, keep_len=False, end='both') str | numpy.ndarray[source]#

Introduce a deletion in the sequence.

Parameters:
  • seq – A single DNA sequence in string or integer encoded format.

  • deletion_len – Number of bases to delete

  • pos – start position of the deletion. If None, the deletion will be centered in the input sequence.

  • input_type – Format of the input sequence. Accepted values are “strings” or “indices”.

  • keep_len – Whether to pad the sequence back to its original length with Ns after the deletion.

  • end – Which end of the sequence to pad, if keep_len is True. Accepted values are “left”, “right” and “both”.

Returns:

The deletion-containing sequence in the same format as the input.

Raises:

ValueError – if the input is not a string or integer encoded DNA sequence.

grelu.sequence.mutate.random_mutate(seq: str | numpy.ndarray, rng: numpy.random.RandomState | None = None, pos: int | None = None, drop_ref: bool = True, input_type: str | None = None, protect: List[int] = []) str | numpy.ndarray[source]#

Introduce a random single-base substitution into a DNA sequence.

Parameters:
  • seq – A single DNA sequence in string or integer encoded format.

  • rng – np.random.RandomState object for reproducibility

  • pos – Position at which to insert a random mutation. If None, a random position will be chosen.

  • drop_ref – If True, the reference base will be dropped from the list of possible bases at the mutated position. If False, there is a possibility that the original sequence will be returned.

  • input_type – Format of the input sequence. Accepted values are “strings” or “indices”.

  • protect – A list of positions to protect from mutation. Only needed if pos is None.

Returns:

A mutated sequence in the same format as the input sequence

Raises:

ValueError – if the input is not a string or integer encoded DNA sequence.

grelu.sequence.mutate.seq_differences(seq1: str, seq2: str, verbose: bool = True) List[int][source]#

List all the positions at which two sequences of equal length differ.

Parameters:
  • seq1 – The first DNA sequence as a string.

  • seq2 – The second DNA sequence as a string.

  • verbose – If True, print out the base at each differing position along with the five bases before and after it.

Returns:

A list of positions where the two sequences differ.

Raises:

AssertionError – If the two input sequences have different lengths.