grelu.sequence.mutate#
Functions to mutate or alter DNA sequences in various ways.
Functions#
|
Introduce a mutation (substitution) in one or more bases of the sequence. |
|
Introduce an insertion in the sequence. |
|
Introduce a deletion in the sequence. |
|
Introduce a random single-base substitution into a DNA sequence. |
|
List all the positions at which two sequences of equal length differ. |
Module Contents#
- grelu.sequence.mutate.mutate(seq: str | numpy.ndarray, allele: str | int, pos: int | None = None, input_type: str | None = None) str | numpy.ndarray [source]#
Introduce a mutation (substitution) in one or more bases of the sequence.
- Parameters:
seq – A single DNA sequence in string or integer encoded format.
allele – The allele to substitute at the given position. The allele should be in the same format as the sequence.
pos – The start position at which to insert the allele into the input sequence. If None, the allele will be centered in the input sequence.
input_type – Format of the input sequence. Accepted values are “strings” or “indices”.
- Returns:
Mutated sequence in the same format as the input.
- Raises:
ValueError – if the input is not a string or integer encoded DNA sequence.
- grelu.sequence.mutate.insert(seq: str | numpy.ndarray, insert: str, pos: int | None = None, input_type: str | None = None, keep_len: bool = False, end: str = 'both') str | numpy.ndarray [source]#
Introduce an insertion in the sequence.
- Parameters:
seq – A single DNA sequence in string or integer encoded format.
insert – A sub-sequence to insert into the given sequence. The insert should be in the same format as the sequence.
pos – start position at which to insert the sub-sequence into the input sequence. If None, the insert will be centered in the input sequence.
input_type – Format of the input sequence. Accepted values are “strings” or “indices”.
keep_len – Whether to trim the sequence back to its original length after insertion.
end – Which end of the sequence to trim, if keep_len is True. Accepted values are “left”, “right” and “both”.
- Returns:
The insert-containing sequence in the same format as the input.
- Raises:
ValueError – if the input is not a string or integer encoded DNA sequence.
- grelu.sequence.mutate.delete(seq: str | numpy.ndarray, deletion_len: int = 0, pos: int | None = None, input_type: str | None = None, keep_len=False, end='both') str | numpy.ndarray [source]#
Introduce a deletion in the sequence.
- Parameters:
seq – A single DNA sequence in string or integer encoded format.
deletion_len – Number of bases to delete
pos – start position of the deletion. If None, the deletion will be centered in the input sequence.
input_type – Format of the input sequence. Accepted values are “strings” or “indices”.
keep_len – Whether to pad the sequence back to its original length with Ns after the deletion.
end – Which end of the sequence to pad, if keep_len is True. Accepted values are “left”, “right” and “both”.
- Returns:
The deletion-containing sequence in the same format as the input.
- Raises:
ValueError – if the input is not a string or integer encoded DNA sequence.
- grelu.sequence.mutate.random_mutate(seq: str | numpy.ndarray, rng: numpy.random.RandomState | None = None, pos: int | None = None, drop_ref: bool = True, input_type: str | None = None, protect: List[int] = []) str | numpy.ndarray [source]#
Introduce a random single-base substitution into a DNA sequence.
- Parameters:
seq – A single DNA sequence in string or integer encoded format.
rng – np.random.RandomState object for reproducibility
pos – Position at which to insert a random mutation. If None, a random position will be chosen.
drop_ref – If True, the reference base will be dropped from the list of possible bases at the mutated position. If False, there is a possibility that the original sequence will be returned.
input_type – Format of the input sequence. Accepted values are “strings” or “indices”.
protect – A list of positions to protect from mutation. Only needed if pos is None.
- Returns:
A mutated sequence in the same format as the input sequence
- Raises:
ValueError – if the input is not a string or integer encoded DNA sequence.
- grelu.sequence.mutate.seq_differences(seq1: str, seq2: str, verbose: bool = True) List[int] [source]#
List all the positions at which two sequences of equal length differ.
- Parameters:
seq1 – The first DNA sequence as a string.
seq2 – The second DNA sequence as a string.
verbose – If True, print out the base at each differing position along with the five bases before and after it.
- Returns:
A list of positions where the two sequences differ.
- Raises:
AssertionError – If the two input sequences have different lengths.