polygraph package¶

Submodules¶

polygraph.classifier module¶

polygraph.classifier.groupwise_svm(ad, reference_group, group_col='Group', cv=5, is_kernel=True, max_iter=1000, use_pca=False)[source]¶

Train an SVM to distinguish between each non-reference group and the reference group

Parameters:

ad (anndata.AnnData) – Anndata object containing sequence embeddings of shape (n_seqs x n_vars)
reference_group (str) – ID of group to use as reference
group_col (str) – Name of column in .obs containing group ID
cv (int) – Number of cross-validation folds
is_kernel (bool) – Whether ad.X is a symmetric kernel matrix
max_iter (int) – Maximum number of iterations for SVM
use_pca (bool) – Whether to use PCA distances

Returns:

Modified anndata object containing each: sequence’s predicted label in .obs, as well as SVM performance metrics in ad.uns[“svm_performance”]

Return type:

ad (anndata.AnnData)

polygraph.embedding module¶

polygraph.evolve module¶

polygraph.evolve.evolve(start_seq, reference_seqs, iter, model, k=None, drop_last_layers=None, batch_size=512, device='cpu', task=None, alpha=3)[source]¶

Directed evolution with an additional goal to increase similarity to reference sequences.

Parameters:

start_seq (str) – Start sequence
reference_seqs (list) – Reference sequences
iter (int) – Number of iterations
model (nn.Sequential) – Torch sequential model
k (int) – k-mer length for k-mer embedding.
drop_last_layers (int) – Number of terminal layers to drop from the model for model embedding.
batch_size (int) – Batch size for inference
device (int, str) – Index of device to use for inference
task (int) – Model output head. If None, average all heads.
alpha (int) – Relative weight for similarity

Returns:

Optimized sequence

Return type:

best_seq (str)

polygraph.input module¶

polygraph.input.download_gtex_tpm(download_dir='/home/runner/work/polygraph/polygraph/src/polygraph/resources/gtex')[source]¶

Download per-tissue TPM values from GTEX.

Parameters:: download_dir (str) – Path to directory in which to download file
Returns:: Path to downloaded local file
Return type:: (str)

polygraph.input.download_jaspar(family='vertebrates', download_dir='/home/runner/work/polygraph/polygraph/src/polygraph/resources/jaspar')[source]¶

Download and read the JASPAR database of TF motifs

Parameters:

family (str) – JASPAR family. one of “fungi”, “insects”, “nematodes”, “plants”, “urochordates”, “vertebrates”
download_dir (str) – Path to directory in which to download motifs

Returns:

Path to downloaded local file

Return type:

(str)

polygraph.input.load_gtex_tpm(download_dir='/home/runner/work/polygraph/polygraph/src/polygraph/resources/gtex')[source]¶

Load per-tissue TPM values from GTEX.

Parameters:: download_dir (str) – Path to directory in which to download file
Returns:: TPM matrix.
Return type:: (pd.DataFrame)

polygraph.input.read_meme_file(file)[source]¶

Read a motif database in MEME format

Parameters:: file (str) – path to MEME file
Returns:: List of pymemesuite.common.Motif objects bg (pymemesuite.common.Background): Background distribution
Return type:: motifs (list)

polygraph.input.read_seqs(file, sep='\t', incl_ids=False)[source]¶

Read sequences and group labels into a dataframe. This creates the input dataframe for all subsequent analyses.

Parameters:

file (str) – path to a text file containing no header. If incl_ids=True,
contain (the first column should contain IDs and the next two columns should)
incl_ids=False (sequence and group label. If)
should (the first two columns)
label. (contain sequence and group)
sep (str) – Column separator
incl_ids (bool) – Whether the first column corresponds to sequence IDs.

Returns:

Pandas dataframe with columns Sequence, Group: and a unique index.

Return type:

df (pd.DataFrame)

polygraph.likelihood module¶

class polygraph.likelihood.CharDataset(seqs)[source]¶

Bases: Dataset

encode(seq)[source]¶

polygraph.likelihood.compute_likelihood(seqs, model, batch_size=32, num_workers=1, device='cpu')[source]¶

Function to compute the log-likelihood of each sequence in the given list using the hyenaDNA model pretrained on the human genome.

Parameters:

seqs (str, list, pd.DataFrame) – DNA sequence, list of DNA sequences or a dataframe containing sequences in the column “Sequence”.
model (ConvLMHead) – HyenaDNA model
batch_size (int) – Batch size for inference
num_workers (int) – Number of workers for inference dataloader
device (int, str) – Device ID for inference

Returns:

Log-likelihoods for each sequence

Return type:

LL (list)

polygraph.likelihood.load_hyenadna(hyena_path, ckpt_dir='.', model='hyenadna-small-32k-seqlen')[source]¶

Loads the pretrained hyenaDNA foundation model.

Parameters:

hyena_path (str) – Path to the cloned hyenaDNA repo. The repo must be cloned with the recurse-submodules flag. See installation instructions at https://github.com/HazyResearch/hyena-dna/tree/main.
ckpt_dir (str) – Path to directory in which to download the model
model (str) – Name of the foundation model to download. See https://github.com/HazyResearch/hyena-dna/tree/main for options.

Returns:

Pretrained HyenaDNA model

Return type:

model (ConvLMHeadModel)

polygraph.models module¶

polygraph.models.batch(sequences, batch_size)[source]¶

Pad sequences to a constant length and split them into batches to pass to a model

Parameters:

sequences (list) – List of DNA sequences
batch_size (int) – Batch size

Returns:

sequence batch generator

polygraph.models.cell_type_specificity(seqs, on_target_col, off_target_cols)[source]¶

Calculate cell type specificity from predicted or measured output

Parameters:

seqs (pd.DataFrame) – Dataframe containing sequence predictions
on_target (str) – Column containing predictions in on-target cell type
off_target (list) – Columns containing predictions in off-target cell types.

Returns:

seqs with additional columns mingap, maxgap and meangap,: reporting 3 measures of cell type specificity for each sequence.

Return type:

(pd.DataFrame)

polygraph.models.enformer_embed(sequences, model, device='cpu')[source]¶

Embed a batch of sequences using pretrained or fine-tuned enformer

Parameters:

sequences (list) – List of sequences
model (Enformer) – pre-trained or fine-tuned enformer model

Returns:

np.array of shape (n_seqs x 3072)

polygraph.models.get_embeddings(seqs, model, batch_size, drop_last_layers=1, device='cpu', swapaxes=False)[source]¶

Get model embeddings for all sequences in a dataframe

Parameters:

seqs (list, pd.DataFrame) – List of sequences or dataframe containing sequences in the column “Sequence”.
model (nn.Sequential) – trained model
batch_size (int) – Batch size for inference
drop_last_layers (int) – Number of terminal layers to drop to get embeddings
device (str, int) – ID of GPU to perform inference.
swapaxes (bool) – If true, batches will be of shape (N, 4, L). Otherwise, shape will be (N, L, 4).

Returns:

np.array of shape (n_seqs x n_features)

polygraph.models.ism_score(model, seqs, batch_size, device='cpu', task=None)[source]¶

Get base-level importance scores for given sequence(s) using ISM

Parameters:

seqs (list, pd.DataFrame) – List of sequences or dataframe containing sequences in the column “Sequence”.
model (nn.Sequential) – trained model
batch_size (int) – Batch size for inference
device (str, int) – ID of GPU to perform inference.

Returns:

DataFrame of shape (n_seqs x n_outputs)

Return type:

(pd.DataFrame)

polygraph.models.load_enformer()[source]¶

Load pre-trained enformer model

Returns:: Pretrained model
Return type:: (Enformer)

polygraph.models.load_nucleotide_transformer(model='InstaDeepAI/nucleotide-transformer-2.5b-multi-species')[source]¶

Load pre-trained nucleotide transformer model

Parameters:: model (str) – Name of pretrained model to download
Returns:: Pre-trained model tokenizer (): Class to convert sequences to tokens
Return type:: model (EsmForMaskedLM)

polygraph.models.nucleotide_transformer_embed(seqs, model, tokenizer)[source]¶

Embed a batch of sequences using the pre-trained nucleotide transformer model

Parameters:

sequences (list) – List of sequences
model – pre-trained nucleotide transformer model

Returns:

np.array of shape (n_seqs x n_features)

polygraph.models.predict(seqs, model, batch_size, device='cpu')[source]¶

Predict sequence properties using a sequence-to-function model.

Parameters:

seqs (list, pd.DataFrame) – List of sequences or dataframe containing sequences in the column “Sequence”.
model (nn.Sequential) – trained model
batch_size (int) – Batch size for inference
device (str, int) – ID of GPU to perform inference.

Returns:

Array of shape (n_seqs x n_outputs)

Return type:

(np.array)

polygraph.models.robustness(model, seqs, batch_size, device='cpu', task=None, aggfunc='mean')[source]¶

Get robustness scores for given sequence(s) using ISM

Parameters:

seqs (list, pd.DataFrame) – List of sequences or dataframe containing sequences in the column “Sequence”.
model (nn.Sequential) – trained model
batch_size (int) – Batch size for inference
device (str, int) – ID of GPU to perform inference.
aggfunc (str) – Either ‘mean’ or ‘max’. Determines how to aggregate the effect of all possible single-base mutations.

Returns:

DataFrame of shape (n_seqs x n_outputs)

Return type:

(pd.DataFrame)

polygraph.models.sequential_embed(seqs, model, drop_last_layers, swapaxes=False, device='cpu')[source]¶

Embed a batch of sequences using a torch.nn.Sequential model

Parameters:

seqs (list) – List of sequences
model (nn.Sequential) – trained model
drop_last_layers (int) – Number of terminal layers to drop to get embeddings

Returns:

np.array of shape (n_seqs x n_features)

polygraph.motifs module¶

polygraph.motifs.get_motif_pairs(sites)[source]¶

List the pairs of motifs present in each sequence.

Parameters:: sites (pd.DataFrame) – Pandas dataframe containing FIMO output.
Returns:: Dataframe containing all motif pairs in each sequence with their orientation and distance.
Return type:: pairs (pd.DataFrame)

polygraph.motifs.motif_frequencies(sites, normalize=False, seqs=None)[source]¶

Count frequency of occurrence of motifs in a list of sequences

Parameters:

sites (list) – Output of scan function
normalize (bool) – Whether to normalize the resulting count matrix to correct for sequence length
seqs (pd.DataFrame) – Pandas dataframe containing DNA sequences. Needed if normalize=True.

Returns:

Count matrix with rows = sequences and columns = motifs

Return type:

cts (pd.DataFrame)

polygraph.motifs.motif_pair_differential_abundance(motif_pairs, seqs, reference_group, group_col='Group', max_prop_cutoff=0, min_prop_cutoff=0, ref_prop_cutoff=0)[source]¶

Compare the rate of occurence of pairwise combinations of motifs between groups

Parameters:

motif_pairs (pd.DataFrame) – Pandas dataframe containing the ouptut of get_motif_pairs.
seqs (pd.DataFrame) – Pandas dataframe containing sequences
reference_group (str) – ID of group to use as reference
group_col (str) – Name of column in seqs containing group IDs
max_prop_cutoff (int) – Limit to combinations with this proportion in at least one group.
min_prop_cutoff (float) – Limit to combinations with this proportion in in all groups.

Returns:

Pandas dataframe containing FDR-corrected significance: testing results for the occurrence of pairwise combinations between groups

Return type:

res (pd.DataFrame)

polygraph.motifs.motif_pair_differential_distance(motif_pairs, seqs, reference_group, group_col='Group', max_prop_cutoff=0, min_prop_cutoff=0, ref_prop_cutoff=0)[source]¶

Compare the distance between all motif pairs across groups.

Parameters:

motif_pairs (pd.DataFrame) – Pandas dataframe containing the ouptut of get_motif_pairs.
seqs (pd.DataFrame) – Pandas dataframe containing sequences
reference_group (str) – ID of group to use as reference
group_col (str) – Name of column in seqs containing group IDs
max_prop_cutoff (int) – Limit to combinations with this proportion in at least one group.
min_prop_cutoff (float) – Limit to combinations with this proportion in in all groups.

Returns:

Pandas dataframe containing FDR-corrected significance: testing results for the distance between paired motifs, between groups

Return type:

res (pd.DataFrame)

polygraph.motifs.motif_pair_differential_orientation(motif_pairs, seqs, reference_group, group_col='Group', max_prop_cutoff=0, min_prop_cutoff=0, ref_prop_cutoff=0)[source]¶

Compare the mutual orientation of all motif pairs between groups.

Parameters:

motif_pairs (pd.DataFrame) – Pandas dataframe containing the ouptut of get_motif_pairs.
seqs (pd.DataFrame) – Pandas dataframe containing sequences
reference_group (str) – ID of group to use as reference
group_col (str) – Name of column in seqs containing group IDs
max_prop_cutoff (int) – Limit to combinations with this proportion in at least one group.
min_prop_cutoff (float) – Limit to combinations with this proportion in in all groups.

Returns:

Pandas dataframe containing FDR-corrected significance: testing results for the mutual orientation of pairwise combinations between groups

Return type:

res (pd.DataFrame)

polygraph.motifs.nmf(counts, seqs, reference_group, group_col='Group', n_components=10)[source]¶

Perform NMF on motif count matrix

Parameters:

counts (pd.DataFrame) – motif count matrix where rows are sequences and columns are motifs.
seqs (pd.DataFrame) – pandas dataframe containing DNA sequences.
reference_group (str) – ID for the group to use as reference
group_col (str) – Name of the column in seqs containing group IDs
n_components (int) – Number of components or factors to extract using NMF

Returns:

Pandas dataframe of size sequences x factors, containing: the contribution of each factor to each sequence.
H (pd.DataFrame): Pandas dataframe of size factors x motifs, containing the: contribution of each motif to each factor.
res (pd.DataFrame): Pandas dataframe containing the FDR-corrected significance: testing results for factor contribution between groups.

Return type:

W (pd.DataFrame)

polygraph.motifs.scan(seqs, meme_file, group_col='Group', pthresh=0.001, rc=True)[source]¶

Scan a DNA sequence using motifs from a MEME file.

Parameters:

seqs (str) – Dataframe containing DNA sequences
meme_file (str) – Path to MEME file
group_col (str) – Column containing group IDs
pthresh (float) – p-value cutoff for binding sites
rc (bool) – Whether to scan the sequence reverse complement as well

Returns:

pd.DataFrame containing columns ‘MotifID’, ‘SeqID’, ‘start’, ‘end’, ‘strand’.

polygraph.motifs.score_sites(sites, seqs, scores)[source]¶

Calculate the average score of each motif site given base-level importance scores.

Parameters:

sites (pd.DataFrame) – Dataframe containing site positions
seqs (pd.DataFrame) – Dataframe containing sequences
scores (np.array) – Numpy array of shape (sequences x length)

Returns: sites (pd.DataFrame): ‘sites’ dataframe with an additional columns ‘score’

polygraph.sequence module¶

polygraph.sequence.ISM(seqs, drop_ref=False)[source]¶

Perform in-silico mutagenesis on given DNA sequence(s)

Parameters:

seqs (str, list, pd.DataFrame) – A DNA sequence, list of sequences or dataframe containing sequences in the column “Sequence”.
drop_ref (bool) – If True, do not return the original sequence.

Returns:

A list of all possible single-base mutated sequences: derived from the original sequences.

Return type:

(list)

polygraph.sequence.bleu_similarity(seqs, reference_seqs, max_k=4)[source]¶

Calculate the bleu similarity score between two sets of sequences.

Parameters:

seqs (list) – List of DNA sequences
reference_seqs (list) – List of DNA sequences
max_k (int) – Highest k-mer length for calculation. All k-mers of length 1 to max_k inclusive will be considered.

polygraph.sequence.fastsk(seqs, k=5, m=2)[source]¶

Compute a gapped k-mer kernel matrix for the given sequences using FastSK.

Parameters:

seqs (str, list, pd.DataFrame) – A DNA sequence, list of sequences or dataframe containing sequences in the column “Sequence”.
k (int) – k-mer length
m (int) – Number of mismatches allowed

Returns:

Array of shape (n_seqs, n_seqs) containing the gapped k-mer kernel.

Return type:

(np.array)

polygraph.sequence.gc(seqs)[source]¶

Calculate the GC fraction of a DNA sequence or list of sequences.

Parameters:: seqs (str, list, pd.DataFrame) – A DNA sequence, list of sequences or dataframe containing sequences in the column “Sequence”.
Returns:: The fraction of each sequence comprised of G and C bases.
Return type:: (list, float)

polygraph.sequence.groupwise_mean_edit_dist(seqs, group_col='Group')[source]¶: Calculate average edit distances between all groups of sequences

polygraph.sequence.kmer_frequencies(seqs, k, normalize=False, genome='hg38')[source]¶

Get frequencies of all kmers of length k in a sequence or sequences.

Parameters:

seqs (str, list, pd.DataFrame) – A DNA sequence, list of sequences or dataframe containing sequences in the column “Sequence”.
k (int) – The k-mer length.
normalize (bool, optional) – Whether to normalize the k-mer counts by sequence length. Default is False.

Returns:

A dataframe of shape (kmers x sequences), containing: the frequency of each k-mer in the sequence.

Return type:

(pd.DataFrame)

polygraph.sequence.kmer_positions(seq, kmer)[source]¶

Return all the locations of a given k-mer in a DNA sequence

Parameters:

seq (str) – the input DNA sequence
kmer (str) – the k-mer for which to search

Returns:

a numpy array containing the positions of the kmer

Return type:

(np.array)

polygraph.sequence.min_edit_distance(seqs, reference_seqs)[source]¶

For each sequence in a list, find the smallest edit distance between that sequence and a list of reference sequences

Parameters:

seqs (list) – List of sequences
reference_seqs (list) – List of sequences

Returns:

edit distance between each sequence in seqs and its closest reference sequence

polygraph.sequence.min_edit_distance_from_reference(seqs, reference_group, group_col='Group')[source]¶

For each sequence in non-reference groups, find the smallest edit distance: between that sequence and the sequences in the reference group.

Parameters:

seqs (pd.DataFrame) – Dataframe containing sequences in column “Sequence”
reference_group (str) – ID for the group to use as reference
group_col (str) – Name of the column containing group IDs

Returns:

list of edit distance between each sequence and its closest: reference sequence.

Set to 0 for reference sequences

Return type:

edit (np.array)

polygraph.sequence.unique_kmers(seq, k)[source]¶

Get all unique kmers of length k that are present in a DNA sequence.

Parameters:

seq (str) – the input DNA sequence
k (int) – length of k-mers to extract

Returns:

a set containing the unique kmers extracted from: the sequence.

Return type:

(set)

polygraph.stats module¶

polygraph.stats.groupwise_fishers(data, reference_group, val_col, reference_val=None, group_col='Group')[source]¶

Perform Fisher’s exact test for proportions between each non-reference group: and the reference group.

Parameters:

data (pd.DataFrame, anndata.AnnData) – Pandas dataframe with group IDs and values to compare, or an AnnData object containing this dataframe in .obs
val_col (str) – Name of column with values to compare
reference_group (str) – ID of group to use as reference
reference_val (str) – A specific value whose proportion is to be compared between groups
group_col (str) – Name of column containing group IDs

Returns:

Dataframe containing group proportions and FDR-corrected: p-values for each group.

Return type:

(pd.DataFrame)

polygraph.stats.groupwise_mann_whitney(data, val_col, reference_group, group_col='Group')[source]¶

Compare the mean values between each non-reference group and the: reference group using the Mann-Whitney U test.

Parameters:

data (pd.DataFrame, anndata.AnnData) – Pandas dataframe containing group IDs and values to compare, or an AnnData object containing this dataframe in .obs
val_col (str) – Name of column with values to compare
reference_group (str) – ID of group to use as reference
group_col (str) – Name of column containing group IDs

Returns:

Dataframe containing FDR-corrected p-values for each group.

Return type:

(pd.DataFrame)

polygraph.stats.kruskal_dunn(data, val_col, group_col='Group')[source]¶

Compare the mean values between all groups using the Kruskal-Wallis: test followed by Dunn’s post-hoc test

Parameters:

data (pd.DataFrame, anndata.AnnData) – Pandas dataframe with group IDs and values to compare, or an AnnData object containing this dataframe in .obs
val_col (str) – Name of column with values to compare
group_col (str) – Name of column containing group IDs

Returns:

Dictionary containing p-values for both Kruskal-Wallis and Dunn’s test.

Return type:

(dict)

polygraph.utils module¶

polygraph.utils.check_equal_lens(seqs)[source]¶

Given sequences, check whether they are all of equal length.

Parameters:: seqs (list, pd.DataFrame) – Either a list of DNA sequences, or a dataframe containing DNA sequences in the column “Sequence”.
Returns:: whether the sequences are all equal in length.
Return type:: (bool)

polygraph.utils.get_lens(seqs)[source]¶

Calculate the lengths of given DNA sequences.

Parameters:: seqs (str, list, pd.DataFrame) – A DNA sequence, list of sequences or dataframe containing sequences in the column “Sequence”.
Returns:: length of each sequence
Return type:: (int, list)

polygraph.utils.integer_encode(seqs)[source]¶

Encode DNA sequence(s) as a numpy array of integers.

Parameters:: seqs (str, list, pd.DataFrame) – seqs (str, list): A DNA sequence, list of sequences or dataframe containing sequences in the column “Sequence”.
Returns:: A 1-D or 2-D array containing the sequences encoded as integers.
Return type:: (np.array)

polygraph.utils.make_ids(seqs)[source]¶

Assign a unique index to each row of a dataframe

Parameters:: seqs (pd.DataFrame) – Pandas dataframe
Returns:: Modified database containing unique indices.
Return type:: seqs (pd.DataFrame)

polygraph.utils.pad_with_Ns(seqs, seq_len=None, end='both')[source]¶

Pads a sequence with Ns at the desired end until it reaches seq_len in length.

If seq_len is not provided, it is set to the length of the longest sequence.

Parameters:

seqs (str, list, pd.DataFrame) – DNA sequence, list of sequences or dataframe containing sequences in the column “Sequence”.
seq_len (int) – Length upto which to pad each sequence

Returns:

Padded sequences of length seq_len

Return type:

(str, list)

polygraph.utils.reverse_complement(seqs)[source]¶

Reverse complement DNA sequences

Parameters:: seqs (str, list, pd.DataFrame) – seqs (str, list): A DNA sequence, list of sequences or dataframe containing sequences in the column “Sequence”.
Returns:: reverse complemented sequences
Return type:: (str, list)

polygraph.visualize module¶

polygraph.visualize.boxplot(data, value_col, group_col='Group', fill_col=None)[source]¶

Plot boxplot of values in each group

Parameters:

data (pd.DataFrame, anndata.AnnData) – Pandas dataframe with group IDs and values to compare, or an AnnData object containing this dataframe in .obs
value_col (str) – Column containing values to plot
group_col (str) – Column containing group IDs
fill_col (str) – Column containing additional variable to split each group

polygraph.visualize.densityplot(data, value_col, group_col='Group')[source]¶

Plot density plot of values in each group

Parameters:

data (pd.DataFrame, anndata.AnnData) – Pandas dataframe with group IDs and values to compare, or an AnnData object containing this dataframe in .obs
value_col (str) – Column containing values to plot
group_col (str) – Column containing group IDs

polygraph.visualize.one_nn_frac_plot(ad, reference_group, group_col='Group')[source]¶

Plot a barplot showing the fraction of points in each group whose nearest neighbors are reference sequences.

Parameters:

ad (anndata.AnnData) – AnnData object containing sequence embedding.
reference_group (str) – Group to use as reference. This group will be plotted first.
group_col (str) – Column in ad.obs containing group IDs.
fill_col (str) – Column containing additional variable to split each group

polygraph.visualize.pca_plot(ad, group_col='Group', components=[0, 1], size=0.1, show_ellipse=True, reference_group=None)[source]¶

Plot PCA embeddings of sequences, colored by group.

Parameters:

ad (anndata.AnnData) – AnnData object containing PCA components.
group_col (str) – Column containing group IDs.
components (list) – PCA components to plot
size (float) – Size of points
show_ellipse (bool) – Fit each group with a multivariate normal distribution and display an ellipse representing the 95% confidence level.
reference_group (str) – Group to use as reference. This group will be plotted first.

polygraph.visualize.plot_factors_nmf(H, n_features=50, **kwargs)[source]¶

Plot heatmap of contributions of features to NMF factors

Parameters:

H (pd.DataFrame) – Dataframe of shape (factors, features)
n_features (int) – Number of features to cluster
**kwargs – Additional arguments to pass to sns.clustermap

polygraph.visualize.plot_seqs_nmf(W, reorder=True)[source]¶

Plot stacked barplot of the distribution of NMF factors among sequences, split by group

Parameters:

W (pd.DataFrame) – Dataframe of shape n_seqs x (n_factors+1). The last column should contain group IDs.
reorder (bool)

polygraph.visualize.umap_plot(ad, group_col='Group', size=0.1, show_ellipse=True, reference_group=None)[source]¶

Plot UMAP embeddings of sequences, colored by group.

Parameters:

ad (anndata.AnnData) – AnnData object containing UMAP embedding.
group_col (str) – Column containing group IDs.
size (float) – Size of points
show_ellipse (bool) – Outline each group with an ellipse.
reference_group (str) – Group to use as reference. This group will be plotted first.

polygraph.visualize.upset_plot(ad, group_col='Group')[source]¶

Plot UpSet plot showing the overlap between features present in different groups.

Parameters:

ad (anndata.AnnData) – AnnData object containing sequence embedding.
group_col (str) – Column in ad.obs containing group IDs.

polygraph package¶

Submodules¶

polygraph.classifier module¶

polygraph.embedding module¶

polygraph.evolve module¶

polygraph.input module¶

polygraph.likelihood module¶

polygraph.models module¶

polygraph.motifs module¶

polygraph.sequence module¶

polygraph.stats module¶

polygraph.utils module¶

polygraph.visualize module¶

Module contents¶

polygraph

Navigation

Related Topics