Isosceles 0.2.1
Isosceles (Isoform Single-Cell and Long-read Expression Suite) is an R package dedicated to transcript detection and quantification from long read RNA-seq data, supporting both bulk and single-cell technologies.
Isosceles can be installed using the following commands:
install.packages(c("BiocManager", "devtools"))
BiocManager::install(c("scran", "scater", "uwot", "dittoSeq", "DEXSeq",
"Nebulosa", "ggbio", "BiocStyle"))
devtools::install_github("Genentech/Isosceles", dependencies = TRUE,
upgrade = TRUE, INSTALL_opts = "--install-tests")
We found that some versions of Isosceles’ dependencies don’t work together well, which might cause problems with testing the package or building the vignettes. If you encounter such issues, re-installing certain packages might be helpful:
install.packages("irlba")
devtools::install_github("powellgenomicslab/Nebulosa", upgrade = FALSE)
Load the Isosceles package:
library(Isosceles)
Isosceles requires the following input files:
gtf_to_intron_bed
function.When analyzing scRNA-Seq data, Isosceles requires its input BAM files to contain a tag storing the cell barcode (BC
tag by default, but it can be changed using the barcode_tag argument of the bam_to_tcc
function). Several pipelines are suitable for this purpose, and the most popular options are described below. Depending on the pipeline, some additional steps (such as UMI deduplication) might need to be undertaken in order to make the output BAM file compatible with Isosceles.
Sicelore is suite of tools designed to analyze long read single-cell sequencing data, utilizing Illumina short reads to guide cell barcode and UMI assignment. A tagged molecule BAM file created by following steps 1-8 of the workflow described on Sicelore’s webpage can be used directly by Isosceles. Cell barcodes are stored in the BC
tag, which is the default setting of the bam_to_tcc
function.
Sicelore v2.1 is the next release of the Sicelore suite that doesn’t rely on Illumina short read data for cell barcode and UMI assignment. A BAM file suitable for Isosceles analysis can be created by following steps 1-4b of the workflow described on the project’s webpage. As with the original version of Sicelore, cell barcodes are stored in the BC
tag.
wf-single-cell is a Nextflow-based pipeline for cell barcode and UMI identification in Oxford Nanopore long read data. Detailed instructions on running the workflow can be found on its webpage - we recommend running it with the ‘--merge_bam True’ flag in order to obtain one, merged BAM file for the processed data. As the output BAM file is not deduplicated by UMI, some additional steps need to be taken before it can be used with Isosceles.
First, aligned reads tagged with UMI sequences that are shorter than expected (and therefore interfering with UMI deduplication) need to be filtered out. This can be achieved with samtools (‘$alias’ should to be replaced with a prefix appropriate for the given analysis):
samtools view -b \
--expr 'length([UB]) == 12' \
$alias/bams/$alias.tagged.sorted.bam \
> filtered.bam
samtools index filtered.bam
In this case we used UMI length of 12 bp (appropriate for 3’ 10xv3 data), but that value might vary depending on the used protocol.
Read deduplication by UMI and mapping coordinates can be performed using UMI-tools:
umi_tools dedup \
--extract-umi-method=tag \
--method=directional \
--per-gene \
--per-cell \
--umi-tag=UB \
--cell-tag=CB \
--gene-tag=GN \
--stdin filtered.bam \
--stdout dedup.bam
samtools index dedup.bam
The deduplicated BAM file can now be used by Isosceles. As cell barcodes are stored in the CB
tag in this case, it needs to be specified using the barcode_tag argument of the bam_to_tcc
function.
Isosceles can used in of the following run modes, specified when the bam_to_tcc
function is run:
The default settings of de novo transcript detection used by the bam_to_tcc
function should work well for a eukaryotic transcriptome, but for the analysis of spike-in data, such as SIRVs, we recommend increasing the read count threshold (the min_read_count argument) to a higher value (e.g. 50).
Isosceles selects the appropriate set of transcripts to quantify, depending on the given run mode, using the genomic annotations that are prepared with the prepare_transcripts
function. This function needs a GTF file with reference annotations, and, optionally, transcript structures extracted from BAM files using the bam_to_read_structures
function (which is necessary for detecting de novo transcripts). If de novo transcript detection is enabled, the numbers of spliced reads assigned to different splicing compatibility levels (described in detail in the Isosceles paper) can be plotted using the plot_splicing_support_levels
function.
There are several important aspects of the prepare_transcripts
function to consider:
rowData(se)$compatible_tx
).The Isosceles transcript ID (e.g. ‘ISOT-2a9c-c3db-71b4-c3f2:s100628650:e100636850:AP:FL’) consists of several parts, separated by colons:
prepare_transcripts
function. The start of the bin is always used, ensuring that it’s smaller or equal to the transcript start position.prepare_transcripts
function. Unlike the start positions, the end of the bin is always used, ensuring that it’s bigger or equal to the transcript end position.The comma separated list of compatible reference transcript IDs can be found in transcript level SummarizedExperiment object’s rowData (rowData(se)$compatible_tx
).
The main output of Isosceles consists of SummarizedExperiment objects, created at various resolution levels:
bam_to_tcc
function. The values here are used to derive transcript quantification estimates via the EM algorithm. The pseudobulk_tcc
function (for summarizing data at the pseudo-bulk level), the pseudotime_tcc
function (for summarizing data at the pseudo-time window level) and the neighborhood_tcc
function (for combining counts from neighboring cells) utilize TCC SummarizedExperiment objects as both input and output.tcc_to_transcript
function, applying the EM (Expectation–Maximization) algorithm. By default, normalization is done using effective transcript lengths, but it can be turned off for UMI-based protocol data (commonly in scRNA-Seq datasets) by setting the use_length_normalization argument to FALSE. You can export transcript annotations from the SummarizedExperiment object to a GTF file using the export_gtf
function.tcc_to_gene
function. This process is typically very swift as it doesn’t necessitate running the EM algorithm.transcript_to_psi
function. This requires a transcript level SummarizedExperiment object as input.Each of these levels provides specific insights and functionalities within the Isosceles framework, catering to different analytical requirements.
Data in the SummarizedExperiment objects can be accessed using the following commands:
Many functions in the Isosceles package can make use of multiple CPU cores, specified using the ncpu argument. The following tips should help you find the optimal ncpu value for your analysis:
bam_to_read_structures
and bam_to_tcc
) can process each of them on a separate CPU core.tcc_to_transcript
function parallelizes the EM algorithm computations at the SummarizedExperiment column level (samples for bulk RNA-Seq, cells for scRNA-Seq data).transcript_to_psi
function, PSI region detection can be parallelized at the gene level.Get the input file paths for a small bulk RNA-Seq dataset:
bam_file <- system.file(
"extdata", "bulk_rnaseq.bam",
package = "Isosceles"
)
gtf_file <- system.file(
"extdata", "bulk_rnaseq.gtf",
package = "Isosceles"
)
genome_fasta_file <- system.file(
"extdata", "bulk_rnaseq.fa",
package = "Isosceles"
)
Extract read data from the BAM files:
bam_files <- c(Sample = bam_file)
bam_parsed <- bam_to_read_structures(
bam_files = bam_files
)
Prepare transcript data for the analysis:
transcript_data <- prepare_transcripts(
gtf_file = gtf_file,
genome_fasta_file = genome_fasta_file,
bam_parsed = bam_parsed,
min_bam_splice_read_count = 2,
min_bam_splice_fraction = 0.01
)
Create a TCC (Transcript Compatibility Counts) SummarizedExperiment object:
se_tcc <- bam_to_tcc(
bam_files = bam_files,
transcript_data = transcript_data,
run_mode = "de_novo_loose",
min_read_count = 1,
min_relative_expression = 0
)
se_tcc
## class: SummarizedExperiment
## dim: 26 1
## metadata(4): compatibility_matrix transcript_df
## transcript_exon_granges_list mean_read_length
## assays(3): counts tpm relative_expression
## rownames(26): 20,22,24,25,26,27 20,23,24,25,26,27,28,30 ... 69 70
## rowData names(3): ec_id gene_id gene_name
## colnames(1): Sample
## colData names(0):
Create a transcript-level SummarizedExperiment object using the EM algorithm:
se_transcript <- tcc_to_transcript(
se_tcc = se_tcc,
use_length_normalization = TRUE
)
se_transcript
## class: RangedSummarizedExperiment
## dim: 112 1
## metadata(0):
## assays(3): counts tpm relative_expression
## rownames(112): ISOT-5c13-e73a-5003-3c93:s976950:e991750:AP:FL
## ISOT-0000-0000-0000-0762:s989800:e990350:AP:FL ...
## ISOT-fd89-a94a-cc11-ef44:s72650:e75300:AP:FL
## ISOT-0000-0000-0000-0762:s30100:e30300:AP:FL
## rowData names(12): transcript_id position ... read_count
## relative_expression
## colnames(1): Sample
## colData names(0):
Create a gene-level SummarizedExperiment object:
se_gene <- tcc_to_gene(
se_tcc = se_tcc
)
se_gene
## class: SummarizedExperiment
## dim: 23 1
## metadata(0):
## assays(3): counts tpm relative_expression
## rownames(23): ENSG00000064218 ENSG00000107099 ... ENSG00000277631
## ENSG00000283921
## rowData names(2): gene_id gene_name
## colnames(1): Sample
## colData names(0):
Get the input file paths for a small scRNA-Seq dataset:
bam_file <- system.file(
"extdata", "scrnaseq.bam",
package = "Isosceles"
)
gtf_file <- system.file(
"extdata", "scrnaseq.gtf.gz",
package = "Isosceles"
)
genome_fasta_file <- system.file(
"extdata", "scrnaseq.fa.gz",
package = "Isosceles"
)
Extract read data from the BAM files:
bam_files <- c(Sample = bam_file)
bam_parsed <- bam_to_read_structures(
bam_files = bam_files
)
Prepare transcript data for the analysis:
transcript_data <- prepare_transcripts(
gtf_file = gtf_file,
genome_fasta_file = genome_fasta_file,
bam_parsed = bam_parsed,
min_bam_splice_read_count = 1,
min_bam_splice_fraction = 0.01
)
Create a TCC (Transcript Compatibility Counts) SummarizedExperiment object:
se_tcc <- bam_to_tcc(
bam_files = bam_files,
transcript_data = transcript_data,
run_mode = "de_novo_loose",
min_read_count = 1,
min_relative_expression = 0,
is_single_cell = TRUE,
barcode_tag = "BC"
)
se_tcc
## class: SummarizedExperiment
## dim: 8 57
## metadata(4): compatibility_matrix transcript_df
## transcript_exon_granges_list mean_read_length
## assays(3): counts tpm relative_expression
## rownames(8): 470 470,471,472,473,474,475 ... 476 477
## rowData names(3): ec_id gene_id gene_name
## colnames(57): Sample.CGTAGGCAGAAGGCCT Sample.CTAGCCTTCCTATTCA ...
## Sample.CCGTTCAAGAGTAAGG Sample.CGGCTAGTCCGTTGCT
## colData names(0):
Create a transcript-level SummarizedExperiment object using the EM algorithm:
se_transcript <- tcc_to_transcript(
se_tcc = se_tcc,
use_length_normalization = FALSE
)
se_transcript
## class: RangedSummarizedExperiment
## dim: 4111 57
## metadata(0):
## assays(3): counts tpm relative_expression
## rownames(4111): ISOT-7438-e6a0-1dea-329e:s120453200:e120530200:AP:FL
## ISOT-85c9-224f-fb90-8b49:s120447100:e120530200:AP:FL ...
## ISOT-0000-0000-0000-3df5:s43834750:e43836550:AP:FL
## ISOT-f090-cb81-2f39-4fa9:s156254500:e156255400:AP:FL
## rowData names(12): transcript_id position ... read_count
## relative_expression
## colnames(57): Sample.CGTAGGCAGAAGGCCT Sample.CTAGCCTTCCTATTCA ...
## Sample.CCGTTCAAGAGTAAGG Sample.CGGCTAGTCCGTTGCT
## colData names(0):
Create a gene-level SummarizedExperiment object:
se_gene <- tcc_to_gene(
se_tcc = se_tcc
)
se_gene
## class: SummarizedExperiment
## dim: 2855 57
## metadata(0):
## assays(3): counts tpm relative_expression
## rownames(2855): ENSMUSG00000000085 ENSMUSG00000000409 ...
## ENSMUSG00000111611 ENSMUSG00000111895
## rowData names(2): gene_id gene_name
## colnames(57): Sample.CGTAGGCAGAAGGCCT Sample.CTAGCCTTCCTATTCA ...
## Sample.CCGTTCAAGAGTAAGG Sample.CGGCTAGTCCGTTGCT
## colData names(0):
Create a pseudobulk TCC SummarizedExperiment object:
# Randomly choose one of two labels ('1' or '2') for each cell
set.seed(42)
cell_labels <- sample(1:2, ncol(se_tcc), replace = TRUE)
# Prepare pseudobulk TCC data for given labels
se_pseudobulk_tcc <- pseudobulk_tcc(
se_tcc = se_tcc,
cell_labels = cell_labels
)
se_pseudobulk_tcc
## class: SummarizedExperiment
## dim: 8 2
## metadata(4): compatibility_matrix transcript_df
## transcript_exon_granges_list mean_read_length
## assays(3): counts tpm relative_expression
## rownames(8): 470 470,471,472,473,474,475 ... 476 477
## rowData names(3): ec_id gene_id gene_name
## colnames(2): 1 2
## colData names(0):
Create a pseudobulk transcript-level SummarizedExperiment object using the EM algorithm:
se_pseudobulk_transcript <- tcc_to_transcript(
se_tcc = se_pseudobulk_tcc,
use_length_normalization = FALSE
)
se_pseudobulk_transcript
## class: RangedSummarizedExperiment
## dim: 4111 2
## metadata(0):
## assays(3): counts tpm relative_expression
## rownames(4111): ISOT-7438-e6a0-1dea-329e:s120453200:e120530200:AP:FL
## ISOT-85c9-224f-fb90-8b49:s120447100:e120530200:AP:FL ...
## ISOT-0000-0000-0000-3df5:s43834750:e43836550:AP:FL
## ISOT-f090-cb81-2f39-4fa9:s156254500:e156255400:AP:FL
## rowData names(12): transcript_id position ... read_count
## relative_expression
## colnames(2): 1 2
## colData names(0):
Create a pseudobulk gene-level SummarizedExperiment object:
se_pseudobulk_gene <- tcc_to_gene(
se_tcc = se_pseudobulk_tcc
)
se_pseudobulk_gene
## class: SummarizedExperiment
## dim: 2855 2
## metadata(0):
## assays(3): counts tpm relative_expression
## rownames(2855): ENSMUSG00000000085 ENSMUSG00000000409 ...
## ENSMUSG00000111611 ENSMUSG00000111895
## rowData names(2): gene_id gene_name
## colnames(2): 1 2
## colData names(0):
Isosceles also allows adding counts from neighboring cells to each cell, potentially improving the quantification results:
# Get PCA coordinates of each cell from transcript expression data
sce_transcript <- methods::as(se_transcript, "SingleCellExperiment")
sce_transcript <- scuttle::computeLibraryFactors(sce_transcript)
sce_transcript <- scuttle::logNormCounts(sce_transcript)
set.seed(42)
sce_transcript <- scater::runPCA(sce_transcript, ncomponents = 2)
pca_mat <- SingleCellExperiment::reducedDim(sce_transcript, "PCA")
# Add TCC values from two nearest cells to each cell
se_neighbor_tcc <- neighborhood_tcc(
se_tcc = se_tcc,
pca_mat = pca_mat,
k = 2
)
se_neighbor_tcc
## class: SummarizedExperiment
## dim: 8 57
## metadata(4): compatibility_matrix transcript_df
## transcript_exon_granges_list mean_read_length
## assays(3): counts tpm relative_expression
## rownames(8): 470 470,471,472,473,474,475 ... 476 477
## rowData names(3): ec_id gene_id gene_name
## colnames(57): Sample.CGTAGGCAGAAGGCCT Sample.CTAGCCTTCCTATTCA ...
## Sample.CCGTTCAAGAGTAAGG Sample.CGGCTAGTCCGTTGCT
## colData names(0):
Recalculate transcript expression values from merged data using the EM algorithm:
se_neighbor_transcript <- tcc_to_transcript(
se_tcc = se_neighbor_tcc,
use_length_normalization = FALSE
)
se_neighbor_transcript
## class: RangedSummarizedExperiment
## dim: 4111 57
## metadata(0):
## assays(3): counts tpm relative_expression
## rownames(4111): ISOT-7438-e6a0-1dea-329e:s120453200:e120530200:AP:FL
## ISOT-85c9-224f-fb90-8b49:s120447100:e120530200:AP:FL ...
## ISOT-0000-0000-0000-3df5:s43834750:e43836550:AP:FL
## ISOT-f090-cb81-2f39-4fa9:s156254500:e156255400:AP:FL
## rowData names(12): transcript_id position ... read_count
## relative_expression
## colnames(57): Sample.CGTAGGCAGAAGGCCT Sample.CTAGCCTTCCTATTCA ...
## Sample.CCGTTCAAGAGTAAGG Sample.CGGCTAGTCCGTTGCT
## colData names(0):
Load a transcript-level SummarizedExperiment object containing expression levels of 100 highly variable transcripts across 200 cells from a mix of ovarian cancer cell lines:
se <- readRDS(system.file("extdata", "se_transcript_100_hvts.rds",
package = "Isosceles"))
se
## class: RangedSummarizedExperiment
## dim: 100 200
## metadata(0):
## assays(3): counts tpm relative_expression
## rownames(100): ISOT-5434-0815-056e-2ed2:s3020300:e3022400:AP:FL
## ISOT-85d4-1297-c095-b42b:s17229250:e17237600:AP:FL ...
## ISOT-79a9-1e21-3e21-7593:s106969800:e106974800:AP:FL
## ISOT-188a-636e-edb1-889b:s29679000:e29698700:AP:FL
## rowData names(12): transcript_id position ... read_count
## relative_expression
## colnames(200): AAACGAAGTAGCCAGA AAAGAACTCGGTCACG ... TAGGTACGTCGTACAT
## TAGGTACTCTACTATC
## colData names(0):
Run a basic scRNA-Seq data analysis using Bioconductor packages:
sce <- as(se, "SingleCellExperiment")
sce <- scuttle::computeLibraryFactors(sce)
sce <- scuttle::logNormCounts(sce)
dec <- scran::modelGeneVar(sce)
top_hvgs <- scran::getTopHVGs(dec, n = 100)
set.seed(42)
sce <- scran::denoisePCA(sce, technical = dec, subset.row = top_hvgs)
set.seed(42)
sce <- scater::runUMAP(sce, dimred = "PCA")
sce
## class: SingleCellExperiment
## dim: 100 200
## metadata(0):
## assays(4): counts tpm relative_expression logcounts
## rownames(100): ISOT-5434-0815-056e-2ed2:s3020300:e3022400:AP:FL
## ISOT-85d4-1297-c095-b42b:s17229250:e17237600:AP:FL ...
## ISOT-79a9-1e21-3e21-7593:s106969800:e106974800:AP:FL
## ISOT-188a-636e-edb1-889b:s29679000:e29698700:AP:FL
## rowData names(12): transcript_id position ... read_count
## relative_expression
## colnames(200): AAACGAAGTAGCCAGA AAAGAACTCGGTCACG ... TAGGTACGTCGTACAT
## TAGGTACTCTACTATC
## colData names(1): sizeFactor
## reducedDimNames(2): PCA UMAP
## mainExpName: NULL
## altExpNames(0):
Cell clustering using k-means:
set.seed(42)
colLabels(sce) <- bluster::clusterRows(reducedDim(sce, "PCA"),
bluster::KmeansParam(3))
dittoSeq::dittoDimPlot(sce, "label", reduction.use = "UMAP",
size = 2, main = "",
do.label = TRUE, labels.highlight = TRUE,
legend.show = FALSE)
Identify marker transcripts for individual clusters:
markers <- scran::findMarkers(sce, groups = sce$label, test.type = "t",
pval.type = "all", direction = "up")
markers
## List of length 3
## names(3): 1 2 3
head(markers[[1]])
## DataFrame with 6 rows and 5 columns
## p.value FDR
## <numeric> <numeric>
## ISOT-9a12-71e0-57a3-e57d:s170864850:e170870500:AP:FL 2.29637e-40 2.29637e-38
## ISOT-0664-96b1-953c-fbcd:s40457250:e40468600:AP:FL 6.80912e-33 3.40456e-31
## ISOT-222e-275a-5543-ff6e:s39312850:e39319800:AP:FL 1.81760e-31 6.05867e-30
## ISOT-76a1-49c3-a1a7-9d07:s8377500:e8383200:AP:FL 5.55753e-27 1.38938e-25
## ISOT-1310-94db-577c-5321:s39406650:e39412550:AP:FL 3.17542e-23 6.35085e-22
## ISOT-d695-7986-8bd7-ca02:s78033850:e78040700:AP:FL 2.62126e-21 4.36877e-20
## summary.logFC logFC.2
## <numeric> <numeric>
## ISOT-9a12-71e0-57a3-e57d:s170864850:e170870500:AP:FL 2.65280 3.06150
## ISOT-0664-96b1-953c-fbcd:s40457250:e40468600:AP:FL 3.18546 3.74561
## ISOT-222e-275a-5543-ff6e:s39312850:e39319800:AP:FL 1.57421 1.57421
## ISOT-76a1-49c3-a1a7-9d07:s8377500:e8383200:AP:FL 1.70687 2.67656
## ISOT-1310-94db-577c-5321:s39406650:e39412550:AP:FL 2.04990 2.15373
## ISOT-d695-7986-8bd7-ca02:s78033850:e78040700:AP:FL 2.29017 2.16768
## logFC.3
## <numeric>
## ISOT-9a12-71e0-57a3-e57d:s170864850:e170870500:AP:FL 2.65280
## ISOT-0664-96b1-953c-fbcd:s40457250:e40468600:AP:FL 3.18546
## ISOT-222e-275a-5543-ff6e:s39312850:e39319800:AP:FL 2.25056
## ISOT-76a1-49c3-a1a7-9d07:s8377500:e8383200:AP:FL 1.70687
## ISOT-1310-94db-577c-5321:s39406650:e39412550:AP:FL 2.04990
## ISOT-d695-7986-8bd7-ca02:s78033850:e78040700:AP:FL 2.29017
Get top 3 marker transcripts for each cluster:
top_markers <- markers %>%
lapply(function(x) rownames(x)[1:3]) %>%
unlist() %>%
unname()
top_markers
## [1] "ISOT-9a12-71e0-57a3-e57d:s170864850:e170870500:AP:FL"
## [2] "ISOT-0664-96b1-953c-fbcd:s40457250:e40468600:AP:FL"
## [3] "ISOT-222e-275a-5543-ff6e:s39312850:e39319800:AP:FL"
## [4] "ISOT-a049-9476-caf6-f7ba:s114295800:e114312600:AP:FL"
## [5] "ISOT-1d28-0bb1-cc79-de6b:s209675400:e209676400:AP:FL"
## [6] "ISOT-188a-636e-edb1-889b:s29679000:e29698700:AP:FL"
## [7] "ISOT-3d99-b5d4-4c81-952b:s150338300:e150341700:AP:FL"
## [8] "ISOT-e3aa-f039-f992-8f8f:s105486300:e105488800:AP:FL"
## [9] "ISOT-6639-25b4-48c3-9465:s51166550:e51171750:AP:FL"
Top marker transcript heatmap:
dittoSeq::dittoHeatmap(sce, top_markers, annot.by = "label",
cluster_rows = TRUE, fontsize_row = 6)
for (transcript_id in top_markers) {
cat("\n\n### ", transcript_id, "\n")
umap_plot <- dittoSeq::dittoDimPlot(
sce, transcript_id, size = 2,
min.color = "#0072B2", max.color = "#F0E442",
order = "increasing"
)
print(umap_plot)
}
Load a transcript-level SingleCellExperiment object containing expression levels of transcripts of the CAV1 gene across 2,060 cells from a mix of ovarian cancer cell lines:
sce_transcript <- readRDS(system.file("extdata", "sce_transcript_cav1.rds",
package = "Isosceles"))
sce_transcript
## class: SingleCellExperiment
## dim: 6 2060
## metadata(0):
## assays(3): counts tpm relative_expression
## rownames(6): ISOT-187e-bc5a-afe9-5320:s116525000:e116559350:AP:FL
## ISOT-35a7-317f-81c6-345e:s116526250:e116559550:AP:FL ...
## ISOT-88b1-d6d5-6bc5-0a76:s116526250:e116561200:AP:FL
## ISOT-bf85-2818-1b10-afd6:s116525700:e116561200:AP:FL
## rowData names(12): transcript_id position ... read_count
## relative_expression
## colnames(2060): AAACGAAGTAGCCAGA AAAGAACTCGGTCACG ... GGGTGAACAAGGATGC
## TATCCTACATCAGCGC
## colData names(0):
## reducedDimNames(1): UMAP
## mainExpName: NULL
## altExpNames(0):
Create a PSI (Percent Spliced In) SummarizedExperiment object:
se_psi <- transcript_to_psi(sce_transcript)
se_psi
## class: RangedSummarizedExperiment
## dim: 12 2060
## metadata(0):
## assays(1): psi
## rownames(12): ENSG00000105974:7:116524750:+:TSS
## ENSG00000105974:7:116525000:+:TSS ...
## ENSG00000105974:7:116559550:+:TES ENSG00000105974:7:116561200:+:TES
## rowData names(2): gene_id type
## colnames(2060): AAACGAAGTAGCCAGA AAAGAACTCGGTCACG ... GGGTGAACAAGGATGC
## TATCCTACATCAGCGC
## colData names(0):
PSI values are calculated for the following types of regions:
PSI value heatmap for the CAV1 gene:
region_colors <- c(
TSS = "#FF83FF", CE = "#FF9289", RI = "#82B7FF",
A5 = "#00D65C", A3 = "#00DAE0", TES = "#D3BA00"
)
plot_psi_heatmap(se_psi, gene_id = "ENSG00000105974",
region_colors = region_colors)
Plot the CAV1 transcript structures:
plot_psi_regions(
se_psi = se_psi,
se_transcript = sce_transcript,
gene_id = "ENSG00000105974",
region_colors = region_colors
)
Plot the CAV1 transcript structures with introns shrinked to max 1000 bp:
plot_psi_regions(
se_psi = se_psi,
se_transcript = sce_transcript,
gene_id = "ENSG00000105974",
max_intron_length = 1000,
region_colors = region_colors
)
# Calculate UMAP coordinates based on PSI values
sce_psi <- as(se_psi, "SingleCellExperiment")
reducedDim(sce_psi, "UMAP") <- reducedDim(sce_transcript, "UMAP")
sce_psi <- sce_psi[
, colSums(assay(sce_psi, "psi")) > 0
]
# Create UMAP plots colored by PSI values for each gene region
for (region_id in rownames(sce_psi)) {
cat("\n\n### ", region_id, "\n")
umap_plot <- dittoSeq::dittoDimPlot(
sce_psi, region_id, size = 1,
min.color = "#0072B2", max.color = "#F0E442",
order = "increasing"
)
print(umap_plot)
}
sessionInfo()
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.6 LTS
##
## Matrix products: default
## BLAS: /usr/local/lib/R/lib/libRblas.so
## LAPACK: /usr/local/lib/R/lib/libRlapack.so; LAPACK version 3.11.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=C
## [4] LC_COLLATE=C LC_MONETARY=C LC_MESSAGES=C
## [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C
## [10] LC_TELEPHONE=C LC_MEASUREMENT=C LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] Isosceles_0.2.1 SingleCellExperiment_1.24.0
## [3] SummarizedExperiment_1.32.0 Biobase_2.62.0
## [5] GenomicRanges_1.54.1 GenomeInfoDb_1.38.6
## [7] IRanges_2.36.0 S4Vectors_0.40.2
## [9] BiocGenerics_0.48.1 MatrixGenerics_1.14.0
## [11] matrixStats_1.2.0 BiocStyle_2.30.0
##
## loaded via a namespace (and not attached):
## [1] BiocIO_1.12.0 bitops_1.0-7
## [3] filelock_1.0.3 tibble_3.2.1
## [5] graph_1.80.0 XML_3.99-0.16.1
## [7] rpart_4.1.23 lifecycle_1.0.4
## [9] edgeR_4.0.15 OrganismDbi_1.44.0
## [11] ensembldb_2.26.0 lattice_0.22-5
## [13] backports_1.4.1 magrittr_2.0.3
## [15] limma_3.58.1 Hmisc_5.1-1
## [17] sass_0.4.8 rmarkdown_2.25
## [19] jquerylib_0.1.4 yaml_2.3.8
## [21] metapod_1.10.1 ggbio_1.50.0
## [23] cowplot_1.1.3 DBI_1.2.2
## [25] RColorBrewer_1.1-3 abind_1.4-5
## [27] zlibbioc_1.48.0 purrr_1.0.2
## [29] AnnotationFilter_1.26.0 biovizBase_1.50.0
## [31] RCurl_1.98-1.14 nnet_7.3-19
## [33] VariantAnnotation_1.48.1 rappdirs_0.3.3
## [35] GenomeInfoDbData_1.2.11 ggrepel_0.9.5
## [37] irlba_2.3.5.1 pheatmap_1.0.12
## [39] dqrng_0.3.2 DelayedMatrixStats_1.24.0
## [41] codetools_0.2-19 DelayedArray_0.28.0
## [43] scuttle_1.12.0 xml2_1.3.6
## [45] tidyselect_1.2.0 farver_2.1.1
## [47] ScaledMatrix_1.10.0 viridis_0.6.5
## [49] BiocFileCache_2.10.1 base64enc_0.1-3
## [51] GenomicAlignments_1.38.2 jsonlite_1.8.8
## [53] BiocNeighbors_1.20.2 Formula_1.2-5
## [55] ggridges_0.5.6 scater_1.30.1
## [57] tools_4.3.1 progress_1.2.3
## [59] Rcpp_1.0.12 glue_1.7.0
## [61] gridExtra_2.3 SparseArray_1.2.4
## [63] xfun_0.42 dplyr_1.1.4
## [65] withr_3.0.0 BiocManager_1.30.22
## [67] fastmap_1.1.1 GGally_2.2.1
## [69] bluster_1.12.0 fansi_1.0.6
## [71] digest_0.6.34 rsvd_1.0.5
## [73] R6_2.5.1 colorspace_2.1-0
## [75] dichromat_2.0-0.1 biomaRt_2.58.2
## [77] RSQLite_2.3.5 utf8_1.2.4
## [79] tidyr_1.3.1 generics_0.1.3
## [81] data.table_1.15.0 rtracklayer_1.62.0
## [83] FNN_1.1.4 prettyunits_1.2.0
## [85] httr_1.4.7 htmlwidgets_1.6.4
## [87] S4Arrays_1.2.0 ggstats_0.5.1
## [89] uwot_0.1.16 pkgconfig_2.0.3
## [91] gtable_0.3.4 blob_1.2.4
## [93] XVector_0.42.0 htmltools_0.5.7
## [95] bookdown_0.37 RBGL_1.78.0
## [97] ProtGenerics_1.34.0 scales_1.3.0
## [99] png_0.1-8 scran_1.30.2
## [101] knitr_1.45 rstudioapi_0.15.0
## [103] reshape2_1.4.4 rjson_0.2.21
## [105] checkmate_2.3.1 curl_5.2.0
## [107] cachem_1.0.8 stringr_1.5.1
## [109] parallel_4.3.1 vipor_0.4.7
## [111] foreign_0.8-84 AnnotationDbi_1.64.1
## [113] restfulr_0.0.15 pillar_1.9.0
## [115] grid_4.3.1 vctrs_0.6.5
## [117] BiocSingular_1.18.0 dbplyr_2.4.0
## [119] dittoSeq_1.14.2 beachmat_2.18.1
## [121] cluster_2.1.4 beeswarm_0.4.0
## [123] htmlTable_2.4.2 evaluate_0.23
## [125] GenomicFeatures_1.54.3 magick_2.8.2
## [127] cli_3.6.2 locfit_1.5-9.8
## [129] compiler_4.3.1 Rsamtools_2.18.0
## [131] rlang_1.1.3 crayon_1.5.2
## [133] labeling_0.4.3 plyr_1.8.9
## [135] ggbeeswarm_0.7.2 stringi_1.8.3
## [137] viridisLite_0.4.2 BiocParallel_1.36.0
## [139] assertthat_0.2.1 munsell_0.5.0
## [141] Biostrings_2.70.2 lazyeval_0.2.2
## [143] Matrix_1.6-5 BSgenome_1.70.2
## [145] hms_1.1.3 sparseMatrixStats_1.14.0
## [147] bit64_4.0.5 ggplot2_3.5.1
## [149] KEGGREST_1.42.0 statmod_1.5.0
## [151] highr_0.10 igraph_1.6.0
## [153] memoise_2.0.1 bslib_0.6.1
## [155] fastmatch_1.1-4 bit_4.0.5