Acknowledgement: this report has been inspired by
materials used in the Advanced Bioinformatics course taught
by Igor Ruiz de los Mozos at UPNA.
After completing the nf-core/rnaseq pipeline using
Salmon pseudoalignment for the analysis of 42 colorectal samples from
SRA study SRP357925 will obtain the
salmon.merged.gene.SummarizedExperiment.rds file. This file
contains gene-level quantification matrices (counts, TPMs, effective
lengths) plus sample metadata and annotations produced from the merged
Salmon quantifications.
Let’s load the
salmon.merged.gene.SummarizedExperiment.rds file. The
content of the file has a SummarizedExperiment
representation, which is an R/Bioconductor container that stores assay
matrices (e.g., counts), feature metadata, and sample metadata in one
coherent object.
# Specify a working directory
directory <- "/Volumes/DataWinBackup/00_MASTER UOC BIOINFORMATICA - TEMPORAL/TFM-UOC/Deseq2_42eocrc"
setwd(directory)
list.files()
[1] "dds_DGE_42.rds"
[2] "DE_tumor_vs_normal_42.csv"
[3] "Deseq2_42crc copy.Rmd"
[4] "Deseq2_42crc_cache"
[5] "Deseq2_42crc_files"
[6] "Deseq2_42crc-copy_cache"
[7] "Deseq2_42crc-copy.Rmd"
[8] "Deseq2_42crc.html"
[9] "Deseq2_42crc.Rmd"
[10] "DESeq2_sig_genes_normal_vs_cancer_42_eocrc_annotated.tab"
[11] "DESeq2_sig_genes_normal_vs_cancer_42_eocrc.tab"
[12] "DESeq2_sig_genes_normal_vs_cancer_44_locrc_annotated.tab"
[13] "enrichment_map.pdf"
[14] "figures"
[15] "gene_concept_network_plot.pdf"
[16] "gene.SummarizedExperiment.metadata_42.rds"
[17] "images"
[18] "Merge_GO_Gene_Set_Enrichment_Analysis_CC_.pdf"
[19] "salmon.merged.gene.SummarizedExperiment.rds"
[20] "sraruntable_srp357925_crc42_metadata.csv"
[21] "sraruntable_srp357925_crc42.csv"
[22] "tables"
[23] "tumor_vs_normal_42_dds.csv"
# Read 'se' object obtained from nfcore/rnaseq
se <- readRDS("salmon.merged.gene.SummarizedExperiment.rds")
# Display se object
print(se)
class: SummarizedExperiment
dim: 78428 42
metadata(0):
assays(5): counts counts_length_scaled counts_scaled lengths tpm
rownames(78428): ENSG00000000003.17 ENSG00000000005.6 ...
ENSG00000310592.1 ENSG00000310593.1
rowData names(3): transcript_id gene_id gene_name
colnames(42): SRR17866817 SRR17866818 ... SRR17866857 SRR17866858
colData names(4): sample fastq_1 fastq_2 strandedness
We focus on the counts assay matrix and check
information on columns and rows.
# Check the first elements of the counts matrix
head(assay(se, "counts")[, 1:2], 5)
SRR17866817 SRR17866818
ENSG00000000003.17 5220.281 2132.471
ENSG00000000005.6 40.000 18.000
ENSG00000000419.15 2176.673 670.315
ENSG00000000457.15 417.069 352.181
ENSG00000000460.18 385.301 88.000
# Collect sample information
head(colData(se)[, 2:3], 5)
DataFrame with 5 rows and 2 columns
fastq_1 fastq_2
<character> <character>
SRR17866817 data/reads/SRR178668.. data/reads/SRR178668..
SRR17866818 data/reads/SRR178668.. data/reads/SRR178668..
SRR17866819 data/reads/SRR178668.. data/reads/SRR178668..
SRR17866820 data/reads/SRR178668.. data/reads/SRR178668..
SRR17866821 data/reads/SRR178668.. data/reads/SRR178668..
# Collect gene information
head(rowData(se), 3)
DataFrame with 3 rows and 3 columns
transcript_id gene_id gene_name
<character> <character> <character>
ENSG00000000003.17 ENST00000373020.9,EN.. ENSG00000000003.17 TSPAN6
ENSG00000000005.6 ENST00000373031.5,EN.. ENSG00000000005.6 TNMD
ENSG00000000419.15 ENST00000858753.1,EN.. ENSG00000000419.15 DPM1
# Collect genomic ranges
head(rowRanges(se), 3)
NULL
Below, we can obtain the name of the 42 samples.
# Display colNames
colData(se)$sample
[1] "SRR17866817" "SRR17866818" "SRR17866819" "SRR17866820" "SRR17866821"
[6] "SRR17866822" "SRR17866823" "SRR17866824" "SRR17866825" "SRR17866826"
[11] "SRR17866827" "SRR17866828" "SRR17866829" "SRR17866830" "SRR17866831"
[16] "SRR17866832" "SRR17866833" "SRR17866834" "SRR17866835" "SRR17866836"
[21] "SRR17866837" "SRR17866838" "SRR17866839" "SRR17866840" "SRR17866841"
[26] "SRR17866842" "SRR17866843" "SRR17866844" "SRR17866845" "SRR17866846"
[31] "SRR17866847" "SRR17866848" "SRR17866849" "SRR17866850" "SRR17866851"
[36] "SRR17866852" "SRR17866853" "SRR17866854" "SRR17866855" "SRR17866856"
[41] "SRR17866857" "SRR17866858"
Since the colData field in the se object is
limited to data from the original FASTQ files, it would be useful to
supplement it with metadata from the SRA study, such as patient age and
tissue type.
In this section we describe the steps to add metadata from the SRA
study into the colData field of the
SummarizedExperiment object obtained from the
nf-core/rnaseq pipeline:
SraRunTable.csv file in Google
Sheets.csv file and rename it
sraruntable_srp357925_crc42.csvHere are the first lines of the
sraruntable_srp357925_crc42.csv file.
Run,age_at_surgery,tissue,tumor_stage,tissue_type,Sample Name,SRA Study,Assay Type,AvgSpotLen,Bases,BioProject,BioSample,Bytes,Center Name,Consent,DATASTORE filetype,DATASTORE provider,DATASTORE region,diagnosis,Experiment,Instrument,Library Name,LibraryLayout,LibrarySelection,LibrarySource,Organism,Patient_ID,Platform,ReleaseDate,create_date,version,source_name
SRR17866817,49.1,Sigmoid Colon,1,Tumor tissue,GSM5857839,SRP357925,RNA-Seq,113,3159273528,PRJNA802883,SAMN25601372,1096074179,PENN STATE COLLEGE OF MEDICINE,public,"sra,fastq,run.zq","s3,gs,ncbi","s3.us-east-1,gs.us-east1,ncbi.public",Early onset colorectal cancer,SRX14026607,Illumina MiSeq,GSM5857839,PAIRED,cDNA,TRANSCRIPTOMIC,Homo sapiens,20.327,ILLUMINA,2022-09-06T00:00:00Z,2022-02-02T21:46:00Z,1,Sigmoid Colon
SRR17866818,49.1,Sigmoid Colon,NA,Adjacent normal tissue,GSM5857838,SRP357925,RNA-Seq,113,2373327113,PRJNA802883,SAMN25601373,833169111,PENN STATE COLLEGE OF MEDICINE,public,"fastq,sra,run.zq","ncbi,s3,gs","s3.us-east-1,gs.us-east1,ncbi.public",Early onset colorectal cancer,SRX14026606,Illumina MiSeq,GSM5857838,PAIRED,cDNA,TRANSCRIPTOMIC,Homo sapiens,20.327,ILLUMINA,2022-09-06T00:00:00Z,2022-02-02T21:40:00Z,1,Sigmoid Colon
As follows, a metadata .csv file was created using
information from BioProject:
PRJNA802883. Here are the first lines of the created
sraruntable_srp357925_crc42_metadata.csv file.
key,values
abstract,"Despite a global decrease in colorectal cancer (CRC) incidence, the prevalence of early onset colorectal cancer (EOCRC), or those occurring in individuals before the age of 50..."
Accession,PRJNA802883; GEO: GSE196006
Let’s add metadata contained in the previously created files into the
se object.
After loading the .csv files, here we can see the samples metadata
represented as a dataframe named df_coldata:
# Load files
df_coldata <- read.csv("sraruntable_srp357925_crc42.csv", header = TRUE, row.names = 1,
check.names = FALSE)
# Display first rows
head(df_coldata[, 1:5], n = 3)
age_at_surgery tissue tumor_stage tissue_type
SRR17866817 49.1 Sigmoid Colon 1 Tumor tissue
SRR17866818 49.1 Sigmoid Colon NA Adjacent normal tissue
SRR17866819 44.2 Rectum 3 Tumor tissue
Sample Name
SRR17866817 GSM5857839
SRR17866818 GSM5857838
SRR17866819 GSM5857837
and below, it’s the metadata corresponding to the SRA study
represented as a dataframe named df_metadata with two
columns: key and values.
# Load files
df_metadata <- read.csv("sraruntable_srp357925_crc42_metadata.csv", header = TRUE,
check.names = FALSE)
# Display first rows
head(df_metadata, n = 3)
key
1 abstract
2 Accession
3 Data Type
values
1 Despite a global decrease in colorectal cancer (CRC) incidence, the prevalence of early onset colorectal cancer (EOCRC), or those occurring in individuals before the age of 50, has been steadily increasing over the past several decades. When compared to later onset colorectal cancers (LOCRCs) in individuals over 50, our understanding of the genetic and molecular underpinnings of EOCRCs is limited. Here, we conducted transcriptomic analysis of patient-matched normal colonic segments and tumors to identify gene expression programs involved in carcinogenesis. Amongst differentially expressed genes, we found increased expression of the c-MYC (MYC) proto-oncogene and its downstream targets in tumor samples. We identify tumors with high and low differential MYC expression, and patients with high-MYC tumors were older and overweight or obese. We also detect elevated expression of the PVT1 long-non-coding RNA (lncRNA) in most tumors and find gains in copy number for both MYC and PVT1 gene loci in 35% of tumors evaluated. Our transcriptome analysis indicates that EOCRC can be sub-classified into groups based on differential MYC expression and suggests that MYC is a critical driver of most CRCs that develop in younger patients. Overall design: Gene expression profiles of 21 surgically resected tumors and patient-matched adjacent colonic segments from early onset colorectal cancer patients were generated by deep sequencing.
2 PRJNA802883; GEO: GSE196006
3 Transcriptome or Gene expression
The lower_underscore_nospaces() function will help us
standardize the format of character-type data.
# Define the lower_underscore_nospaces() function
lower_underscore_nospaces <- function(df) {
non_numeric_col <- names(df)[!sapply(df, is.numeric)]
for (col in non_numeric_col) {
df[[col]] <- tolower(df[[col]]) # lowercase
df[[col]] <- gsub("[^a-z0-9]+", "_", df[[col]]) # replace non-alphanumeric with underscores
df[[col]] <- gsub("_$", "", df[[col]]) # remove trailing underscores
}
if (!is.null(names(df))) {
names(df) <- tolower(names(df))
names(df) <- gsub("[^a-z0-9]+", "_", names(df))
names(df) <- gsub("_$", "", names(df))
}
if (!is.null(rownames(df))) {
rownames(df) <- tolower(rownames(df))
rownames(df) <- gsub("[^a-z0-9]+", "_", rownames(df))
rownames(df) <- gsub("_$", "", rownames(df))
}
return(df)
}
# Apply lower_underscore_nospaces to all datasets
df_coldata <- lower_underscore_nospaces(df_coldata)
# Display last changes on dataframes
head(df_coldata[, 1:5], n = 3)
age_at_surgery tissue tumor_stage tissue_type
srr17866817 49.1 sigmoid_colon 1 tumor_tissue
srr17866818 49.1 sigmoid_colon NA adjacent_normal_tissue
srr17866819 44.2 rectum 3 tumor_tissue
sample_name
srr17866817 gsm5857839
srr17866818 gsm5857838
srr17866819 gsm5857837
Since the metadata(se) field in a
SummarizedExperiment object stores data as a named list,
let’s convert df_metadata to this format.
# Convert metadata dataframe into a named list
list_metadata <- as.list(df_metadata$values)
names(list_metadata) <- df_metadata$key
list_metadata
$abstract
[1] "Despite a global decrease in colorectal cancer (CRC) incidence, the prevalence of early onset colorectal cancer (EOCRC), or those occurring in individuals before the age of 50, has been steadily increasing over the past several decades. When compared to later onset colorectal cancers (LOCRCs) in individuals over 50, our understanding of the genetic and molecular underpinnings of EOCRCs is limited. Here, we conducted transcriptomic analysis of patient-matched normal colonic segments and tumors to identify gene expression programs involved in carcinogenesis. Amongst differentially expressed genes, we found increased expression of the c-MYC (MYC) proto-oncogene and its downstream targets in tumor samples. We identify tumors with high and low differential MYC expression, and patients with high-MYC tumors were older and overweight or obese. We also detect elevated expression of the PVT1 long-non-coding RNA (lncRNA) in most tumors and find gains in copy number for both MYC and PVT1 gene loci in 35% of tumors evaluated. Our transcriptome analysis indicates that EOCRC can be sub-classified into groups based on differential MYC expression and suggests that MYC is a critical driver of most CRCs that develop in younger patients. Overall design: Gene expression profiles of 21 surgically resected tumors and patient-matched adjacent colonic segments from early onset colorectal cancer patients were generated by deep sequencing."
$Accession
[1] "PRJNA802883; GEO: GSE196006"
$`Data Type`
[1] "Transcriptome or Gene expression"
$Scope
[1] "Multiisolate"
$Organism
[1] "Homo sapiens[Taxonomy ID: 9606]Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo; Homo sapiens"
$Publications
[1] "Marx OM et al., Transcriptome Analyses Identify Deregulated MYC in Early Onset Colorectal Cancer., Biomolecules, 2022 Sep 2;12(9)"
$Submission
[1] "Registration date: 2-Feb-2022 Penn State College of Medicine"
$Relevance
[1] "Medical"
Now let’s convert R regular dataframes into Bioconductor S4 DataFrame objects.
# Convert regular R dataframes to Bioconductor S4 DataFrame objects
colData <- DataFrame(df_coldata)
# Visualize data converted to S4 DataFrame S4 type
print(colData)
DataFrame with 42 rows and 31 columns
age_at_surgery tissue tumor_stage tissue_type
<numeric> <character> <integer> <character>
sample_name sra_study assay_type avgspotlen bases
<character> <character> <character> <integer> <numeric>
bioproject biosample bytes center_name
<character> <character> <numeric> <character>
consent datastore_filetype datastore_provider
<character> <character> <character>
datastore_region diagnosis experiment
<character> <character> <character>
instrument library_name librarylayout libraryselection
<character> <character> <character> <character>
librarysource organism patient_id platform
<character> <character> <numeric> <character>
releasedate create_date version source_name
<character> <character> <integer> <character>
[ reached 'max' / getOption("max.print") -- omitted 11 rows ]
and verify that their dimensions match.
# Check dimmensions of each dataset
cat("Dimensions of counts assay:", dim(assay(se, "counts")), "\n", "Dimensions of colData:",
dim(colData), "\n", "Dimensions of metadata:", length(list_metadata), "\n")
Dimensions of counts assay: 78428 42
Dimensions of colData: 42 31
Dimensions of metadata: 8
Finally, let’s update the SummarizedExperiment object to
include information on colData and metadata
fields.
# Update SE object
colData(se) <- colData
# Add metadata
metadata(se) <- list_metadata
se
class: SummarizedExperiment
dim: 78428 42
metadata(8): abstract Accession ... Submission Relevance
assays(5): counts counts_length_scaled counts_scaled lengths tpm
rownames(78428): ENSG00000000003.17 ENSG00000000005.6 ...
ENSG00000310592.1 ENSG00000310593.1
rowData names(3): transcript_id gene_id gene_name
colnames(42): srr17866817 srr17866818 ... srr17866857 srr17866858
colData names(31): age_at_surgery tissue ... version source_name
We’re on the right track. To move on, let’s include a column named
condition, which will convert the tumor_stage
column of colData into a binary factor variable with levels
normal and tumor. A good practice in R is that
the first level be the reference one (e.g. normal, control, or untreated
samples). This will come handy when conducting the differential
expression analysis with the volcano plots.
# Create new colData column named condition
colData(se)$condition <- ifelse(is.na(colData(se)$tumor_stage), "normal_eocrc", "tumor_eocrc")
colData(se)$condition <- as.factor(colData(se)$condition)
colData(se)$condition
[1] tumor_eocrc normal_eocrc tumor_eocrc normal_eocrc tumor_eocrc
[6] normal_eocrc tumor_eocrc normal_eocrc tumor_eocrc normal_eocrc
[11] tumor_eocrc normal_eocrc tumor_eocrc normal_eocrc tumor_eocrc
[16] normal_eocrc tumor_eocrc normal_eocrc tumor_eocrc normal_eocrc
[21] tumor_eocrc normal_eocrc tumor_eocrc normal_eocrc tumor_eocrc
[26] normal_eocrc tumor_eocrc normal_eocrc tumor_eocrc normal_eocrc
[31] tumor_eocrc normal_eocrc tumor_eocrc normal_eocrc tumor_eocrc
[36] normal_eocrc tumor_eocrc normal_eocrc tumor_eocrc normal_eocrc
[41] tumor_eocrc normal_eocrc
Levels: normal_eocrc tumor_eocrc
We can now save the se object for reuse, eliminating the
need to rerun this section.
# Save the updated `se` object to a new file
saveRDS(se, file = "gene.SummarizedExperiment.metadata_42.rds")
A DESeqDataSet (dds) object encapsulates all components
needed for DE analysis. It ensures statistical correctness and handles
data integrity by combining: - Count data (from assays) - Sample
metadata (from colData) - Experimental design (formula) - Gene metadata
(from rowData)
The first step to convert our se into
DESeqDataSet is to ensure that count data from the assay be
a matrix data type. Currently, assay(se) is dataframe class
containing lists.
# Check class and mode of each assay in se
sapply(assays(se), class)
counts counts_length_scaled counts_scaled
"data.frame" "data.frame" "data.frame"
lengths tpm
"data.frame" "data.frame"
sapply(assays(se), mode)
counts counts_length_scaled counts_scaled
"list" "list" "list"
lengths tpm
"list" "list"
Let’s convert assay(se, "counts") into a numeric matrix
and round count data.
# Extract the counts assay as a matrix
counts_mat <- as.matrix(assay(se, "counts"))
assay(se, "counts") <- round(counts_mat)
# Keep only the 'counts' assay
assays(se) <- assays(se)["counts"]
# Check class and mode of each assay in se
sapply(assays(se), class)
counts
[1,] "matrix"
[2,] "array"
sapply(assays(se), mode)
counts
"numeric"
Now, we’ll remove genes with less than 10 reads in all samples. This
practice will save computational resources when conducting the DE
analysis. It is preferable to run this task before converting from
se to a DESeqDataSet due to memory
constraints.
gene_count_10plus <- rowSums(assay(se)) > 10
se_filtered <- se[gene_count_10plus, ]
se_filtered
class: SummarizedExperiment
dim: 37865 42
metadata(8): abstract Accession ... Submission Relevance
assays(1): counts
rownames(37865): ENSG00000000003.17 ENSG00000000005.6 ...
ENSG00000310592.1 ENSG00000310593.1
rowData names(3): transcript_id gene_id gene_name
colnames(42): srr17866817 srr17866818 ... srr17866857 srr17866858
colData names(32): age_at_surgery tissue ... source_name condition
Finally, we construct DESeqDataSet without considering
any variable as part of the experimental design.
# Create a DESeqDataSet data object
deseq2_raw <- DESeqDataSet(se_filtered, design = ~1)
# Display first lines of DESeqDataSet data object
class(deseq2_raw)
[1] "DESeqDataSet"
attr(,"package")
[1] "DESeq2"
head(counts(deseq2_raw)[, 1:5], n = 3)
srr17866817 srr17866818 srr17866819 srr17866820 srr17866821
ENSG00000000003.17 5220 2132 503 1008 3229
ENSG00000000005.6 40 18 6 1 321
ENSG00000000419.15 2177 670 772 1078 2621
In essence, we have the same data but now it is a
DESeqDataSet data object.
In this section, we explore in more detail the content of the
DESeqDataSet data object.
# Display DESeqDataSet data object
deseq2_raw
class: DESeqDataSet
dim: 37865 42
metadata(9): abstract Accession ... Relevance version
assays(1): counts
rownames(37865): ENSG00000000003.17 ENSG00000000005.6 ...
ENSG00000310592.1 ENSG00000310593.1
rowData names(3): transcript_id gene_id gene_name
colnames(42): srr17866817 srr17866818 ... srr17866857 srr17866858
colData names(32): age_at_surgery tissue ... source_name condition
# Show first lines
head(counts(deseq2_raw)[, 1:5], n = 3)
srr17866817 srr17866818 srr17866819 srr17866820 srr17866821
ENSG00000000003.17 5220 2132 503 1008 3229
ENSG00000000005.6 40 18 6 1 321
ENSG00000000419.15 2177 670 772 1078 2621
# Validate NAs
X <- counts(deseq2_raw)
X <- as.data.frame(X) # convert to data.frame
print(table(is.na(X)))
FALSE
1590330
The dataset comprises 37.865 genes and 42 samples, for a total of 1.590.330 read counts, and contains no missing values (NAs).
To compare raw counts across 42 samples in a DESeq2 analysis, we need to account for library size differences. This is where normalization by sequencing depth comes handy to correct for differences in total read counts per sample. This is the baseline normalization for differential expression testing.
Let’s calculate the size factor per sample.
# Display raw counts
head(counts(deseq2_raw)[, 1:5], 5)
srr17866817 srr17866818 srr17866819 srr17866820 srr17866821
ENSG00000000003.17 5220 2132 503 1008 3229
ENSG00000000005.6 40 18 6 1 321
ENSG00000000419.15 2177 670 772 1078 2621
ENSG00000000457.15 417 352 355 309 448
ENSG00000000460.18 385 88 134 187 681
# Sum all counts per sample counts(deseq2_raw) is equivalent to assay(se,
# 'counts') from SummarizedExperiment print(colSums(counts(deseq2_raw))[1:5])
# Calculate the size factor and add it to the dataset It normalizes for
# sequencing depth differences to compare expression between samples fairly
# DESeq2 will automatically account for these depth differences in all
# downstream analyses using these size factors.
deseq2_raw <- estimateSizeFactors(deseq2_raw)
print(sizeFactors(deseq2_raw)[1:5])
srr17866817 srr17866818 srr17866819 srr17866820 srr17866821
1.1980151 0.6088379 0.9402656 0.9075319 0.8583443
Remember that,
The sizeFactor column was added to the coldData field of
the DESeqDataSet data object.
# Check that the sizeFactor column was added to colData
deseq2_raw
class: DESeqDataSet
dim: 37865 42
metadata(9): abstract Accession ... Relevance version
assays(1): counts
rownames(37865): ENSG00000000003.17 ENSG00000000005.6 ...
ENSG00000310592.1 ENSG00000310593.1
rowData names(3): transcript_id gene_id gene_name
colnames(42): srr17866817 srr17866818 ... srr17866857 srr17866858
colData names(33): age_at_surgery tissue ... condition sizeFactor
Now, all raw counts across 42 samples have accounted for library size differences.
# Now take a look at the sum of the total depth after normalization
# print(colSums(counts(deseq2_raw, normalized=TRUE))[1:5])
# To retrieve normalized read counts
counts_normalized <- counts(deseq2_raw, normalized = TRUE)
# Display normalized counts
head(counts_normalized[, 1:5], 5)
srr17866817 srr17866818 srr17866819 srr17866820 srr17866821
ENSG00000000003.17 4357.20708 3501.75301 534.955243 1110.70476 3761.8938
ENSG00000000005.6 33.38856 29.56452 6.381176 1.10189 373.9758
ENSG00000000419.15 1817.17238 1100.45709 821.044627 1187.83704 3053.5533
ENSG00000000457.15 348.07574 578.15059 377.552905 340.48390 521.9351
ENSG00000000460.18 321.36489 144.53765 142.512927 206.05336 793.3879
For RNA-seq counts, the expected variance grows with the mean, making low-count genes contribute excessive noise or making high-count genes dominate. Therefore, it is required to implement variance stabilization so that each gene contributes equally to overall variance patterns making sample relationships and biological signals easier to detect.
It consists of a raw counts transformation using the formula
log2(count + 1). It is useful for reducing the effect of
large counts making distributions more symmetric. Use this
transformation for a quick comparisons across samples and conducting
exploratory visualization with PCA and heatmaps.
DESeq2 offers two transformations to stabilize variance and thus transform data to become approximately homoskedastic (to see a flat trend in the meanSdPlot): the variance stabilizing transformation (VST) for negative binomial data with a dispersion-mean trend (Anders and Huber 2010), and the regularized-logarithm transformation or rlog (Love, Huber, and Anders 2014).
“VST is much faster to compute and is less sensitive to high count outliers than the rlog. Also, VST is recommended for medium-to-large datasets (n > 30). rlog tends to work well on small datasets (n < 30), potentially outperforming VST when there is a wide range of sequencing depth across samples (an order of magnitude difference.)”
# Implement VST
counts_vst <- vst(deseq2_raw, blind = TRUE)
# counts_vst is a DESeq2 object type head(assay(counts_vst), 3)
# Implement rlog
counts_rlog <- rlog(deseq2_raw, blind = TRUE)
# counts_rlog is a DESeq2 object type head(assay(counts_rlog), 3)
We’ll continue an Exploratory Data Analysis (EDA) considering variance-stabilized-transformed counts.
# Plot PCA on condition type plotPCA(counts_vst) plotPCA(counts_vst,
# intgroup=c('tissue','condition'))
# Plot PCA on tissue type plotPCA(counts_vst, intgroup='tissue')
# Plot PCA on both condition and tissue type plotPCA(counts_vst,
# intgroup=c('condition', 'tissue'))
Importance of components:
PC1 PC2 PC3 PC4 PC5 PC6
Standard deviation 74.4640 65.5317 52.96238 48.20994 39.26859 36.6645
PC7 PC8 PC9 PC10 PC11 PC12
Standard deviation 34.71566 31.20532 31.02371 29.02386 28.41309 27.97506
PC13 PC14 PC15 PC16 PC17 PC18
Standard deviation 27.76684 26.90759 26.30617 26.15321 25.27805 24.75281
PC19 PC20 PC21 PC22 PC23 PC24
Standard deviation 24.39675 24.34059 23.7550 23.19946 23.08138 22.86886
PC25 PC26 PC27 PC28 PC29 PC30
Standard deviation 22.56182 22.42150 21.81255 21.56427 21.26939 21.20621
PC31 PC32 PC33 PC34 PC35 PC36
Standard deviation 20.47515 20.16900 19.98894 19.78820 19.61382 19.56957
PC37 PC38 PC39 PC40 PC41 PC42
Standard deviation 19.30610 18.98985 18.75413 18.31402 17.54750 1.949e-13
[ reached 'max' / getOption("max.print") -- omitted 2 rows ]
PC1 and PC2 components explains approximately 26% of the total variability. We also identified 3 potential outliers from both conditions, which corresponds to rectum samples.
The Mahalanobis distance (MD) can identify outliers beyond the 99.9th
percentile. According to this criterion, only the srr17866844 sample
appears to be a potential outlier. However, we decided not to remove it
since its paired sample, srr17866843, does not represent an outlier for
the tumor_eocrc condition.
# Detect outliers in PCA
pc_scores <- pca$x[, 1:5] # Use first PCs
d <- mahalanobis(pc_scores, colMeans(pc_scores), cov(pc_scores))
# Set threshold at 99.9 percentile
thr <- quantile(d, 0.999)
# Enumerate outliers
outliers <- names(d[d > thr])
outliers
[1] "srr17866844"
Before conducting the DESeq2 analysis, let’s build the design matrix and evaluate which variables are potential confounded ones (condition, tissue, age_at_surgery).
Since each tissue has distinct transcriptional profiles and the
normal vs tumor comparison is made within the same tissue,
tissue is a potential confounded variable.
Now after the data exploration we can run the differential expression pipeline on the raw counts with a single call to the function DESeq. The null hypothesis is that there is no systematic difference between the average read count values of the different conditions for a given gene. We will calculate the fold change of read counts, assuming de differences in sequencing depth and variability. We will test normal tissue versus cancerous tissue (normal used as denominator for the fold change calculation)
# Run DGE DESeq2 pipeline Display se_filtered se_filtered
# Create a DESeqDataSet data object
dds <- DESeqDataSet(se_filtered, design = ~condition)
# Account for sequencing depth differences to compare expression between
# samples fairly
dds <- estimateSizeFactors(dds)
dds
class: DESeqDataSet
dim: 37865 42
metadata(9): abstract Accession ... Relevance version
assays(1): counts
rownames(37865): ENSG00000000003.17 ENSG00000000005.6 ...
ENSG00000310592.1 ENSG00000310593.1
rowData names(3): transcript_id gene_id gene_name
colnames(42): srr17866817 srr17866818 ... srr17866857 srr17866858
colData names(33): age_at_surgery tissue ... condition sizeFactor
# Run DGE DESeq2 pipeline
dds_DGE <- DESeq(dds)
dds_DGE
class: DESeqDataSet
dim: 37865 42
metadata(9): abstract Accession ... Relevance version
assays(6): counts mu ... replaceCounts replaceCooks
rownames(37865): ENSG00000000003.17 ENSG00000000005.6 ...
ENSG00000310592.1 ENSG00000310593.1
rowData names(26): transcript_id gene_id ... maxCooks replace
colnames(42): srr17866817 srr17866818 ... srr17866857 srr17866858
colData names(34): age_at_surgery tissue ... sizeFactor replaceable
saveRDS(dds_DGE, file = "dds_DGE_42.rds")
We proceed to extract results obtained from the DESeq2 pipeline to obtain the base means across samples, the log2 fold variation, standard errors, p-value and p-adjusted.
# Get results from the DESeq2 pipeline
dds_DGE_results <- results(dds_DGE)
head(dds_DGE_results, n = 5)
log2 fold change (MLE): condition tumor eocrc vs normal eocrc
Wald test p-value: condition tumor eocrc vs normal eocrc
DataFrame with 5 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue
<numeric> <numeric> <numeric> <numeric> <numeric>
ENSG00000000003.17 2857.2111 0.161688 0.271413 0.595726 5.51358e-01
ENSG00000000005.6 51.2191 1.327558 0.446539 2.972994 2.94910e-03
ENSG00000000419.15 1670.0659 1.228388 0.221125 5.555184 2.77320e-08
ENSG00000000457.15 485.0428 -0.355986 0.147088 -2.420218 1.55112e-02
ENSG00000000460.18 223.4476 1.387779 0.206574 6.718087 1.84126e-11
padj
<numeric>
ENSG00000000003.17 6.90013e-01
ENSG00000000005.6 1.27134e-02
ENSG00000000419.15 7.86926e-07
ENSG00000000457.15 4.83895e-02
ENSG00000000460.18 1.66796e-09
# Inspect the meaning of the columns from the results object
mcols(dds_DGE_results, use.names = TRUE)
DataFrame with 6 rows and 2 columns
type description
<character> <character>
baseMean intermediate mean of normalized c..
log2FoldChange results log2 fold change (ML..
lfcSE results standard error: cond..
stat results Wald statistic: cond..
pvalue results Wald test p-value: c..
padj results BH adjusted p-values
The results object contains the following features: baseMean, log2FoldChange, lfcSE, stat, pvalue, and padj. Let’s get a summary of the results object. Now, let’s obtain a summary of the results object.
# Summary of DGE
summary(dds_DGE_results)
out of 37827 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 6942, 18%
LFC < 0 (down) : 6130, 16%
outliers [1] : 0, 0%
low counts [2] : 4438, 12%
(mean count < 0)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
After filtering those genes with p-adjusted < 0.05, we obtained 10.806 significant genes.
# Filter genes with p-adjusted value < 0.05 The results object behaves like a
# dataframe
table(dds_DGE_results$padj < 0.05)
FALSE TRUE
22621 10806
and when we order those significant genes by p-adjusted value:
# Select significant genes with a p-adjusted value lower than 0.05
resSig <- subset(dds_DGE_results, padj < 0.05)
# Order significant genes by p-adjusted value
resSig <- resSig[order(resSig$padj), ]
head(resSig, n = 5)
log2 fold change (MLE): condition tumor eocrc vs normal eocrc
Wald test p-value: condition tumor eocrc vs normal eocrc
DataFrame with 5 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue
<numeric> <numeric> <numeric> <numeric> <numeric>
ENSG00000105464.5 140.904 4.17270 0.354597 11.7674 5.74440e-32
ENSG00000164283.13 109.783 5.23269 0.453264 11.5445 7.87246e-31
ENSG00000175832.14 726.727 4.37081 0.398346 10.9724 5.18803e-28
ENSG00000107159.15 388.768 5.63686 0.553410 10.1857 2.29728e-24
ENSG00000163347.6 625.018 4.50301 0.441776 10.1930 2.13172e-24
padj
<numeric>
ENSG00000105464.5 1.92018e-27
ENSG00000164283.13 1.31576e-26
ENSG00000175832.14 5.78068e-24
ENSG00000107159.15 1.02868e-20
ENSG00000163347.6 1.02868e-20
Let’s check those genes with the strongest down-regulation and up-regulation, respectively.
# Order significant genes with the strongest down-regulation
head(resSig[order(resSig$log2FoldChange), ])
log2 fold change (MLE): condition tumor eocrc vs normal eocrc
Wald test p-value: condition tumor eocrc vs normal eocrc
DataFrame with 6 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue
<numeric> <numeric> <numeric> <numeric> <numeric>
ENSG00000113196.4 24.44350 -6.73467 1.049744 -6.41553 1.40332e-10
ENSG00000167916.5 73.04398 -6.15290 0.880134 -6.99086 2.73197e-12
ENSG00000286765.2 5.62746 -5.88870 1.159520 -5.07856 3.80300e-07
ENSG00000172410.5 2143.05931 -5.59589 0.943165 -5.93310 2.97272e-09
ENSG00000167281.21 18.20558 -5.46715 0.797529 -6.85512 7.12544e-12
ENSG00000168748.14 1718.92263 -5.26274 0.657795 -8.00059 1.23828e-15
padj
<numeric>
ENSG00000113196.4 9.03832e-09
ENSG00000167916.5 3.47230e-10
ENSG00000286765.2 7.17398e-06
ENSG00000172410.5 1.17876e-07
ENSG00000167281.21 7.70815e-10
ENSG00000168748.14 4.54858e-13
# Order significant genes with the strongest up-regulation
head(resSig[order(resSig$log2FoldChange, decreasing = TRUE), ])
log2 fold change (MLE): condition tumor eocrc vs normal eocrc
Wald test p-value: condition tumor eocrc vs normal eocrc
DataFrame with 6 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue
<numeric> <numeric> <numeric> <numeric> <numeric>
ENSG00000237550.8 17.3719 23.45433 2.92000 8.03231 9.56526e-16
ENSG00000184956.16 18.8421 7.59803 1.35908 5.59057 2.26321e-08
ENSG00000270372.2 16.9893 7.14581 1.22739 5.82198 5.81561e-09
ENSG00000170373.9 396.5527 7.03499 0.68932 10.20569 1.86982e-24
ENSG00000129455.17 44.2615 6.66333 1.00340 6.64077 3.12050e-11
ENSG00000108244.17 683.7777 6.42894 0.64485 9.96967 2.06915e-23
padj
<numeric>
ENSG00000237550.8 3.67515e-13
ENSG00000184956.16 6.61876e-07
ENSG00000270372.2 2.08135e-07
ENSG00000170373.9 1.02868e-20
ENSG00000129455.17 2.55034e-09
ENSG00000108244.17 6.28777e-20
# Save significant DGE results to file
write.table(resSig, file = "DESeq2_sig_genes_normal_vs_cancer_42_eocrc.tab", sep = "\t",
quote = FALSE, row.names = FALSE)
To visualize the counts for ENSG00000237550.8 gene, which showed the
lowest p-adjusted value of 3.67515e-13 and strongest up-regulation
log2FoldChange of 23.45433, we’ll use the plotCounts function over the
feature condition to get the normalized counts for this
gene.
*An MA-plot (Dudoit et al. 2002) provides a useful overview for the distribution of the estimated coefficients in the model, e.g. the comparisons of interest, across all genes. On the y-axis, the “M” stands for “minus” – subtraction of log values is equivalent to the log of the ratio – and on the x-axis, the “A” stands for “average”. This plot allows us to evaluate fold changes and the distribution around the mean expression.
Before making the MA-plot, we use the lfcShrink function to shrink the log2 fold changes for the comparison of dex treated vs untreated samples. There are three types of shrinkage estimators in DESeq2, which are covered in the DESeq2 vignette. log2 fold shrinkage remove the noise preserving the large differences.*
[1] "Intercept"
[2] "condition_tumor_eocrc_vs_normal_eocrc"
Volcano plot helps to visualize significant differential expressed genes showing the log2 fold change and the significance (p-adjusted).
To export a .tex table having the top significantly differentiated genes of the EOCRC cohort
In this section, we will add the gene symbol, gene name, KEGG pathway, and Entrez ID to our dataset to provide more informative annotations for the DESeq analysis, which currently uses ENSEMBL gene ID nomenclature (e.g., ENSG00000175832).
# Load modules and libraries
BiocManager::install("AnnotationDbi")
BiocManager::install("org.Hs.eg.db")
library("AnnotationDbi")
library("org.Hs.eg.db")
# Check all potential terms that could be retrieved with AnnotationDbi using
# mapsIds()
columns(org.Hs.eg.db)
[1] "ACCNUM" "ALIAS" "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS"
[6] "ENTREZID" "ENZYME" "EVIDENCE" "EVIDENCEALL" "GENENAME"
[11] "GENETYPE" "GO" "GOALL" "IPI" "MAP"
[16] "OMIM" "ONTOLOGY" "ONTOLOGYALL" "PATH" "PFAM"
[21] "PMID" "PROSITE" "REFSEQ" "SYMBOL" "UCSCKG"
[26] "UNIPROT"
# Review the most significant genes from our study Previously, we've saved a in
# a .tab file the significant DGE results file =
# 'DESeq2_sig_genes_normal_vs_cancer_42_eocrc.tab'
head(resSig)
log2 fold change (MLE): condition tumor eocrc vs normal eocrc
Wald test p-value: condition tumor eocrc vs normal eocrc
DataFrame with 6 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue
<numeric> <numeric> <numeric> <numeric> <numeric>
ENSG00000105464.5 140.904 4.17270 0.354597 11.7674 5.74440e-32
ENSG00000164283.13 109.783 5.23269 0.453264 11.5445 7.87246e-31
ENSG00000175832.14 726.727 4.37081 0.398346 10.9724 5.18803e-28
ENSG00000107159.15 388.768 5.63686 0.553410 10.1857 2.29728e-24
ENSG00000163347.6 625.018 4.50301 0.441776 10.1930 2.13172e-24
ENSG00000170373.9 396.553 7.03499 0.689320 10.2057 1.86982e-24
padj
<numeric>
ENSG00000105464.5 1.92018e-27
ENSG00000164283.13 1.31576e-26
ENSG00000175832.14 5.78068e-24
ENSG00000107159.15 1.02868e-20
ENSG00000163347.6 1.02868e-20
ENSG00000170373.9 1.02868e-20
We proceed by adding annotations starting from ENSEMBL to Entrez and from there to KEGG pathways, gene symbols and gene names. This is because, all major annotation databases (KEGG, org.Hs.eg.db, clusterProfiler) are indexed by Entrez, so doing the conversion first dramatically reduces NA values and gives more complete annotations.
# Strip version
rownames(resSig) <- sub("\\..*$", "", rownames(resSig))
# Add gene symbol column
resSig$SYMBOL = mapIds(org.Hs.eg.db, keys = rownames(resSig), column = "SYMBOL",
keytype = "ENSEMBL", multiVals = "first")
# Add gene name column
resSig$GENENAME = mapIds(org.Hs.eg.db, keys = rownames(resSig), column = "GENENAME",
keytype = "ENSEMBL", multiVals = "first")
# Add gene Entrez ID column
resSig$ENTREZID = mapIds(org.Hs.eg.db, keys = rownames(resSig), column = "ENTREZID",
keytype = "ENSEMBL", multiVals = "first")
# Add gene EGG functional pathway column
resSig$PATH = mapIds(org.Hs.eg.db, keys = rownames(resSig), column = "PATH", keytype = "ENSEMBL",
multiVals = "first")
# Add gene ENSEMBL ID column
resSig$ENSEMBL <- rownames(resSig)
# Convert to dataframe
resSig_df <- as.data.frame(resSig)
print(head(resSig_df), 5)
baseMean log2FoldChange lfcSE stat pvalue
ENSG00000105464 140.9041 4.172698 0.3545970 11.76744 5.744397e-32
ENSG00000164283 109.7826 5.232688 0.4532636 11.54447 7.872463e-31
ENSG00000175832 726.7266 4.370810 0.3983460 10.97239 5.188033e-28
ENSG00000107159 388.7679 5.636863 0.5534101 10.18569 2.297276e-24
padj SYMBOL
ENSG00000105464 1.920180e-27 GRIN2D
ENSG00000164283 1.315764e-26 ESM1
ENSG00000175832 5.780679e-24 ETV4
ENSG00000107159 1.028679e-20 CA9
GENENAME ENTREZID
ENSG00000105464 glutamate ionotropic receptor NMDA type subunit 2D 2906
ENSG00000164283 endothelial cell specific molecule 1 11082
ENSG00000175832 ETS variant transcription factor 4 2118
ENSG00000107159 carbonic anhydrase 9 768
PATH ENSEMBL
ENSG00000105464 04020 ENSG00000105464
ENSG00000164283 <NA> ENSG00000164283
ENSG00000175832 <NA> ENSG00000175832
ENSG00000107159 00910 ENSG00000107159
[ reached 'max' / getOption("max.print") -- omitted 2 rows ]
# Save data frame including annotations
write.table(resSig_df, file = ("DESeq2_sig_genes_normal_vs_cancer_42_eocrc_annotated.tab"),
row.names = F, col.names = TRUE, sep = "\t")
They often reflect: biological heterogeneity, batch effects, cell-type composition differences, high expression strength, and technical noise.
# Load modules and libraries
BiocManager::install("clusterProfiler")
BiocManager::install("ggnewscale")
library("clusterProfiler")
library("enrichplot")
library("ggnewscale")
ID Description Count
GO:0001776 GO:0001776 leukocyte homeostasis 49
GO:0002200 GO:0002200 somatic diversification of immune receptors 31
GO:0002252 GO:0002252 immune effector process 251
GO:0002253 GO:0002253 activation of immune response 222
GO:0002262 GO:0002262 myeloid cell homeostasis 81
GO:0002339 GO:0002339 B cell selection 2
GeneRatio
GO:0001776 49/8835
GO:0002200 31/8835
GO:0002252 251/8835
GO:0002253 222/8835
GO:0002262 81/8835
GO:0002339 2/8835
# Convert to dataframe
ego_df <- as.data.frame(ego)
# Visualize the last columns
head(ego_df, 5)
ID Description GeneRatio BgRatio
GO:0042254 GO:0042254 ribosome biogenesis 237/6884 318/18860
GO:0022613 GO:0022613 ribonucleoprotein complex biogenesis 322/6884 496/18860
GO:0006364 GO:0006364 rRNA processing 168/6884 223/18860
GO:0016072 GO:0016072 rRNA metabolic process 188/6884 270/18860
RichFactor FoldEnrichment zScore pvalue p.adjust
GO:0042254 0.7452830 2.041842 14.20564 5.300063e-44 3.376670e-40
GO:0022613 0.6491935 1.778587 13.32257 1.084468e-38 3.454574e-35
GO:0006364 0.7533632 2.063979 12.11773 1.673786e-32 3.554564e-29
GO:0016072 0.6962963 1.907633 11.38876 9.473546e-29 1.508899e-25
qvalue
GO:0042254 2.814612e-40
GO:0022613 2.879549e-35
GO:0006364 2.962895e-29
GO:0016072 1.257738e-25
geneID
GO:0042254 RPP40/DKC1/NOP58/NOB1/NLE1/UTP14A/RPUSD4/UTP25/NOL8/RRP9/DCAF13/RCL1/WDR43/DDX10/BRIX1/PRKDC/BOP1/MTERF3/WDR75/EXOSC2/TMA16/EXOSC3/GTPBP10/WDR12/RPF2/NPM1/GTF3A/RPS21/DDX31/BYSL/EXOSC5/NOP56/UTP4/NPM3/NAT10/RIOX2/GTPBP4/NHP2/PNO1/HEATR1/EIF1AX/PA2G4/RRP15/MRM2/EXOSC7/LYAR/FBL/EXOSC8/DDX56/UTP18/NOLC1/RAN/DIS3/DDX27/NMD3/WDR74/TRMT61B/WDR3/RRS1/AIRIM/RRP1B/GAR1/NSUN5/NOP2/DDX18/EIF2A/RPUSD1/DIMT1/XRN2/PAK1IP1/RBIS/NOP16/TFB1M/METTL5/UTP23/TRMT112/BUD23/PES1/MRPS2/IMP4/RPL14/CUL4A/RPL7A/RPL35/RIOK1/XPO1/EMG1/NOL6/RPS6/EBNA1BP2/PDCD11/SRFBP1/PIH1D2/RPL7L1/NOL10/UTP20/MALSU1/SUV39H1/RPL10L/RPL7/URB2/RPS7/ZNF593/PWP1/LTV1/RPS16/RPS19/MTG2/NOP14/RIOK3/NOM1/RRP1/RNASEL/MPHOSPH10/WDR46/GEMIN4/CHD7/RPSA/RPS15/LAS1L/MAK16/UTP15/MRTO4/RPLP0/NUDT16/CDKN2A/SIRT7/PPAN/RPL26L1/WDR18/POP7/RPS5/TFB2M/RPL35A/EXOSC4/DDX47/REXO5/TSR1/WDR36/RPL38/REXO4/RSL24D1/SNU13/RPL24/UTP6/SPOUT1/RPS13/ZNHIT6/RPS12/DDX55/RPLP0P6/NIP7/RPS15A/FASTKD2/DDX21/KRR1/RPL27/SDAD1/AATF/DDX52/ISG20/ERI1/RPS17/MYBBP1A/RPS4X/UTP11/EIF4A3/NOL11/URB1/KAT2B/GNL2/RPS27A/AFG2A/C1QBP/RPS28/RPS25/NOL7/GNL3L/GREB1L/ZNHIT3/EXOSC9/RPP30/RPUSD2/DDX28/DHX37/NOC2L/EIF6/RPS14/RPS8/RPS24/ESF1/SURF6/HEATR3/RPS23/BMS1/NOC4L/MCAT/RPL26/POP5/RRP7A/EXOSC1/TBL3/NAF1/PIN4/RPS9/GTF2H5/ABT1/DDX49/RPS27/USP36/NSA2/TENT4B/RPL5/MTREX/ZCCHC4/EIF5B/NUP88/ERCC2/DDX54/RPS3A/DHX30/CUL4B/TRMT2B/MDN1/DROSHA/LSM6/RPL11/RPSA2/MYG1/RBFA/RBM34/FAU/ISG20L2/NOA1/RPS11/ERAL1/EIF5
GO:0022613 RPP40/DKC1/NOP58/NOB1/MCTS2/NLE1/UTP14A/RPUSD4/UTP25/NOL8/RRP9/CLNS1A/DCAF13/HSP90AB1/RCL1/RUVBL1/NUFIP1/EIF2S2/WDR43/DDX10/BRIX1/PRKDC/BOP1/WDR77/MTERF3/WDR75/EXOSC2/TMA16/EXOSC3/GTPBP10/WDR12/RPF2/NPM1/GTF3A/RPS21/DDX31/BYSL/EXOSC5/SRPK1/NOP56/UTP4/NPM3/NAT10/RIOX2/SNRPF/GTPBP4/NHP2/PNO1/HEATR1/EIF3B/EIF1AX/HSP90AB3P/PA2G4/RRP15/MRM2/EXOSC7/LYAR/FBL/EXOSC8/DDX56/UTP18/NOLC1/SRSF5/SNRPE/RAN/DIS3/DDX27/ATR/STRAP/NMD3/EIF3E/WDR74/NOPCHAP1/TRMT61B/WDR3/RRS1/AIRIM/EIF3M/RRP1B/GAR1/NSUN5/PRPF19/NOP2/DDX18/SNRPB/EIF2A/SRPK2/RPUSD1/DIMT1/XRN2/PAK1IP1/RBIS/NOP16/TFB1M/EIF2S3/GEMIN6/METTL5/UTP23/TRMT112/SF3B3/BUD23/GEMIN5/SNRPC/PES1/MRPS2/IMP4/RPL14/CUL4A/EIF3H/RPL7A/RPL35/RIOK1/XPO1/SNRPD2/EMG1/NOL6/RPS6/EBNA1BP2/PDCD11/SRFBP1/PIH1D2/RPL7L1/NOL10/SNRPB2/UTP20/MALSU1/SUV39H1/RPL10L/RPL7/URB2/RPS7/ZNF593/PWP1/AAR2/LTV1/USP39/DENR/TGS1/PRKRA/SNRPG/PTGES3/RPS16/RPS19/MTG2/NOP14/RIOK3/NOM1/TAF9/RRP1/RNASEL/MPHOSPH10/WDR46/GEMIN4/CHD7/RPSA/SNRPD1/RPS15/LAS1L/MAK16/UTP15/SHQ1/MRTO4/RPLP0/NUDT16/CDKN2A/LSM2/SIRT7/TARBP2/PPAN/RPL26L1/MCTS1/RBMX2/EIF4B/WDR18/EIF3D/POP7/CELF3/RPS5/TFB2M/RPL35A/SMN1/EXOSC4/DDX47/REXO5/TSR1/WDR36/RPL38/REXO4/RSL24D1/SNU13/HSP90AA1/RPL24/UTP6/SNRPA1/SPOUT1/CELF4/RPS13/ZNHIT6/SF3A3/SNIP1/RPS12/DDX55/EIF3J/DDX20/RPLP0P6/NIP7/RPS15A/FASTKD2/DDX21/KRR1/RPL27/SDAD1/AATF/SNRPD3/RUVBL2/DDX52/AGO2/ISG20/GEMIN8/ERI1/RPS17/MYBBP1A/RPS4X/UTP11/LSM4/EIF4A3/NOL11/PRPF6/URB1/KAT2B/GNL2/RPS27A/AFG2A/C1QBP/RPS28/RPS25/NOL7/GNL3L/GREB1L/ZNHIT3/EXOSC9/RPP30/RPUSD2/DDX28/DHX37/NOC2L/PHF5A/EIF6/RBM5/RPS14/RPS8/RPS24/ESF1/SURF6/CELF6/HEATR3/PRMT5/RPS23/RNU2-1/BMS1/NOC4L/GEMIN2/RNU6-7/SF3B6/MCAT/RPL26/POP5/LUC7L2/RRP7A/EXOSC1/TBL3/NAF1/EIF3G/RPL13A/SRSF1/SRSF12/EIF3C/PIN4/RPS9/SNRNP200/GTF2H5/ABT1/NCBP1/DDX49/RPS27/USP36/CRNKL1/NSA2/TENT4B/SRSF10/RPL5/MTREX/RNVU1-4/ZCCHC4/EIF5B/CELF5/COIL/NUP88/ERCC2/DDX54/EIF3CL/RPS3A/RNU6ATAC/DHX30/CUL4B/TRMT2B/MDN1/DROSHA/DDX1/LSM6/RPL11/RPSA2/MYG1/RBFA/RBM34/FAU/ISG20L2/NOA1/RPS11/ERAL1/EIF5/RSRP1
GO:0006364 RPP40/DKC1/NOP58/NOB1/UTP14A/RPUSD4/UTP25/NOL8/RRP9/DCAF13/RCL1/WDR43/DDX10/BRIX1/PRKDC/BOP1/WDR75/EXOSC2/EXOSC3/WDR12/RPF2/RPS21/BYSL/EXOSC5/NOP56/UTP4/NPM3/NAT10/GTPBP4/NHP2/HEATR1/PA2G4/RRP15/MRM2/EXOSC7/LYAR/FBL/EXOSC8/DDX56/UTP18/NOLC1/DIS3/DDX27/WDR74/TRMT61B/WDR3/RRS1/RRP1B/GAR1/NSUN5/NOP2/DDX18/RPUSD1/DIMT1/XRN2/PAK1IP1/TFB1M/METTL5/UTP23/TRMT112/BUD23/PES1/IMP4/RPL14/RPL7A/RPL35/RIOK1/EMG1/NOL6/RPS6/EBNA1BP2/PDCD11/SRFBP1/PIH1D2/RPL7L1/NOL10/UTP20/SUV39H1/RPL7/RPS7/PWP1/RPS16/RPS19/NOP14/RIOK3/RRP1/RNASEL/MPHOSPH10/WDR46/GEMIN4/CHD7/RPS15/LAS1L/MAK16/UTP15/MRTO4/NUDT16/CDKN2A/SIRT7/PPAN/WDR18/POP7/TFB2M/RPL35A/EXOSC4/DDX47/REXO5/TSR1/WDR36/REXO4/SNU13/UTP6/SPOUT1/ZNHIT6/DDX55/DDX21/KRR1/RPL27/DDX52/ISG20/ERI1/RPS17/UTP11/EIF4A3/NOL11/URB1/KAT2B/RPS28/RPS25/NOL7/ZNHIT3/EXOSC9/RPP30/RPUSD2/DHX37/EIF6/RPS14/RPS8/RPS24/ESF1/BMS1/NOC4L/RPL26/POP5/RRP7A/EXOSC1/TBL3/NAF1/PIN4/GTF2H5/ABT1/DDX49/RPS27/USP36/NSA2/TENT4B/RPL5/MTREX/ZCCHC4/ERCC2/DDX54/TRMT2B/DROSHA/LSM6/RPL11/MYG1/RBFA/RBM34
GO:0016072 RPP40/DKC1/NOP58/NOB1/POLR1F/MYC/UTP14A/RPUSD4/UTP25/NOL8/RRP9/DCAF13/RCL1/WDR43/DDX10/BRIX1/PRKDC/BOP1/POLR1G/WDR75/EXOSC2/NIFK/EXOSC3/WDR12/RPF2/GTF3A/RPS21/BYSL/POLR1D/EXOSC5/NOP56/UTP4/NPM3/POLR1B/NAT10/GTPBP4/NHP2/HEATR1/PA2G4/RRP15/MRM2/EXOSC7/LYAR/FBL/EXOSC8/DDX56/UTP18/NOLC1/DIS3/DDX27/WDR74/TRMT61B/WDR3/RRS1/RRP1B/GAR1/NSUN5/GTF3C3/NOP2/DDX18/RPUSD1/MAPT/DIMT1/XRN2/PAK1IP1/TFB1M/METTL5/UTP23/TRMT112/BUD23/PES1/IMP4/RPL14/RPL7A/RPL35/RIOK1/NCL/POLR1A/EMG1/NOL6/RPS6/EBNA1BP2/PDCD11/SRFBP1/PIH1D2/RPL7L1/NOL10/UTP20/SUV39H1/POLR1E/RPL7/RPS7/PWP1/RPS16/RPS19/NOP14/RIOK3/RRP1/RNASEL/MPHOSPH10/WDR46/GTF3C4/GEMIN4/CHD7/RPS15/LAS1L/MAK16/UTP15/MRTO4/NUDT16/CDKN2A/SIRT7/PPAN/TCOF1/MARS1/WDR18/POP7/TFB2M/RPL35A/EXOSC4/DDX47/REXO5/TSR1/WDR36/REXO4/SNU13/UTP6/SPOUT1/ZNHIT6/TAF1B/DDX55/DDX21/KRR1/RPL27/DDX52/ISG20/ERI1/RPS17/UTP11/EIF4A3/NOL11/URB1/DDX11/KAT2B/RPS28/RPS25/NOL7/ZNHIT3/EXOSC9/RPP30/RPUSD2/DHX37/EIF6/RPS14/RPS8/RPS24/ESF1/BMS1/NOC4L/CAVIN1/RPL26/POP5/RRP7A/EXOSC1/TBL3/NAF1/PIN4/TP53/GTF2H5/ABT1/DDX49/RPS27/USP36/NSA2/TENT4B/RPL5/MTREX/ZCCHC4/ERCC2/ERN2/DDX54/TRMT2B/DROSHA/LSM6/RPL11/MYG1/RBFA/RBM34
Count
GO:0042254 237
GO:0022613 322
GO:0006364 168
GO:0016072 188
[ reached 'max' / getOption("max.print") -- omitted 1 rows ]
# Get dataframe dimmensions
dim(ego_df)
[1] 488 12
The Gene-Concept Network displays which genes are associated with which biological processes, making it easy to see gene-function relationships. Meanwhile, the enrichment map groups similar terms together. Terms that share genes are connected, making related functions easy to spot.
# Load modules and libraries
library(DOSE)
# Convert gene ID to Symbol
egox <- setReadable(ego, "org.Hs.eg.db", "ENTREZID")
# Generate geneList with Log2FoldChange data
geneList <- resSig$log2FoldChange
names(geneList) <- as.character(unique(resSig$ENTREZID))
geneList <- sort(geneList, decreasing = TRUE)
head(geneList)
338094 81502 23333 1469 4085 25984
23.454332 7.598031 7.145805 7.034988 6.663326 6.428938
# Network plot of enriched terms CategorySize can be scaled by 'pvalue' or
# 'geneNum'
p1 <- cnetplot(egox, categorySize = "pvalue", foldChange = geneList)
# Display the plot
print(p1)
# Calculate pairwise similarities of enriched terms
egox_pairwise <- pairwise_termsim(egox)
# Creates an enrichment map visualization from pairwise enrichment analysis
# results
p2 <- emapplot(egox_pairwise, layout = "kk")
# Display the plot
print(p2)
# Plot Gene-Concept Network and Enrichment Map in a PDF file
# Plot 1
pdf("gene_concept_network_plot.pdf", width = 12, height = 10)
print(p1)
dev.off()
quartz_off_screen
2
# Plot 2
pdf("enrichment_map.pdf", width = 12, height = 10)
print(p2)
dev.off()
quartz_off_screen
2
# # Start PDF device pdf(file=paste('Merge_GO_Gene_Set_Enrichment_Analysis',
# 'CC', '.pdf', sep='_'), width=25, height=10) # Create the plot
# cowplot::plot_grid(p1, p2, ncol=2, labels=LETTERS[1:3], rel_widths=c(.8, .8,
# 1.2)) # Close PDF device dev.off()
# Get results from the DESeq2 pipeline by contrast
tumor_vs_normal_42_dds <- results(dds_DGE, contrast = c("condition", "tumor_eocrc",
"normal_eocrc"))
summary(tumor_vs_normal_42_dds)
out of 37827 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 6942, 18%
LFC < 0 (down) : 6130, 16%
outliers [1] : 0, 0%
low counts [2] : 4438, 12%
(mean count < 0)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
# Save results
write.csv(tumor_vs_normal_42_dds, "tumor_vs_normal_42_dds.csv")
Get differentially expressed genes by contrast
tumor_eocrc_vs_normal_eocrc:
log2 fold change (MLE): condition tumor_eocrc vs normal_eocrc
Wald test p-value: condition tumor eocrc vs normal eocrc
DataFrame with 5 rows and 5 columns
log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric>
ENSG00000105464.5 4.17270 0.354597 11.7674 5.74440e-32 1.92018e-27
ENSG00000164283.13 5.23269 0.453264 11.5445 7.87246e-31 1.31576e-26
ENSG00000175832.14 4.37081 0.398346 10.9724 5.18803e-28 5.78068e-24
ENSG00000107159.15 5.63686 0.553410 10.1857 2.29728e-24 1.02868e-20
ENSG00000163347.6 4.50301 0.441776 10.1930 2.13172e-24 1.02868e-20
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.5.1 (2025-06-13)
os macOS Ventura 13.7.3
system x86_64, darwin20
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz Europe/Madrid
date 2026-02-03
pandoc 3.2 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/x86_64/ (via rmarkdown)
quarto 1.5.57 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/quarto
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
abind 1.4-8 2024-09-12 [1] CRAN (R 4.5.0)
affy 1.88.0 2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
affyio 1.80.0 2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
AnnotationDbi * 1.72.0 2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
ape 5.8-1 2024-12-16 [1] CRAN (R 4.5.0)
apeglm * 1.32.0 2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
aplot 0.2.9 2025-09-12 [1] CRAN (R 4.5.1)
base64enc 0.1-3 2015-07-28 [1] CRAN (R 4.5.0)
bbmle 1.0.25.1 2023-12-09 [1] CRAN (R 4.5.0)
bdsmatrix 1.3-7 2024-03-02 [1] CRAN (R 4.5.0)
beeswarm 0.4.0 2021-06-01 [1] CRAN (R 4.5.0)
Biobase * 2.70.0 2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
BiocGenerics * 0.56.0 2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
BiocManager * 1.30.27 2025-11-14 [1] CRAN (R 4.5.1)
BiocParallel 1.44.0 2025-10-30 [1] Bioconductor 3.22 (R 4.5.1)
Biostrings 2.78.0 2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
bit 4.6.0 2025-03-06 [1] CRAN (R 4.5.0)
bit64 4.6.0-1 2025-01-16 [1] CRAN (R 4.5.0)
blob 1.2.4 2023-03-17 [1] CRAN (R 4.5.0)
bslib 0.9.0 2025-01-30 [1] CRAN (R 4.5.0)
cachem 1.1.0 2024-05-16 [1] CRAN (R 4.5.0)
cli 3.6.5 2025-04-23 [1] CRAN (R 4.5.0)
cluster 2.1.8.1 2025-03-12 [1] CRAN (R 4.5.1)
clusterProfiler * 4.18.4 2025-12-15 [1] Bioconductor 3.22 (R 4.5.2)
coda 0.19-4.1 2024-01-31 [1] CRAN (R 4.5.0)
codetools 0.2-20 2024-03-31 [1] CRAN (R 4.5.1)
corrplot * 0.95 2024-10-14 [1] CRAN (R 4.5.0)
cowplot 1.2.0 2025-07-07 [1] CRAN (R 4.5.1)
crayon 1.5.3 2024-06-20 [1] CRAN (R 4.5.0)
data.table 1.17.8 2025-07-10 [1] CRAN (R 4.5.1)
DBI 1.2.3 2024-06-02 [1] CRAN (R 4.5.0)
DelayedArray 0.36.0 2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
DESeq2 * 1.50.2 2025-11-17 [1] Bioconductor 3.22 (R 4.5.2)
devtools 2.4.6 2025-10-03 [1] CRAN (R 4.5.1)
digest 0.6.39 2025-11-19 [1] CRAN (R 4.5.1)
DOSE * 4.4.0 2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.5.0)
ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.5.0)
emdbook 1.3.14 2025-07-23 [1] CRAN (R 4.5.1)
enrichplot * 1.30.4 2025-12-01 [1] Bioconductor 3.22 (R 4.5.2)
evaluate 1.0.5 2025-08-27 [1] CRAN (R 4.5.1)
factoextra * 1.0.7 2020-04-01 [1] CRAN (R 4.5.0)
farver 2.1.2 2024-05-13 [1] CRAN (R 4.5.0)
fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.5.0)
fastmatch 1.1-6 2024-12-23 [1] CRAN (R 4.5.0)
fgsea 1.36.0 2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
fontBitstreamVera 0.1.1 2017-02-01 [1] CRAN (R 4.5.0)
fontLiberation 0.1.0 2016-10-15 [1] CRAN (R 4.5.0)
fontquiver 0.2.1 2017-02-01 [1] CRAN (R 4.5.0)
formatR 1.14 2023-01-17 [1] CRAN (R 4.5.0)
fs 1.6.6 2025-04-12 [1] CRAN (R 4.5.0)
gdtools 0.4.4 2025-10-06 [1] CRAN (R 4.5.1)
generics * 0.1.4 2025-05-09 [1] CRAN (R 4.5.0)
GenomicRanges * 1.62.1 2025-12-08 [1] Bioconductor 3.22 (R 4.5.2)
ggbeeswarm * 0.7.3 2025-11-29 [1] CRAN (R 4.5.1)
ggforce 0.5.0 2025-06-18 [1] CRAN (R 4.5.0)
ggfortify * 0.4.19 2025-07-27 [1] CRAN (R 4.5.1)
ggfun 0.2.0 2025-07-15 [1] CRAN (R 4.5.1)
ggiraph 0.9.2 2025-10-07 [1] CRAN (R 4.5.1)
ggnewscale * 0.5.2 2025-06-20 [1] CRAN (R 4.5.0)
ggplot2 * 4.0.1 2025-11-14 [1] CRAN (R 4.5.1)
ggplotify 0.1.3 2025-09-20 [1] CRAN (R 4.5.1)
ggrepel * 0.9.6 2024-09-07 [1] CRAN (R 4.5.0)
ggtangle 0.0.9 2025-11-30 [1] CRAN (R 4.5.1)
ggtree 4.0.3 2025-12-18 [1] Bioconductor 3.22 (R 4.5.2)
glue 1.8.0 2024-09-30 [1] CRAN (R 4.5.0)
GO.db 3.22.0 2025-11-21 [1] Bioconductor
GOSemSim 2.36.0 2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
gridExtra * 2.3 2017-09-09 [1] CRAN (R 4.5.0)
gridGraphics 0.5-1 2020-12-13 [1] CRAN (R 4.5.0)
gson 0.1.0 2023-03-07 [1] CRAN (R 4.5.0)
gtable 0.3.6 2024-10-25 [1] CRAN (R 4.5.0)
hexbin * 1.28.5 2024-11-13 [1] CRAN (R 4.5.0)
htmltools 0.5.9 2025-12-04 [1] CRAN (R 4.5.1)
htmlwidgets 1.6.4 2023-12-06 [1] CRAN (R 4.5.0)
httr 1.4.7 2023-08-15 [1] CRAN (R 4.5.0)
igraph 2.2.1 2025-10-27 [1] CRAN (R 4.5.1)
IRanges * 2.44.0 2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.5.0)
jsonlite 2.0.0 2025-03-27 [1] CRAN (R 4.5.0)
KEGGREST 1.50.0 2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
knitr * 1.51 2025-12-20 [1] CRAN (R 4.5.1)
labeling 0.4.3 2023-08-29 [1] CRAN (R 4.5.0)
lattice 0.22-7 2025-04-02 [1] CRAN (R 4.5.1)
lazyeval 0.2.2 2019-03-15 [1] CRAN (R 4.5.0)
lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.5.0)
limma 3.66.0 2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
locfit 1.5-9.12 2025-03-05 [1] CRAN (R 4.5.0)
magrittr 2.0.4 2025-09-12 [1] CRAN (R 4.5.1)
MASS 7.3-65 2025-02-28 [1] CRAN (R 4.5.1)
Matrix 1.7-4 2025-08-28 [1] CRAN (R 4.5.1)
MatrixGenerics * 1.22.0 2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
matrixStats * 1.5.0 2025-01-07 [1] CRAN (R 4.5.0)
memoise 2.0.1 2021-11-26 [1] CRAN (R 4.5.0)
mvtnorm 1.3-3 2025-01-10 [1] CRAN (R 4.5.0)
nlme 3.1-168 2025-03-31 [1] CRAN (R 4.5.1)
numDeriv 2016.8-1.1 2019-06-06 [1] CRAN (R 4.5.0)
org.Hs.eg.db * 3.22.0 2025-11-20 [1] Bioconductor
otel 0.2.0 2025-08-29 [1] CRAN (R 4.5.1)
patchwork 1.3.2 2025-08-25 [1] CRAN (R 4.5.1)
pheatmap * 1.0.13 2025-06-05 [1] CRAN (R 4.5.0)
pillar 1.11.1 2025-09-17 [1] CRAN (R 4.5.1)
pkgbuild 1.4.8 2025-05-26 [1] CRAN (R 4.5.0)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.5.0)
pkgload 1.4.1 2025-09-23 [1] CRAN (R 4.5.1)
plyr 1.8.9 2023-10-02 [1] CRAN (R 4.5.0)
png 0.1-8 2022-11-29 [1] CRAN (R 4.5.0)
polyclip 1.10-7 2024-07-23 [1] CRAN (R 4.5.0)
preprocessCore 1.72.0 2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
purrr 1.2.0 2025-11-04 [1] CRAN (R 4.5.1)
qvalue 2.42.0 2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.5.0)
R.oo 1.27.1 2025-05-02 [1] CRAN (R 4.5.0)
R.utils 2.13.0 2025-02-24 [1] CRAN (R 4.5.0)
R6 2.6.1 2025-02-15 [1] CRAN (R 4.5.0)
rappdirs 0.3.3 2021-01-31 [1] CRAN (R 4.5.0)
RColorBrewer * 1.1-3 2022-04-03 [1] CRAN (R 4.5.0)
Rcpp 1.1.0 2025-07-02 [1] CRAN (R 4.5.1)
remotes 2.5.0 2024-03-17 [1] CRAN (R 4.5.0)
repr 1.1.7 2024-03-22 [1] CRAN (R 4.5.0)
reshape2 * 1.4.5 2025-11-12 [1] CRAN (R 4.5.1)
rlang 1.1.6 2025-04-11 [1] CRAN (R 4.5.0)
rmarkdown 2.30 2025-09-28 [1] CRAN (R 4.5.1)
RSQLite 2.4.5 2025-11-30 [1] CRAN (R 4.5.1)
rstudioapi 0.17.1 2024-10-22 [1] CRAN (R 4.5.0)
S4Arrays 1.10.1 2025-12-01 [1] Bioconductor 3.22 (R 4.5.2)
S4Vectors * 0.48.0 2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
S7 0.2.1 2025-11-14 [1] CRAN (R 4.5.1)
sass 0.4.10 2025-04-11 [1] CRAN (R 4.5.0)
scales * 1.4.0 2025-04-24 [1] CRAN (R 4.5.0)
scatterpie 0.2.6 2025-09-12 [1] CRAN (R 4.5.1)
Seqinfo * 1.0.0 2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
sessioninfo 1.2.3 2025-02-05 [1] CRAN (R 4.5.0)
skimr * 2.2.2 2026-01-10 [1] CRAN (R 4.5.1)
SparseArray 1.10.8 2025-12-18 [1] Bioconductor 3.22 (R 4.5.2)
statmod 1.5.1 2025-10-09 [1] CRAN (R 4.5.1)
stringi 1.8.7 2025-03-27 [1] CRAN (R 4.5.0)
stringr 1.6.0 2025-11-04 [1] CRAN (R 4.5.1)
SummarizedExperiment * 1.40.0 2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
systemfonts 1.3.1 2025-10-01 [1] CRAN (R 4.5.1)
tibble 3.3.0 2025-06-08 [1] CRAN (R 4.5.0)
tidydr 0.0.6 2025-07-25 [1] CRAN (R 4.5.1)
tidyr * 1.3.2 2025-12-19 [1] CRAN (R 4.5.1)
tidyselect 1.2.1 2024-03-11 [1] CRAN (R 4.5.0)
tidytree 0.4.6 2023-12-12 [1] CRAN (R 4.5.0)
treeio 1.34.0 2025-10-30 [1] Bioconductor 3.22 (R 4.5.1)
tweenr 2.0.3 2024-02-26 [1] CRAN (R 4.5.0)
usethis 3.2.1 2025-09-06 [1] CRAN (R 4.5.1)
vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.5.0)
vipor 0.4.7 2023-12-18 [1] CRAN (R 4.5.0)
vsn * 3.78.1 2026-01-13 [1] https://bioc-release.r-universe.dev (R 4.5.2)
withr 3.0.2 2024-10-28 [1] CRAN (R 4.5.0)
xfun 0.55 2025-12-16 [1] CRAN (R 4.5.1)
XVector 0.50.0 2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
yaml 2.3.12 2025-12-10 [1] CRAN (R 4.5.1)
yulab.utils 0.2.3 2025-12-15 [1] CRAN (R 4.5.1)
[1] /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library
* ── Packages attached to the search path.
──────────────────────────────────────────────────────────────────────────────