Acknowledgement: this report has been inspired by materials used in the Advanced Bioinformatics course taught by Igor Ruiz de los Mozos at UPNA.

1 Data preparation

1.1 Load dataset

After completing the nf-core/rnaseq pipeline using Salmon pseudoalignment for the analysis of 42 colorectal samples from SRA study SRP357925 will obtain the salmon.merged.gene.SummarizedExperiment.rds file. This file contains gene-level quantification matrices (counts, TPMs, effective lengths) plus sample metadata and annotations produced from the merged Salmon quantifications.

Let’s load the salmon.merged.gene.SummarizedExperiment.rds file. The content of the file has a SummarizedExperiment representation, which is an R/Bioconductor container that stores assay matrices (e.g., counts), feature metadata, and sample metadata in one coherent object.

# Specify a working directory
directory <- "/Volumes/DataWinBackup/00_MASTER UOC BIOINFORMATICA - TEMPORAL/TFM-UOC/Deseq2_42eocrc"
setwd(directory)
list.files()

 [1] "dds_DGE_42.rds"                                          
 [2] "DE_tumor_vs_normal_42.csv"                               
 [3] "Deseq2_42crc copy.Rmd"                                   
 [4] "Deseq2_42crc_cache"                                      
 [5] "Deseq2_42crc_files"                                      
 [6] "Deseq2_42crc-copy_cache"                                 
 [7] "Deseq2_42crc-copy.Rmd"                                   
 [8] "Deseq2_42crc.html"                                       
 [9] "Deseq2_42crc.Rmd"                                        
[10] "DESeq2_sig_genes_normal_vs_cancer_42_eocrc_annotated.tab"
[11] "DESeq2_sig_genes_normal_vs_cancer_42_eocrc.tab"          
[12] "DESeq2_sig_genes_normal_vs_cancer_44_locrc_annotated.tab"
[13] "enrichment_map.pdf"                                      
[14] "figures"                                                 
[15] "gene_concept_network_plot.pdf"                           
[16] "gene.SummarizedExperiment.metadata_42.rds"               
[17] "images"                                                  
[18] "Merge_GO_Gene_Set_Enrichment_Analysis_CC_.pdf"           
[19] "salmon.merged.gene.SummarizedExperiment.rds"             
[20] "sraruntable_srp357925_crc42_metadata.csv"                
[21] "sraruntable_srp357925_crc42.csv"                         
[22] "tables"                                                  
[23] "tumor_vs_normal_42_dds.csv"

# Read 'se' object obtained from nfcore/rnaseq
se <- readRDS("salmon.merged.gene.SummarizedExperiment.rds")

# Display se object
print(se)

class: SummarizedExperiment 
dim: 78428 42 
metadata(0):
assays(5): counts counts_length_scaled counts_scaled lengths tpm
rownames(78428): ENSG00000000003.17 ENSG00000000005.6 ...
  ENSG00000310592.1 ENSG00000310593.1
rowData names(3): transcript_id gene_id gene_name
colnames(42): SRR17866817 SRR17866818 ... SRR17866857 SRR17866858
colData names(4): sample fastq_1 fastq_2 strandedness

We focus on the counts assay matrix and check information on columns and rows.

# Check the first elements of the counts matrix
head(assay(se, "counts")[, 1:2], 5)

                   SRR17866817 SRR17866818
ENSG00000000003.17    5220.281    2132.471
ENSG00000000005.6       40.000      18.000
ENSG00000000419.15    2176.673     670.315
ENSG00000000457.15     417.069     352.181
ENSG00000000460.18     385.301      88.000

# Collect sample information
head(colData(se)[, 2:3], 5)

DataFrame with 5 rows and 2 columns
                           fastq_1                fastq_2
                       <character>            <character>
SRR17866817 data/reads/SRR178668.. data/reads/SRR178668..
SRR17866818 data/reads/SRR178668.. data/reads/SRR178668..
SRR17866819 data/reads/SRR178668.. data/reads/SRR178668..
SRR17866820 data/reads/SRR178668.. data/reads/SRR178668..
SRR17866821 data/reads/SRR178668.. data/reads/SRR178668..

# Collect gene information
head(rowData(se), 3)

DataFrame with 3 rows and 3 columns
                            transcript_id            gene_id   gene_name
                              <character>        <character> <character>
ENSG00000000003.17 ENST00000373020.9,EN.. ENSG00000000003.17      TSPAN6
ENSG00000000005.6  ENST00000373031.5,EN..  ENSG00000000005.6        TNMD
ENSG00000000419.15 ENST00000858753.1,EN.. ENSG00000000419.15        DPM1

# Collect genomic ranges
head(rowRanges(se), 3)

NULL

Below, we can obtain the name of the 42 samples.

# Display colNames
colData(se)$sample

 [1] "SRR17866817" "SRR17866818" "SRR17866819" "SRR17866820" "SRR17866821"
 [6] "SRR17866822" "SRR17866823" "SRR17866824" "SRR17866825" "SRR17866826"
[11] "SRR17866827" "SRR17866828" "SRR17866829" "SRR17866830" "SRR17866831"
[16] "SRR17866832" "SRR17866833" "SRR17866834" "SRR17866835" "SRR17866836"
[21] "SRR17866837" "SRR17866838" "SRR17866839" "SRR17866840" "SRR17866841"
[26] "SRR17866842" "SRR17866843" "SRR17866844" "SRR17866845" "SRR17866846"
[31] "SRR17866847" "SRR17866848" "SRR17866849" "SRR17866850" "SRR17866851"
[36] "SRR17866852" "SRR17866853" "SRR17866854" "SRR17866855" "SRR17866856"
[41] "SRR17866857" "SRR17866858"

Since the colData field in the se object is limited to data from the original FASTQ files, it would be useful to supplement it with metadata from the SRA study, such as patient age and tissue type.

1.2 Preparation of .csv files

In this section we describe the steps to add metadata from the SRA study into the colData field of the SummarizedExperiment object obtained from the nf-core/rnaseq pipeline:

Go to SRA study SRP357925
Download metadata of 42 runs
Open the downloaded SraRunTable.csv file in Google Sheets
Convert the spreadsheet into a table
Order data by run number SRR17866817, SRR17866818, SRR17866819…
Download it as .csv file and rename it sraruntable_srp357925_crc42.csv

Here are the first lines of the sraruntable_srp357925_crc42.csv file.

Run,age_at_surgery,tissue,tumor_stage,tissue_type,Sample Name,SRA Study,Assay Type,AvgSpotLen,Bases,BioProject,BioSample,Bytes,Center Name,Consent,DATASTORE filetype,DATASTORE provider,DATASTORE region,diagnosis,Experiment,Instrument,Library Name,LibraryLayout,LibrarySelection,LibrarySource,Organism,Patient_ID,Platform,ReleaseDate,create_date,version,source_name
SRR17866817,49.1,Sigmoid Colon,1,Tumor tissue,GSM5857839,SRP357925,RNA-Seq,113,3159273528,PRJNA802883,SAMN25601372,1096074179,PENN STATE COLLEGE OF MEDICINE,public,"sra,fastq,run.zq","s3,gs,ncbi","s3.us-east-1,gs.us-east1,ncbi.public",Early onset colorectal cancer,SRX14026607,Illumina MiSeq,GSM5857839,PAIRED,cDNA,TRANSCRIPTOMIC,Homo sapiens,20.327,ILLUMINA,2022-09-06T00:00:00Z,2022-02-02T21:46:00Z,1,Sigmoid Colon
SRR17866818,49.1,Sigmoid Colon,NA,Adjacent normal tissue,GSM5857838,SRP357925,RNA-Seq,113,2373327113,PRJNA802883,SAMN25601373,833169111,PENN STATE COLLEGE OF MEDICINE,public,"fastq,sra,run.zq","ncbi,s3,gs","s3.us-east-1,gs.us-east1,ncbi.public",Early onset colorectal cancer,SRX14026606,Illumina MiSeq,GSM5857838,PAIRED,cDNA,TRANSCRIPTOMIC,Homo sapiens,20.327,ILLUMINA,2022-09-06T00:00:00Z,2022-02-02T21:40:00Z,1,Sigmoid Colon

As follows, a metadata .csv file was created using information from BioProject: PRJNA802883. Here are the first lines of the created sraruntable_srp357925_crc42_metadata.csv file.

key,values
abstract,"Despite a global decrease in colorectal cancer (CRC) incidence, the prevalence of early onset colorectal cancer (EOCRC), or those occurring in individuals before the age of 50..."
Accession,PRJNA802883; GEO: GSE196006

1.3 Update SummarizedExperiment object

Let’s add metadata contained in the previously created files into the se object.

After loading the .csv files, here we can see the samples metadata represented as a dataframe named df_coldata:

# Load files
df_coldata <- read.csv("sraruntable_srp357925_crc42.csv", header = TRUE, row.names = 1,
    check.names = FALSE)

# Display first rows
head(df_coldata[, 1:5], n = 3)

            age_at_surgery        tissue tumor_stage            tissue_type
SRR17866817           49.1 Sigmoid Colon           1           Tumor tissue
SRR17866818           49.1 Sigmoid Colon          NA Adjacent normal tissue
SRR17866819           44.2        Rectum           3           Tumor tissue
            Sample Name
SRR17866817  GSM5857839
SRR17866818  GSM5857838
SRR17866819  GSM5857837

and below, it’s the metadata corresponding to the SRA study represented as a dataframe named df_metadata with two columns: key and values.

# Load files
df_metadata <- read.csv("sraruntable_srp357925_crc42_metadata.csv", header = TRUE,
    check.names = FALSE)

# Display first rows
head(df_metadata, n = 3)

        key
1  abstract
2 Accession
3 Data Type
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        values
1 Despite a global decrease in colorectal cancer (CRC) incidence, the prevalence of early onset colorectal cancer (EOCRC), or those occurring in individuals before the age of 50, has been steadily increasing over the past several decades. When compared to later onset colorectal cancers (LOCRCs) in individuals over 50, our understanding of the genetic and molecular underpinnings of EOCRCs is limited. Here, we conducted transcriptomic analysis of patient-matched normal colonic segments and tumors to identify gene expression programs involved in carcinogenesis. Amongst differentially expressed genes, we found increased expression of the c-MYC (MYC) proto-oncogene and its downstream targets in tumor samples. We identify tumors with high and low differential MYC expression, and patients with high-MYC tumors were older and overweight or obese. We also detect elevated expression of the PVT1 long-non-coding RNA (lncRNA) in most tumors and find gains in copy number for both MYC and PVT1 gene loci in 35% of tumors evaluated. Our transcriptome analysis indicates that EOCRC can be sub-classified into groups based on differential MYC expression and suggests that MYC is a critical driver of most CRCs that develop in younger patients. Overall design: Gene expression profiles of 21 surgically resected tumors and patient-matched adjacent colonic segments from early onset colorectal cancer patients were generated by deep sequencing.
2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  PRJNA802883; GEO: GSE196006
3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             Transcriptome or Gene expression

The lower_underscore_nospaces() function will help us standardize the format of character-type data.

# Define the lower_underscore_nospaces() function
lower_underscore_nospaces <- function(df) {
    non_numeric_col <- names(df)[!sapply(df, is.numeric)]
    for (col in non_numeric_col) {
        df[[col]] <- tolower(df[[col]])  # lowercase
        df[[col]] <- gsub("[^a-z0-9]+", "_", df[[col]])  # replace non-alphanumeric with underscores
        df[[col]] <- gsub("_$", "", df[[col]])  # remove trailing underscores
    }
    if (!is.null(names(df))) {
        names(df) <- tolower(names(df))
        names(df) <- gsub("[^a-z0-9]+", "_", names(df))
        names(df) <- gsub("_$", "", names(df))
    }
    if (!is.null(rownames(df))) {
        rownames(df) <- tolower(rownames(df))
        rownames(df) <- gsub("[^a-z0-9]+", "_", rownames(df))
        rownames(df) <- gsub("_$", "", rownames(df))
    }
    return(df)
}

# Apply lower_underscore_nospaces to all datasets
df_coldata <- lower_underscore_nospaces(df_coldata)

# Display last changes on dataframes
head(df_coldata[, 1:5], n = 3)

            age_at_surgery        tissue tumor_stage            tissue_type
srr17866817           49.1 sigmoid_colon           1           tumor_tissue
srr17866818           49.1 sigmoid_colon          NA adjacent_normal_tissue
srr17866819           44.2        rectum           3           tumor_tissue
            sample_name
srr17866817  gsm5857839
srr17866818  gsm5857838
srr17866819  gsm5857837

Since the metadata(se) field in a SummarizedExperiment object stores data as a named list, let’s convert df_metadata to this format.

# Convert metadata dataframe into a named list
list_metadata <- as.list(df_metadata$values)
names(list_metadata) <- df_metadata$key
list_metadata

$abstract
[1] "Despite a global decrease in colorectal cancer (CRC) incidence, the prevalence of early onset colorectal cancer (EOCRC), or those occurring in individuals before the age of 50, has been steadily increasing over the past several decades. When compared to later onset colorectal cancers (LOCRCs) in individuals over 50, our understanding of the genetic and molecular underpinnings of EOCRCs is limited. Here, we conducted transcriptomic analysis of patient-matched normal colonic segments and tumors to identify gene expression programs involved in carcinogenesis. Amongst differentially expressed genes, we found increased expression of the c-MYC (MYC) proto-oncogene and its downstream targets in tumor samples. We identify tumors with high and low differential MYC expression, and patients with high-MYC tumors were older and overweight or obese. We also detect elevated expression of the PVT1 long-non-coding RNA (lncRNA) in most tumors and find gains in copy number for both MYC and PVT1 gene loci in 35% of tumors evaluated. Our transcriptome analysis indicates that EOCRC can be sub-classified into groups based on differential MYC expression and suggests that MYC is a critical driver of most CRCs that develop in younger patients. Overall design: Gene expression profiles of 21 surgically resected tumors and patient-matched adjacent colonic segments from early onset colorectal cancer patients were generated by deep sequencing."

$Accession
[1] "PRJNA802883; GEO: GSE196006"

$`Data Type`
[1] "Transcriptome or Gene expression"

$Scope
[1] "Multiisolate"

$Organism
[1] "Homo sapiens[Taxonomy ID: 9606]Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo; Homo sapiens"

$Publications
[1] "Marx OM et al., Transcriptome Analyses Identify Deregulated MYC in Early Onset Colorectal Cancer., Biomolecules, 2022 Sep 2;12(9)"

$Submission
[1] "Registration date: 2-Feb-2022 Penn State College of Medicine"

$Relevance
[1] "Medical"

Now let’s convert R regular dataframes into Bioconductor S4 DataFrame objects.

# Convert regular R dataframes to Bioconductor S4 DataFrame objects
colData <- DataFrame(df_coldata)

# Visualize data converted to S4 DataFrame S4 type
print(colData)

DataFrame with 42 rows and 31 columns
            age_at_surgery          tissue tumor_stage            tissue_type
                 <numeric>     <character>   <integer>            <character>
            sample_name   sra_study  assay_type avgspotlen       bases
            <character> <character> <character>  <integer>   <numeric>
             bioproject    biosample      bytes            center_name
            <character>  <character>  <numeric>            <character>
                consent datastore_filetype datastore_provider
            <character>        <character>        <character>
                  datastore_region              diagnosis  experiment
                       <character>            <character> <character>
                instrument library_name librarylayout libraryselection
               <character>  <character>   <character>      <character>
             librarysource     organism patient_id    platform
               <character>  <character>  <numeric> <character>
                     releasedate          create_date   version     source_name
                     <character>          <character> <integer>     <character>
 [ reached 'max' / getOption("max.print") -- omitted 11 rows ]

and verify that their dimensions match.

# Check dimmensions of each dataset
cat("Dimensions of counts assay:", dim(assay(se, "counts")), "\n", "Dimensions of colData:",
    dim(colData), "\n", "Dimensions of metadata:", length(list_metadata), "\n")

Dimensions of counts assay: 78428 42 
 Dimensions of colData: 42 31 
 Dimensions of metadata: 8

Finally, let’s update the SummarizedExperiment object to include information on colData and metadata fields.

# Update SE object
colData(se) <- colData

# Add metadata
metadata(se) <- list_metadata
se

class: SummarizedExperiment 
dim: 78428 42 
metadata(8): abstract Accession ... Submission Relevance
assays(5): counts counts_length_scaled counts_scaled lengths tpm
rownames(78428): ENSG00000000003.17 ENSG00000000005.6 ...
  ENSG00000310592.1 ENSG00000310593.1
rowData names(3): transcript_id gene_id gene_name
colnames(42): srr17866817 srr17866818 ... srr17866857 srr17866858
colData names(31): age_at_surgery tissue ... version source_name

We’re on the right track. To move on, let’s include a column named condition, which will convert the tumor_stage column of colData into a binary factor variable with levels normal and tumor. A good practice in R is that the first level be the reference one (e.g. normal, control, or untreated samples). This will come handy when conducting the differential expression analysis with the volcano plots.

# Create new colData column named condition
colData(se)$condition <- ifelse(is.na(colData(se)$tumor_stage), "normal_eocrc", "tumor_eocrc")
colData(se)$condition <- as.factor(colData(se)$condition)
colData(se)$condition

 [1] tumor_eocrc  normal_eocrc tumor_eocrc  normal_eocrc tumor_eocrc 
 [6] normal_eocrc tumor_eocrc  normal_eocrc tumor_eocrc  normal_eocrc
[11] tumor_eocrc  normal_eocrc tumor_eocrc  normal_eocrc tumor_eocrc 
[16] normal_eocrc tumor_eocrc  normal_eocrc tumor_eocrc  normal_eocrc
[21] tumor_eocrc  normal_eocrc tumor_eocrc  normal_eocrc tumor_eocrc 
[26] normal_eocrc tumor_eocrc  normal_eocrc tumor_eocrc  normal_eocrc
[31] tumor_eocrc  normal_eocrc tumor_eocrc  normal_eocrc tumor_eocrc 
[36] normal_eocrc tumor_eocrc  normal_eocrc tumor_eocrc  normal_eocrc
[41] tumor_eocrc  normal_eocrc
Levels: normal_eocrc tumor_eocrc

We can now save the se object for reuse, eliminating the need to rerun this section.

# Save the updated `se` object to a new file
saveRDS(se, file = "gene.SummarizedExperiment.metadata_42.rds")

1.4 Create a DESeqDataSet data object

A DESeqDataSet (dds) object encapsulates all components needed for DE analysis. It ensures statistical correctness and handles data integrity by combining: - Count data (from assays) - Sample metadata (from colData) - Experimental design (formula) - Gene metadata (from rowData)

The first step to convert our se into DESeqDataSet is to ensure that count data from the assay be a matrix data type. Currently, assay(se) is dataframe class containing lists.

# Check class and mode of each assay in se
sapply(assays(se), class)

              counts counts_length_scaled        counts_scaled 
        "data.frame"         "data.frame"         "data.frame" 
             lengths                  tpm 
        "data.frame"         "data.frame"

sapply(assays(se), mode)

              counts counts_length_scaled        counts_scaled 
              "list"               "list"               "list" 
             lengths                  tpm 
              "list"               "list"

Let’s convert assay(se, "counts") into a numeric matrix and round count data.

# Extract the counts assay as a matrix
counts_mat <- as.matrix(assay(se, "counts"))
assay(se, "counts") <- round(counts_mat)

# Keep only the 'counts' assay
assays(se) <- assays(se)["counts"]

# Check class and mode of each assay in se
sapply(assays(se), class)

     counts  
[1,] "matrix"
[2,] "array"

sapply(assays(se), mode)

   counts 
"numeric"

Now, we’ll remove genes with less than 10 reads in all samples. This practice will save computational resources when conducting the DE analysis. It is preferable to run this task before converting from se to a DESeqDataSet due to memory constraints.

gene_count_10plus <- rowSums(assay(se)) > 10
se_filtered <- se[gene_count_10plus, ]
se_filtered

class: SummarizedExperiment 
dim: 37865 42 
metadata(8): abstract Accession ... Submission Relevance
assays(1): counts
rownames(37865): ENSG00000000003.17 ENSG00000000005.6 ...
  ENSG00000310592.1 ENSG00000310593.1
rowData names(3): transcript_id gene_id gene_name
colnames(42): srr17866817 srr17866818 ... srr17866857 srr17866858
colData names(32): age_at_surgery tissue ... source_name condition

Finally, we construct DESeqDataSet without considering any variable as part of the experimental design.

# Create a DESeqDataSet data object
deseq2_raw <- DESeqDataSet(se_filtered, design = ~1)

# Display first lines of DESeqDataSet data object
class(deseq2_raw)

[1] "DESeqDataSet"
attr(,"package")
[1] "DESeq2"

head(counts(deseq2_raw)[, 1:5], n = 3)

                   srr17866817 srr17866818 srr17866819 srr17866820 srr17866821
ENSG00000000003.17        5220        2132         503        1008        3229
ENSG00000000005.6           40          18           6           1         321
ENSG00000000419.15        2177         670         772        1078        2621

In essence, we have the same data but now it is a DESeqDataSet data object.

2 Exploratory Data Analyisis (EDA)

In this section, we explore in more detail the content of the DESeqDataSet data object.

# Display DESeqDataSet data object
deseq2_raw

class: DESeqDataSet 
dim: 37865 42 
metadata(9): abstract Accession ... Relevance version
assays(1): counts
rownames(37865): ENSG00000000003.17 ENSG00000000005.6 ...
  ENSG00000310592.1 ENSG00000310593.1
rowData names(3): transcript_id gene_id gene_name
colnames(42): srr17866817 srr17866818 ... srr17866857 srr17866858
colData names(32): age_at_surgery tissue ... source_name condition

# Show first lines
head(counts(deseq2_raw)[, 1:5], n = 3)

                   srr17866817 srr17866818 srr17866819 srr17866820 srr17866821
ENSG00000000003.17        5220        2132         503        1008        3229
ENSG00000000005.6           40          18           6           1         321
ENSG00000000419.15        2177         670         772        1078        2621

# Validate NAs
X <- counts(deseq2_raw)
X <- as.data.frame(X)  # convert to data.frame 
print(table(is.na(X)))


  FALSE 
1590330

The dataset comprises 37.865 genes and 42 samples, for a total of 1.590.330 read counts, and contains no missing values (NAs).

2.1 Normalization by sequencing depth

To compare raw counts across 42 samples in a DESeq2 analysis, we need to account for library size differences. This is where normalization by sequencing depth comes handy to correct for differences in total read counts per sample. This is the baseline normalization for differential expression testing.

Let’s calculate the size factor per sample.

# Display raw counts
head(counts(deseq2_raw)[, 1:5], 5)

                   srr17866817 srr17866818 srr17866819 srr17866820 srr17866821
ENSG00000000003.17        5220        2132         503        1008        3229
ENSG00000000005.6           40          18           6           1         321
ENSG00000000419.15        2177         670         772        1078        2621
ENSG00000000457.15         417         352         355         309         448
ENSG00000000460.18         385          88         134         187         681

# Sum all counts per sample counts(deseq2_raw) is equivalent to assay(se,
# 'counts') from SummarizedExperiment print(colSums(counts(deseq2_raw))[1:5])

# Calculate the size factor and add it to the dataset It normalizes for
# sequencing depth differences to compare expression between samples fairly
# DESeq2 will automatically account for these depth differences in all
# downstream analyses using these size factors.
deseq2_raw <- estimateSizeFactors(deseq2_raw)
print(sizeFactors(deseq2_raw)[1:5])

srr17866817 srr17866818 srr17866819 srr17866820 srr17866821 
  1.1980151   0.6088379   0.9402656   0.9075319   0.8583443

Remember that,

Size factor = 1.0: Average sequencing depth
Size factor < 1.0: Lower than average sequencing depth
Size factor > 1.0: Higher than average sequencing depth

The sizeFactor column was added to the coldData field of the DESeqDataSet data object.

# Check that the sizeFactor column was added to colData
deseq2_raw

class: DESeqDataSet 
dim: 37865 42 
metadata(9): abstract Accession ... Relevance version
assays(1): counts
rownames(37865): ENSG00000000003.17 ENSG00000000005.6 ...
  ENSG00000310592.1 ENSG00000310593.1
rowData names(3): transcript_id gene_id gene_name
colnames(42): srr17866817 srr17866818 ... srr17866857 srr17866858
colData names(33): age_at_surgery tissue ... condition sizeFactor

Now, all raw counts across 42 samples have accounted for library size differences.

# Now take a look at the sum of the total depth after normalization
# print(colSums(counts(deseq2_raw, normalized=TRUE))[1:5])

# To retrieve normalized read counts
counts_normalized <- counts(deseq2_raw, normalized = TRUE)

# Display normalized counts
head(counts_normalized[, 1:5], 5)

                   srr17866817 srr17866818 srr17866819 srr17866820 srr17866821
ENSG00000000003.17  4357.20708  3501.75301  534.955243  1110.70476   3761.8938
ENSG00000000005.6     33.38856    29.56452    6.381176     1.10189    373.9758
ENSG00000000419.15  1817.17238  1100.45709  821.044627  1187.83704   3053.5533
ENSG00000000457.15   348.07574   578.15059  377.552905   340.48390    521.9351
ENSG00000000460.18   321.36489   144.53765  142.512927   206.05336    793.3879

2.2 Data transformation

For RNA-seq counts, the expected variance grows with the mean, making low-count genes contribute excessive noise or making high-count genes dominate. Therefore, it is required to implement variance stabilization so that each gene contributes equally to overall variance patterns making sample relationships and biological signals easier to detect.

2.2.1 Log2 transformation

It consists of a raw counts transformation using the formula log2(count + 1). It is useful for reducing the effect of large counts making distributions more symmetric. Use this transformation for a quick comparisons across samples and conducting exploratory visualization with PCA and heatmaps.

2.2.2 Variance Stabilizing Transformation (VST) and Regularized-logarithm Transformation (rlog)

DESeq2 offers two transformations to stabilize variance and thus transform data to become approximately homoskedastic (to see a flat trend in the meanSdPlot): the variance stabilizing transformation (VST) for negative binomial data with a dispersion-mean trend (Anders and Huber 2010), and the regularized-logarithm transformation or rlog (Love, Huber, and Anders 2014).

“VST is much faster to compute and is less sensitive to high count outliers than the rlog. Also, VST is recommended for medium-to-large datasets (n > 30). rlog tends to work well on small datasets (n < 30), potentially outperforming VST when there is a wide range of sequencing depth across samples (an order of magnitude difference.)”

# Implement VST
counts_vst <- vst(deseq2_raw, blind = TRUE)
# counts_vst is a DESeq2 object type head(assay(counts_vst), 3)

# Implement rlog
counts_rlog <- rlog(deseq2_raw, blind = TRUE)
# counts_rlog is a DESeq2 object type head(assay(counts_rlog), 3)

2.3 PCA & Hierarchical Clustering

We’ll continue an Exploratory Data Analysis (EDA) considering variance-stabilized-transformed counts.

# Plot PCA on condition type plotPCA(counts_vst) plotPCA(counts_vst,
# intgroup=c('tissue','condition'))

# Plot PCA on tissue type plotPCA(counts_vst, intgroup='tissue')

# Plot PCA on both condition and tissue type plotPCA(counts_vst,
# intgroup=c('condition', 'tissue'))

Importance of components:
                           PC1     PC2      PC3      PC4      PC5     PC6
Standard deviation     74.4640 65.5317 52.96238 48.20994 39.26859 36.6645
                            PC7      PC8      PC9     PC10     PC11     PC12
Standard deviation     34.71566 31.20532 31.02371 29.02386 28.41309 27.97506
                           PC13     PC14     PC15     PC16     PC17     PC18
Standard deviation     27.76684 26.90759 26.30617 26.15321 25.27805 24.75281
                           PC19     PC20    PC21     PC22     PC23     PC24
Standard deviation     24.39675 24.34059 23.7550 23.19946 23.08138 22.86886
                           PC25     PC26     PC27     PC28     PC29     PC30
Standard deviation     22.56182 22.42150 21.81255 21.56427 21.26939 21.20621
                           PC31     PC32     PC33     PC34     PC35     PC36
Standard deviation     20.47515 20.16900 19.98894 19.78820 19.61382 19.56957
                           PC37     PC38     PC39     PC40     PC41      PC42
Standard deviation     19.30610 18.98985 18.75413 18.31402 17.54750 1.949e-13
 [ reached 'max' / getOption("max.print") -- omitted 2 rows ]

PC1 and PC2 components explains approximately 26% of the total variability. We also identified 3 potential outliers from both conditions, which corresponds to rectum samples.

The Mahalanobis distance (MD) can identify outliers beyond the 99.9th percentile. According to this criterion, only the srr17866844 sample appears to be a potential outlier. However, we decided not to remove it since its paired sample, srr17866843, does not represent an outlier for the tumor_eocrc condition.

# Detect outliers in PCA
pc_scores <- pca$x[, 1:5]  # Use first PCs
d <- mahalanobis(pc_scores, colMeans(pc_scores), cov(pc_scores))

# Set threshold at 99.9 percentile
thr <- quantile(d, 0.999)

# Enumerate outliers
outliers <- names(d[d > thr])
outliers

[1] "srr17866844"

Before conducting the DESeq2 analysis, let’s build the design matrix and evaluate which variables are potential confounded ones (condition, tissue, age_at_surgery).

Since each tissue has distinct transcriptional profiles and the normal vs tumor comparison is made within the same tissue, tissue is a potential confounded variable.

3 Differential gene expression with DESeq2

Now after the data exploration we can run the differential expression pipeline on the raw counts with a single call to the function DESeq. The null hypothesis is that there is no systematic difference between the average read count values of the different conditions for a given gene. We will calculate the fold change of read counts, assuming de differences in sequencing depth and variability. We will test normal tissue versus cancerous tissue (normal used as denominator for the fold change calculation)

# Run DGE DESeq2 pipeline Display se_filtered se_filtered

# Create a DESeqDataSet data object
dds <- DESeqDataSet(se_filtered, design = ~condition)

# Account for sequencing depth differences to compare expression between
# samples fairly
dds <- estimateSizeFactors(dds)
dds

class: DESeqDataSet 
dim: 37865 42 
metadata(9): abstract Accession ... Relevance version
assays(1): counts
rownames(37865): ENSG00000000003.17 ENSG00000000005.6 ...
  ENSG00000310592.1 ENSG00000310593.1
rowData names(3): transcript_id gene_id gene_name
colnames(42): srr17866817 srr17866818 ... srr17866857 srr17866858
colData names(33): age_at_surgery tissue ... condition sizeFactor

# Run DGE DESeq2 pipeline
dds_DGE <- DESeq(dds)
dds_DGE

class: DESeqDataSet 
dim: 37865 42 
metadata(9): abstract Accession ... Relevance version
assays(6): counts mu ... replaceCounts replaceCooks
rownames(37865): ENSG00000000003.17 ENSG00000000005.6 ...
  ENSG00000310592.1 ENSG00000310593.1
rowData names(26): transcript_id gene_id ... maxCooks replace
colnames(42): srr17866817 srr17866818 ... srr17866857 srr17866858
colData names(34): age_at_surgery tissue ... sizeFactor replaceable

saveRDS(dds_DGE, file = "dds_DGE_42.rds")

We proceed to extract results obtained from the DESeq2 pipeline to obtain the base means across samples, the log2 fold variation, standard errors, p-value and p-adjusted.

# Get results from the DESeq2 pipeline
dds_DGE_results <- results(dds_DGE)
head(dds_DGE_results, n = 5)

log2 fold change (MLE): condition tumor eocrc vs normal eocrc 
Wald test p-value: condition tumor eocrc vs normal eocrc 
DataFrame with 5 rows and 6 columns
                    baseMean log2FoldChange     lfcSE      stat      pvalue
                   <numeric>      <numeric> <numeric> <numeric>   <numeric>
ENSG00000000003.17 2857.2111       0.161688  0.271413  0.595726 5.51358e-01
ENSG00000000005.6    51.2191       1.327558  0.446539  2.972994 2.94910e-03
ENSG00000000419.15 1670.0659       1.228388  0.221125  5.555184 2.77320e-08
ENSG00000000457.15  485.0428      -0.355986  0.147088 -2.420218 1.55112e-02
ENSG00000000460.18  223.4476       1.387779  0.206574  6.718087 1.84126e-11
                          padj
                     <numeric>
ENSG00000000003.17 6.90013e-01
ENSG00000000005.6  1.27134e-02
ENSG00000000419.15 7.86926e-07
ENSG00000000457.15 4.83895e-02
ENSG00000000460.18 1.66796e-09

# Inspect the meaning of the columns from the results object
mcols(dds_DGE_results, use.names = TRUE)

DataFrame with 6 rows and 2 columns
                       type            description
                <character>            <character>
baseMean       intermediate mean of normalized c..
log2FoldChange      results log2 fold change (ML..
lfcSE               results standard error: cond..
stat                results Wald statistic: cond..
pvalue              results Wald test p-value: c..
padj                results   BH adjusted p-values

The results object contains the following features: baseMean, log2FoldChange, lfcSE, stat, pvalue, and padj. Let’s get a summary of the results object. Now, let’s obtain a summary of the results object.

# Summary of DGE
summary(dds_DGE_results)


out of 37827 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 6942, 18%
LFC < 0 (down)     : 6130, 16%
outliers [1]       : 0, 0%
low counts [2]     : 4438, 12%
(mean count < 0)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

After filtering those genes with p-adjusted < 0.05, we obtained 10.806 significant genes.

# Filter genes with p-adjusted value < 0.05 The results object behaves like a
# dataframe
table(dds_DGE_results$padj < 0.05)


FALSE  TRUE 
22621 10806

and when we order those significant genes by p-adjusted value:

# Select significant genes with a p-adjusted value lower than 0.05
resSig <- subset(dds_DGE_results, padj < 0.05)

# Order significant genes by p-adjusted value
resSig <- resSig[order(resSig$padj), ]

head(resSig, n = 5)

log2 fold change (MLE): condition tumor eocrc vs normal eocrc 
Wald test p-value: condition tumor eocrc vs normal eocrc 
DataFrame with 5 rows and 6 columns
                    baseMean log2FoldChange     lfcSE      stat      pvalue
                   <numeric>      <numeric> <numeric> <numeric>   <numeric>
ENSG00000105464.5    140.904        4.17270  0.354597   11.7674 5.74440e-32
ENSG00000164283.13   109.783        5.23269  0.453264   11.5445 7.87246e-31
ENSG00000175832.14   726.727        4.37081  0.398346   10.9724 5.18803e-28
ENSG00000107159.15   388.768        5.63686  0.553410   10.1857 2.29728e-24
ENSG00000163347.6    625.018        4.50301  0.441776   10.1930 2.13172e-24
                          padj
                     <numeric>
ENSG00000105464.5  1.92018e-27
ENSG00000164283.13 1.31576e-26
ENSG00000175832.14 5.78068e-24
ENSG00000107159.15 1.02868e-20
ENSG00000163347.6  1.02868e-20

Let’s check those genes with the strongest down-regulation and up-regulation, respectively.

# Order significant genes with the strongest down-regulation
head(resSig[order(resSig$log2FoldChange), ])

log2 fold change (MLE): condition tumor eocrc vs normal eocrc 
Wald test p-value: condition tumor eocrc vs normal eocrc 
DataFrame with 6 rows and 6 columns
                     baseMean log2FoldChange     lfcSE      stat      pvalue
                    <numeric>      <numeric> <numeric> <numeric>   <numeric>
ENSG00000113196.4    24.44350       -6.73467  1.049744  -6.41553 1.40332e-10
ENSG00000167916.5    73.04398       -6.15290  0.880134  -6.99086 2.73197e-12
ENSG00000286765.2     5.62746       -5.88870  1.159520  -5.07856 3.80300e-07
ENSG00000172410.5  2143.05931       -5.59589  0.943165  -5.93310 2.97272e-09
ENSG00000167281.21   18.20558       -5.46715  0.797529  -6.85512 7.12544e-12
ENSG00000168748.14 1718.92263       -5.26274  0.657795  -8.00059 1.23828e-15
                          padj
                     <numeric>
ENSG00000113196.4  9.03832e-09
ENSG00000167916.5  3.47230e-10
ENSG00000286765.2  7.17398e-06
ENSG00000172410.5  1.17876e-07
ENSG00000167281.21 7.70815e-10
ENSG00000168748.14 4.54858e-13

# Order significant genes with the strongest up-regulation
head(resSig[order(resSig$log2FoldChange, decreasing = TRUE), ])

log2 fold change (MLE): condition tumor eocrc vs normal eocrc 
Wald test p-value: condition tumor eocrc vs normal eocrc 
DataFrame with 6 rows and 6 columns
                    baseMean log2FoldChange     lfcSE      stat      pvalue
                   <numeric>      <numeric> <numeric> <numeric>   <numeric>
ENSG00000237550.8    17.3719       23.45433   2.92000   8.03231 9.56526e-16
ENSG00000184956.16   18.8421        7.59803   1.35908   5.59057 2.26321e-08
ENSG00000270372.2    16.9893        7.14581   1.22739   5.82198 5.81561e-09
ENSG00000170373.9   396.5527        7.03499   0.68932  10.20569 1.86982e-24
ENSG00000129455.17   44.2615        6.66333   1.00340   6.64077 3.12050e-11
ENSG00000108244.17  683.7777        6.42894   0.64485   9.96967 2.06915e-23
                          padj
                     <numeric>
ENSG00000237550.8  3.67515e-13
ENSG00000184956.16 6.61876e-07
ENSG00000270372.2  2.08135e-07
ENSG00000170373.9  1.02868e-20
ENSG00000129455.17 2.55034e-09
ENSG00000108244.17 6.28777e-20

# Save significant DGE results to file
write.table(resSig, file = "DESeq2_sig_genes_normal_vs_cancer_42_eocrc.tab", sep = "\t",
    quote = FALSE, row.names = FALSE)

3.0.1 Visualization of results

3.0.1.1 Individual gene count plot

To visualize the counts for ENSG00000237550.8 gene, which showed the lowest p-adjusted value of 3.67515e-13 and strongest up-regulation log2FoldChange of 23.45433, we’ll use the plotCounts function over the feature condition to get the normalized counts for this gene.

3.0.1.2 Histogram of p-value frecuencies

3.0.1.3 MA-plot

*An MA-plot (Dudoit et al. 2002) provides a useful overview for the distribution of the estimated coefficients in the model, e.g. the comparisons of interest, across all genes. On the y-axis, the “M” stands for “minus” – subtraction of log values is equivalent to the log of the ratio – and on the x-axis, the “A” stands for “average”. This plot allows us to evaluate fold changes and the distribution around the mean expression.

Before making the MA-plot, we use the lfcShrink function to shrink the log2 fold changes for the comparison of dex treated vs untreated samples. There are three types of shrinkage estimators in DESeq2, which are covered in the DESeq2 vignette. log2 fold shrinkage remove the noise preserving the large differences.*

[1] "Intercept"                            
[2] "condition_tumor_eocrc_vs_normal_eocrc"

3.0.1.4 Volcano plot

Volcano plot helps to visualize significant differential expressed genes showing the log2 fold change and the significance (p-adjusted).

To export a .tex table having the top significantly differentiated genes of the EOCRC cohort

4 Annotation of results

In this section, we will add the gene symbol, gene name, KEGG pathway, and Entrez ID to our dataset to provide more informative annotations for the DESeq analysis, which currently uses ENSEMBL gene ID nomenclature (e.g., ENSG00000175832).

# Load modules and libraries
BiocManager::install("AnnotationDbi")
BiocManager::install("org.Hs.eg.db")
library("AnnotationDbi")
library("org.Hs.eg.db")

# Check all potential terms that could be retrieved with AnnotationDbi using
# mapsIds()
columns(org.Hs.eg.db)

 [1] "ACCNUM"       "ALIAS"        "ENSEMBL"      "ENSEMBLPROT"  "ENSEMBLTRANS"
 [6] "ENTREZID"     "ENZYME"       "EVIDENCE"     "EVIDENCEALL"  "GENENAME"    
[11] "GENETYPE"     "GO"           "GOALL"        "IPI"          "MAP"         
[16] "OMIM"         "ONTOLOGY"     "ONTOLOGYALL"  "PATH"         "PFAM"        
[21] "PMID"         "PROSITE"      "REFSEQ"       "SYMBOL"       "UCSCKG"      
[26] "UNIPROT"

# Review the most significant genes from our study Previously, we've saved a in
# a .tab file the significant DGE results file =
# 'DESeq2_sig_genes_normal_vs_cancer_42_eocrc.tab'
head(resSig)

log2 fold change (MLE): condition tumor eocrc vs normal eocrc 
Wald test p-value: condition tumor eocrc vs normal eocrc 
DataFrame with 6 rows and 6 columns
                    baseMean log2FoldChange     lfcSE      stat      pvalue
                   <numeric>      <numeric> <numeric> <numeric>   <numeric>
ENSG00000105464.5    140.904        4.17270  0.354597   11.7674 5.74440e-32
ENSG00000164283.13   109.783        5.23269  0.453264   11.5445 7.87246e-31
ENSG00000175832.14   726.727        4.37081  0.398346   10.9724 5.18803e-28
ENSG00000107159.15   388.768        5.63686  0.553410   10.1857 2.29728e-24
ENSG00000163347.6    625.018        4.50301  0.441776   10.1930 2.13172e-24
ENSG00000170373.9    396.553        7.03499  0.689320   10.2057 1.86982e-24
                          padj
                     <numeric>
ENSG00000105464.5  1.92018e-27
ENSG00000164283.13 1.31576e-26
ENSG00000175832.14 5.78068e-24
ENSG00000107159.15 1.02868e-20
ENSG00000163347.6  1.02868e-20
ENSG00000170373.9  1.02868e-20

We proceed by adding annotations starting from ENSEMBL to Entrez and from there to KEGG pathways, gene symbols and gene names. This is because, all major annotation databases (KEGG, org.Hs.eg.db, clusterProfiler) are indexed by Entrez, so doing the conversion first dramatically reduces NA values and gives more complete annotations.

# Strip version
rownames(resSig) <- sub("\\..*$", "", rownames(resSig))

# Add gene symbol column
resSig$SYMBOL = mapIds(org.Hs.eg.db, keys = rownames(resSig), column = "SYMBOL",
    keytype = "ENSEMBL", multiVals = "first")

# Add gene name column
resSig$GENENAME = mapIds(org.Hs.eg.db, keys = rownames(resSig), column = "GENENAME",
    keytype = "ENSEMBL", multiVals = "first")

# Add gene Entrez ID column
resSig$ENTREZID = mapIds(org.Hs.eg.db, keys = rownames(resSig), column = "ENTREZID",
    keytype = "ENSEMBL", multiVals = "first")

# Add gene EGG functional pathway column
resSig$PATH = mapIds(org.Hs.eg.db, keys = rownames(resSig), column = "PATH", keytype = "ENSEMBL",
    multiVals = "first")

# Add gene ENSEMBL ID column
resSig$ENSEMBL <- rownames(resSig)

# Convert to dataframe
resSig_df <- as.data.frame(resSig)
print(head(resSig_df), 5)

                baseMean log2FoldChange     lfcSE     stat       pvalue
ENSG00000105464 140.9041       4.172698 0.3545970 11.76744 5.744397e-32
ENSG00000164283 109.7826       5.232688 0.4532636 11.54447 7.872463e-31
ENSG00000175832 726.7266       4.370810 0.3983460 10.97239 5.188033e-28
ENSG00000107159 388.7679       5.636863 0.5534101 10.18569 2.297276e-24
                        padj SYMBOL
ENSG00000105464 1.920180e-27 GRIN2D
ENSG00000164283 1.315764e-26   ESM1
ENSG00000175832 5.780679e-24   ETV4
ENSG00000107159 1.028679e-20    CA9
                                                          GENENAME ENTREZID
ENSG00000105464 glutamate ionotropic receptor NMDA type subunit 2D     2906
ENSG00000164283               endothelial cell specific molecule 1    11082
ENSG00000175832                 ETS variant transcription factor 4     2118
ENSG00000107159                               carbonic anhydrase 9      768
                 PATH         ENSEMBL
ENSG00000105464 04020 ENSG00000105464
ENSG00000164283  <NA> ENSG00000164283
ENSG00000175832  <NA> ENSG00000175832
ENSG00000107159 00910 ENSG00000107159
 [ reached 'max' / getOption("max.print") -- omitted 2 rows ]

# Save data frame including annotations
write.table(resSig_df, file = ("DESeq2_sig_genes_normal_vs_cancer_42_eocrc_annotated.tab"),
    row.names = F, col.names = TRUE, sep = "\t")

They often reflect: biological heterogeneity, batch effects, cell-type composition differences, high expression strength, and technical noise.

4.1 Functional Gene Ontology (GO) Analysis

# Load modules and libraries
BiocManager::install("clusterProfiler")
BiocManager::install("ggnewscale")

library("clusterProfiler")
library("enrichplot")
library("ggnewscale")

                   ID                                 Description Count
GO:0001776 GO:0001776                       leukocyte homeostasis    49
GO:0002200 GO:0002200 somatic diversification of immune receptors    31
GO:0002252 GO:0002252                     immune effector process   251
GO:0002253 GO:0002253               activation of immune response   222
GO:0002262 GO:0002262                    myeloid cell homeostasis    81
GO:0002339 GO:0002339                            B cell selection     2
           GeneRatio
GO:0001776   49/8835
GO:0002200   31/8835
GO:0002252  251/8835
GO:0002253  222/8835
GO:0002262   81/8835
GO:0002339    2/8835

# Convert to dataframe
ego_df <- as.data.frame(ego)

# Visualize the last columns
head(ego_df, 5)

                   ID                          Description GeneRatio   BgRatio
GO:0042254 GO:0042254                  ribosome biogenesis  237/6884 318/18860
GO:0022613 GO:0022613 ribonucleoprotein complex biogenesis  322/6884 496/18860
GO:0006364 GO:0006364                      rRNA processing  168/6884 223/18860
GO:0016072 GO:0016072               rRNA metabolic process  188/6884 270/18860
           RichFactor FoldEnrichment   zScore       pvalue     p.adjust
GO:0042254  0.7452830       2.041842 14.20564 5.300063e-44 3.376670e-40
GO:0022613  0.6491935       1.778587 13.32257 1.084468e-38 3.454574e-35
GO:0006364  0.7533632       2.063979 12.11773 1.673786e-32 3.554564e-29
GO:0016072  0.6962963       1.907633 11.38876 9.473546e-29 1.508899e-25
                 qvalue
GO:0042254 2.814612e-40
GO:0022613 2.879549e-35
GO:0006364 2.962895e-29
GO:0016072 1.257738e-25
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            geneID
GO:0042254                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                RPP40/DKC1/NOP58/NOB1/NLE1/UTP14A/RPUSD4/UTP25/NOL8/RRP9/DCAF13/RCL1/WDR43/DDX10/BRIX1/PRKDC/BOP1/MTERF3/WDR75/EXOSC2/TMA16/EXOSC3/GTPBP10/WDR12/RPF2/NPM1/GTF3A/RPS21/DDX31/BYSL/EXOSC5/NOP56/UTP4/NPM3/NAT10/RIOX2/GTPBP4/NHP2/PNO1/HEATR1/EIF1AX/PA2G4/RRP15/MRM2/EXOSC7/LYAR/FBL/EXOSC8/DDX56/UTP18/NOLC1/RAN/DIS3/DDX27/NMD3/WDR74/TRMT61B/WDR3/RRS1/AIRIM/RRP1B/GAR1/NSUN5/NOP2/DDX18/EIF2A/RPUSD1/DIMT1/XRN2/PAK1IP1/RBIS/NOP16/TFB1M/METTL5/UTP23/TRMT112/BUD23/PES1/MRPS2/IMP4/RPL14/CUL4A/RPL7A/RPL35/RIOK1/XPO1/EMG1/NOL6/RPS6/EBNA1BP2/PDCD11/SRFBP1/PIH1D2/RPL7L1/NOL10/UTP20/MALSU1/SUV39H1/RPL10L/RPL7/URB2/RPS7/ZNF593/PWP1/LTV1/RPS16/RPS19/MTG2/NOP14/RIOK3/NOM1/RRP1/RNASEL/MPHOSPH10/WDR46/GEMIN4/CHD7/RPSA/RPS15/LAS1L/MAK16/UTP15/MRTO4/RPLP0/NUDT16/CDKN2A/SIRT7/PPAN/RPL26L1/WDR18/POP7/RPS5/TFB2M/RPL35A/EXOSC4/DDX47/REXO5/TSR1/WDR36/RPL38/REXO4/RSL24D1/SNU13/RPL24/UTP6/SPOUT1/RPS13/ZNHIT6/RPS12/DDX55/RPLP0P6/NIP7/RPS15A/FASTKD2/DDX21/KRR1/RPL27/SDAD1/AATF/DDX52/ISG20/ERI1/RPS17/MYBBP1A/RPS4X/UTP11/EIF4A3/NOL11/URB1/KAT2B/GNL2/RPS27A/AFG2A/C1QBP/RPS28/RPS25/NOL7/GNL3L/GREB1L/ZNHIT3/EXOSC9/RPP30/RPUSD2/DDX28/DHX37/NOC2L/EIF6/RPS14/RPS8/RPS24/ESF1/SURF6/HEATR3/RPS23/BMS1/NOC4L/MCAT/RPL26/POP5/RRP7A/EXOSC1/TBL3/NAF1/PIN4/RPS9/GTF2H5/ABT1/DDX49/RPS27/USP36/NSA2/TENT4B/RPL5/MTREX/ZCCHC4/EIF5B/NUP88/ERCC2/DDX54/RPS3A/DHX30/CUL4B/TRMT2B/MDN1/DROSHA/LSM6/RPL11/RPSA2/MYG1/RBFA/RBM34/FAU/ISG20L2/NOA1/RPS11/ERAL1/EIF5
GO:0022613 RPP40/DKC1/NOP58/NOB1/MCTS2/NLE1/UTP14A/RPUSD4/UTP25/NOL8/RRP9/CLNS1A/DCAF13/HSP90AB1/RCL1/RUVBL1/NUFIP1/EIF2S2/WDR43/DDX10/BRIX1/PRKDC/BOP1/WDR77/MTERF3/WDR75/EXOSC2/TMA16/EXOSC3/GTPBP10/WDR12/RPF2/NPM1/GTF3A/RPS21/DDX31/BYSL/EXOSC5/SRPK1/NOP56/UTP4/NPM3/NAT10/RIOX2/SNRPF/GTPBP4/NHP2/PNO1/HEATR1/EIF3B/EIF1AX/HSP90AB3P/PA2G4/RRP15/MRM2/EXOSC7/LYAR/FBL/EXOSC8/DDX56/UTP18/NOLC1/SRSF5/SNRPE/RAN/DIS3/DDX27/ATR/STRAP/NMD3/EIF3E/WDR74/NOPCHAP1/TRMT61B/WDR3/RRS1/AIRIM/EIF3M/RRP1B/GAR1/NSUN5/PRPF19/NOP2/DDX18/SNRPB/EIF2A/SRPK2/RPUSD1/DIMT1/XRN2/PAK1IP1/RBIS/NOP16/TFB1M/EIF2S3/GEMIN6/METTL5/UTP23/TRMT112/SF3B3/BUD23/GEMIN5/SNRPC/PES1/MRPS2/IMP4/RPL14/CUL4A/EIF3H/RPL7A/RPL35/RIOK1/XPO1/SNRPD2/EMG1/NOL6/RPS6/EBNA1BP2/PDCD11/SRFBP1/PIH1D2/RPL7L1/NOL10/SNRPB2/UTP20/MALSU1/SUV39H1/RPL10L/RPL7/URB2/RPS7/ZNF593/PWP1/AAR2/LTV1/USP39/DENR/TGS1/PRKRA/SNRPG/PTGES3/RPS16/RPS19/MTG2/NOP14/RIOK3/NOM1/TAF9/RRP1/RNASEL/MPHOSPH10/WDR46/GEMIN4/CHD7/RPSA/SNRPD1/RPS15/LAS1L/MAK16/UTP15/SHQ1/MRTO4/RPLP0/NUDT16/CDKN2A/LSM2/SIRT7/TARBP2/PPAN/RPL26L1/MCTS1/RBMX2/EIF4B/WDR18/EIF3D/POP7/CELF3/RPS5/TFB2M/RPL35A/SMN1/EXOSC4/DDX47/REXO5/TSR1/WDR36/RPL38/REXO4/RSL24D1/SNU13/HSP90AA1/RPL24/UTP6/SNRPA1/SPOUT1/CELF4/RPS13/ZNHIT6/SF3A3/SNIP1/RPS12/DDX55/EIF3J/DDX20/RPLP0P6/NIP7/RPS15A/FASTKD2/DDX21/KRR1/RPL27/SDAD1/AATF/SNRPD3/RUVBL2/DDX52/AGO2/ISG20/GEMIN8/ERI1/RPS17/MYBBP1A/RPS4X/UTP11/LSM4/EIF4A3/NOL11/PRPF6/URB1/KAT2B/GNL2/RPS27A/AFG2A/C1QBP/RPS28/RPS25/NOL7/GNL3L/GREB1L/ZNHIT3/EXOSC9/RPP30/RPUSD2/DDX28/DHX37/NOC2L/PHF5A/EIF6/RBM5/RPS14/RPS8/RPS24/ESF1/SURF6/CELF6/HEATR3/PRMT5/RPS23/RNU2-1/BMS1/NOC4L/GEMIN2/RNU6-7/SF3B6/MCAT/RPL26/POP5/LUC7L2/RRP7A/EXOSC1/TBL3/NAF1/EIF3G/RPL13A/SRSF1/SRSF12/EIF3C/PIN4/RPS9/SNRNP200/GTF2H5/ABT1/NCBP1/DDX49/RPS27/USP36/CRNKL1/NSA2/TENT4B/SRSF10/RPL5/MTREX/RNVU1-4/ZCCHC4/EIF5B/CELF5/COIL/NUP88/ERCC2/DDX54/EIF3CL/RPS3A/RNU6ATAC/DHX30/CUL4B/TRMT2B/MDN1/DROSHA/DDX1/LSM6/RPL11/RPSA2/MYG1/RBFA/RBM34/FAU/ISG20L2/NOA1/RPS11/ERAL1/EIF5/RSRP1
GO:0006364                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             RPP40/DKC1/NOP58/NOB1/UTP14A/RPUSD4/UTP25/NOL8/RRP9/DCAF13/RCL1/WDR43/DDX10/BRIX1/PRKDC/BOP1/WDR75/EXOSC2/EXOSC3/WDR12/RPF2/RPS21/BYSL/EXOSC5/NOP56/UTP4/NPM3/NAT10/GTPBP4/NHP2/HEATR1/PA2G4/RRP15/MRM2/EXOSC7/LYAR/FBL/EXOSC8/DDX56/UTP18/NOLC1/DIS3/DDX27/WDR74/TRMT61B/WDR3/RRS1/RRP1B/GAR1/NSUN5/NOP2/DDX18/RPUSD1/DIMT1/XRN2/PAK1IP1/TFB1M/METTL5/UTP23/TRMT112/BUD23/PES1/IMP4/RPL14/RPL7A/RPL35/RIOK1/EMG1/NOL6/RPS6/EBNA1BP2/PDCD11/SRFBP1/PIH1D2/RPL7L1/NOL10/UTP20/SUV39H1/RPL7/RPS7/PWP1/RPS16/RPS19/NOP14/RIOK3/RRP1/RNASEL/MPHOSPH10/WDR46/GEMIN4/CHD7/RPS15/LAS1L/MAK16/UTP15/MRTO4/NUDT16/CDKN2A/SIRT7/PPAN/WDR18/POP7/TFB2M/RPL35A/EXOSC4/DDX47/REXO5/TSR1/WDR36/REXO4/SNU13/UTP6/SPOUT1/ZNHIT6/DDX55/DDX21/KRR1/RPL27/DDX52/ISG20/ERI1/RPS17/UTP11/EIF4A3/NOL11/URB1/KAT2B/RPS28/RPS25/NOL7/ZNHIT3/EXOSC9/RPP30/RPUSD2/DHX37/EIF6/RPS14/RPS8/RPS24/ESF1/BMS1/NOC4L/RPL26/POP5/RRP7A/EXOSC1/TBL3/NAF1/PIN4/GTF2H5/ABT1/DDX49/RPS27/USP36/NSA2/TENT4B/RPL5/MTREX/ZCCHC4/ERCC2/DDX54/TRMT2B/DROSHA/LSM6/RPL11/MYG1/RBFA/RBM34
GO:0016072                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    RPP40/DKC1/NOP58/NOB1/POLR1F/MYC/UTP14A/RPUSD4/UTP25/NOL8/RRP9/DCAF13/RCL1/WDR43/DDX10/BRIX1/PRKDC/BOP1/POLR1G/WDR75/EXOSC2/NIFK/EXOSC3/WDR12/RPF2/GTF3A/RPS21/BYSL/POLR1D/EXOSC5/NOP56/UTP4/NPM3/POLR1B/NAT10/GTPBP4/NHP2/HEATR1/PA2G4/RRP15/MRM2/EXOSC7/LYAR/FBL/EXOSC8/DDX56/UTP18/NOLC1/DIS3/DDX27/WDR74/TRMT61B/WDR3/RRS1/RRP1B/GAR1/NSUN5/GTF3C3/NOP2/DDX18/RPUSD1/MAPT/DIMT1/XRN2/PAK1IP1/TFB1M/METTL5/UTP23/TRMT112/BUD23/PES1/IMP4/RPL14/RPL7A/RPL35/RIOK1/NCL/POLR1A/EMG1/NOL6/RPS6/EBNA1BP2/PDCD11/SRFBP1/PIH1D2/RPL7L1/NOL10/UTP20/SUV39H1/POLR1E/RPL7/RPS7/PWP1/RPS16/RPS19/NOP14/RIOK3/RRP1/RNASEL/MPHOSPH10/WDR46/GTF3C4/GEMIN4/CHD7/RPS15/LAS1L/MAK16/UTP15/MRTO4/NUDT16/CDKN2A/SIRT7/PPAN/TCOF1/MARS1/WDR18/POP7/TFB2M/RPL35A/EXOSC4/DDX47/REXO5/TSR1/WDR36/REXO4/SNU13/UTP6/SPOUT1/ZNHIT6/TAF1B/DDX55/DDX21/KRR1/RPL27/DDX52/ISG20/ERI1/RPS17/UTP11/EIF4A3/NOL11/URB1/DDX11/KAT2B/RPS28/RPS25/NOL7/ZNHIT3/EXOSC9/RPP30/RPUSD2/DHX37/EIF6/RPS14/RPS8/RPS24/ESF1/BMS1/NOC4L/CAVIN1/RPL26/POP5/RRP7A/EXOSC1/TBL3/NAF1/PIN4/TP53/GTF2H5/ABT1/DDX49/RPS27/USP36/NSA2/TENT4B/RPL5/MTREX/ZCCHC4/ERCC2/ERN2/DDX54/TRMT2B/DROSHA/LSM6/RPL11/MYG1/RBFA/RBM34
           Count
GO:0042254   237
GO:0022613   322
GO:0006364   168
GO:0016072   188
 [ reached 'max' / getOption("max.print") -- omitted 1 rows ]

# Get dataframe dimmensions
dim(ego_df)

[1] 488  12

4.2 Gene-Concept Network and Enrichment Map

The Gene-Concept Network displays which genes are associated with which biological processes, making it easy to see gene-function relationships. Meanwhile, the enrichment map groups similar terms together. Terms that share genes are connected, making related functions easy to spot.

# Load modules and libraries
library(DOSE)

# Convert gene ID to Symbol
egox <- setReadable(ego, "org.Hs.eg.db", "ENTREZID")

# Generate geneList with Log2FoldChange data
geneList <- resSig$log2FoldChange
names(geneList) <- as.character(unique(resSig$ENTREZID))
geneList <- sort(geneList, decreasing = TRUE)

head(geneList)

   338094     81502     23333      1469      4085     25984 
23.454332  7.598031  7.145805  7.034988  6.663326  6.428938

# Network plot of enriched terms CategorySize can be scaled by 'pvalue' or
# 'geneNum'
p1 <- cnetplot(egox, categorySize = "pvalue", foldChange = geneList)

# Display the plot
print(p1)

# Calculate pairwise similarities of enriched terms
egox_pairwise <- pairwise_termsim(egox)

# Creates an enrichment map visualization from pairwise enrichment analysis
# results
p2 <- emapplot(egox_pairwise, layout = "kk")

# Display the plot
print(p2)

# Plot Gene-Concept Network and Enrichment Map in a PDF file

# Plot 1
pdf("gene_concept_network_plot.pdf", width = 12, height = 10)
print(p1)
dev.off()

quartz_off_screen 
                2

# Plot 2
pdf("enrichment_map.pdf", width = 12, height = 10)
print(p2)
dev.off()

quartz_off_screen 
                2

# # Start PDF device pdf(file=paste('Merge_GO_Gene_Set_Enrichment_Analysis',
# 'CC', '.pdf', sep='_'), width=25, height=10) # Create the plot
# cowplot::plot_grid(p1, p2, ncol=2, labels=LETTERS[1:3], rel_widths=c(.8, .8,
# 1.2)) # Close PDF device dev.off()

5 Comparison of results using Venn diagrams

# Get results from the DESeq2 pipeline by contrast
tumor_vs_normal_42_dds <- results(dds_DGE, contrast = c("condition", "tumor_eocrc",
    "normal_eocrc"))

summary(tumor_vs_normal_42_dds)


out of 37827 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 6942, 18%
LFC < 0 (down)     : 6130, 16%
outliers [1]       : 0, 0%
low counts [2]     : 4438, 12%
(mean count < 0)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

# Save results
write.csv(tumor_vs_normal_42_dds, "tumor_vs_normal_42_dds.csv")

Get differentially expressed genes by contrast tumor_eocrc_vs_normal_eocrc:

log2 fold change (MLE): condition tumor_eocrc vs normal_eocrc 
Wald test p-value: condition tumor eocrc vs normal eocrc 
DataFrame with 5 rows and 5 columns
                   log2FoldChange     lfcSE      stat      pvalue        padj
                        <numeric> <numeric> <numeric>   <numeric>   <numeric>
ENSG00000105464.5         4.17270  0.354597   11.7674 5.74440e-32 1.92018e-27
ENSG00000164283.13        5.23269  0.453264   11.5445 7.87246e-31 1.31576e-26
ENSG00000175832.14        4.37081  0.398346   10.9724 5.18803e-28 5.78068e-24
ENSG00000107159.15        5.63686  0.553410   10.1857 2.29728e-24 1.02868e-20
ENSG00000163347.6         4.50301  0.441776   10.1930 2.13172e-24 1.02868e-20

6 R Session Information

─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.5.1 (2025-06-13)
 os       macOS Ventura 13.7.3
 system   x86_64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       Europe/Madrid
 date     2026-02-03
 pandoc   3.2 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/x86_64/ (via rmarkdown)
 quarto   1.5.57 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/quarto

─ Packages ───────────────────────────────────────────────────────────────────
 package              * version    date (UTC) lib source
 abind                  1.4-8      2024-09-12 [1] CRAN (R 4.5.0)
 affy                   1.88.0     2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 affyio                 1.80.0     2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 AnnotationDbi        * 1.72.0     2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 ape                    5.8-1      2024-12-16 [1] CRAN (R 4.5.0)
 apeglm               * 1.32.0     2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 aplot                  0.2.9      2025-09-12 [1] CRAN (R 4.5.1)
 base64enc              0.1-3      2015-07-28 [1] CRAN (R 4.5.0)
 bbmle                  1.0.25.1   2023-12-09 [1] CRAN (R 4.5.0)
 bdsmatrix              1.3-7      2024-03-02 [1] CRAN (R 4.5.0)
 beeswarm               0.4.0      2021-06-01 [1] CRAN (R 4.5.0)
 Biobase              * 2.70.0     2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 BiocGenerics         * 0.56.0     2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 BiocManager          * 1.30.27    2025-11-14 [1] CRAN (R 4.5.1)
 BiocParallel           1.44.0     2025-10-30 [1] Bioconductor 3.22 (R 4.5.1)
 Biostrings             2.78.0     2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 bit                    4.6.0      2025-03-06 [1] CRAN (R 4.5.0)
 bit64                  4.6.0-1    2025-01-16 [1] CRAN (R 4.5.0)
 blob                   1.2.4      2023-03-17 [1] CRAN (R 4.5.0)
 bslib                  0.9.0      2025-01-30 [1] CRAN (R 4.5.0)
 cachem                 1.1.0      2024-05-16 [1] CRAN (R 4.5.0)
 cli                    3.6.5      2025-04-23 [1] CRAN (R 4.5.0)
 cluster                2.1.8.1    2025-03-12 [1] CRAN (R 4.5.1)
 clusterProfiler      * 4.18.4     2025-12-15 [1] Bioconductor 3.22 (R 4.5.2)
 coda                   0.19-4.1   2024-01-31 [1] CRAN (R 4.5.0)
 codetools              0.2-20     2024-03-31 [1] CRAN (R 4.5.1)
 corrplot             * 0.95       2024-10-14 [1] CRAN (R 4.5.0)
 cowplot                1.2.0      2025-07-07 [1] CRAN (R 4.5.1)
 crayon                 1.5.3      2024-06-20 [1] CRAN (R 4.5.0)
 data.table             1.17.8     2025-07-10 [1] CRAN (R 4.5.1)
 DBI                    1.2.3      2024-06-02 [1] CRAN (R 4.5.0)
 DelayedArray           0.36.0     2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 DESeq2               * 1.50.2     2025-11-17 [1] Bioconductor 3.22 (R 4.5.2)
 devtools               2.4.6      2025-10-03 [1] CRAN (R 4.5.1)
 digest                 0.6.39     2025-11-19 [1] CRAN (R 4.5.1)
 DOSE                 * 4.4.0      2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 dplyr                * 1.1.4      2023-11-17 [1] CRAN (R 4.5.0)
 ellipsis               0.3.2      2021-04-29 [1] CRAN (R 4.5.0)
 emdbook                1.3.14     2025-07-23 [1] CRAN (R 4.5.1)
 enrichplot           * 1.30.4     2025-12-01 [1] Bioconductor 3.22 (R 4.5.2)
 evaluate               1.0.5      2025-08-27 [1] CRAN (R 4.5.1)
 factoextra           * 1.0.7      2020-04-01 [1] CRAN (R 4.5.0)
 farver                 2.1.2      2024-05-13 [1] CRAN (R 4.5.0)
 fastmap                1.2.0      2024-05-15 [1] CRAN (R 4.5.0)
 fastmatch              1.1-6      2024-12-23 [1] CRAN (R 4.5.0)
 fgsea                  1.36.0     2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 fontBitstreamVera      0.1.1      2017-02-01 [1] CRAN (R 4.5.0)
 fontLiberation         0.1.0      2016-10-15 [1] CRAN (R 4.5.0)
 fontquiver             0.2.1      2017-02-01 [1] CRAN (R 4.5.0)
 formatR                1.14       2023-01-17 [1] CRAN (R 4.5.0)
 fs                     1.6.6      2025-04-12 [1] CRAN (R 4.5.0)
 gdtools                0.4.4      2025-10-06 [1] CRAN (R 4.5.1)
 generics             * 0.1.4      2025-05-09 [1] CRAN (R 4.5.0)
 GenomicRanges        * 1.62.1     2025-12-08 [1] Bioconductor 3.22 (R 4.5.2)
 ggbeeswarm           * 0.7.3      2025-11-29 [1] CRAN (R 4.5.1)
 ggforce                0.5.0      2025-06-18 [1] CRAN (R 4.5.0)
 ggfortify            * 0.4.19     2025-07-27 [1] CRAN (R 4.5.1)
 ggfun                  0.2.0      2025-07-15 [1] CRAN (R 4.5.1)
 ggiraph                0.9.2      2025-10-07 [1] CRAN (R 4.5.1)
 ggnewscale           * 0.5.2      2025-06-20 [1] CRAN (R 4.5.0)
 ggplot2              * 4.0.1      2025-11-14 [1] CRAN (R 4.5.1)
 ggplotify              0.1.3      2025-09-20 [1] CRAN (R 4.5.1)
 ggrepel              * 0.9.6      2024-09-07 [1] CRAN (R 4.5.0)
 ggtangle               0.0.9      2025-11-30 [1] CRAN (R 4.5.1)
 ggtree                 4.0.3      2025-12-18 [1] Bioconductor 3.22 (R 4.5.2)
 glue                   1.8.0      2024-09-30 [1] CRAN (R 4.5.0)
 GO.db                  3.22.0     2025-11-21 [1] Bioconductor
 GOSemSim               2.36.0     2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 gridExtra            * 2.3        2017-09-09 [1] CRAN (R 4.5.0)
 gridGraphics           0.5-1      2020-12-13 [1] CRAN (R 4.5.0)
 gson                   0.1.0      2023-03-07 [1] CRAN (R 4.5.0)
 gtable                 0.3.6      2024-10-25 [1] CRAN (R 4.5.0)
 hexbin               * 1.28.5     2024-11-13 [1] CRAN (R 4.5.0)
 htmltools              0.5.9      2025-12-04 [1] CRAN (R 4.5.1)
 htmlwidgets            1.6.4      2023-12-06 [1] CRAN (R 4.5.0)
 httr                   1.4.7      2023-08-15 [1] CRAN (R 4.5.0)
 igraph                 2.2.1      2025-10-27 [1] CRAN (R 4.5.1)
 IRanges              * 2.44.0     2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 jquerylib              0.1.4      2021-04-26 [1] CRAN (R 4.5.0)
 jsonlite               2.0.0      2025-03-27 [1] CRAN (R 4.5.0)
 KEGGREST               1.50.0     2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 knitr                * 1.51       2025-12-20 [1] CRAN (R 4.5.1)
 labeling               0.4.3      2023-08-29 [1] CRAN (R 4.5.0)
 lattice                0.22-7     2025-04-02 [1] CRAN (R 4.5.1)
 lazyeval               0.2.2      2019-03-15 [1] CRAN (R 4.5.0)
 lifecycle              1.0.4      2023-11-07 [1] CRAN (R 4.5.0)
 limma                  3.66.0     2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 locfit                 1.5-9.12   2025-03-05 [1] CRAN (R 4.5.0)
 magrittr               2.0.4      2025-09-12 [1] CRAN (R 4.5.1)
 MASS                   7.3-65     2025-02-28 [1] CRAN (R 4.5.1)
 Matrix                 1.7-4      2025-08-28 [1] CRAN (R 4.5.1)
 MatrixGenerics       * 1.22.0     2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 matrixStats          * 1.5.0      2025-01-07 [1] CRAN (R 4.5.0)
 memoise                2.0.1      2021-11-26 [1] CRAN (R 4.5.0)
 mvtnorm                1.3-3      2025-01-10 [1] CRAN (R 4.5.0)
 nlme                   3.1-168    2025-03-31 [1] CRAN (R 4.5.1)
 numDeriv               2016.8-1.1 2019-06-06 [1] CRAN (R 4.5.0)
 org.Hs.eg.db         * 3.22.0     2025-11-20 [1] Bioconductor
 otel                   0.2.0      2025-08-29 [1] CRAN (R 4.5.1)
 patchwork              1.3.2      2025-08-25 [1] CRAN (R 4.5.1)
 pheatmap             * 1.0.13     2025-06-05 [1] CRAN (R 4.5.0)
 pillar                 1.11.1     2025-09-17 [1] CRAN (R 4.5.1)
 pkgbuild               1.4.8      2025-05-26 [1] CRAN (R 4.5.0)
 pkgconfig              2.0.3      2019-09-22 [1] CRAN (R 4.5.0)
 pkgload                1.4.1      2025-09-23 [1] CRAN (R 4.5.1)
 plyr                   1.8.9      2023-10-02 [1] CRAN (R 4.5.0)
 png                    0.1-8      2022-11-29 [1] CRAN (R 4.5.0)
 polyclip               1.10-7     2024-07-23 [1] CRAN (R 4.5.0)
 preprocessCore         1.72.0     2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 purrr                  1.2.0      2025-11-04 [1] CRAN (R 4.5.1)
 qvalue                 2.42.0     2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 R.methodsS3            1.8.2      2022-06-13 [1] CRAN (R 4.5.0)
 R.oo                   1.27.1     2025-05-02 [1] CRAN (R 4.5.0)
 R.utils                2.13.0     2025-02-24 [1] CRAN (R 4.5.0)
 R6                     2.6.1      2025-02-15 [1] CRAN (R 4.5.0)
 rappdirs               0.3.3      2021-01-31 [1] CRAN (R 4.5.0)
 RColorBrewer         * 1.1-3      2022-04-03 [1] CRAN (R 4.5.0)
 Rcpp                   1.1.0      2025-07-02 [1] CRAN (R 4.5.1)
 remotes                2.5.0      2024-03-17 [1] CRAN (R 4.5.0)
 repr                   1.1.7      2024-03-22 [1] CRAN (R 4.5.0)
 reshape2             * 1.4.5      2025-11-12 [1] CRAN (R 4.5.1)
 rlang                  1.1.6      2025-04-11 [1] CRAN (R 4.5.0)
 rmarkdown              2.30       2025-09-28 [1] CRAN (R 4.5.1)
 RSQLite                2.4.5      2025-11-30 [1] CRAN (R 4.5.1)
 rstudioapi             0.17.1     2024-10-22 [1] CRAN (R 4.5.0)
 S4Arrays               1.10.1     2025-12-01 [1] Bioconductor 3.22 (R 4.5.2)
 S4Vectors            * 0.48.0     2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 S7                     0.2.1      2025-11-14 [1] CRAN (R 4.5.1)
 sass                   0.4.10     2025-04-11 [1] CRAN (R 4.5.0)
 scales               * 1.4.0      2025-04-24 [1] CRAN (R 4.5.0)
 scatterpie             0.2.6      2025-09-12 [1] CRAN (R 4.5.1)
 Seqinfo              * 1.0.0      2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 sessioninfo            1.2.3      2025-02-05 [1] CRAN (R 4.5.0)
 skimr                * 2.2.2      2026-01-10 [1] CRAN (R 4.5.1)
 SparseArray            1.10.8     2025-12-18 [1] Bioconductor 3.22 (R 4.5.2)
 statmod                1.5.1      2025-10-09 [1] CRAN (R 4.5.1)
 stringi                1.8.7      2025-03-27 [1] CRAN (R 4.5.0)
 stringr                1.6.0      2025-11-04 [1] CRAN (R 4.5.1)
 SummarizedExperiment * 1.40.0     2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 systemfonts            1.3.1      2025-10-01 [1] CRAN (R 4.5.1)
 tibble                 3.3.0      2025-06-08 [1] CRAN (R 4.5.0)
 tidydr                 0.0.6      2025-07-25 [1] CRAN (R 4.5.1)
 tidyr                * 1.3.2      2025-12-19 [1] CRAN (R 4.5.1)
 tidyselect             1.2.1      2024-03-11 [1] CRAN (R 4.5.0)
 tidytree               0.4.6      2023-12-12 [1] CRAN (R 4.5.0)
 treeio                 1.34.0     2025-10-30 [1] Bioconductor 3.22 (R 4.5.1)
 tweenr                 2.0.3      2024-02-26 [1] CRAN (R 4.5.0)
 usethis                3.2.1      2025-09-06 [1] CRAN (R 4.5.1)
 vctrs                  0.6.5      2023-12-01 [1] CRAN (R 4.5.0)
 vipor                  0.4.7      2023-12-18 [1] CRAN (R 4.5.0)
 vsn                  * 3.78.1     2026-01-13 [1] https://bioc-release.r-universe.dev (R 4.5.2)
 withr                  3.0.2      2024-10-28 [1] CRAN (R 4.5.0)
 xfun                   0.55       2025-12-16 [1] CRAN (R 4.5.1)
 XVector                0.50.0     2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 yaml                   2.3.12     2025-12-10 [1] CRAN (R 4.5.1)
 yulab.utils            0.2.3      2025-12-15 [1] CRAN (R 4.5.1)

 [1] /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library
 * ── Packages attached to the search path.

──────────────────────────────────────────────────────────────────────────────

Differential Gene Expression of 42 colorectal cancer samples in R

Salomón Marquez

2026-02-03