The content of this R notebook file includes all the codes requied for the Alternative PolyAdenylation (APA) analysis using 3p-Seq data. Refer to ‘PART 2- Alternative polyadenylation (APA) analysis using 3p-seq’ of the Protocol section. NOTE: To proceed with each step below to get the results, type the bash commands on Linux command-line then press Enter, or type the R codes in R console then press Enter.
Ensure the related tools and packages used for APA analysis are installed correctly to guarantee the successful running of the following pipeline. As the analysis is required to be performed both on the Linux command-line and in the R console, and perl is expected to be used for data processing, ensure all of the environments are prepared at the beginning.
condaconda is a popular and flexible package manager that allows convenient installation of packages with their dependencies across all platforms. To assist the further installation of related bash tools/packages required in this analysis, install conda at first by following the installing instruction of Anaconda (conda package manager) document at https://docs.anaconda.com/anaconda/install/. Ensure the system requirements and the installer are selected properly.
condaInstall all the required packages in this analysis using conda on Linux command-line.
conda install -c daler sratoolkit
conda install -c conda-forge parallel
conda install -c bioconda bowtie sratoolkit cutadapt samtools bedtools deeptools
Install all the R packages required in this analysis in R console if not installed.
bioc_packages <- c("DEXSeq", "Rsubread", "EnhancedVolcano", "edgeR", "limma", "GenomicRanges")
packages <- c("tidyverse", "BiocManager", "openxlsx")
#Install if not already installed
installed_packages <- packages %in% rownames(installed.packages())
installed_bioc_packages <- bioc_packages %in% rownames(installed.packages())
if (any(installed_packages == FALSE)) {
install.packages(packages[!installed_packages], dependencies = TRUE)
BiocManager::install(packages[!installed_bioc_packages], dependencies = TRUE)
}
Create the project directory APA_analysis and the raw data directory raw_data.
# Create and open the project directory
PROJ_DIR=APA_analysis/
mkdir $PROJ_DIR
cd $PROJ_DIR
# Create the raw data directory
RAW_DATA=raw_data
mkdir -p $RAW_DATA
Given the Sequence Read Archive [SRA] Accession Ids (1553129 to 1553136) in sequence, download the raw data in parallel from SRA using the prefetch command from SRA toolkit (v2.10.8) and the parallel command from GNU parallel utility. Save the rsa files in the directory raw_data.
# Open the raw_data directory for raw data storage
cd $RAW_DATA
# Download SRA files using accession numbers
seq 1553129 1553136 | parallel prefetch SRR{}
Extract fastq.gz files from the archive using the fastq-dump command from the SRA toolkit. Ensure all the paths and names of the rsa files are listed correctly.
# Convert sra files to fastq files for alignment
parallel -j 3 fastq-dump --gzip --skip-technical --read-filter pass --dumpbase --split-e --clip --origfmt {} ::: SRR1553129/SRR1553129.sra SRR1553130/SRR1553130.sra SRR1553131/SRR1553131.sra SRR1553132/SRR1553132.sra SRR1553133/SRR1553133.sra SRR1553134/SRR1553134.sra SRR1553135/SRR1553135.sra SRR1553136/SRR1553136.sra
Unzip the compressed fastq.gz files in parallel using the command gunzip to obtain fastq files.
# Unzip the fastq.gz data to gain fastq files
parallel “gunzip {}” :::$RAW_DATA/*.fastq.gz
# Return to the project directory for reference downloading
cd ..
Download the reference genome and annotations for Mouse (Genome assembly mm10) in the project directory using the command wget, then decompress the genome annotation file using the command unzip. Download the reference file for the mouse chromosome sizes from UCSC using the command wget. The reference genome file and its annotations (the bowtie index of mm10) here are saved in the directory mm10, and the the reference file for the mouse chromosome sizes is named as mm10.chrom.sizes.txt.
# Download the bowtie index of mm10
wget https://genome-idx.s3.amazonaws.com/bt/mm10.zip
unzip mm10.zip
# Download the mm10.chrom.sizes
wget https://hgdownload-test.gi.ucsc.edu/goldenPath/mm10/bigZips/mm10.chrom.sizes
To remove the influence of adapter sequence on short reads (~35 bases) gained in polyA-seq, perform Illumina adapter trimming with the command cutadapt. Save the results in the newly created directory adapter.trimming.results.
TRIM_DIR=adapter.trimming.results
mkdir $TRIM_DIR
for fq in $RAW_DATA/*.fastq
do
OUTPUT=$(basename ${fq}| sed 's/_pass.fastq//g');
cutadapt -a "AGATCGGAAGAGC" -m 15 -n 3 ${fq} | fastx_reverse_complement -i > $TRIM_DIR\/${OUTPUT}_trimmed.fastq
done
Ensure the read sequence used for genome alignment is consistent with the sense strand sequence. According to the description of library preparation steps in the original article, polyA-seq was performed on the antisense strand. Therefore, after adapter trimming, perform reverse complementary operation using the fastx_reverse_complement command from FASTX toolkit to obtain the sense strand sequence.
TRIMRC_DIR=trimmed.rc.data
mkdir $TRIMRC_DIR
for fq in $RAW_DATA/*.fastq
do
OUTPUT=$(basename ${fq}| sed 's/_pass.fastq//g');
fastx_reverse_complement -i $TRIM_DIR\/${OUTPUT}_trimmed.fastq -o $TRIMRC_DIR\/${OUTPUT}_trimmed_rc.fastq
done
NOTE: This step is specific to the PolyA-seq assay used.
Map reads to mouse genome assembly using bowtie aligner and save the aligned SAM files in the directory bamfiles.
SAM_DIR=samfiles
mkdir $SAM_DIR
for fq in $RAW_DATA/*.fastq
do
OUTPUT=$(basename ${fq}| sed 's/_pass.fastq//g');
bowtie -n 2 -k 100 --best --strata -p 10 -x mm10/genome --un unmapped_out $TRIMRC_DIR\/${OUTPUT}_trimmed_rc.fastq -S > $SAM_DIR\/${OUTPUT}.sam
done
To prepare for genome coverage visualization, convert SAM files to BAM files, then sort and index BAM files for each sample using samtools commands before proceeding to further steps. Save the results in the directory bamfiles by typing the following commands and press Enter.
BAM_DIR=bamfiles
mkdir $BAM_DIR
for fq in $RAW_DATA/*.fastq
do
OUTPUT=$(basename ${fq}| sed 's/_pass.fastq//g');
samtools view -bS $SAM_DIR\/${OUTPUT}.sam -o $BAM_DIR\/${OUTPUT}.bam
samtools sort $BAM_DIR\/${OUTPUT}.bam -o $BAM_DIR\/${OUTPUT}.sorted.bam
samtools index $BAM_DIR\/${OUTPUT}.sorted.bam
done
To generate BigWig files for the read coverage visualization from BAM files, firstly apply genomeCoverageBed command from bedtools to generate bedgraph files, and use the parameter split and strand to split files based on their strand. Save the results in the directory bedgraphfiles by typing the following commands and press Enter.
BDG_DIR=bedgraphfiles
mkdir $BDG_DIR
for fq in $RAW_DATA/*.fastq
do
OUTPUT=$(basename ${fq}| sed 's/_pass.fastq//g');
# Generate genomeCoverageBed files
genomeCoverageBed -split -strand + -bga -ibam $BAM_DIR\/${OUTPUT}.sorted.bam | sort -k 1,1 -k2,2n > $BDG_DIR\/${OUTPUT}.bedgraph_fwd
genomeCoverageBed -split -strand - -bga -ibam $BAM_DIR\/${OUTPUT}.sorted.bam | sort -k 1,1 -k2,2n > $BDG_DIR\/${OUTPUT}.bedgraph_rev
done
If the counts are used for the read coverage visualization, the height of read peaks is influenced by library size. Therefore, normalization of read counts by sequencing depth/library size is a critical step in the read coverage visualization. Calculate the library size of each sample using the samtools command, and save the results as txt files in the directory librarysizefiles. Then use perl -lane command to calculate the counts per million (CPM) number normalized by the sample library size. Save the results in the directory bedgraphfiles.norm by typing the following commands and press Enter.
LBSIZE_DIR=librarysizefiles
mkdir $LBSIZE_DIR
BDGNORM_DIR=bedgraphfiles.norm
mkdir $BDGNORM_DIR
for fq in $RAW_DATA/*.fastq
do
OUTPUT=$(basename ${fq}| sed 's/_pass.fastq//g');
# Calculate the library size
samtools view -F 0x105 -c $BAM_DIR\/${OUTPUT}.sorted.bam > $LBSIZE_DIR\/${OUTPUT}.temp
# Generate the normalized genomeCoverageBed files for the coverage visualisation to decide the flanking regions included in the annotation file
libsize=$(cat $LBSIZE_DIR\/${OUTPUT}.temp)
perl -lane '$xx=$F[3]*1e6/'$libsize'; print "$F[0]\t$F[1]\t$F[2]\t$xx"' $BDG_DIR\/${OUTPUT}.bedgraph_fwd > $BDGNORM_DIR\/${OUTPUT}.bedgraph_fwd_norm
perl -lane '$xx=$F[3]*1e6/'$libsize'; print "$F[0]\t$F[1]\t$F[2]\t$xx"' $BDG_DIR\/${OUTPUT}.bedgraph_rev > $BDGNORM_DIR\/${OUTPUT}.bedgraph_rev_norm
done
Generate BigWig files using the UCSC wigToBigWig command, and save the results in the directory wigToBigWig.
BWNORM_DIR=normalized.BigWigfiles
mkdir $BWNORM_DIR
for fq in $RAW_DATA/*.fastq
do
OUTPUT=$(basename ${fq}| sed 's/_pass.fastq//g');
# Generate bigwig files for the check in the genome browser/IGV
wigToBigWig $BDGNORM_DIR\/${OUTPUT}.bedgraph_fwd_norm mm10.chrom.sizes.txt $BWNORM_DIR\/${OUTPUT}.fwd.bw
wigToBigWig $BDGNORM_DIR\/${OUTPUT}.bedgraph_rev_norm mm10.chrom.sizes.txt $BWNORM_DIR\/${OUTPUT}.rev.bw
done
NOTE: The processing steps of the pA sites annotation file is preformed firstly in R console, then on Linux command-line for the read coverage visualzation at pA sites and the generation of final annotation file.
Download the pA site annotation file “atlas.clusters.2.0.GRCm38.96.tsv” from the PolyASite 2.0 database at https://polyasite.unibas.ch. To find the link for the download, select ‘Atlas’ on the top bar of the main website, then click the ‘Atlas BED file’ under the subtitle ‘Mus musculus: v2.0 (GRCm38.96)’. The tsv file is expected to start downloading automatically.
Type the following commands in R console and press Enter to load the required libraries in R.
library(magrittr)
As the packages used here for APA analysis are designed for differential exon usage and splicing analysis, the format of the annotations needs to be modified in two aspects for the adaption to APA analysis: 1) Change the format of unique feature IDs (namely unique pA site cluster IDs) from ‘group ID: feature ID: strand’ to group ID: feature ID’ (namely ‘gene ID: pA site cluster ID’). For the convenience of downstream analysis and to avoid duplicates in the pA site cluster IDs, remove the last common in each original cluster ID and replace the strand information + and - with letters F (Forward) and R(Reverse) individually; 2) Add ‘chr’ before each chromosome number; In addition to the changes above, for the visualization of reads coverage and pA cleavage sites: 3) Replace the pA site coordinates by their corresponding cleavage sites.
anno$name <- anno$strand %>% replace(. %in% "+", "F") %>% # Replace the strand information '+' with 'F'
replace(. %in% "-", "R") %>% # Replace the strand information '-' with 'R'
paste0(anno$gene_name, ":", anno$rep, .) # Generate new unique pA site cluster IDs
anno$chrom <- paste0("chr", anno$chrom) # change the format of chromosome column for downstream analysis
# Replace coordinates of pA sites with their corresponding cleavage sites
anno$chromStart <- anno$rep -1
anno$chromEnd <- anno$rep
To prepare for the read coverage visualization at pA sites, split the processed annotation file into two sperate files according to the strand information, then save the two files separatly in bed format as ‘pA_annotation.fwd.bed’ and ‘pA_annotation.rev.bed’ in the current working directory.
# Generate annotation files for each strand for genomeCoverage plotting
anno %T>%
{
anno.fwd = .[.$strand %in% "+", ]
write.table(anno.fwd, file = "pA_annotation.fwd.bed", row.names = FALSE, col.names = FALSE, sep = "\t", quote = FALSE)
} %>% {
anno.rev <- .[.$strand %in% "-", ]
write.table(anno.rev, file = "pA_annotation.rev.bed", row.names = FALSE, col.names = FALSE, sep = "\t", quote = FALSE)
}
Save the processed annotation file in bed format for further modification after pA sites read coverage visualization.
anno %>% write.table(file = "pA_annotation.bed", row.names = FALSE, col.names = FALSE, sep = "\t", quote = FALSE)
According to the 3p-seq protocol, the reads are expected to be mapped to the region around, rather than specifically at pA cleavage sites. Therefore, visualization to check the peak dispersion of mapped reads at pA sites to optimize the annotation file for this data is a critical step in the annotation file processing.
Visualize the peaks of aligned reads at pA cleavage sites using computeMatrix and plotHeatmap from deepTools by typing the following commands on Linux command-line and press Enter. Remeber to scale the distance upstream (-b) and downstream (-a) of the cleavage sites if the peak cannot be visualized completely. Save matrix results in the created directory compute.matrix.results, and the plots in pdf in the created directory plotHeatmap.results.
CM_DIR=compute.matrix.results
mkdir $CM_DIR
HM_DIR=plotHeatmap.results
mkdir $HM_DIR
for fq in $RAW_DATA/*.fastq
do
OUTPUT=$(basename ${fq}| sed 's/_pass.fastq//g');
# Visualizing the read coverage at pA sites
computeMatrix reference-point -S $BWNORM_DIR\/${OUTPUT}.fwd.bw -R pA_annotation.fwd.bed -a 100 -b 100 -o $CM_DIR\/${OUTPUT}.fwd.ud100.gz --referencePoint center --sortRegions no
plotHeatmap -m $CM_DIR\/${OUTPUT}.fwd.ud100.gz -o $HM_DIR\/${OUTPUT}.fwd.ud100.pdf
computeMatrix reference-point -S $BWNORM_DIR\/${OUTPUT}.rev.bw -R pA_annotation.rev.bed -a 100 -b 100 -o $CM_DIR\/${OUTPUT}.rev.ud100.gz --referencePoint center --sortRegions no
plotHeatmap -m $CM_DIR\/${OUTPUT}.rev.ud100.gz -o $HM_DIR\/${OUTPUT}.rev.ud100.pdf
done
NOTE: Through the visualization of read coverage at pA sites, it was found that the peaks of the mapped reads were mainly dispersed within ~60 bp upstream of the cleavage sites (Refer to the supplementary figure “merge.heatmap.png”). Therefore, coordinates of pA sites from the annotation file are supposed to be extended to 60 bp upstream of their cleavage sites in the analysis of the PolyA-seq data. Remember that depending on the specific 3’ end sequencing protocol used, this step will need to be optimized for assays other than PolyA-seq.
Extend the coordinates of pA sites to 60 bp upstream of their cleavage sites in the annotation file by typing the following command on the Linux command-line. Download the chromosome size annotation file from UCSC using the command wget. Save the processed annotation file in bed format for downstream analysis as “flanking60added.pA_annotation.bed”.
# Download the chromosome size annotation file 'mm10.chrom.sizes.txt'
wget https://hgdownload-test.gi.ucsc.edu/goldenPath/mm10/bigZips/mm10.chrom.sizes
# Change the added value based on the genomeCov results
bedtools slop -i pA_annotation.bed -g mm10.chrom.sizes.txt -l 60 -r 0 -s > flanking60added.pA_annotation.bed
NOTE: All the steps in the following sections are perfomed in R console.
Type the following commands in R console and press Enter to load the required libraries in R.
c("Rsubread","tidyverse") %>% lapply(library, character.only = TRUE) %>% invisible
Warning message:
In readChar(file, size, TRUE) : truncating string with embedded nuls
Load the pA site annotation information from the bed file using the read.table function, then select eight columns used for the differential APA analysis, including unique pA site cluster ID, chromosome name, start and end positions of the pA site cluster, strand on which the cluster is encoded, the overlapping gene ensembles and their corresponding gene symbols, and the representative pA site of the cluster. NOTE: Before running this step, ensure the path of the bed file is correct as the input of the read.table function.
anno <- read.table(file = "/Users/evelynzheng/3PSeq_files/flanking60added.pA_annotation.bed", stringsAsFactors = FALSE, check.names = FALSE, header = FALSE, sep = "")
colnames(anno) <- c("Chr", "Start", "End", "GeneID", "Score", "Strand", "repID", "Anno", "Symbol", "Ensembl")
anno <- dplyr::select(anno, GeneID, Chr, Start, End, Strand, Ensembl, Symbol, repID)
feactureCounts to acquire the raw read countsRead all bam files obtained as input for featureCounts() from Rsubread package to acquire the raw counts of reads mapped to each annotated pA site, the format of which is expected to be a matrix of counts associated with each feature with rows representing exons(features) and columns representing samples. Save the count table as the RData file “APA_countData.RData” for APA analysis using different tools. NOTE: Before running this step, ensure the folder containing bam files is in the current working directory.
# As the running of featureCounts requires around 25 G memory, during the generation of this Rmd file, this step was running on server and the result was saved as "countData.RData" and loaded in the next R session.
countData <- dir("bamfiles", pattern="sorted.bam$", full.names = TRUE) %>%
# Read all bam files as input for featureCounts
featureCounts(annot.ext = anno, isGTFAnnotationFile = FALSE, minMQS = 0, useMetaFeatures = TRUE,
allowMultiOverlap = TRUE, largestOverlap = TRUE, strandSpecific = 1,
countMultiMappingReads = TRUE, primaryOnly = TRUE, isPairedEnd = FALSE, nthreads = 12) %T>%
save(file = "APA_countData.RData")
NOTE: Be conscious to change any of the parameters listed in the featureCounts function. Modify the strandSpecific parameter to ensure it is consistent with the sequencing direction of the 3’ end sequencing assay used (empirically, visualizing the data in a genome browser over genes on plus and minus strands will clarify this).
As filtering of the raw read counts can significantly improve the statistical robustness in differential pA site usage tests, apply read counts filtering on two levels: 1) On gene-level, remove genes with only one pA site, on which differential pA site usage cannot be defined; 2) On gene-level, remove genes annotated as NA, which will hinder the downstream processing and analysis; 3) On pA site-level, apply non-specific filtering based on coverage: filter counts by cpm (counts per million) less than 1 in x out of n samples, where x is the minimum number of replicates in any condition. n = 8 and x = 2 for this example data. NOTE: Before running this step, ensure the path of the RData file is correct as the input of the load function.
load(file = "/Users/evelynzheng/3PSeq_files/APA_countData.RData") # Skip this step if it is already loaded
# Non-specific filtering: Remove the pA sites not differentially expressed in the samples
countData <- countData$counts %>% as.data.frame %>% .[rowSums(edgeR::cpm(.) > 1) >= 2, ]
anno %<>% .[.$GeneID %in% rownames(countData), ]
# Remove genes with only 1 site and NA in geneIDs
dnsites <- anno %>% group_by(Symbol) %>% summarise(nsites=n()) %>% filter(nsites > 1 & !is.na(Symbol))
anno %<>% filter(Symbol %in% dnsites$Symbol)
countData %<>% .[rownames(.) %in% anno$GeneID, ]
NOTE: As a contrast matrix cannot be defined for the DEXSeq pipeline, the differential APA analysis of each two experimental conditions has to be developed separatly. Here the differential APA analysis of the condition wild type (WT) and Mbnl 1/2 double knockout (DKO) is developed as an example to explain the procedure. The analysis of the contrast pairs WT and KD, WT and Ctrl are attached at the end.
Type the following commands in R console and press Enter to load the required libraries in R.
c("DEXSeq", "GenomicRanges") %>% lapply(library, character.only = TRUE) %>% invisible
Create the sampleTable, which consists of details of each sample with its library-type and condition. Ensure the order of row names is consistent with the order of bam file names used by featureCounts to count the reads.
sampleTable1 <- data.frame(row.names = c("WT_1","WT_2","DKO_1","DKO_2"),
condition = c(rep("WT", 2), rep("DKO", 2)),
libType = rep("single-end", 4))
pA site information in the form of GRanges objects (https://bioconductor.org/packages/release/bioc/html/GenomicRanges.html) is required as an input to create DEXSeq object. Extract the annotation from the processed annotation object to create PASinfo object for the preparation of DEXSeq object construction.
# Prepare the GRanges object for DEXSeqDataSet object construction
PASinfo <- GRanges(seqnames = anno$Chr,
ranges = IRanges(start = anno$Start, end = anno$End),
strand = Rle(anno$Strand))
mcols(PASinfo)$PASID <- anno$repID
mcols(PASinfo)$GeneEns <- anno$Ensembl
mcols(PASinfo)$GeneID <- anno$Symbol
# Prepare the new feature IDs, replace the strand information with letters to match the current pA site clusterID
new.featureID <- anno$Strand %>% as.character %>% replace(. %in% "+", "F") %>% replace(. %in% "-", "R") %>% paste0(as.character(anno$repID), .)
Create the DEXSeq object using the function DEXSeqDataSet to collect: 1) the filtered read counts generated in step 3; 2) sample information from the sampleTable; 3) design matrix which is generated based on the sampleData and a model formula notation for the differential APA testing; 4) feature information including pA site IDs and their corresponding gene symbols.
countData1 <- dplyr::select(countData, SRR1553129.sorted.bam, SRR1553130.sorted.bam,
SRR1553131.sorted.bam, SRR1553132.sorted.bam) # Select the read counts of the condtion WT and DKO
colnames(countData1) <- rownames(sampleTable1) # Rename the columns of countData using sample names in sampleTable
dxd1 <- DEXSeqDataSet(countData = countData1,
sampleData = sampleTable1,
design = ~ sample + exon + condition:exon,
featureID = new.featureID,
groupID = anno$Symbol,
featureRanges = PASinfo)
converting counts to integer mode
some variables in design formula are characters, converting to factors
NOTE: This step might take some time. Once the object is created, explore the columns for initial understanding of data.
# Explore/Inspect the DEXSeq object
head(counts(dxd1), 5)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
Akap12:4359096F 74 110 21 29 492 671 203 201
Akap12:4359468F 492 671 203 201 74 110 21 29
Lats1:7716088F 46 82 41 25 283 416 320 257
Lats1:7716124F 85 108 82 66 244 390 279 216
Lats1:7716458F 198 308 238 191 131 190 123 91
colData(dxd1)
DataFrame with 8 rows and 4 columns
sample condition libType exon
<factor> <factor> <character> <factor>
1 WT_1 WT single-end this
2 WT_2 WT single-end this
3 DKO_1 DKO single-end this
4 DKO_2 DKO single-end this
5 WT_1 WT single-end others
6 WT_2 WT single-end others
7 DKO_1 DKO single-end others
8 DKO_2 DKO single-end others
split(seq_len(ncol(dxd1)), colData(dxd1)$exon)
$others
[1] 5 6 7 8
$this
[1] 1 2 3 4
#head(featureCounts(dxd1), 5)
head(rowRanges(dxd1), 3)
GRanges object with 3 ranges and 7 metadata columns:
seqnames ranges strand | PASID GeneEns GeneID featureID
<Rle> <IRanges> <Rle> | <integer> <character> <character> <character>
Akap12:4359096F chr10 4359035-4359096 + | 4359096 ENSMUSG00000038587 Akap12 4359096F
Akap12:4359468F chr10 4359407-4359468 + | 4359468 ENSMUSG00000038587 Akap12 4359468F
Lats1:7716088F chr10 7716027-7716088 + | 7716088 ENSMUSG00000040021 Lats1 7716088F
groupID exonBaseMean exonBaseVar
<character> <numeric> <numeric>
Akap12:4359096F Akap12 58.50 1723.0
Akap12:4359468F Akap12 391.75 53347.6
Lats1:7716088F Lats1 48.50 579.0
-------
seqinfo: 21 sequences from an unspecified genome; no seqlengths
sampleAnnotation(dxd1)
DataFrame with 4 rows and 3 columns
sample condition libType
<factor> <factor> <character>
1 WT_1 WT single-end
2 WT_2 WT single-end
3 DKO_1 DKO single-end
4 DKO_2 DKO single-end
Define the contrast pair in the following analysis through defining the levels of conditions in DEXSeq object. If not defined, the conditions saved in DEXSeq object will be transferred to alphabetical order-based factors, and the contrast pairs will be “‘the condition with higher level’ - ‘the condition with lower level’”.
dxd1$condition <- factor(dxd1$condition, levels = c("WT", "DKO")) # The contrast pair will be "DKO - WT"
Similar to RNA-seq data, for 3’ end sequencing data perform normalization between samples (column-wise median of ratios for each sample) using the estimateSizeFactors function, and estimate the variation of the data using with the estimateDispersions function, then visualize the dispersion estimation result using the plotDispEsts function.
dxd1 %<>% estimateSizeFactors %>% estimateDispersions %T>% plotDispEsts # Visualization of the dispersion estimation result
Test for differential pA site usage for each gene and generate the results using the function testForDEU, then estimate the fold change of pA site usage using the function estimateExonFoldChanges. Check the results using the function DEXSeqResults and set ‘FDR < 10%’ as the criterion for significantly differential pA sites.
dxd1 %<>% testForDEU %>% estimateExonFoldChanges(fitExpToVar = "condition") #Estimate fold changes
dxr1 <- DEXSeqResults(dxd1)
dxr1
LRT p-value: full vs reduced
DataFrame with 13283 rows and 12 columns
groupID featureID exonBaseMean dispersion stat pvalue padj WT
<character> <character> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
Akap12:4359096F Akap12 4359096F 58.4225 0.00798561 1.3333075 0.2482177 0.3425646 11.1258
Akap12:4359468F Akap12 4359468F 391.4266 0.00697381 1.3993786 0.2368276 0.3295738 21.5595
Lats1:7716088F Lats1 7716088F 47.9757 0.01022535 4.6834917 0.0304538 0.0586427 11.4030
Lats1:7716124F Lats1 7716124F 84.7371 0.00775833 0.0257107 0.8726089 0.9083749 13.4728
Lats1:7716458F Lats1 7716458F 231.1985 0.00454191 2.8345060 0.0922592 0.1519879 18.9816
... ... ... ... ... ... ... ... ...
Msl3:168654204R Msl3 168654204R 56.3907 0.0721362 0.961646 3.26773e-01 4.26271e-01 10.9169
Hccs:169311537R Hccs 169311537R 292.6900 0.0134943 20.084690 7.40874e-06 3.36906e-05 18.5133
Hccs:169311802R Hccs 169311802R 37.9711 0.0160051 18.511783 1.68857e-05 7.23621e-05 11.4216
Ddx3y:1260772R Ddx3y 1260772R 626.7269 0.0205045 0.687067 4.07164e-01 5.07970e-01 25.3131
Ddx3y:1260874R Ddx3y 1260874R 69.9770 0.0218784 0.656277 4.17877e-01 5.18608e-01 11.9097
DKO log2fold_DKO_WT genomicData countData
<numeric> <numeric> <GRanges> <matrix>
Akap12:4359096F 10.29634 -0.26470603 chr10:4359035-4359096:+ 74:110:21:...
Akap12:4359468F 21.96409 0.09049584 chr10:4359407-4359468:+ 492:671:203:...
Lats1:7716088F 9.66163 -0.56319310 chr10:7716027-7716088:+ 46:82:41:...
Lats1:7716124F 13.48425 0.00315177 chr10:7716063-7716124:+ 85:108:82:...
Lats1:7716458F 19.64339 0.15382436 chr10:7716397-7716458:+ 198:308:238:...
... ... ... ... ...
Msl3:168654204R 12.05918 0.347584 chrX:168654203-168654264:- 34:68:42:...
Hccs:169311537R 20.53603 0.468533 chrX:169311536-169311597:- 117:125:521:...
Hccs:169311802R 8.13324 -1.129432 chrX:169311801-169311862:- 39:27:43:...
Ddx3y:1260772R 24.81621 -0.107380 chrY:1260771-1260832:- 782:1010:366:...
Ddx3y:1260874R 12.52243 0.179070 chrY:1260873-1260934:- 71:119:37:...
#dxr <- na.omit(dxr)
mcols(dxr1)$description
[1] "group/gene identifier"
[2] "feature/exon identifier"
[3] "mean of the counts across samples in each feature/exon"
[4] "exon dispersion estimate"
[5] "LRT statistic: full vs reduced"
[6] "LRT p-value: full vs reduced"
[7] "BH adjusted p-values"
[8] "exon usage coefficient"
[9] "exon usage coefficient"
[10] "relative exon usage fold change"
[11] "GRanges object of the coordinates of the exon/feature"
[12] "matrix of integer counts, of each column containing a sample"
## ----tallyPASs------------------------------------------------------------
table(dxr1$padj < 0.1) # Check the number of differential pA sites (FDR < 0.1)
FALSE TRUE
5782 7501
## ----tallyGenes------------------------------------------------------------
table(tapply(dxr1$padj < 0.1, dxr1$groupID, any)) # Check the number of gene overlapped with differential pA site clusters (FDR < 0.1)
FALSE TRUE
1436 3152
# Select the top 100 significant differntial pA site usage
topdiff.PAS <- dxr1 %>% as.data.frame %>% rownames_to_column %>% arrange(padj) %$% groupID[1:100]
head(topdiff.PAS)
[1] "S100a7a" "Kcnq1ot1" "Ppp2ca" "Tpm1" "Wwtr1" "S100a7a"
Use the plotDEXSeq function for visualization of differential pA usage result. Use different gene names to explore the results:
plotDEXSeq(dxr1, "S100a7a", legend = TRUE, expression = FALSE, splicing = TRUE, cex.axis = 1.2, cex = 1.3, lwd = 2)
To display differential ployA usage use splicing = TRUE:
plotDEXSeq(dxr1, "Fosl2", legend = TRUE, expression = FALSE, splicing = TRUE, cex.axis = 1.2, cex = 1.3, lwd = 2)
To display both experssion and differential ployA usage use splicing = TRUE and expression = TRUE:
plotDEXSeq(dxr1, "Papola", legend = TRUE, expression = FALSE, splicing = TRUE, cex.axis = 1.2, cex = 1.3, lwd = 2)
Generate plots for all the top 300 significant genes using perGeneQValue function:
dxr1 %<>% .[!is.na(.$padj), ]
dgene <- data.frame(perGeneQValue=perGeneQValue(dxr1)) %>% rownames_to_column("groupID")
dePAS1 <- dxr1 %>% data.frame() %>%
dplyr::select(-matches("dispersion|stat|countData|genomicData")) %>%
inner_join(dgene) %>% arrange(perGeneQValue) %>% distinct()
Joining, by = "groupID"
# Save the significant differential pA sites (order by perGeneQValue)
"openxlsx" %>% lapply(library, character.only = TRUE) %>% invisible
dePAS_sig1 <- dePAS1 %>% filter(padj < 0.1) %T>%
write.xlsx(file = "3PSeq_DEXSeq_significant_genes.xlsx", sheetName = "DKO vs WT", firstRow = TRUE, headerStyle = createStyle(textDecoration = "BOLD", halign = "center"))
Check the information of the top 300 differntial pA sites.
DT::datatable(dePAS_sig1[1:300, ])
Generate plots of the top 300 differntial pA sites.
pdf(file= "3PSeq_DEXSeq_significant_genes.DKO_WT.top300.pdf")
sig_genes <- dePAS_sig1 %>% filter(perGeneQValue <= 0.01) %$% groupID %>% unique %>% .[1:300]
for (geneid in sig_genes) {
nn <- dxr1 %>% .[.$groupID %in% geneid, ] %>% nrow
if (nn > 1) {
plotDEXSeq(dxr1, geneid, legend = TRUE, expression = FALSE, splicing = TRUE, cex.axis = 1.2, cex = 1.3, lwd = 2)
}
}
dev.off()
null device
1
Use the EnhancedVolcano package to visualise differential pA site usage of the contrast pair.
"EnhancedVolcano" %>% lapply(library, character.only = TRUE) %>% invisible
EnhancedVolcano(dePAS_sig1, lab = dePAS_sig1$groupID, x = 'log2fold_DKO_WT', y = 'pvalue', title = 'Volcano Plot',
subtitle = 'DKO vs WT', FCcutoff = 1, labSize = 4, legendPosition = "right",
caption = bquote(~Log[2]~ "Fold change cutoff, 1; FDR 10%"))
NOTE: EnhancedVolcano package was used to generate the above plot, you will need to install it if not already installed.
countData2 <- dplyr::select(countData, SRR1553129.sorted.bam, SRR1553130.sorted.bam,
SRR1553133.sorted.bam, SRR1553134.sorted.bam) # Select the read counts of the condtion WT and KD
sampleTable2 <- data.frame(row.names = c("WT_1", "WT_2", "KD_1", "KD_2"),
condition = c(rep("WT", 2), rep("KD", 2)),
libType = rep("single-end", 4))
dxd2 <- DEXSeqDataSet(countData = countData2,
sampleData = sampleTable2,
design = ~ sample + exon + condition:exon,
featureID = new.featureID,
groupID = anno$Symbol,
featureRanges = PASinfo)
converting counts to integer mode
some variables in design formula are characters, converting to factors
dxd2$condition <- factor(dxd2$condition, levels = c("WT", "KD"))
dxd2 %<>% estimateSizeFactors %>% estimateDispersions %>% testForDEU %>% estimateExonFoldChanges(fitExpToVar = "condition")
dxr2 <- DEXSeqResults(dxd2)
mcols(dxr2)$description
[1] "group/gene identifier"
[2] "feature/exon identifier"
[3] "mean of the counts across samples in each feature/exon"
[4] "exon dispersion estimate"
[5] "LRT statistic: full vs reduced"
[6] "LRT p-value: full vs reduced"
[7] "BH adjusted p-values"
[8] "exon usage coefficient"
[9] "exon usage coefficient"
[10] "relative exon usage fold change"
[11] "GRanges object of the coordinates of the exon/feature"
[12] "matrix of integer counts, of each column containing a sample"
## ----tallyPASs------------------------------------------------------------
table(dxr2$padj < 0.1) # Check the number of differential pA sites (FDR < 0.1)
FALSE TRUE
4745 8533
## ----tallyGenes------------------------------------------------------------
table(tapply(dxr2$padj < 0.1, dxr2$groupID, any)) # Check the number of gene overlapped with differential pA site clusters (FDR < 0.1)
FALSE TRUE
1047 3538
# Selection and visualization of the results
topdiff.PAS2 <- dxr2 %>% as.data.frame %>% rownames_to_column %>% arrange(padj) %$% groupID[1:100]
head(topdiff.PAS2)
[1] "Tpm1" "Peg3" "Peg3" "Rragc" "Rragc" "Mbd2"
plotDEXSeq(dxr2, "Tpm1", legend = TRUE, expression = FALSE, splicing = TRUE, cex.axis = 1.2, cex = 1.3, lwd = 2)
dxr2 %<>% .[!is.na(.$padj), ]
dgene2 <- data.frame(perGeneQValue=perGeneQValue(dxr2)) %>% rownames_to_column("groupID")
dePAS2 <- dxr2 %>% data.frame() %>%
dplyr::select(-matches("dispersion|stat|countData|genomicData")) %>%
inner_join(dgene2) %>% arrange(perGeneQValue) %>% distinct()
Joining, by = "groupID"
# Save the significant differential pA sites (order by perGeneQValue)
add.xlsx.sheet <- function(fileName, sheetName, data){
wb <- loadWorkbook(fileName)
addWorksheet(wb, sheetName = sheetName)
freezePane(wb, sheet = sheetName, firstRow = TRUE)
writeData(wb, sheet = sheetName, data, headerStyle = createStyle(textDecoration = "BOLD", halign = "center"))
saveWorkbook(wb, file = fileName, overwrite = TRUE)
}
dePAS_sig2 <- dePAS2 %>% filter(padj < 0.1) %T>%
add.xlsx.sheet(file = "3PSeq_DEXSeq_significant_genes.xlsx", "KD vs WT", .)
DT::datatable(dePAS_sig2[1:300, ])
EnhancedVolcano(dePAS_sig2, lab = dePAS_sig2$groupID, x = 'log2fold_KD_WT', y = 'pvalue', title = 'Volcano Plot',
subtitle = 'KD vs WT', FCcutoff = 1, labSize = 4, legendPosition = "right",
caption = bquote(~Log[2]~ "Fold change cutoff, 1; FDR 10%"))
countData3 <- dplyr::select(countData, SRR1553129.sorted.bam, SRR1553130.sorted.bam,
SRR1553135.sorted.bam, SRR1553136.sorted.bam) # Select the read counts of the condtion WT and KD
sampleTable3 <- data.frame(row.names = c("WT_1", "WT_2", "Ctrl_1", "Ctrl_2"),
condition = c(rep("WT", 2), rep("Ctrl", 2)),
libType = rep("single-end", 4))
dxd3 <- DEXSeqDataSet(countData = countData3,
sampleData = sampleTable3,
design = ~ sample + exon + condition:exon,
featureID = new.featureID,
groupID = anno$Symbol,
featureRanges = PASinfo)
converting counts to integer mode
some variables in design formula are characters, converting to factors
dxd3$condition <- factor(dxd3$condition, levels = c("WT", "Ctrl"))
dxd3 %<>% estimateSizeFactors %>% estimateDispersions %>% testForDEU %>% estimateExonFoldChanges(fitExpToVar = "condition")
dxr3 <- DEXSeqResults(dxd3)
mcols(dxr3)$description
[1] "group/gene identifier"
[2] "feature/exon identifier"
[3] "mean of the counts across samples in each feature/exon"
[4] "exon dispersion estimate"
[5] "LRT statistic: full vs reduced"
[6] "LRT p-value: full vs reduced"
[7] "BH adjusted p-values"
[8] "exon usage coefficient"
[9] "exon usage coefficient"
[10] "relative exon usage fold change"
[11] "GRanges object of the coordinates of the exon/feature"
[12] "matrix of integer counts, of each column containing a sample"
## ----tallyPASs------------------------------------------------------------
table(dxr3$padj < 0.1) # Check the number of differential pA sites (FDR < 0.1)
FALSE TRUE
8232 5048
## ----tallyGenes------------------------------------------------------------
table(tapply(dxr3$padj < 0.1, dxr3$groupID, any)) # Check the number of gene overlapped with differential pA site clusters (FDR < 0.1)
FALSE TRUE
2193 2393
# Selection and visualization of the results
topdiff.PAS3 <- dxr3 %>% as.data.frame %>% rownames_to_column %>% arrange(padj) %$% groupID[1:100]
head(topdiff.PAS3)
[1] "Tpm1" "Smc6" "Tpm1" "Smc6" "Gm13341" "Zranb2"
plotDEXSeq(dxr3, "Tpm1", legend = TRUE, expression = FALSE, splicing = TRUE, cex.axis = 1.2, cex = 1.3, lwd = 2)
dxr3 %<>% .[!is.na(.$padj), ]
dgene3 <- data.frame(perGeneQValue=perGeneQValue(dxr3)) %>% rownames_to_column("groupID")
dePAS3 <- dxr3 %>% data.frame() %>%
dplyr::select(-matches("dispersion|stat|countData|genomicData")) %>%
inner_join(dgene3) %>% arrange(perGeneQValue) %>% distinct()
Joining, by = "groupID"
# Save the significant differential pA sites (order by perGeneQValue)
dePAS_sig3 <- dePAS3 %>% filter(padj < 0.1) %T>%
add.xlsx.sheet(file = "3PSeq_DEXSeq_significant_genes.xlsx", "Ctrl vs WT", .)
DT::datatable(dePAS_sig3[1:300, ])
EnhancedVolcano(dePAS_sig3, lab = dePAS_sig3$groupID, x = 'log2fold_Ctrl_WT', y = 'pvalue', title = 'Volcano Plot',
subtitle = 'Ctrl vs WT', FCcutoff = 1, labSize = 4, legendPosition = "right",
caption = bquote(~Log[2]~ "Fold change cutoff, 1; FDR 10%"))
Type the following commands in R console and press Enter to load the required libraries in R.
c("limma", "edgeR") %>% lapply(library, character.only = TRUE) %>% invisible
Create the sampleTable, which consists of details of each sample with its library-type and condition. Ensure the order of row names is consistent with the order of bam file names used by featureCounts to count the reads. Remember that as the contrast pairs can be defined directly in the diffSplice function, the sampleTable created in this step includes all the read counts of all conditions.
sampleTable <- data.frame(row.names = c("WT_1","WT_2","DKO_1","DKO_2", "KD_1", "KD_2", "Ctrl_1", "Ctrl_2"),
condition = c(rep("WT", 2), rep("DKO", 2), rep("KD", 2), rep("Ctrl", 2)),
libType = rep("single-end", 4))
Create the DGEList using the function DGEList to collect the filtered read counts generated in step 3. Remember that as the contrast pairs can be defined directly in the next step, the DGEList object created in this step include all the read counts of all conditions. Then develop processing steps of the DGEList object includes: 1) Normalization of the read counts across samples using Trimmed Meand of M values (TMM normalization) by the function calcNormFactors; 2) Differential expression testing across all conditions using the function voomWithQualityWeights; 3) Linear modelling based on the count data using the function limFit; 4) Computing moderated statistical results using the function eBayes given the linear model fit from the limFit.
fit <- {
dge <- DGEList(counts = countData) %>% calcNormFactors
Treat <- factor(sampleTable$condition, levels = c("WT", "DKO", "KD", "Ctrl"))
design <- model.matrix(~0 + Treat)
colnames(design) <- levels(Treat)
voomWithQualityWeights(dge, design, plot = FALSE)
} %>% lmFit(design) %>% eBayes
Define the contrasts of interest for differential pA usage analysis using makeContrasts function. Then process the DGEList object fit generated in the previous step using the function contrasts.fit and eBayes to acquire the differential expression results of the contrast pairs.
contrast.matrix <- makeContrasts(DKO_vs_WT = DKO-WT, KD_vs_WT = KD-WT, Ctrl_vs_WT = Ctrl-WT, levels = design)
fit2 <- fit %>% contrasts.fit(contrast.matrix) %>% eBayes
summary(decideTests(fit2))
DKO_vs_WT KD_vs_WT Ctrl_vs_WT
Down 4107 3643 1161
NotSig 4953 6038 10393
Up 4223 3602 1729
Develop the differential pA site usage analysis for each defined contrasct pair on the fitted model using the function diffSplice.
ex <- diffSplice(fit2, geneid = anno$Symbol, exonid = new.featureID)
Total number of exons: 13283
Total number of genes: 4588
Number of genes with 1 exon: 0
Mean number of exons in a gene: 3
Max number of exons in a gene: 27
#Check the top significant results with topSplice
topSplice(ex)
NOTE: This step uses the PAS information object created in 4.1.3. Ensure it is loaded in the current working environment.
Check the information of the top 300 differntial pA sites.
sig1 <-topSplice(ex, n = Inf, FDR = 0.1, coef = 1, test= "t", sort.by = "logFC") # Save the results sorted by logFC with FDR < 0.1
DT::datatable(sig1[1:300, ])
Check the information of 300 genes with the top differntial APA.
sig1.genes <-topSplice(ex, n = Inf, FDR = 0.1, coef = 1, test= "simes") # Save the results with FDR < 0.1
DT::datatable(sig1.genes[1:300, ])
Visualize the differential APA result using the plotSplice function and volcano plots.
pdf(file = "3PSeq_diffSplice_plotSplice_DKO_WT.pdf")
plotSplice(ex, coef = 1, geneid = "S100a7a", FDR = 0.1)
plotSplice(ex, coef = 1, geneid = "Tpm1", FDR = 0.1)
plotSplice(ex, coef = 1, geneid = "Smc6", FDR = 0.1)
dev.off()
null device
1
pdf(file = "3PSeq_diffSplice_Volcano_DKO_WT.pdf", height = 10, width = 12)
EnhancedVolcano(sig1, lab = sig1$GeneID, xlab = bquote(~Log[2]~ 'fold change'),
x = 'logFC', y = 'P.Value', title = 'Volcano Plot', subtitle = 'DKO vs WT',
# pCutoff = 10e-16,
FCcutoff = 1, labSize = 8, legendPosition = "right")
dev.off()
null device
1
# Save the list of top differential pA sites at FDR 10%
write.xlsx(sig1, file = "3PSeq_diffSplice_significant_genes.xlsx", sheetName = "DKO vs WT (pA site-level)",
firstRow = TRUE, headerStyle = createStyle(textDecoration = "BOLD", halign = "center"))
# Save the list of genes with top differential APA at FDR 10%
add.xlsx.sheet(file = "3PSeq_diffSplice_significant_genes.xlsx", "DKO vs WT (gene-level)", sig1.genes)
Check the information of the top 300 differntial pA sites.
sig2 <-topSplice(ex, n = Inf, FDR = 0.1, coef = 2, test= "t", sort.by = "logFC")
DT::datatable(sig2[1:300, ])
Check the information of 300 genes with the top differntial APA.
sig2.genes <-topSplice(ex, n = Inf, FDR = 0.1, coef = 2, test= "simes") # Save the results with FDR < 0.1
DT::datatable(sig2.genes[1:300, ])
Visualize the differential APA result using the plotSplice function and volcano plots.
pdf(file = "3PSeq_diffSplice_plotSplice_KD_WT.pdf")
plotSplice(ex, coef = 2, geneid = "Mbd2", FDR = 0.1)
plotSplice(ex, coef = 2, geneid = "Ssbp1", FDR = 0.1)
plotSplice(ex, coef = 2, geneid = "Pfn1", FDR = 0.1)
dev.off()
null device
1
pdf(file = "3PSeq_diffSplice_Volcano_KD_WT.pdf", height = 10, width = 12)
EnhancedVolcano(sig2, lab = sig2$GeneID, xlab = bquote(~Log[2]~ 'fold change'),
x = 'logFC', y = 'P.Value', title = 'Volcano Plot', subtitle = 'KD vs WT',
# pCutoff = 10e-16,
FCcutoff = 1, labSize = 8, legendPosition = "right")
dev.off()
null device
1
#Save the list of top differential pA sites at FDR 10%
add.xlsx.sheet(file = "3PSeq_diffSplice_significant_genes.xlsx", "KD vs WT (pA site-level)", sig2)
# Save the list of genes with top differential APA at FDR 10%
add.xlsx.sheet(file = "3PSeq_diffSplice_significant_genes.xlsx", "KD vs WT (gene-level)", sig2.genes)
Check the information of the top 300 differntial pA sites.
sig3 <-topSplice(ex, n = Inf, FDR = 0.1, coef = 3, test= "t", sort.by = "logFC")
DT::datatable(sig3[1:300, ])
Check the information of 300 genes with the top differntial APA.
sig3.genes <-topSplice(ex, n = Inf, FDR = 0.1, coef = 3, test= "simes") # Save the results with FDR < 0.1
DT::datatable(sig3.genes[1:300, ])
Visualize the differential APA result using the plotSplice function and volcano plots.
pdf(file = "3PSeq_diffSplice_plotSplice_Ctrl_WT.pdf")
plotSplice(ex, coef = 3, geneid = "Mbd2", FDR = 0.1)
plotSplice(ex, coef = 3, geneid = "Ift81", FDR = 0.1)
plotSplice(ex, coef = 3, geneid = "Rps24", FDR = 0.1)
dev.off()
null device
1
pdf(file = "3PSeq_diffSplice_Volcano_Ctrl_WT.pdf", height = 10, width = 12)
EnhancedVolcano(sig3, lab = sig3$GeneID, xlab = bquote(~Log[2]~ 'fold change'),
x = 'logFC', y = 'P.Value', title = 'Volcano Plot', subtitle = 'Ctrl vs WT',
# pCutoff = 10e-16,
FCcutoff = 1, labSize = 8, legendPosition = "right")
dev.off()
null device
1
#Save the list of top differential pA sites at FDR 10%
add.xlsx.sheet(file = "3PSeq_diffSplice_significant_genes.xlsx", "Ctrl vs WT (pA site-level)", sig3)
# Save the list of genes with top differential APA at FDR 10%
add.xlsx.sheet(file = "3PSeq_diffSplice_significant_genes.xlsx", "Ctrl vs WT (gene-level)", sig3.genes)
print(sessionInfo())
R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] EnhancedVolcano_1.8.0 ggrepel_0.9.1 openxlsx_4.2.3
[4] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.6
[7] purrr_0.3.4 readr_1.4.0 tidyr_1.1.3
[10] tibble_3.1.1 ggplot2_3.3.3 tidyverse_1.3.1
[13] Rsubread_2.4.3 magrittr_2.0.1 DEXSeq_1.36.0
[16] RColorBrewer_1.1-2 AnnotationDbi_1.52.0 DESeq2_1.30.1
[19] SummarizedExperiment_1.20.0 GenomicRanges_1.42.0 GenomeInfoDb_1.26.7
[22] IRanges_2.24.1 S4Vectors_0.28.1 MatrixGenerics_1.2.1
[25] matrixStats_0.58.0 Biobase_2.50.0 BiocGenerics_0.36.1
[28] BiocParallel_1.24.1 edgeR_3.32.1 limma_3.46.0
loaded via a namespace (and not attached):
[1] readxl_1.3.1 backports_1.2.1 BiocFileCache_1.14.0 splines_4.0.3
[5] crosstalk_1.1.1 digest_0.6.27 htmltools_0.5.1.1 fansi_0.4.2
[9] memoise_2.0.0 Biostrings_2.58.0 annotate_1.68.0 modelr_0.1.8
[13] extrafont_0.17 extrafontdb_1.0 askpass_1.1 prettyunits_1.1.1
[17] colorspace_2.0-1 blob_1.2.1 rvest_1.0.0 rappdirs_0.3.3
[21] haven_2.4.1 xfun_0.22 crayon_1.4.1 RCurl_1.98-1.3
[25] jsonlite_1.7.2 genefilter_1.72.1 survival_3.2-11 glue_1.4.2
[29] gtable_0.3.0 zlibbioc_1.36.0 XVector_0.30.0 DelayedArray_0.16.3
[33] proj4_1.0-10.1 Rttf2pt1_1.3.8 maps_3.3.0 scales_1.1.1
[37] DBI_1.1.1 Rcpp_1.0.6 xtable_1.8-4 progress_1.2.2
[41] bit_4.0.4 DT_0.18 htmlwidgets_1.5.3 httr_1.4.2
[45] ellipsis_0.3.2 farver_2.1.0 pkgconfig_2.0.3 XML_3.99-0.6
[49] sass_0.3.1 dbplyr_2.1.1 locfit_1.5-9.4 utf8_1.2.1
[53] labeling_0.4.2 tidyselect_1.1.1 rlang_0.4.11 munsell_0.5.0
[57] cellranger_1.1.0 tools_4.0.3 cachem_1.0.4 cli_2.5.0
[61] generics_0.1.0 RSQLite_2.2.7 broom_0.7.6 evaluate_0.14
[65] fastmap_1.1.0 yaml_2.2.1 knitr_1.33 bit64_4.0.5
[69] fs_1.5.0 zip_2.1.1 ash_1.0-15 ggrastr_0.2.3
[73] xml2_1.3.2 biomaRt_2.46.3 compiler_4.0.3 rstudioapi_0.13
[77] beeswarm_0.3.1 curl_4.3.1 reprex_2.0.0 statmod_1.4.35
[81] geneplotter_1.68.0 bslib_0.2.4 stringi_1.5.3 ggalt_0.4.0
[85] lattice_0.20-44 Matrix_1.3-3 vctrs_0.3.8 pillar_1.6.0
[89] lifecycle_1.0.0 BiocManager_1.30.13 jquerylib_0.1.4 bitops_1.0-7
[93] R6_2.5.0 hwriter_1.3.2 KernSmooth_2.23-20 vipor_0.4.5
[97] MASS_7.3-54 assertthat_0.2.1 openssl_1.4.4 withr_2.4.2
[101] Rsamtools_2.6.0 GenomeInfoDbData_1.2.4 hms_1.0.0 grid_4.0.3
[105] rmarkdown_2.8 lubridate_1.7.10 ggbeeswarm_0.6.0