Differential expression using Deseq2
Often the goal of a RNA-seq type experiment is to find differentially expressed genes. Below I give guidelines for calling differential expression.
Idea behind differental expression programs
Ìý
Imagine you do RNA-seq on 6 samples that are all biological replicated of each other. When you analyze them, you split them into two groups. Then you draw each gene as a dot on a graph where the x axis is expression level and the y axis is log fold change. What do you expext? What do you get? (for example you can use #data from SRP221750)
Ìý
Step 0) Prepare
- If you have never used R before, you will need to learn about R. I like this set of .
- To work today, you need to install .
- Within Rstudio you will need to install the following:
- install.packages("ggplot2")
- install.packages("tidyr")
- install.packages("BiocManager")
- BiocManager::install("DESeq2")
- BiocManager::install("vsn")
- Within Rstudio you will need to install the following:
Ìý
Ìý
PREPARE -get this data if you don't have your own
- Download the : (You must log in with a colorado.edu address).
- Unpack any gzÌý data:
- tar -zxvf archive.tar.gz
Ìý
Step 1) Count the reads over each genes.
Tools for counting reads
- I will show you: Rsubreads
- Other tools you could use: Bedtools coverage or multi-cov (comand line), htseq (python)
- Assumptions of counting programs to be aware of
- Are low quality reads counted?
- Are multi-mapped reads counted?
- How are spliced reads handled?
- How is paired end data handled?
Input:
- Mapped reads file (geneally bam or cram and the index files for those files)
- Regions to count file (geneally gtf or bed)
Output
- expression object (we will save as RData file)
Method
Create a R script that looks like this: Or run each of these commands on your command line.
Step 2) Calculate differential expression
To get the data I use in this example download the files from link.
The major steps for differeatal expression are to normalize the data, determine where the differenal line will be, and call the differnetal expressed genes. How each of these steps is done varies from program to program.
Tools
I will teach you deseq2. However, I also recomend and edgeR or bayseq. bayseq is great for complicated patterns of anaysis, but not as good for cutoff anaysis.
Ìý
Input:
- list of samples you want to keep (the ones that looked ok on quality control)
- Coverage$counts from the RData file in step1
Output
- list of genes that are diffentailly expressed via adjusted p-value
- normalized count object inside "DESeqDataSet"
- estimates of dispersion of the data
- basemean expression of each gene
Additional Methods
This is the link for the Deseq2 script I am using.
This is another Deseq script that shows:
- how you can use alternative size factors if you know the size factors might be affected by the data in some way
- how to compare multiple things at once with a function
Design terms information:
- Imagine you have 3 biological replicates (repA, repB, repC) of RNA-seq between two people (person1 and person2). Imagine that the three replicates don't look very similar because of batch effects. Your metadata file should have one column with the replicate number and one column for the person. Your designs could be the following
- ~rep + person #this would tell you genes that go the same direction (up or down-regulated) in the three replicates
- ~rep + person + rep:person #this would show results for genes that are different between person1 and person2 theÌýreference levelÌý in this case repA------- person_1vs2 is just the results for replicate A
- It would also get you two interaction terms repB.person2 repC:person2 which would tell you relative to person_1vs2Ìý what is happening in repB and repC
Ìý
Other resources
http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html
https://github.com/lpantano/DEGreport
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0190152
https://rstudio-pubs-static.s3.amazonaws.com/329027_593046fb6d7a427da6b2c538caf601e1.html#the-condition-effect-for-genotype-i-the-main-effect
Ìý
Ìý