bigdata

Differential expression using Deseq2

Anonymous — Mon, 14 Oct 2019 15:45:13 +0000

Differential expression using Deseq2 Anonymous (not verified) Mon, 10/14/2019 - 09:45

Often the goal of a RNA-seq type experiment is to find differentially expressed genes. Below I give guidelines for calling differential expression.

Idea behind differental expression programs

Imagine you do RNA-seq on 6 samples that are all biological replicated of each other. When you analyze them, you split them into two groups. Then you draw each gene as a dot on a graph where the x axis is expression level and the y axis is log fold change. What do you expext? What do you get? (for example you can use #data from SRP221750)

Step 0) Prepare

If you have never used R before, you will need to learn about R. I like this set of lessons.
To work today, you need to install Rstudio.
- Within Rstudio you will need to install the following:
  - install.packages("ggplot2")
  - install.packages("tidyr")
  - install.packages("BiocManager")
  - BiocManager::install("DESeq2")
  - BiocManager::install("vsn")

PREPARE -get this data if you don't have your own

Download the data: (You must log in with a colorado.edu address).
Unpack any gz data:
1. tar -zxvf archive.tar.gz

Step 1) Count the reads over each genes.

Tools for counting reads

I will show you: Rsubreads
Other tools you could use: Bedtools coverage or multi-cov (comand line), htseq (python)
Assumptions of counting programs to be aware of
- Are low quality reads counted?
- Are multi-mapped reads counted?
- How are spliced reads handled?
- How is paired end data handled?

Input:

Mapped reads file (geneally bam or cram and the index files for those files)
Regions to count file (geneally gtf or bed)

Output

expression object (we will save as RData file)

Method

Create a R script that looks like this: Or run each of these commands on your command line.

Step 2) Calculate differential expression

To get the data I use in this example download the files from this link.

The major steps for differeatal expression are to normalize the data, determine where the differenal line will be, and call the differnetal expressed genes. How each of these steps is done varies from program to program.

Tools

I will teach you deseq2. However, I also recomend and edgeR or bayseq. bayseq is great for complicated patterns of anaysis, but not as good for cutoff anaysis.

Input:

list of samples you want to keep (the ones that looked ok on quality control)
Coverage$counts from the RData file in step1

Output

list of genes that are diffentailly expressed via adjusted p-value
normalized count object inside "DESeqDataSet"
estimates of dispersion of the data
basemean expression of each gene

Additional Methods

This is the link for the Deseq2 script I am using.

This is another Deseq script that shows:

how you can use alternative size factors if you know the size factors might be affected by the data in some way
how to compare multiple things at once with a function

Design terms information:

Imagine you have 3 biological replicates (repA, repB, repC) of RNA-seq between two people (person1 and person2). Imagine that the three replicates don't look very similar because of batch effects. Your metadata file should have one column with the replicate number and one column for the person. Your designs could be the following
- ~rep + person #this would tell you genes that go the same direction (up or down-regulated) in the three replicates
- ~rep + person + rep:person #this would show results for genes that are different between person1 and person2 the reference level in this case repA------- person_1vs2 is just the results for replicate A
  - It would also get you two interaction terms repB.person2 repC:person2 which would tell you relative to person_1vs2 what is happening in repB and repC

Other resources

http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html

https://github.com/lpantano/DEGreport

https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0190152

https://rstudio-pubs-static.s3.amazonaws.com/329027_593046fb6d7a427da6b2c538caf601e1.html#the-condition-effect-for-genotype-i-the-main-effect

Off

Traditional

White

Graphing tools

Anonymous — Thu, 28 Feb 2019 19:20:40 +0000

Graphing tools Anonymous (not verified) Thu, 02/28/2019 - 12:20

When I first got into bioinformatics one of the things I needed to learn quickly was how to graph with very large tabels of data. Below are some of my favorite websites for learning how to graph big data.

A great webstie with instructions on using R

https://datacarpentry.org/R-ecology-lesson/index.html (Links to an external site.)

Some great websties with instructions on using python for big data and graphing.SomeLinks to an external site.

Use this to learn the python package pandas (like excel for big data).

https://nikgrozev.com/2015/12/27/pandas-in-jupyter-quickstart-and-useful-snippets/ (Links to an external site.)Links to an external site.

Fancier plotting can be achieved with plotly or bokah

https://plot.ly/pandas/ (Links to an external site.)Links to an external site.

https://bokeh.pydata.org/en/latest/ (Links to an external site.)Links to an external site.

Specific to those in BioFrontiers on the compute cluster Fiji:

To use Rstudio on fiji, use a web browser to go to fiji-viz.colorado.edu and click on Rstudio.

Setting up a FANCY jupyter notebook on fiji.

Step 0:

Log into Fiji on the command line and do this:

http://bficores.colorado.edu/biofrontiers-it/cluster-computing/fiji/jupyterhub

Step1:

also run these line on the command line

pip3 install hide_code
pip3 install plotly
pip3 install ipywidgets
pip3 install jupyter_contrib_nbextensions
pip3 install jupyter_nbextensions_configurator
jupyter contrib nbextension install --user

jupyter nbextension install --user --py widgetsnbextension

jupyter nbextension enable --user --py widgetsnbextension

Then log off the fiji on the command line!

Step 2:

Start a new server on fiji-viz.colorado.edu

Step 3:

Start a new notebook

step 4: use the control panel button to log off the server

step 5: restart the server, on the home page you will have a new tab called Nbextentions, click that and turn on extensions you want

step 6: Start a new notebook with the "New" button. Name it by clicking on the name.

BTW, if you want to do the R lesson in the jupyter notebook instead the starting file is here.

#df=pandas.read_csv("https://raw.githubusercontent.com/kbroman/kbroman.github.io/master/datacarp/portal_data_joined.csv")

Off

Traditional

White

Finding the number of unique items in a column

Anonymous — Wed, 14 Mar 2018 15:37:02 +0000

Finding the number of unique items in a column Anonymous (not verified) Wed, 03/14/2018 - 09:37

Often to check the content of a tab delemited file we want to know how many unique things there are in a particular column. Below I give you instructions for checking this using command line and the python package pandas.

If you want to check for number of uniq things on the command line or in a shell script

How many unique things are in column <1> of a file named ?

# outputs a count of the unique things in a column
cut -f 1 input_file | sort | uniq | wc -l

#outputs the each of the unique things and how many of each there are
cut -f 1 input_file | sort | uniq –c

If you want to check for number of uniq things using the python package pandas

#my files first line is chr, start, stop, name, score. I want to know how many uniq chromosomes there are or how many lines have each of thechromosomes.

#First open python by typing python (if you are on fiji you must also module load python/2.7.3/pandas)

# outputs a count of the unique things in a column

import pandas
df = pandas.read_csv("allmu.bed", sep="\t")
print df["chr"].nunique()

#outputs the each of the unique things and how many of each there are

import pandas
df = pandas.read_csv("allmu.bed", sep="\t")
print df["chr"].value_counts()

import pandas

df = pandas.read_csv("allmu.bed", names=["chr", "start", "stop", "name", "score"], sep="\t")
print sort(df["chr"].value_counts())

print df["chr"].nunique()

Off

Traditional

White