bigdata /lab/allen/ en Differential expression using Deseq2 /lab/allen/2019/10/14/differential-expression-using-deseq2 <span>Differential expression using Deseq2</span> <span><span>Anonymous (not verified)</span></span> <span><time datetime="2019-10-14T09:45:13-06:00" title="Monday, October 14, 2019 - 09:45">Mon, 10/14/2019 - 09:45</time> </span> <div role="contentinfo" class="container ucb-article-tags" itemprop="keywords"> <span class="visually-hidden">Tags:</span> <div class="ucb-article-tag-icon" aria-hidden="true"> <i class="fa-solid fa-tags"></i> </div> <a href="/lab/allen/taxonomy/term/10" hreflang="en">bigdata</a> </div> <div class="ucb-article-content ucb-striped-content"> <div class="container"> <div class="paragraph paragraph--type--article-content paragraph--view-mode--default"> <div class="ucb-article-content-media ucb-article-content-media-above"> <div> <div class="paragraph paragraph--type--media paragraph--view-mode--default"> </div> </div> </div> <div class="ucb-article-text d-flex align-items-center" itemprop="articleBody"> <div><p>Often the goal of a RNA-seq type experiment is to find differentially expressed genes. Below I give guidelines for calling differential expression.</p> <h3>Idea behind differental expression programs</h3> <p>&nbsp;</p> <p>Imagine you do RNA-seq on 6 samples that are all biological replicated of each other. When you analyze them, you split them into two groups. Then you draw each gene as a dot on a graph where the x axis is expression level and the y axis is log fold change. What do you expext? What do you get? (for example you can use #data from <strong>SRP221750)</strong></p> <p>&nbsp;</p> <h3>Step 0) Prepare</h3> <ul> <li>If you have never used R before, you will need to learn about R. I like this set of <a href="https://datacarpentry.org/ecology-workshop/" rel="nofollow">lessons</a>.</li> <li>To work today, you need to install <a href="https://datacarpentry.org/R-ecology-lesson/#setup_instructions" rel="nofollow">Rstudio</a>. <ul> <li>Within Rstudio you will need to install the following: <ul> <li>install.packages("ggplot2")</li> <li>install.packages("tidyr")</li> <li>install.packages("BiocManager")</li> <li>BiocManager::install("DESeq2")</li> <li>BiocManager::install("vsn")</li> </ul> </li> </ul> </li> </ul> <p>&nbsp;</p> <p>&nbsp;</p> <p>PREPARE -get this data if you don't have your own</p> <ol> <li>Download the <a href="https://drive.google.com/drive/u/1/folders/1fJOyfyqoKziTg2XvVcIahrdjFDXqgzZr" rel="nofollow">data</a>: (You must log in with a colorado.edu address).</li> <li>Unpack any gz&nbsp; data: <ol> <li><span>tar -zxvf </span>archive.tar.gz</li> </ol> </li> </ol> <p>&nbsp;</p> <h3>Step 1) Count the reads over each genes.</h3> <h5>Tools for counting reads</h5> <ul> <li>I will show you: Rsubreads</li> <li>Other tools you could use: Bedtools coverage or multi-cov (comand line), htseq (python)</li> <li>Assumptions of counting programs to be aware of <ul> <li>Are low quality reads counted?</li> <li>Are multi-mapped reads counted?</li> <li>How are spliced reads handled?</li> <li>How is paired end data handled?</li> </ul> </li> </ul> <h5>Input:</h5> <ol> <li>Mapped reads file (geneally bam or cram and the index files for those files)</li> <li>Regions to count file (geneally gtf or bed)</li> </ol> <h5>Output</h5> <ol> <li>expression object (we will save as RData file)</li> </ol> <h5>Method</h5> <p>Create a R script that looks like <a href="/lab/allen/node/119/attachment" rel="nofollow">this</a>: Or run each of these commands on your command line.</p> <p></p> <ul></ul> <h3>Step 2) Calculate differential expression</h3> <p>To get the data I use in this example download the files from <a href="https://drive.google.com/drive/folders/1fJOyfyqoKziTg2XvVcIahrdjFDXqgzZr?usp=sharing" rel="nofollow">this</a> link.</p> <p>The major steps for differeatal expression are to normalize the data, determine where the differenal line will be, and call the differnetal expressed genes. How each of these steps is done varies from program to program.</p> <h5>Tools</h5> <p>I will teach you deseq2. However, I also recomend and edgeR or bayseq. bayseq is great for complicated patterns of anaysis, but not as good for cutoff anaysis.</p> <p>&nbsp;</p> <p></p> <h5>Input:</h5> <ol> <li>list of samples you want to keep (the ones that looked ok on quality control)</li> <li>Coverage$counts from the RData file in step1</li> </ol> <h5>Output</h5> <ol> <li><span><span>list of genes that are diffentailly expressed via adjusted p-value</span></span></li> <li>normalized count object inside "<span><span>DESeqDataSet"</span></span></li> <li><span><span>estimates of dispersion of the data</span></span></li> <li>basemean expression of each gene</li> </ol> <h5>Additional Methods</h5> <p>This is the<a href="/lab/allen/node/121/attachment" rel="nofollow"> link for the Deseq2 script</a> I am using.</p> <p>This is another Deseq <a href="/lab/allen/node/125/attachment" rel="nofollow">script</a> that shows:</p> <ul> <li>how you can use alternative size factors if you know the size factors might be affected by the data in some way</li> <li>how to compare multiple things at once with a function</li> </ul> <p>Design terms information:</p> <ul> <li>Imagine you have 3 biological replicates (repA, repB, repC) of RNA-seq between two people (person1 and person2). Imagine that the three replicates don't look very similar because of batch effects. Your metadata file should have one column with the replicate number and one column for the person. Your designs could be the following <ul> <li>~rep + person #this would tell you genes that go the same direction (up or down-regulated) in the three replicates</li> <li>~rep + person + rep:person #this would show results for genes that are different between person1 and person2 the&nbsp;<em>reference level</em>&nbsp; in this case repA------- person_1vs2 is just the results for replicate A <ul> <li>It would also get you two interaction terms repB.person2 repC:person2 which would tell you relative to person_1vs2&nbsp; what is happening in repB and repC</li> </ul> </li> </ul> </li> </ul> <p>&nbsp;</p> <h2>Other resources</h2> <p>http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html</p> <p>https://github.com/lpantano/DEGreport</p> <p>https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0190152</p> <p>https://rstudio-pubs-static.s3.amazonaws.com/329027_593046fb6d7a427da6b2c538caf601e1.html#the-condition-effect-for-genotype-i-the-main-effect</p> <p><a href="https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0190152" rel="nofollow"></a></p> <p>&nbsp;</p> <p>&nbsp;</p></div> </div> </div> </div> </div> <h2> <div class="paragraph paragraph--type--ucb-related-articles-block paragraph--view-mode--default"> <div>Off</div> </div> </h2> <div>Traditional</div> <div>0</div> <div>On</div> <div>White</div> Mon, 14 Oct 2019 15:45:13 +0000 Anonymous 115 at /lab/allen Graphing tools /lab/allen/2019/02/28/graphing-tools <span>Graphing tools</span> <span><span>Anonymous (not verified)</span></span> <span><time datetime="2019-02-28T12:20:40-07:00" title="Thursday, February 28, 2019 - 12:20">Thu, 02/28/2019 - 12:20</time> </span> <div role="contentinfo" class="container ucb-article-categories" itemprop="about"> <span class="visually-hidden">Categories:</span> <div class="ucb-article-category-icon" aria-hidden="true"> <i class="fa-solid fa-folder-open"></i> </div> <a href="/lab/allen/taxonomy/term/13"> data </a> </div> <div role="contentinfo" class="container ucb-article-tags" itemprop="keywords"> <span class="visually-hidden">Tags:</span> <div class="ucb-article-tag-icon" aria-hidden="true"> <i class="fa-solid fa-tags"></i> </div> <a href="/lab/allen/taxonomy/term/10" hreflang="en">bigdata</a> </div> <div class="ucb-article-content ucb-striped-content"> <div class="container"> <div class="paragraph paragraph--type--article-content paragraph--view-mode--default"> <div class="ucb-article-content-media ucb-article-content-media-above"> <div> <div class="paragraph paragraph--type--media paragraph--view-mode--default"> </div> </div> </div> <div class="ucb-article-text d-flex align-items-center" itemprop="articleBody"> <div><p>When I first got into bioinformatics one of the things I needed to learn quickly was how to graph with very large tabels of data. Below are some of my favorite websites for learning how to graph big data.</p> <p>&nbsp;</p> <p><strong>A great webstie with instructions on using R</strong></p> <p><a href="https://datacarpentry.org/R-ecology-lesson/index.html" rel="nofollow" target="_blank"><span><span>https://datacarpentry.org/R-ecology-lesson/index.html</span><span>&nbsp;(Links to an external site.)</span></span></a></p> <p><strong>Some great websties with instructions on using python for big data and graphing.</strong><a href="https://datacarpentry.org/R-ecology-lesson/index.html" rel="nofollow" target="_blank"><span><span>SomeLinks to an external site.</span></span></a></p> <p>Use this to learn the python package pandas (like excel for big data).</p> <p><a href="https://nikgrozev.com/2015/12/27/pandas-in-jupyter-quickstart-and-useful-snippets/" rel="nofollow" target="_blank"><span><span>https://nikgrozev.com/2015/12/27/pandas-in-jupyter-quickstart-and-useful-snippets/</span><span>&nbsp;(Links to an external site.)</span></span><span><span>Links to an external site.</span></span></a></p> <p>Fancier plotting can be achieved with plotly or bokah&nbsp;</p> <p><a href="https://plot.ly/pandas/" rel="nofollow" target="_blank"><span><span>https://plot.ly/pandas/</span><span>&nbsp;(Links to an external site.)</span></span><span><span>Links to an external site.</span></span></a></p> <p><a href="https://bokeh.pydata.org/en/latest/" rel="nofollow" target="_blank"><span><span>https://bokeh.pydata.org/en/latest/</span><span>&nbsp;(Links to an external site.)</span></span><span><span>Links to an external site.</span></span></a></p> <p><br> &nbsp;</p> <p><strong>Specific to those in BioFrontiers on the compute cluster Fiji:</strong></p> <p>To use Rstudio on fiji, use a web browser to go to&nbsp; fiji-viz.colorado.edu and click on Rstudio.&nbsp;</p> <p>&nbsp;</p> <p><strong>Setting up a FANCY jupyter notebook on fiji.</strong></p> <p>Step 0:</p> <p>Log into Fiji on the command line and do this:</p> <p><a href="http://bficores.colorado.edu/biofrontiers-it/cluster-computing/fiji/jupyterhub" rel="nofollow">http://bficores.colorado.edu/biofrontiers-it/cluster-computing/fiji/jupyterhub</a></p> <p>Step1:</p> <p>also run these line on the command line&nbsp;</p> <p>pip3 install hide_code<br> <span>pip3 install plotly</span><br> <span>pip3 install ipywidgets</span><br> <span>pip3 install jupyter_contrib_nbextensions</span><br> pip3 install jupyter_nbextensions_configurator<br> jupyter contrib nbextension install --user</p> <p>jupyter nbextension install --user --py widgetsnbextension</p> <p>jupyter nbextension enable --user --py widgetsnbextension</p> <p>Then log off the fiji on the command line!</p> <p>Step 2:</p> <p>Start a new server on fiji-viz.colorado.edu</p> <p>Step 3:</p> <p>Start a new notebook</p> <p>step 4: use the control panel button to log off the server</p> <p>step 5: restart the server, on the home page you will have a new tab called Nbextentions, click that and turn on extensions you want</p> <p>step 6: Start a new notebook with the "New" button. Name it by clicking on the name.&nbsp;</p> <p>BTW, if you want to do the R lesson in the jupyter notebook instead the starting file is here.</p> <p>#df=pandas.read_csv("https://raw.githubusercontent.com/kbroman/kbroman.github.io/master/datacarp/portal_data_joined.csv")</p> <p>&nbsp;</p> <p>&nbsp;</p></div> </div> </div> </div> </div> <h2> <div class="paragraph paragraph--type--ucb-related-articles-block paragraph--view-mode--default"> <div>Off</div> </div> </h2> <div>Traditional</div> <div>0</div> <div>On</div> <div>White</div> Thu, 28 Feb 2019 19:20:40 +0000 Anonymous 99 at /lab/allen Finding the number of unique items in a column /lab/allen/2018/03/14/finding-number-unique-items-column <span>Finding the number of unique items in a column</span> <span><span>Anonymous (not verified)</span></span> <span><time datetime="2018-03-14T09:37:02-06:00" title="Wednesday, March 14, 2018 - 09:37">Wed, 03/14/2018 - 09:37</time> </span> <div> <div class="imageMediaStyle focal_image_wide"> <img loading="lazy" src="/lab/allen/sites/default/files/styles/focal_image_wide/public/article-thumbnail/one-of-a-kind-clipart-12.jpg?h=edb3c8fc&amp;itok=iA3dDvZy" width="1200" height="600" alt="unique person"> </div> </div> <div role="contentinfo" class="container ucb-article-tags" itemprop="keywords"> <span class="visually-hidden">Tags:</span> <div class="ucb-article-tag-icon" aria-hidden="true"> <i class="fa-solid fa-tags"></i> </div> <a href="/lab/allen/taxonomy/term/10" hreflang="en">bigdata</a> </div> <div class="ucb-article-content ucb-striped-content"> <div class="container"> <div class="paragraph paragraph--type--article-content paragraph--view-mode--default"> <div class="ucb-article-content-media ucb-article-content-media-above"> <div> <div class="paragraph paragraph--type--media paragraph--view-mode--default"> </div> </div> </div> <div class="ucb-article-text d-flex align-items-center" itemprop="articleBody"> <div><p>Often to check the content of a tab delemited file we want to know how many unique things there are in a particular column. Below I give you instructions for checking this using command line and the python package pandas.</p><p>&nbsp;</p><h2>If you want to check for number of uniq things on the command line or in a shell script</h2><p>How many unique things are in column &lt;1&gt; of a file named ?</p><p><em># outputs a count of the unique things in a column<br>cut -f 1 input_file | sort | uniq | wc -l</em></p><p><em>#outputs the each of the unique things and how many of each there are<br>cut -f 1 input_file | sort | uniq –c</em></p><h2>If you want to check for number of uniq things using the python package pandas</h2><p>#my files first line is chr, start, stop, name, score. I want to know how many uniq chromosomes there are or how many lines have each of thechromosomes.</p><p><em>#</em>First open python by typing python (if you are on fiji you must also module load python/2.7.3/pandas)</p><p><em># outputs a count of the unique things in a column</em></p><p>import pandas<br>df = pandas.read_csv("allmu.bed", sep="\t")<br>print df["chr"].nunique()</p><p><em>#outputs the each of the unique things and how many of each there are</em></p><p>import pandas<br>df = pandas.read_csv("allmu.bed", sep="\t")<br>print df["chr"].value_counts()</p><p>&nbsp;</p><p>&nbsp;</p><p>import pandas</p><p>df = pandas.read_csv("allmu.bed", names=["chr", "start", "stop", "name", "score"], sep="\t")<br>print sort(df["chr"].value_counts())</p><p>print df["chr"].nunique()</p><p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p></div> </div> </div> </div> </div> <h2> <div class="paragraph paragraph--type--ucb-related-articles-block paragraph--view-mode--default"> <div>Off</div> </div> </h2> <div>Traditional</div> <div>0</div> <div>On</div> <div>White</div> Wed, 14 Mar 2018 15:37:02 +0000 Anonymous 56 at /lab/allen