RNA-Seq Analysis Pipeline

= RNA-Seq Analysis Pipeline =



The pipeline contains the following steps:

FastQC
overview sequence quality on raw sequence data

http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

FastQ Screen
Test the fastq file against any adaptor, Vectors or other contaminants

http://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/

TopHat
Generates “accepted_hits.bam” per subject (along with In/Del and Junction bed file)

http://tophat.cbcb.umd.edu/

GSNAP
Genomic Short-read Nucleotide Alignment Program

http://research-pub.gene.com/gmap/

RNA-SeQC
built on the GATK and Picard API. Calculates a series of qc metrics on the bam file

http://www.broadinstitute.org/cancer/cga/rna-seqc

HTSeq
generate count table per individual

http://www-huber.embl.de/users/anders/HTSeq/doc/overview.html

Reads “accepted_hits.bam” and the annotation data (genes.gtf) and generates a “count_table”  per subject

Paste all count tables
generate “smri-count-table-final.txt”

Differential Expression Analysis
pairwise comparison between BP, SCZ , MD , and controls

DESeq2
DE analysis based on the negative binomial distribution

http://www.bioconductor.org/packages/2.13/bioc/html/DESeq2.html

using “smri-count-table-final.txt”

edgeR
Uses empirical Bayes estimation and exact tests based on the negative binomial distribution

http://www.bioconductor.org/packages/2.13/bioc/html/edgeR.html

using “smri-count-table-final.txt”

Cufflinks workflow
http://cufflinks.cbcb.umd.edu/howitworks.html

Cufflinks
calculates transcript abundances in Fragments Per Kilobase of exon per Million fragments mapped (FPKM). It reads bam file and generate clout file (transcript FPKM counts) per subject (transcripts.gtf) along with genes and isoform fpkm tacking.

http://cufflinks.cbcb.umd.edu/manual.html#cufflinks

Cuffmerge
puts all “transcripts.gtf” together and generates a single “final transcriptome assembly” gft file  per pairwise comparison (15 BP and 15 controls for example)

http://cufflinks.cbcb.umd.edu/manual.html#cuffcompare

Cuffdiff
reads all bam files (15 BP and 15 controls for example) along with “final transcriptome assembly” gft file, uses the Cufflinks transcript quantification engine to calculate gene and transcript expression levels in more than one condition and test them for significant differences. It generate cds, gene, isoform, splicing, and tss (transcription start site) differential expression analysis results.

http://cufflinks.cbcb.umd.edu/howitworks.html#diff

CummeRbund
to exploration and visualization of Cufflinks high-throughput RNA-Seq data

http://compbio.mit.edu/cummeRbund/index.html

= Other resources =
 * http://www.nature.com/nprot/journal/v7/n3/full/nprot.2012.016.html
 * http://www.nature.com/nmeth/journal/v8/n6/full/nmeth.1613.html
 * http://seqanswers.com/wiki/How-to/RNASeq_analysis
 * http://nar.oxfordjournals.org/content/early/2011/12/22/nar.gkr1248.long
 * http://bowtie-bio.sourceforge.net/myrna/index.shtml
 * http://code.google.com/p/iqseq/
 * http://www.broadinstitute.org/cancer/software/genepattern/modules/RNA-seq
 * Framework from USC
 * https://www.msi.umn.edu/tutorial
 * http://genome.ucsc.edu/encode/protocols/dataStandards/RNA_standards_v1_2011_May.pdf
 * http://code.google.com/p/altanalyze/wiki/Tutorial_AltExpression_RNASeq
 * https://www.msi.umn.edu/support/materials.html
 * Pathway Analysis for High-Throughput Genomics Studies

< Main Page