»³Æâ¤Î¥µ¥¤¥È

RNAseq¤ÇStringTie¤ò»î¤¹¡£

¥¤¥ó¥¹¥È¡¼¥ë

git clone https://github.com/gpertea/stringtie
cd stringtie
make release

which stringtie
/usr/local/bin/stringtie

¤Ä¤¤¤Ç¤Ë

git clone https://github/gpertea/gffcomp
cd gffcomp
make
sudo ln -s ....../gffcomp/gffcomp /usr/local/bin/gffcomp  # make¤Ç¤Ïinstall¤·¤Ê¤¤

¼Â¹Ô

be sure to run HISAT2 with the --dta option for alignment, or your results will suffer.

cd ~/src/RNAseq-Saccha/Saccha
ls -lt *.bam
SRR453571.tagged.bam
 ...
SRR453566.tagged.bam

# ÆþÎÏbam¥Õ¥¡¥¤¥ë¤Ï¥½¡¼¥È¤µ¤ì¤Æ¤¤¤Ê¤±¤ì¤Ð¤Ê¤é¤Ê¤¤
samtools sort -@ 16 -o SRR453566.tagged.sorted.bam SRR453566.tagged.bam

stringtie SRR453566.tagged.sorted.bam -o SRR453566.gtf -p 16 -G s288c_e.gff -l SRR453566

stringtie SRR453566.tagged.sorted.bam -o SRR453566.gtf -p 16 -G s288c_e.gff -l SRR453566 -A SRR453566.abund.tab

¥µ¥ó¥×¥ë¤´¤È¤ÎGTF¥Õ¥¡¥¤¥ë¤ò¥Þ¡¼¥¸

stringtie --merge -p 8 -G s288c_e.gff -o SRR453566-71_merged.gtf SRR453566.gtf SRR453567.gtf SRR453568.gtf SRR453569.gtf SRR453570.gtf SRR453571.gtf

¤Ä¤¤¤Ç¤Ë

gffcompare -r s288c_e.gff -G -o SRR453566-71_diff SRR453566-71_merged.gtf

Salmon¤ò»î¤¹

Salmon provides accurate, fast, and bias-aware transcript expression estimates using dual-phase inference

Salmon provides accurate, fast, and bias-aware transcript expression estimates using dual-phase inference(PDF)

Kallisto¤ò»î¤¹

Near-optimal RNA-Seq quantification(Nature) Near-optimal RNA-Seq quantification Near-optimal RNA-Seq quantification(PDF)

Kallisto About

Kallisto - mac¤Ç¥¤¥ó¥Õ¥©¥Þ¥Æ¥£¥¯¥¹

kallisto (A. thliana, paired-end RNA-Seq) | kallisto ¤òÍѤ¤¤¿ A. thaliana paired-end ¥ê¡¼¥É¤Îž¼Ì»ºÊª¤ÎÄêÎÌ

Kallisto¤òÍѤ¤¤¿RNA-seq²òÀϥѥ¤¥×¥é¥¤¥ó – ÂçºåÂç³Ø°å³ØÉô Python²ñ <===

Index¤òºî¤ë

Kallisto¤òÍѤ¤¤¿RNA-seq²òÀϥѥ¤¥×¥é¥¤¥ó
¥ê¥Õ¥¡¥ì¥ó¥¹¤Î¥À¥¦¥ó¥í¡¼¥É
kallisto¤Ç¤Ï¡¢transcript¤Ë¥·¥å¡¼¥É¥¢¥é¥¤¥ó¥á¥ó¥È¤¹¤ë¤Î¤Ç¡¢¥ê¥Õ¥¡¥ì¥ó¥¹¤Ë¤ÏcDNA¤òÍѤ¤¤Þ¤¹¡£º£²ó¤ÏGenCodeGenes¤Î¥Ò¥Ètranscript sequences¤Î¥Ç¡¼¥¿¤òÍѤ¤¤Þ¤·¤¿¡£

$ wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_29/gencode.v29.transcripts.fa.gz

¡ä¡ä¡ä¡¡¡Ê¥ê¥Õ¥¡¥ì¥ó¥¹¤Î¡©¡ËcDNA¤¬É¬Í×

Salmon (A. thaliana, paired-end RNA-Seq)¤ËÛ©¤¯
ž¼Ì»ºÊª¡ÊcDNA¡Ë¤Î¥ê¥Õ¥¡¥ì¥ó¥¹ÇÛÎó¤¬Â¸ºß¤·¤Ê¤¤¾ì¹ç¡¢Á´¥²¥Î¥à¤ÎÇÛÎó¥Ç¡¼¥¿¡ÊFASTA¡Ë¤È¥¢¥Î¥Æ¡¼¥·¥ç¥ó¥Ç¡¼¥¿¡ÊGFF/GTF¡Ë¤¬¤¢¤ì¤Ð¡¢Cufflinks Ãæ¤Î gffread ¥³¥Þ¥ó¥É¤Çž¼Ì»ºÊª¤Î¥ê¥Õ¥¡¥ì¥ó¥¹ÇÛÎó¡ÊFASTA¡ËºîÀ®¤Ç¤­¤ë¡£
¤È¤¤¤¦¤Î¤À¤±¤ì¤É¡¢¤â¤·¤³¤ì¤¬Ã±¤ËÇÛÎó¥Ç¡¼¥¿¤«¤éÀڽФ·¤Æ¤¯¤ë¤À¤±¤Ê¤é¡¢mRNA¤¬¥¹¥×¥é¥¤¥·¥ó¥°¤ä¤é¥¨¥Ç¥£¥Æ¥£¥ó¥°¤ä¤é¤ò¼õ¤±¤ë¤Î¤Ç¡¢ËÜʪ¤Ç¤Ï¤Ê¤¤¤Ï¤º¡©¡©

gffread ¤ò»È¤Ã¤¿ transcripts fasta ž¼Ìʪ¤ÎÇÛÎó¼èÆÀ – ¥Ð¥¤¥ª¥¤¥ó¥Õ¥© Æ»¾ì [bioinfo-Dojo]¤ËÛ©¤¯
gffread ¤ò»È¤Ã¤¿Å¾¼ÌʪÇÛÎó¤ÎÀÚ¤ê½Ð¤· ¥²¥Î¥àÇÛÎó¡Êgenome.fa¡Ë¤È°ÌÃÖ¾ðÊó¡ÊGFF3 ¤Þ¤¿¤Ï GTF¥Õ¥¡¥¤¥ë¡Ë¤ò½àÈ÷¤·¤Æ¡¢gffread¤ò¼Â¹Ô¤·¤Þ¤¹¡£¥ª¥×¥·¥ç¥ó¤òË­É٤˻ý¤Á¡¢°Ê²¼¤ÎÎã¤Ç¤Ï-w¤ò»ØÄꤷ¤Æexon¤ò´ð¤Ë¤·¤¿Å¾¼Ìʪ¤ÎÀÚ¤ê½Ð¤·¡Êfasta¥Õ¥¡¥¤¥ë¤ÎÀ¸À®¡Ë¤ò¹Ô¤Ã¤Æ¤¤¤Þ¤¹¡£
¡¡¡¡# gffread GFF3¤ò»È¤Ã¤¿Å¾¼Ìʪ¡Êtranscript¡ËÇÛÎó¤ÎÀÚ¤ê½Ð¤·
¡¡¡¡$ gffread -w transcripts.fa -g genome.fa transcripts.gff3
¤È¤¤¤¦¤³¤È¤Ç¡¢¤³¤ì¤Ç¤¤¤¤¤Î¤À¤í¤¦¡£

gffread -w s288c_transcript.fa -g s288c.fna s288c_e.gff

¥¤¥ó¥Ç¥Ã¥¯¥¹¤òºî¤ë

kallisto index -i s288c.ix ../s288c_transcript.fa

¤³¤ì¤Ç¥¤¥ó¥Ç¥Ã¥¯¥¹¥Õ¥¡¥¤¥ës288c.ix¤¬¤Ç¤­¤¿¤Î¤Ç¡¢¤³¤ì¤ò»È¤Ã¤Æ½èÍý¡£

Quant

¥Ç¡¼¥¿¤Ïpaired¤Ê¤Î¤Ç¡¢Âоݥե¡¥¤¥ë¤ò£²¤Ä»ØÄê¡£single¤Î¾ì¹ç¤Ï¥ª¥×¥·¥ç¥ó¤Ë--single¤ò»ØÄꤹ¤ë¡£¤Þ¤¿single¤Î¾ì¹ç¤Ï-l¤Ç¥ê¡¼¥ÉŤò»ØÄꤹ¤ëɬÍפ¬¤¢¤ë¡£

#!/bin/bash
id=(SRR453566 SRR453567 SRR453568 SRR453569 SRR453570 SRR453571)
for item in ${id[@]}
do
  echo start mapping ${item} with Kallisto
  result_dir=${item}_exp_kallisto
  kallisto quant -i s288c.ix -o ${item} -l 101 -s 15 -b 100 ../paired_SRR453569_1.trim.fastq ../paired_SRR453569_2.trim.fastq
  #kallisto quant -i s288c.ix -o ${item} --single -l 101 -s 15 -b 100 ../${item}.fastq
done

·ë²Ì¤Î¥Õ¥¡¥¤¥ë¤Ï¡¢¥Ç¥£¥ì¥¯¥È¥êSRR453566¡ÊËô¤Ï...¡Ë¤Î²¼¤Ë¡¢abundance.h5 abundance.tsv, run_info.json¡£abundance¥Õ¥¡¥¤¥ë¤ÎÃæ¤Ëtpm¤¬Æþ¤Ã¤Æ¤¤¤ë¡£

target_id       length  eff_length      est_counts      tpm
rna0    363     263     1.08821 0.921395
rna1    228     128     0       0
rna2    1782    1682    0       0
rna3    387     287     0       0
rna4    381     281     0       0
rna5    381     281     0       0
rna6    285     185     5       6.01846
rna7    291     191     8       9.32704
rna8    3969    3869    78.6869 4.52888
rna9    1374    1274    732.109 127.966
rna10   1254    1154    589     113.657
rna11   1149    1049    967     205.276
rna12   639     539     123     50.8164
rna13   1509    1409    173     27.3415
rna14   2643    2543    396     34.6766
rna15   543     443     83      41.7217

RNA id¤À¤±¤Ê¤Î¤Ç¡¢¤½¤³¤«¤éÂбþ¤¹¤ëgene¤ò½¦¤¦É¬Íפ¢¤ë¤«¡©

kallisto (A. thliana, paired-end RNA-Seq) | kallisto ¤òÍѤ¤¤¿ A. thaliana paired-end ¥ê¡¼¥É¤Îž¼Ì»ºÊª¤ÎÄêÎÌ

Kallisto¤òÍѤ¤¤¿RNA-seq²òÀϥѥ¤¥×¥é¥¤¥ó – ÂçºåÂç³Ø°å³ØÉô Python²ñ

¤Î¤¤¤º¤ì¤â¡¢¼¡¥¹¥Æ¥Ã¥×¤ÏTximport, DESeq2/edgeR¤È½ñ¤¤¤Æ¤¤¤ë¡£

tximport | RSEM/kallisto/Salmon ¤Îȯ¸½Î̥ǡ¼¥¿¤ò edgeR/DESeq2 ¤Ê¤É¤Ë¶¶ÅϤ·¤¹¤ë R ¥Ñ¥Ã¥±¡¼¥¸¤ËÛ©¤¯
°äÅÁ»Òȯ¸½ÎÌ kallisto ¤¬½ÐÎϤ·¤¿È¯¸½Î̥ǡ¼¥¿¤Ïž¼Ì»ºÊª¥ì¥Ù¥ë¤Ç¤Îȯ¸½Î̤Ǥ¢¤ë¡£°äÅÁ»Ò¥ì¥Ù¥ë¤Ç¤Î²òÀϤò¹Ô¤¦¾ì¹ç¤Ï¡¢¤³¤ì¤éž¼Ì»ºÊª¥ì¥Ù¥ë¤Ç¤Îȯ¸½Î̤ò°äÅÁ»Ò¥ì¥Ù¥ë¤Îȯ¸½Î̤˴¹»»¤¹¤ëɬÍפ¬¤¢¤ë¡£¤³¤Î´¹»»¤Ï tximport ¥Ñ¥Ã¥±¡¼¥¸¤Î summarizeToGene ´Ø¿ô¤Ç¼Â¹Ô¤Ç¤­¤ë¡£

¼¡¤Î¤è¤¦¤Ë summarizeToGene ¤ò¼Â¹Ô¤¹¤ë¤È¤­¤Ë countsFromAbundance ¤ò»ØÄꤷ¤Æ¥«¥¦¥ó¥È¥Ç¡¼¥¿¤ò¼èÆÀ¤¹¤ëɬÍפ¬¤¢¤ë¡£countsFromAbundance ¤Ë»ØÄê¤Ç¤­¤ë¥¹¥±¡¼¥ê¥ó¥°ÊýË¡¤Ï scaledTPM ¤È lengthScaledTPM ¤Î 2 ¼ïÎब¤¢¤ë¤¬¡¢¤É¤Á¤é¤ò»ØÄꤷ¤Æ¤â¤¤¤¤¡£

RSEM/kallisto/Salmon ¤Îȯ¸½Î̥ǡ¼¥¿¤ò edgeR/DESeq2 ¤Ê¤É¤Ë¶¶ÅϤ·¤¹¤ë R ¥Ñ¥Ã¥±¡¼¥¸ tximport

RSEM/kallisto/Salmon ¤Îȯ¸½Î̥ǡ¼¥¿¤ò edgeR/DESeq2 ¤Ê¤É¤Ë¶¶ÅϤ·¤¹¤ë R ¥Ñ¥Ã¥±¡¼¥¸ tximport

salmon¤ò»È¤¦Îã

RNA-seq ȯ¸½ÊÑÆ°°äÅÁ»Ò²òÀÏ¡§salmon¤ÈTCC-GUI¤ò»È¤Ã¤Æ - Qiita


¥È¥Ã¥×   ÊÔ½¸ Åà·ë º¹Ê¬ ¥Ð¥Ã¥¯¥¢¥Ã¥× źÉÕ Ê£À½ ̾Á°Êѹ¹ ¥ê¥í¡¼¥É   ¿·µ¬ °ìÍ÷ ñ¸ì¸¡º÷ ºÇ½ª¹¹¿·   ¥Ø¥ë¥×   ºÇ½ª¹¹¿·¤ÎRSS
Last-modified: 2019-06-29 (ÅÚ) 08:52:53 (1369d)