![]() |
Python¥Ð¥¤¥ª/¥Ä¡¼¥ë/RNAseq-StringTiehttps://pepper.is.sci.toho-u.ac.jp:443/pepper/index.php?Python%A5%D0%A5%A4%A5%AA%2F%A5%C4%A1%BC%A5%EB%2FRNAseq-StringTie |
![]() |
git clone https://github.com/gpertea/stringtie cd stringtie make release which stringtie /usr/local/bin/stringtie
¤Ä¤¤¤Ç¤Ë
git clone https://github/gpertea/gffcomp cd gffcomp make sudo ln -s ....../gffcomp/gffcomp /usr/local/bin/gffcomp # make¤Ç¤Ïinstall¤·¤Ê¤¤
be sure to run HISAT2 with the --dta option for alignment, or your results will suffer.
cd ~/src/RNAseq-Saccha/Saccha ls -lt *.bam SRR453571.tagged.bam ... SRR453566.tagged.bam # ÆþÎÏbam¥Õ¥¡¥¤¥ë¤Ï¥½¡¼¥È¤µ¤ì¤Æ¤¤¤Ê¤±¤ì¤Ð¤Ê¤é¤Ê¤¤ samtools sort -@ 16 -o SRR453566.tagged.sorted.bam SRR453566.tagged.bam stringtie SRR453566.tagged.sorted.bam -o SRR453566.gtf -p 16 -G s288c_e.gff -l SRR453566 stringtie SRR453566.tagged.sorted.bam -o SRR453566.gtf -p 16 -G s288c_e.gff -l SRR453566 -A SRR453566.abund.tab
stringtie --merge -p 8 -G s288c_e.gff -o SRR453566-71_merged.gtf SRR453566.gtf SRR453567.gtf SRR453568.gtf SRR453569.gtf SRR453570.gtf SRR453571.gtf
¤Ä¤¤¤Ç¤Ë
gffcompare -r s288c_e.gff -G -o SRR453566-71_diff SRR453566-71_merged.gtf
Near-optimal RNA-Seq quantification(Nature) Near-optimal RNA-Seq quantification Near-optimal RNA-Seq quantification(PDF)
Kallisto - mac¤Ç¥¤¥ó¥Õ¥©¥Þ¥Æ¥£¥¯¥¹
Kallisto¤òÍѤ¤¤¿RNA-seq²òÀϥѥ¤¥×¥é¥¤¥ó – ÂçºåÂç³Ø°å³ØÉô Python²ñ <===
Kallisto¤òÍѤ¤¤¿RNA-seq²òÀϥѥ¤¥×¥é¥¤¥ó
¥ê¥Õ¥¡¥ì¥ó¥¹¤Î¥À¥¦¥ó¥í¡¼¥É
kallisto¤Ç¤Ï¡¢transcript¤Ë¥·¥å¡¼¥É¥¢¥é¥¤¥ó¥á¥ó¥È¤¹¤ë¤Î¤Ç¡¢¥ê¥Õ¥¡¥ì¥ó¥¹¤Ë¤ÏcDNA¤òÍѤ¤¤Þ¤¹¡£º£²ó¤ÏGenCodeGenes¤Î¥Ò¥Ètranscript sequences¤Î¥Ç¡¼¥¿¤òÍѤ¤¤Þ¤·¤¿¡£
$ wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_29/gencode.v29.transcripts.fa.gz
¡ä¡ä¡ä¡¡¡Ê¥ê¥Õ¥¡¥ì¥ó¥¹¤Î¡©¡ËcDNA¤¬É¬Í×
Salmon (A. thaliana, paired-end RNA-Seq)¤ËÛ©¤¯
ž¼Ì»ºÊª¡ÊcDNA¡Ë¤Î¥ê¥Õ¥¡¥ì¥ó¥¹ÇÛÎó¤¬Â¸ºß¤·¤Ê¤¤¾ì¹ç¡¢Á´¥²¥Î¥à¤ÎÇÛÎó¥Ç¡¼¥¿¡ÊFASTA¡Ë¤È¥¢¥Î¥Æ¡¼¥·¥ç¥ó¥Ç¡¼¥¿¡ÊGFF/GTF¡Ë¤¬¤¢¤ì¤Ð¡¢Cufflinks Ãæ¤Î gffread ¥³¥Þ¥ó¥É¤Çž¼Ì»ºÊª¤Î¥ê¥Õ¥¡¥ì¥ó¥¹ÇÛÎó¡ÊFASTA¡ËºîÀ®¤Ç¤¤ë¡£
¤È¤¤¤¦¤Î¤À¤±¤ì¤É¡¢¤â¤·¤³¤ì¤¬Ã±¤ËÇÛÎó¥Ç¡¼¥¿¤«¤éÀڽФ·¤Æ¤¯¤ë¤À¤±¤Ê¤é¡¢mRNA¤¬¥¹¥×¥é¥¤¥·¥ó¥°¤ä¤é¥¨¥Ç¥£¥Æ¥£¥ó¥°¤ä¤é¤ò¼õ¤±¤ë¤Î¤Ç¡¢ËÜʪ¤Ç¤Ï¤Ê¤¤¤Ï¤º¡©¡©
gffread ¤ò»È¤Ã¤¿ transcripts fasta ž¼Ìʪ¤ÎÇÛÎó¼èÆÀ – ¥Ð¥¤¥ª¥¤¥ó¥Õ¥© Æ»¾ì [bioinfo-Dojo]¤ËÛ©¤¯
gffread ¤ò»È¤Ã¤¿Å¾¼ÌʪÇÛÎó¤ÎÀÚ¤ê½Ð¤·
¥²¥Î¥àÇÛÎó¡Êgenome.fa¡Ë¤È°ÌÃÖ¾ðÊó¡ÊGFF3 ¤Þ¤¿¤Ï GTF¥Õ¥¡¥¤¥ë¡Ë¤ò½àÈ÷¤·¤Æ¡¢gffread¤ò¼Â¹Ô¤·¤Þ¤¹¡£¥ª¥×¥·¥ç¥ó¤òËÉ٤˻ý¤Á¡¢°Ê²¼¤ÎÎã¤Ç¤Ï-w¤ò»ØÄꤷ¤Æexon¤ò´ð¤Ë¤·¤¿Å¾¼Ìʪ¤ÎÀÚ¤ê½Ð¤·¡Êfasta¥Õ¥¡¥¤¥ë¤ÎÀ¸À®¡Ë¤ò¹Ô¤Ã¤Æ¤¤¤Þ¤¹¡£
¡¡¡¡# gffread GFF3¤ò»È¤Ã¤¿Å¾¼Ìʪ¡Êtranscript¡ËÇÛÎó¤ÎÀÚ¤ê½Ð¤·
¡¡¡¡$ gffread -w transcripts.fa -g genome.fa transcripts.gff3
¤È¤¤¤¦¤³¤È¤Ç¡¢¤³¤ì¤Ç¤¤¤¤¤Î¤À¤í¤¦¡£
gffread -w s288c_transcript.fa -g s288c.fna s288c_e.gff
¥¤¥ó¥Ç¥Ã¥¯¥¹¤òºî¤ë
kallisto index -i s288c.ix ../s288c_transcript.fa
¤³¤ì¤Ç¥¤¥ó¥Ç¥Ã¥¯¥¹¥Õ¥¡¥¤¥ës288c.ix¤¬¤Ç¤¤¿¤Î¤Ç¡¢¤³¤ì¤ò»È¤Ã¤Æ½èÍý¡£
¥Ç¡¼¥¿¤Ïpaired¤Ê¤Î¤Ç¡¢Âоݥե¡¥¤¥ë¤ò£²¤Ä»ØÄê¡£single¤Î¾ì¹ç¤Ï¥ª¥×¥·¥ç¥ó¤Ë--single¤ò»ØÄꤹ¤ë¡£¤Þ¤¿single¤Î¾ì¹ç¤Ï-l¤Ç¥ê¡¼¥ÉŤò»ØÄꤹ¤ëɬÍפ¬¤¢¤ë¡£
#!/bin/bash id=(SRR453566 SRR453567 SRR453568 SRR453569 SRR453570 SRR453571) for item in ${id[@]} do echo start mapping ${item} with Kallisto result_dir=${item}_exp_kallisto kallisto quant -i s288c.ix -o ${item} -l 101 -s 15 -b 100 ../paired_SRR453569_1.trim.fastq ../paired_SRR453569_2.trim.fastq #kallisto quant -i s288c.ix -o ${item} --single -l 101 -s 15 -b 100 ../${item}.fastq done
·ë²Ì¤Î¥Õ¥¡¥¤¥ë¤Ï¡¢¥Ç¥£¥ì¥¯¥È¥êSRR453566¡ÊËô¤Ï...¡Ë¤Î²¼¤Ë¡¢abundance.h5 abundance.tsv, run_info.json¡£abundance¥Õ¥¡¥¤¥ë¤ÎÃæ¤Ëtpm¤¬Æþ¤Ã¤Æ¤¤¤ë¡£
target_id length eff_length est_counts tpm rna0 363 263 1.08821 0.921395 rna1 228 128 0 0 rna2 1782 1682 0 0 rna3 387 287 0 0 rna4 381 281 0 0 rna5 381 281 0 0 rna6 285 185 5 6.01846 rna7 291 191 8 9.32704 rna8 3969 3869 78.6869 4.52888 rna9 1374 1274 732.109 127.966 rna10 1254 1154 589 113.657 rna11 1149 1049 967 205.276 rna12 639 539 123 50.8164 rna13 1509 1409 173 27.3415 rna14 2643 2543 396 34.6766 rna15 543 443 83 41.7217
RNA id¤À¤±¤Ê¤Î¤Ç¡¢¤½¤³¤«¤éÂбþ¤¹¤ëgene¤ò½¦¤¦É¬Íפ¢¤ë¤«¡©
Kallisto¤òÍѤ¤¤¿RNA-seq²òÀϥѥ¤¥×¥é¥¤¥ó – ÂçºåÂç³Ø°å³ØÉô Python²ñ
¤Î¤¤¤º¤ì¤â¡¢¼¡¥¹¥Æ¥Ã¥×¤ÏTximport, DESeq2/edgeR¤È½ñ¤¤¤Æ¤¤¤ë¡£
tximport | RSEM/kallisto/Salmon ¤Îȯ¸½Î̥ǡ¼¥¿¤ò edgeR/DESeq2 ¤Ê¤É¤Ë¶¶ÅϤ·¤¹¤ë R ¥Ñ¥Ã¥±¡¼¥¸¤ËÛ©¤¯
°äÅÁ»Òȯ¸½ÎÌ
kallisto ¤¬½ÐÎϤ·¤¿È¯¸½Î̥ǡ¼¥¿¤Ïž¼Ì»ºÊª¥ì¥Ù¥ë¤Ç¤Îȯ¸½Î̤Ǥ¢¤ë¡£°äÅÁ»Ò¥ì¥Ù¥ë¤Ç¤Î²òÀϤò¹Ô¤¦¾ì¹ç¤Ï¡¢¤³¤ì¤éž¼Ì»ºÊª¥ì¥Ù¥ë¤Ç¤Îȯ¸½Î̤ò°äÅÁ»Ò¥ì¥Ù¥ë¤Îȯ¸½Î̤˴¹»»¤¹¤ëɬÍפ¬¤¢¤ë¡£¤³¤Î´¹»»¤Ï tximport ¥Ñ¥Ã¥±¡¼¥¸¤Î summarizeToGene ´Ø¿ô¤Ç¼Â¹Ô¤Ç¤¤ë¡£
¼¡¤Î¤è¤¦¤Ë summarizeToGene ¤ò¼Â¹Ô¤¹¤ë¤È¤¤Ë countsFromAbundance ¤ò»ØÄꤷ¤Æ¥«¥¦¥ó¥È¥Ç¡¼¥¿¤ò¼èÆÀ¤¹¤ëɬÍפ¬¤¢¤ë¡£countsFromAbundance ¤Ë»ØÄê¤Ç¤¤ë¥¹¥±¡¼¥ê¥ó¥°ÊýË¡¤Ï scaledTPM ¤È lengthScaledTPM ¤Î 2 ¼ïÎब¤¢¤ë¤¬¡¢¤É¤Á¤é¤ò»ØÄꤷ¤Æ¤â¤¤¤¤¡£
RSEM/kallisto/Salmon ¤Îȯ¸½Î̥ǡ¼¥¿¤ò edgeR/DESeq2 ¤Ê¤É¤Ë¶¶ÅϤ·¤¹¤ë R ¥Ñ¥Ã¥±¡¼¥¸ tximport