Python¥Ð¥¤¥ª?¡¡Python¥Ð¥¤¥ª/¥Ä¡¼¥ë?
4135¡¡¡¡¡¡2019-04-20 (ÅÚ) 14:22:38
ȯ¸½Î̲òÀÏ | RNA-Seq ¤òÍøÍѤ·¤¿È¯¸½ÊÑÆ°°äÅÁ»Ò¤Î¸¡½Ð BioPapyrus¥µ¥¤¥È¤Ç¤Î¤Þ¤È¤á
source("http://bioconductor.org/biocLite.R") biocLite("ballgown")
»È¤ª¤¦¡¡¡¥Ç¡¼¥¿¤Î¥í¡¼¥É
library(ballgown)
Ê̤λñÎÁ¤Ë¤è¤ë¤È°Ê²¼¤âƳÆþ¡Ê¾ÜºṲ̀³Îǧ¡Ë
library(RSkittleBrewer) library(genefilter) library(dplyr)
ballgown¤ò»È¤¦¡£
¥Ç¡¼¥¿¹½Â¤¤¬¼¡¤Î¤è¤¦¤Ê·Á¤Ë¤Ê¤Ã¤Æ¤¤¤ë¤È¤¤¤¦Á°Äó
extdata/ sample01/ e2t.ctab e_data.ctab i2t.ctab i_data.ctab t_data.ctab sample02/ e2t.ctab e_data.ctab i2t.ctab i_data.ctab t_data.ctab ... sample20/ e2t.ctab e_data.ctab i2t.ctab i_data.ctab t_data.ctab
¤³¤³¤Ç¤Ïstringtie¤ò½èÍý¤·¤¿»þ¤Ë¡¢
$ stringtie -e -B -p 16 -G s288c_e.gff -o ballgown/SRR453566/SRR453566.gtf SRR453566.sorted.bam $ stringtie -e -B -p 16 -G s288c_e.gff -o ballgown/SRR453567/SRR453567.gtf SRR453567.sorted.bam
¤È¤·¤¿¤Î¤Ç¡¢²¼¤Î¤è¤¦¤Ë¤Ê¤Ã¤Æ¤¤¤ë¡£¥Þ¥Ë¥å¥¢¥ëAccessing assembly data»²¾È¡£
ballgown/ SRR453566/ SRR453566.gtf e2t.ctab e_data.ctab i2t.ctab i_data.ctab t_data.ctab SRR453567/ SRR453567.gtf e2t.ctab e_data.ctab i2t.ctab i_data.ctab t_data.ctab
¤³¤ì¤òballgown¤Ë¿©¤ï¤»¤ë¡£
bg = ballgown(dataDir="~/src/RNAseq-Saccha/Saccha/ballgown", samplePattern='SRR', meas='all')
dataDir¤Ï¥Ç¡¼¥¿¤ÎÃÖ¤¤¤Æ¤¢¤ë¥Ç¥£¥ì¥¯¥È¥ê¡Êballgown¡Ë¡¢samplePattern¤ÏÃæ¤Î¥Ç¥£¥ì¥¯¥È¥êÃæ¤Î¥µ¥ó¥×¥ë¤Î¶¦ÄÌÀÜÆ¬¼¡£¤³¤³¤Ç¤ÏSRRxxxxx¤Ê¤Î¤ÇSRR¤Ë¤·¤¿¡£meas¤ÏÉÔÌÀ¡£
¤³¤Î½èÍý¤¬½ª¤ï¤ë¤È¡¢bg¤¬»È¤¨¤ë¤è¤¦¤Ë¤Ê¤ë¡£bgÃæ¤Îstructure¤Ë¤Ï¡¢Exon, intron, and transcript structures¤¬¤¢¤ë¡£¤½¤ì¤¾¤ì¤ò¼è½Ð¤·¤Æ¤ß¤ë¤Ë¤Ï¡¢
structure(bg)$exon GRanges object with 6801 ranges and 2 metadata columns: seqnames ranges strand | id transcripts <Rle> <IRanges> <Rle> | <integer> <character> [1] NC_001133.9 1807-2169 - | 1 1 [2] NC_001133.9 2480-2707 + | 2 2 [3] NC_001133.9 7235-9016 - | 3 3 [4] NC_001133.9 11565-11951 - | 4 4 [5] NC_001133.9 12046-12426 + | 5 5 ... ... ... ... . ... ... [6797] NC_001224.1 78089-78162 - | 6797 6441 [6798] NC_001224.1 78533-78608 + | 6798 6442 [6799] NC_001224.1 79213-80022 + | 6799 6443 [6800] NC_001224.1 85035-85112 + | 6800 6444 [6801] NC_001224.1 85295-85777 + | 6801 6445 ------- seqinfo: 17 sequences from an unspecified genome; no seqlengths
structure(bg)$trans GRangesList object of length 6445: $1 GRanges object with 1 range and 2 metadata columns: seqnames ranges strand | id transcripts <Rle> <IRanges> <Rle> | <integer> <character> [1] NC_001133.9 1807-2169 - | 1 1 $2 GRanges object with 1 range and 2 metadata columns: seqnames ranges strand | id transcripts [1] NC_001133.9 2480-2707 + | 2 2 $3 GRanges object with 1 range and 2 metadata columns: seqnames ranges strand | id transcripts [1] NC_001133.9 7235-9016 - | 3 3 ... <6442 more elements> ------- seqinfo: 17 sequences from an unspecified genome; no seqlengths
¤Ê¤É¡£
¼¡¤Ë¡¢expr¥¹¥í¥Ã¥È¤ò¼è½Ð¤¹¡£t/e/i/g¤ËÂФ·¤Æ¡¢texpr, eexpr, iexpr, gexpr¤¬Âбþ¤·¡¢ ¤½¤ì¤¾¤ì¤Ë¼è¤ê½Ð¤·¤¿¤¤¤â¤Î¤òtexpr(bg, 'FPKM')¤Î¤è¤¦¤Ë»ØÄꤹ¤ë¡£
¶ñÂÎŪ¤Ë¤Ï¡¢¼¡¤Î¤è¤¦¤Ê¤â¤Î¤¬¼è¤ê½Ð¤»¤ë¡£
transcript_fpkm = texpr(bg, 'FPKM') transcript_cov = texpr(bg, 'cov') whole_tx_table = texpr(bg, 'all') exon_mcov = eexpr(bg, 'mcov') junction_rcount = iexpr(bg) whole_intron_table = iexpr(bg, 'all') gene_expression = gexpr(bg)
¤¿¤È¤¨¤Ð¡¢transcript_fpkm¤Ï
>transcript_fpkm FPKM.SRR453566 FPKM.SRR453567 1 0.251292 1.106585 2 0.000000 0.000000 3 0.000000 0.000000 4 0.071457 0.042556 5 1.218477 2.062575 6 0.000000 0.000000 7 0.000000 0.000000 8 0.000000 0.000000 9 1.410877 1.360651 10 9.473732 8.717489 11 18.461433 12.996510 12 197.994659 219.502182 13 95.170197 101.183212 14 35.876900 39.537365 15 23.731741 23.713844 ... [ reached getOption("max.print") -- 5945 ¹Ô¤ò̵»ë¤·¤Þ¤·¤¿ ]
¤È¤Ê¤ê¡¢gene_expression¤Ï
> gene_expression FPKM.SRR453566 FPKM.SRR453567 gene_0001 0.251292 1.106585 gene_0002 0.000000 0.000000 gene_0003 0.000000 0.000000 gene_0004 0.071457 0.042556 gene_0005 1.218477 2.062575 gene_0006 0.000000 0.000000 gene_0007 0.000000 0.000000 gene_0008 0.000000 0.000000 gene_0009 1.410877 1.360651 gene_0010 9.473732 8.717489 gene_0011 18.461433 12.996510 gene_0012 197.994659 219.502182 gene_0013 95.170197 101.183212 gene_0014 35.876900 39.537365 gene_0015 23.731741 23.713844 ... [ reached getOption("max.print") -- 5945 ¹Ô¤ò̵»ë¤·¤Þ¤·¤¿ ]
¤Î¤è¤¦¤Ê·ë²Ì¤¬ÆÀ¤é¤ì¤ë¡£
index¥¹¥í¥Ã¥È¤Ï¡¢¤â¤¦¾¯¤·ÊÙ¶¯É¬Íס£indexes(bg)¤ËÂФ·¤Æ¡¢indexes(bg)$e2t, indexes(bg)$i2t, indexes(bg)$t2g¤Ê¤É¤¬²Äǽ¡£¡Ê¥Æ¡¼¥Ö¥ë¤ò¸«¤Æ¤¤¤ë¤À¤±¡©¡Ë
> indexes(bg)$e2t e_id t_id 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9 9 9 10 10 10 11 11 11 12 12 12 13 13 13 14 14 14 15 15 15
> indexes(bg)$i2t i_id t_id 1 1 41 2 2 66 3 3 68 4 4 71 5 5 86 6 6 104 7 7 117 8 8 124 9 9 129 10 10 154 11 11 155 12 12 163 13 13 173 14 14 188 15 15 189
> indexes(bg)$t2g t_id g_id 1 1 gene_0001 2 2 gene_0002 3 3 gene_0003 4 4 gene_0004 5 5 gene_0005 6 6 gene_0006 7 7 gene_0007 8 8 gene_0008 9 9 gene_0009 10 10 gene_0010 11 11 gene_0011 12 12 gene_0012 13 13 gene_0013 14 14 gene_0014 15 15 gene_0015
¤¢¤È¡¢¥Þ¥Ë¥å¥¢¥ë¤Ë¤è¤ë¤È¡¢phenotype¾ðÊó¤ò¸«¤ë¥³¥ó¥Ý¡¼¥Í¥ó¥ÈpData¤¬¤¢¤ë¡£ pData¤Î¾ðÊó¤Ï¼ê¤Çºî¤ëɬÍפ¬¤¢¤ê¡¢¤¤¤í¤¤¤í¤È¡Êµ½Ò½çÈ֤Ȥ«¤Î¡ËÀßÄêÀ©¸Â¤¬¤¢¤ë¤é¤·¤¤¡£
ÉÁ²è¤Ë¤Ä¤¤¤Æ¤Ï¡¢¥Þ¥Ë¥å¥¢¥ë Plotting transcript structures¤Ë¤è¤ë¤È¡¢¼¡¤ÎÄ̤ꡣ
plotTranscripts¤ò»È¤¦¤È¡¢ÉÁ²è¤µ¤ì¤ë¡£
plotTranscripts(gene='XLOC_000454', gown=bg, samples='sample12', meas='FPKM', colorby='transcript', main='transcripts from gene XLOC_000454: sample 12, FPKM')