Python¥Ð¥¤¥ª?¡¡Python¥Ð¥¤¥ª/¥Ä¡¼¥ë?
4135¡¡¡¡¡¡2019-04-20 (ÅÚ) 14:22:38

ȯ¸½²òÀÏ

ȯ¸½Î̲òÀÏ | RNA-Seq ¤òÍøÍѤ·¤¿È¯¸½ÊÑÆ°°äÅÁ»Ò¤Î¸¡½Ð BioPapyrus¥µ¥¤¥È¤Ç¤Î¤Þ¤È¤á

Àµµ¬²½¤ÎÀâÌÀ




Ballgown

R¾å¤Ç¤Î¥¤¥ó¥¹¥È¡¼¥ë

source("http://bioconductor.org/biocLite.R")
biocLite("ballgown")

»È¤ª¤¦­¡¡¡¥Ç¡¼¥¿¤Î¥í¡¼¥É

library(ballgown)

Ê̤λñÎÁ¤Ë¤è¤ë¤È°Ê²¼¤âƳÆþ¡Ê¾ÜºṲ̀³Îǧ¡Ë

library(RSkittleBrewer)
library(genefilter)
library(dplyr)

ballgown¤ò»È¤¦¡£

¥Ç¡¼¥¿¹½Â¤¤¬¼¡¤Î¤è¤¦¤Ê·Á¤Ë¤Ê¤Ã¤Æ¤¤¤ë¤È¤¤¤¦Á°Äó

extdata/
   sample01/
       e2t.ctab
       e_data.ctab
       i2t.ctab
       i_data.ctab
       t_data.ctab
   sample02/
       e2t.ctab
       e_data.ctab
       i2t.ctab
       i_data.ctab
       t_data.ctab
   ...
   sample20/
       e2t.ctab
       e_data.ctab
       i2t.ctab
       i_data.ctab
       t_data.ctab

¤³¤³¤Ç¤Ïstringtie¤ò½èÍý¤·¤¿»þ¤Ë¡¢

$ stringtie -e -B -p 16 -G s288c_e.gff -o ballgown/SRR453566/SRR453566.gtf SRR453566.sorted.bam
$ stringtie -e -B -p 16 -G s288c_e.gff -o ballgown/SRR453567/SRR453567.gtf SRR453567.sorted.bam

¤È¤·¤¿¤Î¤Ç¡¢²¼¤Î¤è¤¦¤Ë¤Ê¤Ã¤Æ¤¤¤ë¡£¥Þ¥Ë¥å¥¢¥ëAccessing assembly data»²¾È¡£

ballgown/
   SRR453566/
       SRR453566.gtf
       e2t.ctab
       e_data.ctab
       i2t.ctab
       i_data.ctab
       t_data.ctab
   SRR453567/
       SRR453567.gtf
       e2t.ctab
       e_data.ctab
       i2t.ctab
       i_data.ctab
       t_data.ctab

¤³¤ì¤òballgown¤Ë¿©¤ï¤»¤ë¡£

bg = ballgown(dataDir="~/src/RNAseq-Saccha/Saccha/ballgown", samplePattern='SRR', meas='all')

dataDir¤Ï¥Ç¡¼¥¿¤ÎÃÖ¤¤¤Æ¤¢¤ë¥Ç¥£¥ì¥¯¥È¥ê¡Êballgown¡Ë¡¢samplePattern¤ÏÃæ¤Î¥Ç¥£¥ì¥¯¥È¥êÃæ¤Î¥µ¥ó¥×¥ë¤Î¶¦ÄÌÀÜÆ¬¼­¡£¤³¤³¤Ç¤ÏSRRxxxxx¤Ê¤Î¤ÇSRR¤Ë¤·¤¿¡£meas¤ÏÉÔÌÀ¡£

¤³¤Î½èÍý¤¬½ª¤ï¤ë¤È¡¢bg¤¬»È¤¨¤ë¤è¤¦¤Ë¤Ê¤ë¡£bgÃæ¤Îstructure¤Ë¤Ï¡¢Exon, intron, and transcript structures¤¬¤¢¤ë¡£¤½¤ì¤¾¤ì¤ò¼è½Ð¤·¤Æ¤ß¤ë¤Ë¤Ï¡¢

structure(bg)$exon
GRanges object with 6801 ranges and 2 metadata columns:
            seqnames      ranges strand |        id transcripts
               <Rle>   <IRanges>  <Rle> | <integer> <character>
     [1] NC_001133.9   1807-2169      - |         1           1
     [2] NC_001133.9   2480-2707      + |         2           2
     [3] NC_001133.9   7235-9016      - |         3           3
     [4] NC_001133.9 11565-11951      - |         4           4
     [5] NC_001133.9 12046-12426      + |         5           5
     ...         ...         ...    ... .       ...         ...
  [6797] NC_001224.1 78089-78162      - |      6797        6441
  [6798] NC_001224.1 78533-78608      + |      6798        6442
  [6799] NC_001224.1 79213-80022      + |      6799        6443
  [6800] NC_001224.1 85035-85112      + |      6800        6444
  [6801] NC_001224.1 85295-85777      + |      6801        6445
  -------
  seqinfo: 17 sequences from an unspecified genome; no seqlengths
structure(bg)$trans
GRangesList object of length 6445:
$1 
GRanges object with 1 range and 2 metadata columns:
         seqnames    ranges strand |        id transcripts
            <Rle> <IRanges>  <Rle> | <integer> <character>
  [1] NC_001133.9 1807-2169      - |         1           1

$2 
GRanges object with 1 range and 2 metadata columns:
         seqnames    ranges strand | id transcripts
  [1] NC_001133.9 2480-2707      + |  2           2

$3 
GRanges object with 1 range and 2 metadata columns:
         seqnames    ranges strand | id transcripts
  [1] NC_001133.9 7235-9016      - |  3           3

...
<6442 more elements>
-------
seqinfo: 17 sequences from an unspecified genome; no seqlengths

¤Ê¤É¡£

¼¡¤Ë¡¢expr¥¹¥í¥Ã¥È¤ò¼è½Ð¤¹¡£t/e/i/g¤ËÂФ·¤Æ¡¢texpr, eexpr, iexpr, gexpr¤¬Âбþ¤·¡¢ ¤½¤ì¤¾¤ì¤Ë¼è¤ê½Ð¤·¤¿¤¤¤â¤Î¤òtexpr(bg, 'FPKM')¤Î¤è¤¦¤Ë»ØÄꤹ¤ë¡£

¶ñÂÎŪ¤Ë¤Ï¡¢¼¡¤Î¤è¤¦¤Ê¤â¤Î¤¬¼è¤ê½Ð¤»¤ë¡£

transcript_fpkm = texpr(bg, 'FPKM')
transcript_cov = texpr(bg, 'cov')
whole_tx_table = texpr(bg, 'all')
exon_mcov = eexpr(bg, 'mcov')
junction_rcount = iexpr(bg)
whole_intron_table = iexpr(bg, 'all')
gene_expression = gexpr(bg)

¤¿¤È¤¨¤Ð¡¢transcript_fpkm¤Ï

>transcript_fpkm
     FPKM.SRR453566 FPKM.SRR453567
1          0.251292       1.106585
2          0.000000       0.000000
3          0.000000       0.000000
4          0.071457       0.042556
5          1.218477       2.062575
6          0.000000       0.000000
7          0.000000       0.000000
8          0.000000       0.000000
9          1.410877       1.360651
10         9.473732       8.717489
11        18.461433      12.996510
12       197.994659     219.502182
13        95.170197     101.183212
14        35.876900      39.537365
15        23.731741      23.713844
...
 [ reached getOption("max.print") -- 5945 ¹Ô¤ò̵»ë¤·¤Þ¤·¤¿ ] 

¤È¤Ê¤ê¡¢gene_expression¤Ï

> gene_expression
          FPKM.SRR453566 FPKM.SRR453567
gene_0001       0.251292       1.106585
gene_0002       0.000000       0.000000
gene_0003       0.000000       0.000000
gene_0004       0.071457       0.042556
gene_0005       1.218477       2.062575
gene_0006       0.000000       0.000000
gene_0007       0.000000       0.000000
gene_0008       0.000000       0.000000
gene_0009       1.410877       1.360651
gene_0010       9.473732       8.717489
gene_0011      18.461433      12.996510
gene_0012     197.994659     219.502182
gene_0013      95.170197     101.183212
gene_0014      35.876900      39.537365
gene_0015      23.731741      23.713844
...
 [ reached getOption("max.print") -- 5945 ¹Ô¤ò̵»ë¤·¤Þ¤·¤¿ ] 

¤Î¤è¤¦¤Ê·ë²Ì¤¬ÆÀ¤é¤ì¤ë¡£

index¥¹¥í¥Ã¥È¤Ï¡¢¤â¤¦¾¯¤·ÊÙ¶¯É¬Íס£indexes(bg)¤ËÂФ·¤Æ¡¢indexes(bg)$e2t, indexes(bg)$i2t, indexes(bg)$t2g¤Ê¤É¤¬²Äǽ¡£¡Ê¥Æ¡¼¥Ö¥ë¤ò¸«¤Æ¤¤¤ë¤À¤±¡©¡Ë

> indexes(bg)$e2t
     e_id t_id
1       1    1
2       2    2
3       3    3
4       4    4
5       5    5
6       6    6
7       7    7
8       8    8
9       9    9
10     10   10
11     11   11
12     12   12
13     13   13
14     14   14
15     15   15
> indexes(bg)$i2t
    i_id t_id
1      1   41
2      2   66
3      3   68
4      4   71
5      5   86
6      6  104
7      7  117
8      8  124
9      9  129
10    10  154
11    11  155
12    12  163
13    13  173
14    14  188
15    15  189
> indexes(bg)$t2g
     t_id      g_id
1       1 gene_0001
2       2 gene_0002
3       3 gene_0003
4       4 gene_0004
5       5 gene_0005
6       6 gene_0006
7       7 gene_0007
8       8 gene_0008
9       9 gene_0009
10     10 gene_0010
11     11 gene_0011
12     12 gene_0012
13     13 gene_0013
14     14 gene_0014
15     15 gene_0015

¤¢¤È¡¢¥Þ¥Ë¥å¥¢¥ë¤Ë¤è¤ë¤È¡¢phenotype¾ðÊó¤ò¸«¤ë¥³¥ó¥Ý¡¼¥Í¥ó¥ÈpData¤¬¤¢¤ë¡£ pData¤Î¾ðÊó¤Ï¼ê¤Çºî¤ëɬÍפ¬¤¢¤ê¡¢¤¤¤í¤¤¤í¤È¡Êµ­½Ò½çÈ֤Ȥ«¤Î¡ËÀßÄêÀ©¸Â¤¬¤¢¤ë¤é¤·¤¤¡£

ÉÁ²è¤Ë¤Ä¤¤¤Æ¤Ï¡¢¥Þ¥Ë¥å¥¢¥ë Plotting transcript structures¤Ë¤è¤ë¤È¡¢¼¡¤ÎÄ̤ꡣ

plotTranscripts¤ò»È¤¦¤È¡¢ÉÁ²è¤µ¤ì¤ë¡£

plotTranscripts(gene='XLOC_000454', gown=bg, samples='sample12', 
   meas='FPKM', colorby='transcript', 
   main='transcripts from gene XLOC_000454: sample 12, FPKM')

¥È¥Ã¥×   ÊÔ½¸ Åà·ë º¹Ê¬ ¥Ð¥Ã¥¯¥¢¥Ã¥× źÉÕ Ê£À½ ̾Á°Êѹ¹ ¥ê¥í¡¼¥É   ¿·µ¬ °ìÍ÷ ñ¸ì¸¡º÷ ºÇ½ª¹¹¿·   ¥Ø¥ë¥×   ºÇ½ª¹¹¿·¤ÎRSS
Last-modified: 2019-04-20 (ÅÚ) 14:22:38 (1378d)