[[Python¥Ð¥¤¥ª]]¡¡[[Python¥Ð¥¤¥ª/¥Ä¡¼¥ë]]~
&counter();¡¡¡¡¡¡&lastmod();~

*ȯ¸½Î̤βòÀÏ [#q2eb5d88]
*ȯ¸½²òÀÏ [#q2eb5d88]

[[ȯ¸½Î̲òÀÏ | RNA-Seq ¤òÍøÍѤ·¤¿È¯¸½ÊÑÆ°°äÅÁ»Ò¤Î¸¡½Ð:https://bi.biopapyrus.jp/rnaseq/analysis/]] BioPapyrus¥µ¥¤¥È¤Ç¤Î¤Þ¤È¤á

**Àµµ¬²½¤ÎÀâÌÀ [#qae69fd7]
-[[RNA-Seq | °äÅÁ»Òȯ¸½Î̲òÀÏ:https://bi.biopapyrus.jp/rnaseq/]] ¢Í [[FPKM / RPKM | RNA-Seq ¥ê¡¼¥É¥«¥¦¥ó¥È¥Ç¡¼¥¿¤Ëž¼Ìʪ¤ÎŤµ¤Ê¤É¤òÊäÀµ¤·¤¿È¯¸½ÎÌ:https://bi.biopapyrus.jp/rnaseq/analysis/normalizaiton/fpkm.html]]¡¡¤ÎÃæ¤ËR¤òÍøÍѤ·¤Æ FPKM ¤ò·×»»¤¹¤ëÊýË¡
-[[¼¡À¤Â奷¡¼¥±¥ó¥µ¡¼¤Ç¤Î°äÅÁ»Òȯ¸½Î̲òÀÏ | PictBio:https://www.pictbio.com/tips/2554.html]]

-[[µ¡Ç½¥²¥Î¥à³Ø¡ÊÂè6²ó¡Ë:http://www.iu.a.u-tokyo.ac.jp/~kadota/20110929_kadota.pdf]]¡¡ÌçÅÄÀèÀ¸¤Ë¤è¤ë³Æ¼êË¡¤ÎÈæ³Ó¼Â¸³(2011/09/29)~
RPM(Reads per million mapped reads) ¢Í RPKM(Reads per kilobase of exon per million mapped reads)¡¡/¡¡TMMÀµµ¬²½Ë¡¡¡/¡¡ÌçÅÄË¡

-[[µ¡Ç½¥²¥Î¥à³Ø¡ÊÂè6²ó¡Ë:https://jp.illumina.com/content/dam/illumina-marketing/apac/japan/documents/pdf/2011_illumina_rna-seq_session3.pdf]](2011/11/17)

-[[Question: What is the reason why we usually use normalized values from RNA-Seq (FPKM, RPKM, etc.) ?:https://www.biostars.org/p/270537/]] (BioStars)~
R(F)PKM/TPM values are used to normalize read counts by library size (total number of reads you have in a given RNAseq experiment) and the length of the feature (gene/transcript). But remember that commonly used software for differential expression analysis (DESEQ2/EdgeR) are using raw counts instead of normalized values (they do their internal normalization steps).

-[[¥Þ¥¤¥¯¥í¥¢¥ì¥¤¤è¤ê¾¯¤·Ê£»¨¤Ê¡¢RNA-Seq¤Î¥Ç¡¼¥¿²òÀϼê½ç | Subio:https://www.subioplatform.com/ja/info_technical/293/an-rna-seq-data-analysis-procedure-a-bit-more-complicated-than-microarrays]]

-[[An integrative method to normalize RNA-Seq data:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4067528/]](2014/6/14)~
Since RNA-Seq emergence, a number of normalization methods have been developed to address one or two of the different biases [1-12,14]. Our aim was to develop an integrated method able to correct all these sources of bias.

-----------------------------------
~
~

**Ballgown [#u8416342]
[[R¾å¤Ç¤Î¥¤¥ó¥¹¥È¡¼¥ë:https://github.com/alyssafrazee/ballgown#installation]]
 source("http://bioconductor.org/biocLite.R")
 biocLite("ballgown")


»È¤ª¤¦­¡¡¡[[¥Ç¡¼¥¿¤Î¥í¡¼¥É:https://github.com/alyssafrazee/ballgown#loading-data-into-r]]

 library(ballgown)
Ê̤λñÎÁ¤Ë¤è¤ë¤È°Ê²¼¤âƳÆþ¡Ê¾ÜºṲ̀³Îǧ¡Ë
 library(RSkittleBrewer)
 library(genefilter)
 library(dplyr)

ballgown¤ò»È¤¦¡£

¥Ç¡¼¥¿¹½Â¤¤¬¼¡¤Î¤è¤¦¤Ê·Á¤Ë¤Ê¤Ã¤Æ¤¤¤ë¤È¤¤¤¦Á°Äó
 extdata/
    sample01/
        e2t.ctab
        e_data.ctab
        i2t.ctab
        i_data.ctab
        t_data.ctab
    sample02/
        e2t.ctab
        e_data.ctab
        i2t.ctab
        i_data.ctab
        t_data.ctab
    ...
    sample20/
        e2t.ctab
        e_data.ctab
        i2t.ctab
        i_data.ctab
        t_data.ctab
¤³¤³¤Ç¤Ïstringtie¤ò½èÍý¤·¤¿»þ¤Ë¡¢
 $ stringtie -e -B -p 16 -G s288c_e.gff -o ballgown/SRR453566/SRR453566.gtf SRR453566.sorted.bam
 $ stringtie -e -B -p 16 -G s288c_e.gff -o ballgown/SRR453567/SRR453567.gtf SRR453567.sorted.bam
¤È¤·¤¿¤Î¤Ç¡¢²¼¤Î¤è¤¦¤Ë¤Ê¤Ã¤Æ¤¤¤ë¡£[[¥Þ¥Ë¥å¥¢¥ëAccessing assembly data:https://github.com/alyssafrazee/ballgown#accessing-assembly-data]]»²¾È¡£

 ballgown/
    SRR453566/
        SRR453566.gtf
        e2t.ctab
        e_data.ctab
        i2t.ctab
        i_data.ctab
        t_data.ctab
    SRR453567/
        SRR453567.gtf
        e2t.ctab
        e_data.ctab
        i2t.ctab
        i_data.ctab
        t_data.ctab

¤³¤ì¤òballgown¤Ë¿©¤ï¤»¤ë¡£
 bg = ballgown(dataDir="~/src/RNAseq-Saccha/Saccha/ballgown", samplePattern='SRR', meas='all')
dataDir¤Ï¥Ç¡¼¥¿¤ÎÃÖ¤¤¤Æ¤¢¤ë¥Ç¥£¥ì¥¯¥È¥ê¡Êballgown¡Ë¡¢samplePattern¤ÏÃæ¤Î¥Ç¥£¥ì¥¯¥È¥êÃæ¤Î¥µ¥ó¥×¥ë¤Î¶¦ÄÌÀÜÆ¬¼­¡£¤³¤³¤Ç¤ÏSRRxxxxx¤Ê¤Î¤ÇSRR¤Ë¤·¤¿¡£meas¤ÏÉÔÌÀ¡£

¤³¤Î½èÍý¤¬½ª¤ï¤ë¤È¡¢bg¤¬»È¤¨¤ë¤è¤¦¤Ë¤Ê¤ë¡£bgÃæ¤Îstructure¤Ë¤Ï¡¢Exon, intron, and transcript structures¤¬¤¢¤ë¡£¤½¤ì¤¾¤ì¤ò¼è½Ð¤·¤Æ¤ß¤ë¤Ë¤Ï¡¢
 structure(bg)$exon
 GRanges object with 6801 ranges and 2 metadata columns:
             seqnames      ranges strand |        id transcripts
                <Rle>   <IRanges>  <Rle> | <integer> <character>
      [1] NC_001133.9   1807-2169      - |         1           1
      [2] NC_001133.9   2480-2707      + |         2           2
      [3] NC_001133.9   7235-9016      - |         3           3
      [4] NC_001133.9 11565-11951      - |         4           4
      [5] NC_001133.9 12046-12426      + |         5           5
      ...         ...         ...    ... .       ...         ...
   [6797] NC_001224.1 78089-78162      - |      6797        6441
   [6798] NC_001224.1 78533-78608      + |      6798        6442
   [6799] NC_001224.1 79213-80022      + |      6799        6443
   [6800] NC_001224.1 85035-85112      + |      6800        6444
   [6801] NC_001224.1 85295-85777      + |      6801        6445
   -------
   seqinfo: 17 sequences from an unspecified genome; no seqlengths
 

 structure(bg)$trans
 GRangesList object of length 6445:
 $1 
 GRanges object with 1 range and 2 metadata columns:
          seqnames    ranges strand |        id transcripts
             <Rle> <IRanges>  <Rle> | <integer> <character>
   [1] NC_001133.9 1807-2169      - |         1           1
 
 $2 
 GRanges object with 1 range and 2 metadata columns:
          seqnames    ranges strand | id transcripts
   [1] NC_001133.9 2480-2707      + |  2           2
 
 $3 
 GRanges object with 1 range and 2 metadata columns:
          seqnames    ranges strand | id transcripts
   [1] NC_001133.9 7235-9016      - |  3           3
 
 ...
 <6442 more elements>
 -------
 seqinfo: 17 sequences from an unspecified genome; no seqlengths
¤Ê¤É¡£

¼¡¤Ë¡¢expr¥¹¥í¥Ã¥È¤ò¼è½Ð¤¹¡£t/e/i/g¤ËÂФ·¤Æ¡¢texpr, eexpr, iexpr, gexpr¤¬Âбþ¤·¡¢
¤½¤ì¤¾¤ì¤Ë¼è¤ê½Ð¤·¤¿¤¤¤â¤Î¤òtexpr(bg, 'FPKM')¤Î¤è¤¦¤Ë»ØÄꤹ¤ë¡£

¶ñÂÎŪ¤Ë¤Ï¡¢¼¡¤Î¤è¤¦¤Ê¤â¤Î¤¬¼è¤ê½Ð¤»¤ë¡£
 transcript_fpkm = texpr(bg, 'FPKM')
 transcript_cov = texpr(bg, 'cov')
 whole_tx_table = texpr(bg, 'all')
 exon_mcov = eexpr(bg, 'mcov')
 junction_rcount = iexpr(bg)
 whole_intron_table = iexpr(bg, 'all')
 gene_expression = gexpr(bg)

¤¿¤È¤¨¤Ð¡¢transcript_fpkm¤Ï
 >transcript_fpkm
      FPKM.SRR453566 FPKM.SRR453567
 1          0.251292       1.106585
 2          0.000000       0.000000
 3          0.000000       0.000000
 4          0.071457       0.042556
 5          1.218477       2.062575
 6          0.000000       0.000000
 7          0.000000       0.000000
 8          0.000000       0.000000
 9          1.410877       1.360651
 10         9.473732       8.717489
 11        18.461433      12.996510
 12       197.994659     219.502182
 13        95.170197     101.183212
 14        35.876900      39.537365
 15        23.731741      23.713844
 ...
  [ reached getOption("max.print") -- 5945 ¹Ô¤ò̵»ë¤·¤Þ¤·¤¿ ] 

¤È¤Ê¤ê¡¢gene_expression¤Ï
 > gene_expression
           FPKM.SRR453566 FPKM.SRR453567
 gene_0001       0.251292       1.106585
 gene_0002       0.000000       0.000000
 gene_0003       0.000000       0.000000
 gene_0004       0.071457       0.042556
 gene_0005       1.218477       2.062575
 gene_0006       0.000000       0.000000
 gene_0007       0.000000       0.000000
 gene_0008       0.000000       0.000000
 gene_0009       1.410877       1.360651
 gene_0010       9.473732       8.717489
 gene_0011      18.461433      12.996510
 gene_0012     197.994659     219.502182
 gene_0013      95.170197     101.183212
 gene_0014      35.876900      39.537365
 gene_0015      23.731741      23.713844
 ...
  [ reached getOption("max.print") -- 5945 ¹Ô¤ò̵»ë¤·¤Þ¤·¤¿ ] 

¤Î¤è¤¦¤Ê·ë²Ì¤¬ÆÀ¤é¤ì¤ë¡£

index¥¹¥í¥Ã¥È¤Ï¡¢¤â¤¦¾¯¤·ÊÙ¶¯É¬Íס£indexes(bg)¤ËÂФ·¤Æ¡¢indexes(bg)$e2t, indexes(bg)$i2t, indexes(bg)$t2g¤Ê¤É¤¬²Äǽ¡£¡Ê¥Æ¡¼¥Ö¥ë¤ò¸«¤Æ¤¤¤ë¤À¤±¡©¡Ë

 > indexes(bg)$e2t
      e_id t_id
 1       1    1
 2       2    2
 3       3    3
 4       4    4
 5       5    5
 6       6    6
 7       7    7
 8       8    8
 9       9    9
 10     10   10
 11     11   11
 12     12   12
 13     13   13
 14     14   14
 15     15   15

 > indexes(bg)$i2t
     i_id t_id
 1      1   41
 2      2   66
 3      3   68
 4      4   71
 5      5   86
 6      6  104
 7      7  117
 8      8  124
 9      9  129
 10    10  154
 11    11  155
 12    12  163
 13    13  173
 14    14  188
 15    15  189

 > indexes(bg)$t2g
      t_id      g_id
 1       1 gene_0001
 2       2 gene_0002
 3       3 gene_0003
 4       4 gene_0004
 5       5 gene_0005
 6       6 gene_0006
 7       7 gene_0007
 8       8 gene_0008
 9       9 gene_0009
 10     10 gene_0010
 11     11 gene_0011
 12     12 gene_0012
 13     13 gene_0013
 14     14 gene_0014
 15     15 gene_0015

¤¢¤È¡¢¥Þ¥Ë¥å¥¢¥ë¤Ë¤è¤ë¤È¡¢phenotype¾ðÊó¤ò¸«¤ë¥³¥ó¥Ý¡¼¥Í¥ó¥ÈpData¤¬¤¢¤ë¡£
pData¤Î¾ðÊó¤Ï¼ê¤Çºî¤ëɬÍפ¬¤¢¤ê¡¢¤¤¤í¤¤¤í¤È¡Êµ­½Ò½çÈ֤Ȥ«¤Î¡ËÀßÄêÀ©¸Â¤¬¤¢¤ë¤é¤·¤¤¡£

ÉÁ²è¤Ë¤Ä¤¤¤Æ¤Ï¡¢¥Þ¥Ë¥å¥¢¥ë [[Plotting transcript structures:https://github.com/alyssafrazee/ballgown#plotting-transcript-structures]]¤Ë¤è¤ë¤È¡¢¼¡¤ÎÄ̤ꡣ

plotTranscripts¤ò»È¤¦¤È¡¢ÉÁ²è¤µ¤ì¤ë¡£
 plotTranscripts(gene='XLOC_000454', gown=bg, samples='sample12', 
    meas='FPKM', colorby='transcript', 
    main='transcripts from gene XLOC_000454: sample 12, FPKM')

¥È¥Ã¥×   ÊÔ½¸ º¹Ê¬ ¥Ð¥Ã¥¯¥¢¥Ã¥× źÉÕ Ê£À½ ̾Á°Êѹ¹ ¥ê¥í¡¼¥É   ¿·µ¬ °ìÍ÷ ñ¸ì¸¡º÷ ºÇ½ª¹¹¿·   ¥Ø¥ë¥×   ºÇ½ª¹¹¿·¤ÎRSS