[[Python¥Ð¥¤¥ª]]¡¡[[Python¥Ð¥¤¥ª/¥Ä¡¼¥ë]]~ &counter();¡¡¡¡¡¡&lastmod();~ *ȯ¸½Î̤βòÀÏ [#q2eb5d88] *ȯ¸½²òÀÏ [#q2eb5d88] [[ȯ¸½Î̲òÀÏ | RNA-Seq ¤òÍøÍѤ·¤¿È¯¸½ÊÑÆ°°äÅÁ»Ò¤Î¸¡½Ð:https://bi.biopapyrus.jp/rnaseq/analysis/]] BioPapyrus¥µ¥¤¥È¤Ç¤Î¤Þ¤È¤á **Àµµ¬²½¤ÎÀâÌÀ [#qae69fd7] -[[RNA-Seq | °äÅÁ»Òȯ¸½Î̲òÀÏ:https://bi.biopapyrus.jp/rnaseq/]] ¢Í [[FPKM / RPKM | RNA-Seq ¥ê¡¼¥É¥«¥¦¥ó¥È¥Ç¡¼¥¿¤Ëž¼Ìʪ¤ÎŤµ¤Ê¤É¤òÊäÀµ¤·¤¿È¯¸½ÎÌ:https://bi.biopapyrus.jp/rnaseq/analysis/normalizaiton/fpkm.html]]¡¡¤ÎÃæ¤ËR¤òÍøÍѤ·¤Æ FPKM ¤ò·×»»¤¹¤ëÊýË¡ -[[¼¡À¤Â奷¡¼¥±¥ó¥µ¡¼¤Ç¤Î°äÅÁ»Òȯ¸½Î̲òÀÏ | PictBio:https://www.pictbio.com/tips/2554.html]] -[[µ¡Ç½¥²¥Î¥à³Ø¡ÊÂè6²ó¡Ë:http://www.iu.a.u-tokyo.ac.jp/~kadota/20110929_kadota.pdf]]¡¡ÌçÅÄÀèÀ¸¤Ë¤è¤ë³Æ¼êË¡¤ÎÈæ³Ó¼Â¸³(2011/09/29)~ RPM(Reads per million mapped reads) ¢Í RPKM(Reads per kilobase of exon per million mapped reads)¡¡/¡¡TMMÀµµ¬²½Ë¡¡¡/¡¡ÌçÅÄË¡ -[[µ¡Ç½¥²¥Î¥à³Ø¡ÊÂè6²ó¡Ë:https://jp.illumina.com/content/dam/illumina-marketing/apac/japan/documents/pdf/2011_illumina_rna-seq_session3.pdf]](2011/11/17) -[[Question: What is the reason why we usually use normalized values from RNA-Seq (FPKM, RPKM, etc.) ?:https://www.biostars.org/p/270537/]] (BioStars)~ R(F)PKM/TPM values are used to normalize read counts by library size (total number of reads you have in a given RNAseq experiment) and the length of the feature (gene/transcript). But remember that commonly used software for differential expression analysis (DESEQ2/EdgeR) are using raw counts instead of normalized values (they do their internal normalization steps). -[[¥Þ¥¤¥¯¥í¥¢¥ì¥¤¤è¤ê¾¯¤·Ê£»¨¤Ê¡¢RNA-Seq¤Î¥Ç¡¼¥¿²òÀϼê½ç | Subio:https://www.subioplatform.com/ja/info_technical/293/an-rna-seq-data-analysis-procedure-a-bit-more-complicated-than-microarrays]] -[[An integrative method to normalize RNA-Seq data:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4067528/]](2014/6/14)~ Since RNA-Seq emergence, a number of normalization methods have been developed to address one or two of the different biases [1-12,14]. Our aim was to develop an integrated method able to correct all these sources of bias. ----------------------------------- ~ ~ **Ballgown [#u8416342] [[R¾å¤Ç¤Î¥¤¥ó¥¹¥È¡¼¥ë:https://github.com/alyssafrazee/ballgown#installation]] source("http://bioconductor.org/biocLite.R") biocLite("ballgown") »È¤ª¤¦¡¡¡[[¥Ç¡¼¥¿¤Î¥í¡¼¥É:https://github.com/alyssafrazee/ballgown#loading-data-into-r]] library(ballgown) Ê̤λñÎÁ¤Ë¤è¤ë¤È°Ê²¼¤âƳÆþ¡Ê¾ÜºṲ̀³Îǧ¡Ë library(RSkittleBrewer) library(genefilter) library(dplyr) ballgown¤ò»È¤¦¡£ ¥Ç¡¼¥¿¹½Â¤¤¬¼¡¤Î¤è¤¦¤Ê·Á¤Ë¤Ê¤Ã¤Æ¤¤¤ë¤È¤¤¤¦Á°Äó extdata/ sample01/ e2t.ctab e_data.ctab i2t.ctab i_data.ctab t_data.ctab sample02/ e2t.ctab e_data.ctab i2t.ctab i_data.ctab t_data.ctab ... sample20/ e2t.ctab e_data.ctab i2t.ctab i_data.ctab t_data.ctab ¤³¤³¤Ç¤Ïstringtie¤ò½èÍý¤·¤¿»þ¤Ë¡¢ $ stringtie -e -B -p 16 -G s288c_e.gff -o ballgown/SRR453566/SRR453566.gtf SRR453566.sorted.bam $ stringtie -e -B -p 16 -G s288c_e.gff -o ballgown/SRR453567/SRR453567.gtf SRR453567.sorted.bam ¤È¤·¤¿¤Î¤Ç¡¢²¼¤Î¤è¤¦¤Ë¤Ê¤Ã¤Æ¤¤¤ë¡£[[¥Þ¥Ë¥å¥¢¥ëAccessing assembly data:https://github.com/alyssafrazee/ballgown#accessing-assembly-data]]»²¾È¡£ ballgown/ SRR453566/ SRR453566.gtf e2t.ctab e_data.ctab i2t.ctab i_data.ctab t_data.ctab SRR453567/ SRR453567.gtf e2t.ctab e_data.ctab i2t.ctab i_data.ctab t_data.ctab ¤³¤ì¤òballgown¤Ë¿©¤ï¤»¤ë¡£ bg = ballgown(dataDir="~/src/RNAseq-Saccha/Saccha/ballgown", samplePattern='SRR', meas='all') dataDir¤Ï¥Ç¡¼¥¿¤ÎÃÖ¤¤¤Æ¤¢¤ë¥Ç¥£¥ì¥¯¥È¥ê¡Êballgown¡Ë¡¢samplePattern¤ÏÃæ¤Î¥Ç¥£¥ì¥¯¥È¥êÃæ¤Î¥µ¥ó¥×¥ë¤Î¶¦ÄÌÀÜÆ¬¼¡£¤³¤³¤Ç¤ÏSRRxxxxx¤Ê¤Î¤ÇSRR¤Ë¤·¤¿¡£meas¤ÏÉÔÌÀ¡£ ¤³¤Î½èÍý¤¬½ª¤ï¤ë¤È¡¢bg¤¬»È¤¨¤ë¤è¤¦¤Ë¤Ê¤ë¡£bgÃæ¤Îstructure¤Ë¤Ï¡¢Exon, intron, and transcript structures¤¬¤¢¤ë¡£¤½¤ì¤¾¤ì¤ò¼è½Ð¤·¤Æ¤ß¤ë¤Ë¤Ï¡¢ structure(bg)$exon GRanges object with 6801 ranges and 2 metadata columns: seqnames ranges strand | id transcripts <Rle> <IRanges> <Rle> | <integer> <character> [1] NC_001133.9 1807-2169 - | 1 1 [2] NC_001133.9 2480-2707 + | 2 2 [3] NC_001133.9 7235-9016 - | 3 3 [4] NC_001133.9 11565-11951 - | 4 4 [5] NC_001133.9 12046-12426 + | 5 5 ... ... ... ... . ... ... [6797] NC_001224.1 78089-78162 - | 6797 6441 [6798] NC_001224.1 78533-78608 + | 6798 6442 [6799] NC_001224.1 79213-80022 + | 6799 6443 [6800] NC_001224.1 85035-85112 + | 6800 6444 [6801] NC_001224.1 85295-85777 + | 6801 6445 ------- seqinfo: 17 sequences from an unspecified genome; no seqlengths structure(bg)$trans GRangesList object of length 6445: $1 GRanges object with 1 range and 2 metadata columns: seqnames ranges strand | id transcripts <Rle> <IRanges> <Rle> | <integer> <character> [1] NC_001133.9 1807-2169 - | 1 1 $2 GRanges object with 1 range and 2 metadata columns: seqnames ranges strand | id transcripts [1] NC_001133.9 2480-2707 + | 2 2 $3 GRanges object with 1 range and 2 metadata columns: seqnames ranges strand | id transcripts [1] NC_001133.9 7235-9016 - | 3 3 ... <6442 more elements> ------- seqinfo: 17 sequences from an unspecified genome; no seqlengths ¤Ê¤É¡£ ¼¡¤Ë¡¢expr¥¹¥í¥Ã¥È¤ò¼è½Ð¤¹¡£t/e/i/g¤ËÂФ·¤Æ¡¢texpr, eexpr, iexpr, gexpr¤¬Âбþ¤·¡¢ ¤½¤ì¤¾¤ì¤Ë¼è¤ê½Ð¤·¤¿¤¤¤â¤Î¤òtexpr(bg, 'FPKM')¤Î¤è¤¦¤Ë»ØÄꤹ¤ë¡£ ¶ñÂÎŪ¤Ë¤Ï¡¢¼¡¤Î¤è¤¦¤Ê¤â¤Î¤¬¼è¤ê½Ð¤»¤ë¡£ transcript_fpkm = texpr(bg, 'FPKM') transcript_cov = texpr(bg, 'cov') whole_tx_table = texpr(bg, 'all') exon_mcov = eexpr(bg, 'mcov') junction_rcount = iexpr(bg) whole_intron_table = iexpr(bg, 'all') gene_expression = gexpr(bg) ¤¿¤È¤¨¤Ð¡¢transcript_fpkm¤Ï >transcript_fpkm FPKM.SRR453566 FPKM.SRR453567 1 0.251292 1.106585 2 0.000000 0.000000 3 0.000000 0.000000 4 0.071457 0.042556 5 1.218477 2.062575 6 0.000000 0.000000 7 0.000000 0.000000 8 0.000000 0.000000 9 1.410877 1.360651 10 9.473732 8.717489 11 18.461433 12.996510 12 197.994659 219.502182 13 95.170197 101.183212 14 35.876900 39.537365 15 23.731741 23.713844 ... [ reached getOption("max.print") -- 5945 ¹Ô¤ò̵»ë¤·¤Þ¤·¤¿ ] ¤È¤Ê¤ê¡¢gene_expression¤Ï > gene_expression FPKM.SRR453566 FPKM.SRR453567 gene_0001 0.251292 1.106585 gene_0002 0.000000 0.000000 gene_0003 0.000000 0.000000 gene_0004 0.071457 0.042556 gene_0005 1.218477 2.062575 gene_0006 0.000000 0.000000 gene_0007 0.000000 0.000000 gene_0008 0.000000 0.000000 gene_0009 1.410877 1.360651 gene_0010 9.473732 8.717489 gene_0011 18.461433 12.996510 gene_0012 197.994659 219.502182 gene_0013 95.170197 101.183212 gene_0014 35.876900 39.537365 gene_0015 23.731741 23.713844 ... [ reached getOption("max.print") -- 5945 ¹Ô¤ò̵»ë¤·¤Þ¤·¤¿ ] ¤Î¤è¤¦¤Ê·ë²Ì¤¬ÆÀ¤é¤ì¤ë¡£ index¥¹¥í¥Ã¥È¤Ï¡¢¤â¤¦¾¯¤·ÊÙ¶¯É¬Íס£indexes(bg)¤ËÂФ·¤Æ¡¢indexes(bg)$e2t, indexes(bg)$i2t, indexes(bg)$t2g¤Ê¤É¤¬²Äǽ¡£¡Ê¥Æ¡¼¥Ö¥ë¤ò¸«¤Æ¤¤¤ë¤À¤±¡©¡Ë > indexes(bg)$e2t e_id t_id 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9 9 9 10 10 10 11 11 11 12 12 12 13 13 13 14 14 14 15 15 15 > indexes(bg)$i2t i_id t_id 1 1 41 2 2 66 3 3 68 4 4 71 5 5 86 6 6 104 7 7 117 8 8 124 9 9 129 10 10 154 11 11 155 12 12 163 13 13 173 14 14 188 15 15 189 > indexes(bg)$t2g t_id g_id 1 1 gene_0001 2 2 gene_0002 3 3 gene_0003 4 4 gene_0004 5 5 gene_0005 6 6 gene_0006 7 7 gene_0007 8 8 gene_0008 9 9 gene_0009 10 10 gene_0010 11 11 gene_0011 12 12 gene_0012 13 13 gene_0013 14 14 gene_0014 15 15 gene_0015 ¤¢¤È¡¢¥Þ¥Ë¥å¥¢¥ë¤Ë¤è¤ë¤È¡¢phenotype¾ðÊó¤ò¸«¤ë¥³¥ó¥Ý¡¼¥Í¥ó¥ÈpData¤¬¤¢¤ë¡£ pData¤Î¾ðÊó¤Ï¼ê¤Çºî¤ëɬÍפ¬¤¢¤ê¡¢¤¤¤í¤¤¤í¤È¡Êµ½Ò½çÈ֤Ȥ«¤Î¡ËÀßÄêÀ©¸Â¤¬¤¢¤ë¤é¤·¤¤¡£ ÉÁ²è¤Ë¤Ä¤¤¤Æ¤Ï¡¢¥Þ¥Ë¥å¥¢¥ë [[Plotting transcript structures:https://github.com/alyssafrazee/ballgown#plotting-transcript-structures]]¤Ë¤è¤ë¤È¡¢¼¡¤ÎÄ̤ꡣ plotTranscripts¤ò»È¤¦¤È¡¢ÉÁ²è¤µ¤ì¤ë¡£ plotTranscripts(gene='XLOC_000454', gown=bg, samples='sample12', meas='FPKM', colorby='transcript', main='transcripts from gene XLOC_000454: sample 12, FPKM')