Python¥Ð¥¤¥ª?¡¡Python¥Ð¥¤¥ª/¥Ä¡¼¥ë?
3197¡¡¡¡¡¡2019-06-29 (ÅÚ) 09:30:16
cufflinks¤Ç¥¨¥é¡¼¤¬½Ð¤ë¡£HISAT2-StringTie-Cuffdiff error: BAM record error: found spliced alignment without XS attribute¡ÊÀµ¤·¤¯¤Ï¡©¡Ë
Âкö¤Ï¡¢HISAT2¤Ç¥Ñ¥é¥á¡¼¥¿--dta-cufflinks¤ò»ØÄꤹ¤ë¤³¤È
cuffdiff¤ò»È¤Ã¤Æ¤ß¤ë 2019-06-29 ºÆÅÙ¤ä¤êľ¤¹É¬Íפ¢¤ê¤½¤¦¡£¤Þ¤ºcufflinks¤Ç½¸·×¤¬É¬Íפ«¤â¡£Í×¥Á¥§¥Ã¥¯¡£
¤¢¤é¤«¤¸¤áÍѰդ·¤¿sortºÑ¤ß¤Îbam¥Õ¥¡¥¤¥ëSRR4535xx.tagged.sorted.bam ¤È¥Þ¥Ã¥×s288c_3.gff¤ËÂФ·¤Æ
$cuffdiff -p 32 -o ./results1 ../s288c_e.gff -L batch,chemo \ SRR453566.tagged.sorted.bam,SRR453567.tagged.sorted.bam,SRR453568.tagged.sorted.bam \ SRR453569.tagged.sorted.bam,SRR453570.tagged.sorted.bam,SRR453571.tagged.sorted.bam
¥é¥Ù¥ëÉÕ¤±-L¤ÎÅÔ¹ç¤Ç¡¢¥é¥Ù¥ë£²¼ïÎàbatch,chemo¤ËÂФ·¤Æ¡¢ÆþÎÏ¥Õ¥¡¥¤¥ëSRR453566,67,68¤òbatch¤Ë¡¢SRR453569,70,71¤òchemo¤Ë³ä¤êÅö¤Æ¤¿¡Ê¤Ï¤º¡Ë¡£¥³¥ó¥Þ¤ÎÁ°¸å¤Ï¶õÇò¤Ê¤·¡Ê¤¢¤ë¤È¥³¥Þ¥ó¥É²ò¼á¤Ç°ú¿ô¤ÎÀÚ¤ìÌܤȻפäƤ·¤Þ¤¦¤Î¤Ç¡Ë¡¡¤³¤Î-L¤ÈÆþÎÏ¥Õ¥¡¥¤¥ë¤Î¶èÀÚ¤ê¤Ë¤è¤Ã¤Æ¡¢¸å¤Î²òÀϥѥ¿¡¼¥ó¡Ê¥½¡¼¥¹¤ÎʬÎà¡Ë¤¬·è¤Þ¤ë¤Î¤ÇÍ×Ãí°Õ¡£
Warning: Could not connect to update server to verify current version. Please check at the Cufflinks website (http://cufflinks.cbcb.umd.edu). [10:38:24] Loading reference annotation. [10:38:25] Inspecting maps and determining fragment length distributions. [10:40:50] Modeling fragment count overdispersion. [10:42:08] Modeling fragment count overdispersion. > Map Properties: > Normalized Map Mass: 3618635.35 > Raw Map Mass: 3746966.57 > Fragment Length Distribution: Empirical (learned) > Estimated Mean: 209.34 > Estimated Std Dev: 99.53 > Map Properties: > Normalized Map Mass: 3618635.35 > Raw Map Mass: 5136465.50 > Fragment Length Distribution: Empirical (learned) > Estimated Mean: 213.53 > Estimated Std Dev: 96.07 > Map Properties: > Normalized Map Mass: 3618635.35 > Raw Map Mass: 3780062.80 > Fragment Length Distribution: Empirical (learned) > Estimated Mean: 206.94 > Estimated Std Dev: 94.02 > Map Properties: > Normalized Map Mass: 3618635.35 > Raw Map Mass: 2502112.93 > Fragment Length Distribution: Empirical (learned) > Estimated Mean: 202.52 > Estimated Std Dev: 96.05 > Map Properties: > Normalized Map Mass: 3618635.35 > Raw Map Mass: 3127907.67 > Fragment Length Distribution: Empirical (learned) > Estimated Mean: 211.12 > Estimated Std Dev: 94.18 > Map Properties: > Normalized Map Mass: 3618635.35 > Raw Map Mass: 3898613.50 > Fragment Length Distribution: Empirical (learned) > Estimated Mean: 213.42 > Estimated Std Dev: 98.72 [10:43:25] Calculating preliminary abundance estimates [10:43:25] Testing for differential expression and regulation in locus. > Processed 6278 loci. [*************************] 100% Performed 5732 isoform-level transcription difference tests Performed 0 tss-level transcription difference tests Performed 5732 gene-level transcription difference tests Performed 0 CDS-level transcription difference tests Performed 0 splicing tests Performed 0 promoter preference tests Performing 0 relative CDS output tests Writing isoform-level FPKM tracking Writing TSS group-level FPKM tracking Writing gene-level FPKM tracking Writing CDS-level FPKM tracking Writing isoform-level count tracking Writing TSS group-level count tracking Writing gene-level count tracking Writing CDS-level count tracking Writing isoform-level read group tracking Writing TSS group-level read group tracking Writing gene-level read group tracking Writing CDS-level read group tracking Writing read group info Writing run info
$ cd results1 $ ls bias_params.info cds_exp.diff genes.read_group_tracking promoters.diff tss_groups.count_tracking cds.count_tracking cuffData.db isoform_exp.diff read_groups.info tss_groups.fpkm_tracking cds.diff gene_exp.diff isoforms.count_tracking run.info tss_groups.read_group_tracking cds.fpkm_tracking genes.count_tracking isoforms.fpkm_tracking splicing.diff var_model.info cds.read_group_tracking genes.fpkm_tracking isoforms.read_group_tracking tss_group_exp.diff
¤³¤ÎÎã¤Ç¤Ï¡ÊCDS¤¬½ñ¤¤¤Æ¤¤¤Ê¤¤¤Î¤Ç¡©¡Ëgene_exp.diff¤Èisoforms_exp.diff¤¬·ë²Ì¤Ë¤Ê¤Ã¤Æ¤¤¤ë¡£
gene_exp.diff¤ò¸«¤ë¤È
isoforms_exp.diff¤ò¸«¤ë¤È
¸µÏÀʸ
Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks (Nat Protoc. ; 7(3): 562–578)
¤Ë¤è¤ë¤È¡¢¤³¤Î·ë²Ì¤ÏcummeRbund¤Ë¤è¤Ã¤Æ²Ä»ë²½¤Ç¤¤ë¡£
¼ê½ç¤Ï¤¿¤È¤¨¤Ð
cummeRbund¤ÎÆþÎϤÏcuffDiff¤Î·ë²Ì¤Ç¡¢cummeRband¤Ç¤Î²òÀϤΤ¿¤á¤ËcuffDiff¤Ç¥é¥Ù¥ëÉÕ¤±¤ò¤·¤Æ¤¢¤ë¤³¤È¡£¤½¤Î¥é¥Ù¥ë¤ò¸µ¤Ë¡¢Èæ³Ó¤ò¼è¤Ã¤¿¤ê¤¹¤ë¡£
¥Ç¡¼¥¿¥ë¡¼¥È SRR453566.tagged.sorted.bam SRR453567.tagged.sorted.bam ... SRR453571.tagged.sorted.bam results1 bias_params.info cds_exp.diff genes.read_group_tracking promoters.diff tss_groups.count_tracking cds.count_tracking isoform_exp.diff read_groups.info tss_groups.fpkm_tracking cds.diff gene_exp.diff isoforms.count_tracking run.info tss_groups.read_group_tracking cds.fpkm_tracking genes.count_tracking isoforms.fpkm_tracking splicing.diff var_model.info cds.read_group_tracking genes.fpkm_tracking isoforms.read_group_tracking tss_group_exp.diff ¢¬ ¤³¤Î¥ì¥Ù¥ë¤òR¤Î¥ï¡¼¥¥ó¥°¥Ç¥£¥ì¥¯¥È¥ê¤Ë¤·¤Æ¤ª¤¯¡£
R¤òµ¯Æ°¡£
> library(cummeRbund) ¥×¥í¥°¥é¥àcummRbund¤ò¥í¡¼¥É > cuff<-readCufflinks("results1") ¥Ç¥£¥ì¥¯¥È¥ê results1 ¤«¤é¥Ç¡¼¥¿ÆÉ¹þ¤ó¤Ç¥Ç¡¼¥¿¥Ù¡¼¥¹¤Ë½ñ¤¹þ¤à (Ãí) > cuff <- readCufflinks('results1', gtfFile='../s288c_e.gff')¤Ïgenome=¤¬Ìµ¤¤¤È¥À¥á¡£genome=¤Ë½ñ¤±¤ëÃͤ¬ÉÔÌÀ A character string indicating to which genome build the .gtf annotations belong (e.g. ¡Çhg19¡Ç or ¡Çmm9¡Ç) > cuff CuffSet instance with: 2 samples 6445 genes 6445 isoforms 0 TSS 0 CDS 0 promoters 0 splicing 0 relCDS > disp<-dispersionPlot(genes(cuff)) > disp > dens<-csDensity(genes(cuff)) > dens > b<-csBoxplot(genes(cuff)) > b > sm<-csScatterMatrix(genes(cuff)) > sm > s<-csScatter(genes(cuff),"chemo","batch",smooth=T) chemo¤Èbatch¤Ç¤Î»¶ÉÛ¿Þ > s > dend<-csDendro(genes(cuff)) > dend > m<-MAplot(genes(cuff),"chemo","batch") > m > v<-csVolcanoMatrix(genes(cuff)) > v > mySigGeneIds<-getSig(cuff,alpha=0.05,level='genes') > lengthh(mySigGeneIds) > sigGenes<-getGenes(cuff,sigGeneIds) sigGenes CuffGeneSet instance for 3900 genes Slots: annotation fpkm repFpkm diff count isoforms CuffFeatureSet instance of size 3900 TSS CuffFeatureSet instance of size 0 CDS CuffFeatureSet instance of size 0 promoters CuffFeatureSet instance of size 3900 splicing CuffFeatureSet instance of size 0 relCDS CuffFeatureSet instance of size 3900 > head(annotation(genes(cuff))) gene_id class_code nearest_ref_id gene_short_name 1 gene_0001 <NA> <NA> PAU8 2 gene_0002 <NA> <NA> YAL067W-A 3 gene_0003 <NA> <NA> SEO1 4 gene_0004 <NA> <NA> YAL065C 5 gene_0005 <NA> <NA> YAL064W-B 6 gene_0006 <NA> <NA> TDA8 locus length coverage feature_id isoform_id seqnames 1 NC_001133.9:1806-2169 NA NA NA <NA> <NA> 2 NC_001133.9:2479-2707 NA NA NA <NA> <NA> 3 NC_001133.9:7234-9016 NA NA NA <NA> <NA> 4 NC_001133.9:11564-11951 NA NA NA <NA> <NA> 5 NC_001133.9:12045-12426 NA NA NA <NA> <NA> 6 NC_001133.9:13362-13743 NA NA NA <NA> <NA> source type start end score strand frame 1 <NA> NA NA NA NA <NA> <NA> 2 <NA> NA NA NA NA <NA> <NA> 3 <NA> NA NA NA NA <NA> <NA> 4 <NA> NA NA NA NA <NA> <NA> 5 <NA> NA NA NA NA <NA> <NA> 6 <NA> NA NA NA NA <NA> <NA> > head(fpkm(genes(cuff))) > head(count(genes(cuff))) > head(fpkm(isoforms(cuff))) > head(diffData(genes(cuff))) > head(fpkmMatrix(genes(cuff)))
¶½Ì£¤¢¤ëgene(½¸¹ç)¤ò»ØÄꤷ¤ÆÊ¬ÀϤ¹¤ë
> myGeneIds <- c("gene_0010", "gene_0010", "gene_0011", "gene_0012", "gene_0013", "gene_0014", "gene_0015", "gene_0016", "gene_0017", "gene_0018", "gene_0019") ¶½Ì£¤Î¤¢¤ëgeneid¤ò¥ê¥¹¥È¤·¤¿¤â¤Î¡£¤¢¤é¤«¤¸¤áºî¤Ã¤Æ¤¢¤ë¤È¤¹¤ë¡£¤³¤³¤Ï²¾¤ÎÎã¡£ > myGenes <- getGenes(cuff,myGeneIds) > myGenes CuffGeneSet instance for 10 genes Slots: annotation fpkm repFpkm diff count isoforms CuffFeatureSet instance of size 10 TSS CuffFeatureSet instance of size 0 CDS CuffFeatureSet instance of size 0 promoters CuffFeatureSet instance of size 10 splicing CuffFeatureSet instance of size 0 relCDS CuffFeatureSet instance of size 10 > myGeneset.pluri<-getGenes(cuff,myGeneIds,sampleIdList=c("batch")) > h<-csHeatmap(myGenes,cluster='both') > h > b<-expressionBarplot(myGenes) > b > s<-csScatter(myGenes,"chemo","batch",smooth=T) > s > v<-csVolcano(myGenes,"chemo","batch") > v > ih<-csHeatmap(isoforms(myGenes),cluster='both',labRow=F) > ih > th<-csHeatmap(TSS(myGenes),cluster='both',labRow=F) > th > den<-csDendro(myGenes) > den
£±¤Ä¤À¤±¤Ë¤¹¤ë
> myGeneId <- c("gene_0010") 1¤Ä¤À¤±¤Ë¤¹¤ë > myGene<-getGene(cuff,myGeneId) > head(fpkm(myGene)) Ʊ¤¸¤è¤¦¤Ë»È¤¨¤ë > gl<-expressionPlot(myGene) > gl > gl.iso.rep<-expressionPlot(isoforms(myGene),replicates=T) > gb<-expressionBarplot(myGene) > gb > igb<-expressionBarplot(isoforms(myGene),replicates=T) > gp<-csPie(myGene,level="isoforms") > gp
> mySigMat<-sigMatrix(cuff,level='genes',alpha=0.05) > mySigMat > mySigGeneIds<-getSig(cuff,alpha=0.05,level='genes') > head(mySigGeneIds) > chemo_vs_batch.sigIsoformIds<-getSig(cuff,x='chemo',y='batch',alpha=0.05,level='isoforms')
> myDistHeat<-csDistHeat(genes(cuff)) > myDistHeat > myRepDistHeat<-csDistHeat(genes(cuff),replicates=T) > myRepDistHeat > genes.PCA<-PCAplot(genes(cuff),"PC1","PC2") > genes.PCA > genes.MDS<-MDSplot(genes(cuff)) cmdscale(d, eig = TRUE, k = 2) ¤Ç¥¨¥é¡¼: 'k' must be in {1, 2, .. n - 1} MDS(multi-dimensional scaling)¤Ï¼¡¸µ¿ô3°Ê¾å
¥¯¥é¥¹¥¿²½
> ic<-csCluster(myGenes,k=4) > ic > ic$cluster gene_0010 gene_0011 gene_0013 gene_0014 gene_0015 gene_0016 gene_0017 1 1 2 2 3 3 4 gene_0018 gene_0019 3 3 > icp<-csClusterPlot(ic) > icp
Îà»÷gene¤Î¸¡½Ð
> mySimilar<-findSimilar(cuff,myGeneId,n=20) > mySimilar.expression<-expressionPlot(mySimilar,logMode=T,showErrorbars=F) > mySimilar.expression > myProfile<-c(500,0,400) > mySimilar2<-findSimilar(cuff,myProfile,n=10) > mySimilar2.expression<-expressionPlot(mySimilar2,logMode=T,showErrorbars=F) > mySimilar2.expression
> head(annotation(genes(cuff))) gene_id class_code nearest_ref_id gene_short_name 1 gene_0001 <NA> <NA> PAU8 2 gene_0002 <NA> <NA> YAL067W-A 3 gene_0003 <NA> <NA> SEO1 4 gene_0004 <NA> <NA> YAL065C 5 gene_0005 <NA> <NA> YAL064W-B 6 gene_0006 <NA> <NA> TDA8 locus length coverage feature_id isoform_id seqnames 1 NC_001133.9:1806-2169 NA NA NA <NA> <NA> 2 NC_001133.9:2479-2707 NA NA NA <NA> <NA> 3 NC_001133.9:7234-9016 NA NA NA <NA> <NA> 4 NC_001133.9:11564-11951 NA NA NA <NA> <NA> 5 NC_001133.9:12045-12426 NA NA NA <NA> <NA> 6 NC_001133.9:13362-13743 NA NA NA <NA> <NA> source type start end score strand frame 1 <NA> NA NA NA NA <NA> <NA> 2 <NA> NA NA NA NA <NA> <NA> 3 <NA> NA NA NA NA <NA> <NA> 4 <NA> NA NA NA NA <NA> <NA> 5 <NA> NA NA NA NA <NA> <NA> 6 <NA> NA NA NA NA <NA> <NA>
¡Á¡Á¡Á¡Á¡Á¡Á¡Á¡Á¡Á
¤½¤ÎÁ°¤Ë
»²¾È