¥Î¡¼¥È
ˬÌä¼Ô¿ô¡¡281¡¡¡¡¡¡¡¡¡¡¡¡ºÇ½ª¹¹¿·¡¡2020-03-02 (·î) 11:56:09

RNAseq¡Á´ßËܥǡ¼¥¿¤Î¸å³ʬÀÏ

TPMÊäÀµ¤¬½ª¤ï¤Ã¤¿¼Â¸³¥Ç¡¼¥¿¤ò¤É¤¦Ê¬ÀϤ¹¤ë¤«¹Í¤¨¤ë¡£

¥Ç¡¼¥¿

TPMÊäÀµºÑ¤ß¥Ç¡¼¥¿¡Êȯ¸½ÃͤÎÊäÀµºÑ¤ß¥Ç¡¼¥¿¡Ë¤Ï

	Chr	Start	End	Strand	Length	10B.sam	10D_minus.sam	1p2-1.sam	1p2-2.sam	2-10B_minus.sam
gene_0001_thrL	AP012030.1	190	255	+	66	1119.230093	2070.357664	5212.719939	4193.551146	1955.183064
gene_0003_thrA	AP012030.1	338	2800	+	2463	1872.865147	4391.896215	5773.99795	6352.945352	3971.157253
gene_0005_thrB	AP012030.1	2802	3734	+	933	1925.903487	4443.130625	3779.116205	4639.666182	3926.988291
gene_0007_thrC	AP012030.1	3735	5021	+	1287	2288.887037	3590.528484	3897.465673	4230.197519	3276.631145
gene_0008_yaaX	AP012030.1	5235	5531	+	297	186.9824878	553.2073517	293.9131871	321.6221759	585.3916812
gene_0009_yaaA	AP012030.1	5684	6460	-	777	42.27207978	45.02772533	73.69652755	49.63567721	49.63092301
gene_0011_yaaJ	AP012030.1	6530	7960	-	1431	29.68188822	18.17449013	19.74707008	17.02170195	21.92416
gene_0014_talB	AP012030.1	8239	9192	+	954	3197.624287	397.0260641	1580.352153	1586.773069	443.3769857
gene_0016_mog	AP012030.1	9307	9894	+	588	218.27842	87.23291626	137.3536608	138.6934099	94.32622154
gene_0017_yaaH	AP012030.1	9929	10495	-	567	146.5658456	50.05540317	89.97128851	78.75914474	61.42555542

¤Î¤è¤¦¤Ë¤Ê¤Ã¤Æ¤¤¤ë¡£

Ãí°Õ¡§¡¡Æ±¤¸gene̾¤Ç¡¢Ê£¿ô¤Î°Û¤Ê¤ëCDS¤Ë¤Ê¤Ã¤Æ¤¤¤ë¾ì¹ç¤¬¤¢¤ë¡£¤Ä¤Þ¤êgene̾¤Ç¤Ï¥æ¥Ë¡¼¥¯¤Ç¤Ï¤Ê¤¯¡¢CDS°ÌÃ֤ǥæ¥Ë¡¼¥¯¤Ë¤Ê¤ë¡£¤³¤ì¤é¤Ï¥È¥é¥ó¥¹¥Ý¥¾¥ó¡ÊinsA¡ÁN¤Ê¤É¡Ë¤Ê¤É¥²¥Î¥àÆâ¤òÈô¤Ó²ó¤ë°äÅÁ»Ò¤é¤·¤¤¡£

gene̾¤Ç¸«¤Æ½ÅÊ£¤Î¤¢¤ëCDS¤ò¼è½Ð¤·¤Æ¤ß¤ë¤È

geneStartLengthAnc
gene_3114arpB17864754747.176203468
gene_3115arpB178695914161.441322222
gene_3782gatR21534954476.087732338
gene_3786gatR2155197339163.2194667
gene_2015icd117251512511782.60336
gene_2058icd119006616512.36916525
gene_6818ilvG39296659841955.874255
gene_6819ilvG3930728582942.1393394
gene_0495insA2902962760
gene_1774insA10401612766.572986365
gene_6204insA35613892760.821623296
gene_7778insA449829727637.7946716
gene_0035insB198125043.149555967
gene_0475insB2784035040
gene_0494insB2898745042.6996194
gene_1775insB104035550464.34092903
gene_7637insB44136465043.599492533
gene_7779insB449849129430.85279314
gene_7781insB449878521078.82888648
gene_0647insC3817304110
gene_1727insC10164934110
gene_2221insC12803544110
gene_5485insC31640494111.103494061
gene_7744insC44779974110
gene_0648insD3820809240
gene_1728insD10168619060
gene_2222insD12807229060
gene_2529insD14501039060
gene_3625insD20527559060
gene_5486insD31644179060
gene_7745insD44783479242.6996194
gene_0535insE3157063090
gene_0536insE3157153000
gene_2021insE11773203090
gene_3785insE21540383000
gene_0467insH2733269810
gene_0505insH2944579810
gene_0973insH5663999810
gene_1163insH6782579810
gene_1843insH108165110170
gene_2218insH12782209810
gene_2394insH137705910170.222977414
gene_2401insH138060910170.222977414
gene_2466insH140992898116.18120497
gene_3011insH17257509810
gene_3316insH18933789810
gene_3623insH2050108101728.31813152
gene_3690insH208569810170
gene_3792insH215819510175.351457925
gene_3983insH227405910170
gene_5382insH31081219810
gene_5806insH33436089810
gene_6318insH363008810170
gene_6713insH386872810170
gene_0463insI269828115216.53516882
gene_2530insI145154011521.181083487
gene_7761insI4487236115211.22029313
gene_0028insL1544011190
gene_1028insL59831911190
gene_4345insL249920611190
gene_0462insN2694674059.518658032
gene_7760insN448696726740.76728622
gene_0464insO271055426117.11025
gene_7764insO448872859720.89152701
gene_6038kefG34585125550
gene_6039kefG34585125520
gene_2164ldrB12478211350
gene_2167ldrB124835613568.87029047
gene_2169ldrB12488911350
gene_2465lomR140964015615.99005337
gene_2469lomR14109961711.326128828
gene_3836molR21814688253.573314406
gene_3838molR218240419388.658841171
gene_3840molR21843049604.251900555
gene_7426phnE43014373694.916380045
gene_7427phnE43016556215.842654547
gene_1997potA116185111370
gene_1998potA116185111190
gene_6273rhsB359709842363.051411163
gene_6498rhsB374686929413.11243709
gene_1713sulA10094025160
gene_1714sulA10094025100
gene_3689wbbL208519928299.71360166
gene_3691wbbL2086719474189.4517716
gene_1830ycdN10716841114.085910443
gene_1832ycdN107179472011.65335708
gene_2077ycgH1198254152132.05465244
gene_2078ycgH11998591017105.0223618
gene_2212ychG127307959144.50946097
gene_2213ychG127362123154.97406778
gene_2284yciX131614312954.49464277
gene_2285yciX1316253189278.3607559
gene_2528ydbA1447574255913.20376569
gene_2532ydbA14528723324120.0697148
gene_3511yedN19951511921.181083487
gene_3512yedN19953523210.70644246
gene_3570yedS20178544865.132609723
gene_3572yedS201834921016.1977164
gene_3574yedS201864240519.03731606
gene_3597yeeL20360793273.467401064
gene_3599yeeL20364277050.964970339
gene_4039yfaS231467948615.39782917
gene_4041yfaS2315180410411.65888261
gene_4169yfcC240196215210.298182813
gene_4171yfcC240200414790
gene_7747yjgX447981343258.26678538
gene_7748yjgX448020236030.23573728
gene_0530ykgM3127981413697.444681
gene_0531ykgM3129382672342.420321
gene_0940ylbE54878112600.179974627
gene_0941ylbE54903910020
gene_2630yncI151276874721.55358782
gene_2632yncI151355820115.79478813
gene_2579yncK148532516831.0456231
gene_2580yncK148554428837.7946716
gene_2084ypjA12023482135.323193183
gene_4790ypjA2756055450023.98701824

¤³¤ì¤é¤Ë¤Ä¤¤¤Æ¤Ï¡¢È¯¸½Î̲òÀϤÎÅö½é¤Ç¤ÏÊüÃÖ¤·¤Æ¤ª¤¤¤¿¡ÊCDS°ÌÃ֤ǶèÊ̤·¤Æ½èÍý¤·¤Æ ¤¤¤¿¡Ë¤¬¡¢°äÅÁ»Ò¤Îȯ¸½Î̤Ȥ·¤Æ¤ß¤ë¤È²¿¤é¤«¤ÎȽÃǤò¤·¤Ê¤±¤ì¤Ð¤Ê¤é¤Ê¤¤¡£

ÆâÍÆ¤ò¸«¤ë¤È¡Ê´ßËÜÀèÀ¸¤Ë¤è¤ë¡Ë¡¢Æ±¤¸ÇÛÎ󤬥³¥Ô¡¼¤µ¤ì¤Æ¤¤¤ë¾ì¹ç¤È¡¢£±¤Ä¤ÎÇÛÎó¤¬ÅÓÃæ¤Ë;ʬ¤ÊÔó»¨Êª¤¬Æþ¤Ã¤ÆÀÚ¤ì¤Æ¤¤¤ë¤è¤¦¤Ë¸«¤¨¤ë¾ì¹ç¤¬¤¢¤ë¡£¤µ¤é¤Ë°Üư¤ÎÅÓÃæ¤Çû¤¯¤Ê¤Ã¤¿¤êŤ¯¤Ê¤Ã¤¿¤ê¤·¤Æ¤¤¤ë¥±¡¼¥¹¤¬¤¢¤ë¡£¤³¤ì¤é¤ÎȽÃÇ¤ÏÆñ¤·¤¤¤¬¡¢

¤½¤ì¤¾¤ì¡¢È¯¸½Î̤ˤĤ¤¤Æ¼¡¤Î¤è¤¦¤ÊÂбþ¤ò¤¹¤ë¤³¤È¤Ë¤¹¤ë¡£

¤³¤ì¤é¤Ë¤è¤Ã¤Æ¡¢gene̾¤ò¡ÊCDS°ÌÃ֤˰ͤ餺¡Ë¥æ¥Ë¡¼¥¯¤Ë¤·¤¿¡£

¤ä¤ê¤¿¤¤¤³¤È¡§

¥µ¥ó¥×¥ë´Ö¤ÎÊÑÆ°¥Ñ¥¿¡¼¥ó¤ògene¤´¤È¤Ëµá¤á¤Æ¡¢gene´Ö¤ÇÈæ³Ó¤·¤Æ»÷¤Æ¤¤¤ë¥Ñ¥¿¡¼¥ó¡ÊÁêÊäŪ¤Ê¥Ñ¥¿¡¼¥ó¤ò´Þ¤á¤Æ¡Ë¤òõ¤¹¡£¤â¤·¥Ñ¥¿¡¼¥ó´Ö¤Îµ÷Î¥¤òÄêµÁ¤Ç¤­¤ì¤Ð¡¢µ÷Î¥¤Î¶á¤¤¤â¤Î¤ò¥°¥ë¡¼¥×²½¤¹¤ë¡Ê¥¯¥é¥¹¥¿²½¡Ë¤ÈƱ»þ¤Ë¡¢µ÷Î¥¤ò¸µ¤Ë¥°¥é¥Õ²½¤·¤Æ¹Í¤¨¤ë¤³¤È¤Çgene´Ö¤Î´ØÏ¢¡ÊÈ¿±þÅù¤Î¤Ä¤Ê¤¬¤ê¡Ë¤ò¹Í¤¨¤ë¤³¤È¤¬¤Ç¤­¤ë¡£

ÊÑÆ°¤ÎÂоݤȤ·¤Æ¡¢¥Ç¡¼¥¿¤«¤é·ÏÎó¤Ë½¾¤Ã¤Æ¡¢¡¡Anc - ... - 1_2-1 - 2_5-1 ¤È¡¡Anc - ... - 2_2-1 - 2_6-2¡¡¤òÃê½Ð¤¹¤ë¡£

Î㳰Ū¥Ñ¥¿¡¼¥ó¤Ë¤Ä¤¤¤Æ

0 - ... - 0 - 0 - 0¡¡¤Ä¤Þ¤ê°ìÀÚȯ¸½Ìµ¤·¡¢¤¬¤¢¤êÆÀ¤ë¡£¤³¤ì¤Ï¤ª¤½¤é¤¯Ê¬ÀÏÂоݤ«¤é½ü³°¤·¤Æ¤è¤¤¤À¤í¤¦¡£

0 - ÅÓÃæ¤Ënon-0¤¬¤¢¤ë - ...¡¡¤³¤ì¤é¤Î¥±¡¼¥¹¤Ï¡¢Îã³°»ë¤¹¤ëɬÍפϤʤ¯¡¢£±¤Ä¤Î¥Ñ¥¿¡¼¥ó¤È¹Í¤¨¤Æ¤è¤«¤í¤¦¡£Anc¤¬0¤Ç¤¢¤ë¥±¡¼¥¹¤Ï¡¢¡Ö¤Ê¤«¤Ã¤¿¤â¤Î¤¬½Ð¤Æ¤¯¤ë¤è¤¦¤Ë¤Ê¤Ã¤¿¡×¤È¤¤¤¦°ÕÌ£¤Ç¾¯¤·¹Í¤¨¤ëɬÍפ¬¤¢¤ë¤«¤âÃΤì¤Ê¤¤¤¬¡¢¤È¤ê¤¢¤¨¤º£±¤Ä¤Î¥Ñ¥¿¡¼¥ó¤È¤·¤Æ¹Í¤¨¤ë¤³¤È¤Ë¤¹¤ë¡£

¤Ê¤ª¡¢¸å½Ò¤Î¤è¤¦¤Ë¡¢ÊÑÆ°¤È¤·¤Æ·ÏÎó¾å¤ÎÁ°¤ÎÃͤȤÎÈæ¤ò¹Í¤¨¤ë¤È¡¢0¤Ç³ä¤ë¤³¤È¤Ë¤Ê¤ë¤Î¤Ç¡¢Èù¾®ÃͤËÃÖ¤­´¹¤¨¤ë¤Ê¤É¤Î¹©Éפ¬É¬Íפˤʤ뤷¡¢Èù¾®Ãͤˤ¹¤ë¤Èlog10¤ò¼è¤ë¤ÈÉé¤ÎÂ礭¤ÊÃͤˤʤäÆÂ¿¾¯»ÏËö¤¬°­¤¤¡£

Ʊ¤¸gene̾¤¬Ê£¿ô¤ÎCDS¤Ë¸½¤ì¤ë·ï¤Ë¤Ä¤¤¤Æ¡£¡©¡©

ÊÑÆ°¤Î¿ôÃÍ»ØÉ¸¤Ë¤Ä¤¤¤Æ

­¡Àµµ¬²½¡§
ȯ¸½Î̤ÎÀäÂÐÃͤϡ¢º£²¾¤Ë°ÕÌ£¤¬Ìµ¤¤¤È¹Í¤¨¤ë¡£Íߤ·¤¤¤Î¤Ï¥µ¥ó¥×¥ë´Ö¤Ç¸«¤¿¤È¤­¤ÎÁý¸º¤Î¥Ñ¥¿¡¼¥ó¡ÊÁý¸º¤ÎÊý¸þ¤È¿²É¡¢¤½¤ì¤¬°ìÏ¢¤Î¥µ¥ó¥×¥ë´Ö¤Ç¤É¤¦¤¤¤¦Ï¢º¿¤«¡Ë¤Ê¤Î¤Ç¡¢¿ôÃͤÏgene´Ö¤ÇÀµµ¬²½¤¹¤ëɬÍפ¬¤¢¤ë¤¬¡¢£±¤Ä¤ÎÊýË¡¤È¤·¤ÆÆ±°ìgeneÆâ¤Ç¤ÎÊÑÆ°ÈæÎ¨¤ò¼è¤Ã¤Æ¤·¤Þ¤¦¡£
£²¤Ä¤ÎÈæÎ¨¤¬¹Í¤¨¤é¤ì¤ë¡§£±ÈÖÌܤϴð½àÃ͡ʤ¿¤È¤¨¤ÐAnc¤«¡¢¤Þ¤¿¤ÏÁ´ÂΤÎÊ¿¶ÑÃÍ¡¢ºÇÉÑÃͤΤ褦¤Ê¤â¤Î¡Ë¤ËÂФ¹¤ëÈæÎ¨¡¢£²ÈÖÌܤϥµ¥ó¥×¥ë´Ö¤ÎÈæÎ¨¡£

­¢ÃͤÎlog¡Êlog10¡Ë¤Ë¤è¤ë°µ½Ì
¤³¤ì¤Ï¡¢È¯¸½Î̤ÎÀäÂÐÃͤ¬gene¤Ë¤è¤Ã¤Æ·å¤¬°ã¤¦ÅÀ¤ò¹Íθ¤¹¤ë¤¿¤á¤ËƳÆþ¤ò¹Í¤¨¤é¤ì¤ë¤¬¡¢¤â¤·­¡¤ÇÈæÎ¨¤ò¹Í¤¨¤ë¤È¤½¤ì¤Û¤É°ÕÌ£¤Ï̵¤¤¡£¤Þ¤¿ÈæÎ¨¤Ï¡¢Âпô²½¤·¤¿¸å¤Ç¤Ïº¹¤Ë¤Ê¤ë¤³¤È¤Ë¤âα°Õ¤¹¤ë¡£
Âпô²½¤ÎÃí°ÕÅÀ¤Ï¡¢È¯¸½Ãͤ¬0¤Î¾ì¹ç¤Ëlog¤¬¼è¤ì¤Ê¤¤¤³¤È¤Ç¡¢¤³¤Î¾ì¹çÈù¾®ÃͤËÃÖ¤­´¹¤¨¤ë¤³¤È¤¬¹Í¤¨¤é¤ì¤ë¡Ê⤷log¤ò¼è¤Ã¤¿·ë²Ì¤ÎÃͤÏÉé¤ÎÂ礭¤Ê¿ô¤Ë¤Ê¤ë¡Ë¡£·ÏÎó¾å¤¹¤Ù¤Æ0¤Î¥±¡¼¥¹¤Ï²òÀÏÂоݤ«¤é³°¤¹¤³¤È¤¬¹Í¤¨¤é¤ì¤ë¤¬¡¢°ìÉô¤Îȯ¸½ÃͤΤß0¤Î¥±¡¼¥¹¤Ï°ÕÌ£¤¬¤¢¤êÆÀ¤ë¤Î¤Ç¡¢³°¤¹¤³¤È¤Ï¹Í¤¨¤Ê¤¤¡£

ÅÓÃæ¤ÎÆ»Áð

º£¤Ïlog(tpm/Anc)¤ò¼è¤Ã¤¿É½¡£

geneStartLengthAnc43B45a_minus45A_minus45L1_2-12_5-1
gene_0001thrL1906600.5188013710.1208220120.1305530920.6622781990.7369817060.782980429
gene_0003thrA338246300.3635458120.2172938690.2070900280.4623602140.4576295920.265097062
gene_0005thrB280293300.3756433540.3384470830.3130375920.5336653190.356697460.17628472
gene_0007thrC3735128700.3519461810.1782091510.1605569290.4493966540.3094693790.092155761
gene_0008yaaX523529700.3618048210.4974061890.5658562190.4833611960.2549799220.141397537
gene_0009yaaA568477700.127790066-0.16041128-0.0464044330.0151052210.04435137-0.083863423
gene_0011yaaJ653014310-0.0619139110.1208024030.2154775330.1323669250.108788741-0.028453563
gene_0014talB823995400.025617922-0.296533334-0.4083314670.0772363670.2417954450.203333867
gene_0016mog93075880-0.18900972-0.26035833-0.255735167-0.114590533-0.140329364-0.159416034
gene_0017yaaH99295670-0.151212522-0.346290801-0.274411814-0.177810069-0.161112427-0.103483574

¾åµ­¥Ç¡¼¥¿¤ËÂФ·¤Æclustering¤ò¹Ô¤Ã¤¿·ë²Ì¡ÊÁ´Éô¹Ô¤¦¤Èµ÷Î¥·×»»¤Ë»þ´Ö¤¬Èó¾ï¤Ë¤«¤«¤ë¤Î¤Ç100¸Ä¤À¤±¡Ë

fileCompareTPM.pdf

Á´ÂΤò¥¯¥é¥¹¥¿²½

%matplotlib inline
#
import pandas as pd
import numpy as np
from scipy.spatial import distance
from scipy.cluster.hierarchy import linkage, dendrogram
import matplotlib.pyplot as plt
import math
import os
import pickle

def dfnormalize(row):  # Anc¤¬0¤Ê¤é¥ª¡¼¥ë0¡¢¤½¤¦¤Ç¤Ê¤±¤ì¤Ðlog(u/Anc)
    Anc = row['Anc']
    rest = row.to_list()[3:]
    #print('rest\n', rest)
    if Anc==0:
        result = [0] * len(rest)
    else:
        result = [math.log10(u/Anc) if u>0.000001 else math.log10(0.000001/Anc) for u in rest] 
    #print('result\n', result)
    output = pd.Series(result, index=(row.index.to_list()[3:]))
    output['gene'] = row['gene']
    output['Anc'] = 0 if Anc==0 else 0
    output['Start'] = row['Start']
    output['Length'] = row['Length']
    #print('output\n', output)
    return(output)

#def myeuc(u):   # Euclideanµ÷Î¥¤ò·×»»¤¹¤ë´Ø¿ô¡Ámap¤¹¤ë¤¿¤á¤ËÍѰÕ
#    #print('u\n', u, '\ndfl.loc[target]\n', dfl.loc[target])
#    result = distance.euclidean(u, dfl[['Anc', '43B', '45a_minus', '45A_minus', '45L', '1_2-1', '2_5-1']].loc[target])
#    #print('result:', result)
#    return(result)

picklefname = 'DistanceTest.pickle'
slist = ['Anc', '43B', '45a_minus', '45A_minus', '45L', '1_2-1', '2_5-1']
    
if not os.path.exists(picklefname):
    fname = 'count_tpm.tsv'
    df = pd.read_csv(fname, sep='\t', index_col=0)
    #print(df.columns.to_list())
    df = df.rename(columns=
    {'10B.sam': '45a_2-10Bplus', 
     '10D_minus.sam': '45a_10D_minus', 
     '1p2-1.sam': '1_2-1', 
     '1p2-2.sam': '1_2-2', 
     '2-10B_minus.sam': '45a_2-10B_minus', 
     '2p5-1.sam': '2_5-1', 
     '2p6-1.sam': '2_6-1', 
     '43B.sam': '43B', 
     '45A_minus.sam': '45A_minus', 
     '45A_plus.sam': '45A_plus', 
     '45L.sam': '45L', 
     '45a10D_plus.sam': '45a_10D_plus', 
     '45aIII6c_plus.sam': '45a_III6c_plus', 
     '45a_minus.sam': '45a_minus', 
     '45a_plus.sam': '45a_plus', 
     '45alll6c_minus.sam': '45a_III6c_minus', 
     '45b_minus.sam': '45b_minus', 
     '45b_plus.sam': '45b_plus', 
     '45c_minus.sam': '45c_minus', 
     '45c_plus.sam': '45c_plus', 
     '45d7B_minus.sam': '45d_7B_minus', 
     '45d7B_plus.sam': '45d_7B_plus', 
     'Anc.sam': 'Anc', 
     'PwOw_minus.sam': 'PwOw_minus', 
     'PwOw_plus.sam': 'PwOw_plus'           
    })
    df['gene'] = [u[10:] for u in df.index.to_list()]
    df.index = [u[:9] for u in df.index.to_list()]
     
    dfdup = df[df.duplicated(subset='gene', keep=False)]\
    [['gene', 'Start', 'Length', 'Anc', '43B', '45A_minus', '45L', \
          '1_2-1', '2_5-1']].sort_values(['gene', 'Start'])
    dfdup.to_excel('DuplicatedCDS.xlsx')
    print(dfdup)
    
    df1 = df.copy()[['gene', 'Start', 'Length', \
          'Anc', '43B', '45a_minus', '45b_minus', '45c_minus', '45A_minus', '45L', \
          '1_2-1', '2_5-1']]

    df1 = df1[df1['Anc']!=0]   # Anc¤¬0¤Î¤â¤Î¤ò½ü¤¯¡ÊAnc¤Ç³ä¤ë¤«¤é¡Ë
    df1x = df1[:].apply(dfnormalize, axis=1)
    df1x = df1x[['gene', 'Start', 'Length', \
             'Anc', '43B', '45a_minus', '45A_minus', '45L', '1_2-1', '2_5-1']]
    df1x.to_excel('CompareTPM.xlsx')

############
# Line Graphs
############
    #df1g = df1x[['Anc', '43B', '45a_minus', '45A_minus', '45L', '1_2-1', '2_5-1']]
    #min = 0.4  # ÀäÂÐÃͤ¬0.4°Ê¾å¤Î¥Ç¡¼¥¿ÅÀ¤À¤±¥×¥í¥Ã¥È
    #df1g = df1g[(abs(df1g['45a_minus'])>min) & (abs(df1g['45A_minus'])>min) &\
    #            (abs(df1g['45L'])>min) & (abs(df1g['1_2-1'])>min) & (abs(df1g['2_5-1'])>min) ]
    #
    #df1g.T.plot()
    #plt.show()

    df2 = df.copy()[slist]
    # df2 = df2 + 1  # ÃÍ0¤òÈò¤±¤ë¤¿¤á ¢Í¡¡¤¹¤Ù¤­¤Ç¤Ï¤Ê¤¤¡£¤à¤·¤íÈù¾®¤ÊÀµ¿ô¤Ë¤¹¤Ù¤­¤À¤í¤¦¡£
    df2 = df2 + 0.00000001
    dfl = np.log10(df2)   # Àè¤Ëlog10¤ò¼è¤ë
    dfl_t = dfl.T

    # ¥ª¡¼¥ë0¤Î¹Ô¤Ï½ü¤«¤Ê¤±¤ì¤Ð¤Ê¤é¤Ê¤¤
    dfl_copy = dfl.replace(0.0, np.nan).dropna(how='all', axis=0)
    dfl = dfl.loc[dfl_copy.index]
    print('dfl\n', dfl.head()); print()
    dfl.to_pickle(picklefname)
else:
    dfl = pd.read_pickle(picklefname)

pickle2fname = 'CompareTPM2.pickle'
if not os.path.exists(pickle2fname):
    #dfl = dfl.head(100)

    # target¤È¾¤Îgene¤È¤ÎÂФε÷Î¥¤ò·×»»¤¹¤ë
    genenamelist = dfl.T.columns.to_list()
    print(genenamelist)
    #for target in genenamelist[:50]:
    for target in genenamelist:
        #print('dfl[slist].loc[target]\n', dfl[slist].loc[target])
        dfl['D_'+target] = dfl[slist].apply(lambda x: \
            distance.euclidean(x, dfl[slist].loc[target]), axis=1)

        print('target:', target)
        #dfx = dfl.sort_values('d', ascending=True)[:10]
        #print('dfx\n', dfx)
        #dfg = dfx.drop('d', axis=1).T
        #dfg.plot()
        #plt.ylabel('$log_{10}(TPM)$')
        #plt.title(target)
        #plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
        #plt.show()
       
    print(dfl.drop(columns=slist).head(), '\n')
    dfl.to_pickle('CompareTPM2.pickle')
else:
    dfl = pd.read_pickle(pickle2fname)

pickleLinkagefname = 'CompareTPMLinkage.pickle'
if not os.path.exists(pickleLinkagefname):
    dArray = distance.squareform(dfl.drop(columns=slist))
    result = linkage(dArray, method = 'average')
    node_labels = [u[2:] for u in dfl.drop(columns=slist).columns.to_list()]
    
    with open(pickleLinkagefname, 'wb') as pfw:
        pickle.dump([result, node_labels], pfw)
else:
    with open(pickleLinkagefname, 'rb') as pf:
        result, node_labels = pickle.load(pf)

plt.figure(figsize=(100,100), dpi=200, facecolor='w', edgecolor='k')
dendrogram(result, labels=node_labels)
plt.savefig('CompareTPM.pdf')
plt.show()
print('complete')

¹¹¤Ë¡¢½ÐÎϤÎCompareTPMLinkage.pickle¤òÆÉ¤ó¤Ç¡¢fcluster¤Ç¥¯¥é¥¹¥¿¤ò½ÐÎÏ

%matplotlib inline
# ºÇ¸å¤ÎÉôʬ¡ÊLinkage·×»»¤è¤ê¸å¤í¤ÎÉôʬ¡Ë¤À¤±
import pandas as pd
import numpy as np
from scipy.spatial import distance
from scipy.cluster.hierarchy import linkage, dendrogram
import matplotlib.pyplot as plt
import os
import pickle

# gene_number¤«¤égene_name¤Ø¤ÎÊÑ´¹¼­½ñ
fname = 'count_tpm.tsv'
df = pd.read_csv(fname, sep='\t', index_col=0)
gene_name_dict = {u[:9]: u[10:] for u in df.index.to_list()}
#print(gene_name_dict)

pickleLinkagefname = 'CompareTPMLinkage.pickle'
with open(pickleLinkagefname, 'rb') as pf:
    result, node_labels = pickle.load(pf)
# print(result[:10])
NUM_CLUSTERS = 10
for num in range(10, NUM_CLUSTERS+1):
    labels = fcluster(result, t=num, criterion='maxclust')
    #fcluster¤Ï¡¢ÆþÎϤ¬¤É¤Î¥¯¥é¥¹¥¿¤Ë°¤¹¤ë¤«¡Ê¥¯¥é¥¹¥¿ÈÖ¹æ labels¡Ë¤òÊÖ¤¹
    #print(num, labels)
    # ¥¯¥é¥¹¥¿¤´¤È¤Ë¡¢¤½¤ì¤Ë°¤¹¤ëÆþÎϤò¥ê¥¹¥È¤È¤·¤ÆÉ½¼¨
    clusters = []
    for cl_id in range(1, num+1):
        l = [gene_name_dict[ node_labels[n] ] for n in range(0,len(labels)) if labels[n]==cl_id]
        #print(' ', cl_id, l)
        clusters.append([cl_id, l])
    with open('clusters_'+str(num)+'.pickle', 'wb') as pwf:
        pickle.dump(clusters, pwf)
print('complete')

¥¯¥é¥¹¥¿¤´¤È¤Ë¡¢¤½¤ì¤¾¤ì¤Ë°¤¹¤ëgene¤Îȯ¸½ÊÑÆ°¤ò¥°¥é¥Õɽ¼¨¤¹¤ë¡£

%matplotlib inline
# ¥¯¥é¥¹¥¿¤Ë°¤¹¤ëgene¤Î¥°¥é¥Õ¤òÉÁ¤¯
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import pickle

fname = 'count_tpm.tsv'
df = pd.read_csv(fname, sep='\t', index_col=0)
gene_name_dict = {u[:9]: u[10:] for u in df.index.to_list()}
#print(gene_name_dict)

NUM = 10
with open('clusters_'+str(NUM)+'.pickle', 'rb') as pf:
    clusters = pickle.load(pf)

picklefname = 'DistanceTest.pickle'
dfl = pd.read_pickle(picklefname)
#print(dfl.index.to_list())
for ucl_id, l in clusters:
    print(l)
    ############
    # Line Graphs
    dfl['gene'] = [gene_name_dict[u] for u in dfl.index]
    #print(l, dfl[dfl['gene'].isin(l)])
    dfg = dfl[dfl['gene'].isin(l)]
    dfg = dfg[['Anc', '43B', '45a_minus', '45A_minus', '45L', '1_2-1', '2_5-1']]
    dfg = dfg.iloc[0:10, :]
    dfg.T.plot()
    plt.show()

·ëÏÀ¤Ï¡¢

cluster_1.pngcluster_2.pngcluster_3.pngcluster_4.pngcluster_5.png
cluster_6.pngcluster_7.pngcluster_8.pngcluster_9.pngcluster_10.png

¤ª¤Þ¤±

ɬ¿Ü°äÅÁ»Ò¤Îȯ¸½¥Ñ¥¿¡¼¥ó¤Ï¤É¤¦¤Ê¤Ã¤Æ¤¤¤ë¤Î¤«¡©


źÉÕ¥Õ¥¡¥¤¥ë: filecluster_9.png 487·ï [¾ÜºÙ] filecluster_10.png 479·ï [¾ÜºÙ] filecluster_8.png 460·ï [¾ÜºÙ] filecluster_7.png 484·ï [¾ÜºÙ] filecluster_6.png 459·ï [¾ÜºÙ] filecluster_5.png 486·ï [¾ÜºÙ] filecluster_4.png 441·ï [¾ÜºÙ] filecluster_3.png 449·ï [¾ÜºÙ] filecluster_2.png 413·ï [¾ÜºÙ] filecluster_1.png 450·ï [¾ÜºÙ] fileCompareTPM.pdf 419·ï [¾ÜºÙ]

¥È¥Ã¥×   ÊÔ½¸ Åà·ë º¹Ê¬ ¥Ð¥Ã¥¯¥¢¥Ã¥× źÉÕ Ê£À½ ̾Á°Êѹ¹ ¥ê¥í¡¼¥É   ¿·µ¬ °ìÍ÷ ñ¸ì¸¡º÷ ºÇ½ª¹¹¿·   ¥Ø¥ë¥×   ºÇ½ª¹¹¿·¤ÎRSS
Last-modified: 2020-03-02 (·î) 11:56:09 (1122d)