Blast¤Î¥À¥¦¥ó¥í¡¼¥É¡¡Download BLAST Software and Databases Documentation
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ncbi-blast-2.9.0+-x64-linux.tar.gz tar -xvf ncbi-blast-2.9.0+-x64-linux.tar.gz
¥¤¥ó¥¹¥È¡¼¥ë¡¡Installation - BLAST; Command Line Applications User Manual - NCBI Bookshelf
¤³¤³¤Ç¤Ï¡¡/usr/local/bin/ncbi-blast-2.9.0+/¡¡¤ËŸ³«¤µ¤ì¤Æ¤¤¤ë¤Î¤Ç¡¢
ln -s /usr/local/ncbi-blast-2.9.0+ blast
¤Ç /usr/local/blast ¤òºî¤Ã¤Æ¤ª¤¡¢PATH¤Ë /usr/local/blast/bin ¤òÄɲá£
»È¤¤Êý¤Î¥Þ¥Ë¥å¥¢¥ë¤Ï¡¡BLAST® Command Line Applications User Manual - NCBI Bookshelf ¤Ë¤¢¤ë¡£
BLAST configuration file¤òºî¤ë¤Î¤¬Îɤ¤¤é¤·¤¤¡£
FASTA¤«¤éºî¤ì¤ë¤é¤·¤¤¡£Building a BLAST database with local sequences - BLAST® Command Line Applications User Manual - NCBI Bookshelf ¥µ¥ó¥×¥ë¤Ç»î¤·¤Æ¤ß¤ë¡£
>seq1 MSFSTKPLDMATWPDFAALVERHNGVWGGCWCMAFHAKGSGAVGNREAKEARVREGSTHAALVFDGSACVGWCQFGPTGE LPRIKHLRAYEDGQAVLPDWRITCFFSDKAFRGKGVAAAALAGALAEIGRLGGGTVESYPEDAQGRTVAGAFLHNGTLAM >seq2 MKAIDLKAEEKKRLIEGIQDFFYEERNEEIGIIAAEKALDFFLSGVGKLIYNKALDESKIWFSRRLEDISLDYELLYK >seq3 MTLAAAAQSATWTFIDGDWYEGNVAILGPRSHAMWLGTSVFDGARWFEGVAPDLELHAARVNASAIALGLAPNMTPEQIV GLTWDGLKKFDGKTAVYIRPMYWAEHGGYMGVPADPASTRFCLCLYESPMISPTGFSVTVSPFRRPTIETMPTNAKAGCL YPNNGRAILEAKARGFDNALVLDMLGNVAETGSSNIFLVKDGHVLTPAPNGTFLSGITRSRTMTLLGDYGFRTTEKTLSV RDFLEADEIFSTGNHSKVVPITRIEGRDLQPGPVAKKARELYWDWAHSASVG >seq4 MRSFFHHVAAADPASFGVAQRVLTIPIKRAHIEVTHHLTKAEVDALIAAPNPRTSRGRRDRTFLLFLARTGARVSEATGV NANDLQLERSHPQVLLRGKGRRDRVIPIPQDLARALTALLAEHGIANHEPRPIFIGARQERLTRFGATHIVRRAAAQAVT IKPALAHKPISPHIFRHSLAMKLLQSGVDLLTIQAWLGHAQVATTHRYAAADVEMMRKGLEKAGVSGDLGLRFRPNDAVL QLLTSI >seq5 MTISRVCGSRTEAMLTNGQEIAMTSILKSTGAVALLLLYTLTANATSLMISPSSIERVAPDRAAVFHLRNQMDRPISIKV RVFRWSQKGGVEKLEPTGDVVASPISAQLSPNGNRAVRVVRVSKEPLRSEEGYRVVIDEADPTRNTPEAESLSARHVLPV LFRPPDVLGPEIELSLTRSDGWLMLVVENKGASRLRRSDVTLAQGSAGIARREGFVGYVLPGLTRHWRVGREDSYSGGIV TVSANSSGGAIGEQLVVSGR >seq6 TTLLLQVPIGWGVLHQGGALVVLGFAIAHWRGFVGTYTRDTAIEMRD
¤³¤ì¤ËÂФ·¤Æ
makeblastdb -in test.fsa -parse_seqids -blastdb_version 5 -title "Cookbook demo" -out test -dbtype prot Building a new DB, current time: 09/03/2019 14:35:54 New DB name: /home/yamanouc/src/Kishimoto2017/test-Blast/test.fsa New DB title: Cookbook demo Sequence type: Protein Keep MBits: T Maximum file size: 1000000000B Adding sequences from FASTA; added 6 sequences in 0.0214489 seconds.
¹¹¤Ë¡¢Èæ³ÓÆþÎÏÍѤÎÇÛÎó¤È¤·¤Æ test.fasta ¤òͽ¤á¡Ê¥Ç¡¼¥¿¥Ù¡¼¥¹¥Ç¡¼¥¿¤Ë»È¤Ã¤¿seq1¤À¤±¤ò´Þ¤à¥Õ¥¡¥¤¥ë¤È¤·¤Æ¡Ëºî¤Ã¤Æ¤ª¤¯¡£
¤³¤ì¤ò»È¤Ã¤Æ¡¢
blastp -db test < test.fasta
¤È¤¹¤ë¤È¡¢Æ°ºî¤¹¤ë¡£
BLASTP 2.9.0+ Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Reference for composition-based statistics: Alejandro A. Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, John L. Spouge, Yuri I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), "Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements", Nucleic Acids Res. 29:2994-3005. Database: Cookbook demo 6 sequences; 1,083 total letters Query= seq1 Length=160 Score E Sequences producing significant alignments: (Bits) Value seq1 unnamed protein product 328 2e-121 seq4 unnamed protein product 14.2 6.5 seq3 unnamed protein product 14.2 6.9 >seq1 unnamed protein product Length=160 Score = 328 bits (840), Expect = 2e-121, Method: Compositional matrix adjust. Identities = 160/160 (100%), Positives = 160/160 (100%), Gaps = 0/160 (0%) Query 1 MSFSTKPLDMATWPDFAALVERHNGVWGGCWCMAFHAKGSGAVGNREAKEARVREGSTHA 60 MSFSTKPLDMATWPDFAALVERHNGVWGGCWCMAFHAKGSGAVGNREAKEARVREGSTHA Sbjct 1 MSFSTKPLDMATWPDFAALVERHNGVWGGCWCMAFHAKGSGAVGNREAKEARVREGSTHA 60 Query 61 ALVFDGSACVGWCQFGPTGELPRIKHLRAYEDGQAVLPDWRITCFFSDKAFRGKGVAAAA 120 ALVFDGSACVGWCQFGPTGELPRIKHLRAYEDGQAVLPDWRITCFFSDKAFRGKGVAAAA Sbjct 61 ALVFDGSACVGWCQFGPTGELPRIKHLRAYEDGQAVLPDWRITCFFSDKAFRGKGVAAAA 120 Query 121 LAGALAEIGRLGGGTVESYPEDAQGRTVAGAFLHNGTLAM 160 LAGALAEIGRLGGGTVESYPEDAQGRTVAGAFLHNGTLAM Sbjct 121 LAGALAEIGRLGGGTVESYPEDAQGRTVAGAFLHNGTLAM 160 >seq4 unnamed protein product Length=246 Score = 14.2 bits (25), Expect = 6.5, Method: Compositional matrix adjust. Identities = 8/27 (30%), Positives = 11/27 (41%), Gaps = 0/27 (0%) Query 101 RITCFFSDKAFRGKGVAAAALAGALAE 127 R+T F + R A + ALA Sbjct 141 RLTRFGATHIVRRAAAQAVTIKPALAH 167 >seq3 unnamed protein product Length=292 Score = 14.2 bits (25), Expect = 6.9, Method: Compositional matrix adjust. Identities = 6/11 (55%), Positives = 8/11 (73%), Gaps = 0/11 (0%) Query 56 GSTHAALVFDG 66 GS++ LV DG Sbjct 192 GSSNIFLVKDG 202 Lambda K H a alpha 0.321 0.136 0.447 0.792 4.96 Gapped Lambda K H a alpha sigma 0.267 0.0410 0.140 1.90 42.6 43.6 Effective search space used: 117390 Database: Cookbook demo Posted date: Sep 3, 2019 2:57 PM Number of letters in database: 1,083 Number of sequences in database: 6 Matrix: BLOSUM62 Gap Penalties: Existence: 11, Extension: 1 Neighboring words threshold: 11 Window for multiple hits: 40
½ÐÎϤòºÙ¤«¤¯À©¸æ¤·¤è¤¦¤È¤¹¤ë¤È¡¢¤¿¤È¤¨¤Ð
blastp -db test -outfmt "10 qseqid sseqid pident length mismatch evalue bitscore" -out test.output < test.fasta
¤È¤¹¤ë¤È¡¢
seq1,seq1,100.000,160,0,2.25e-121,328 seq1,seq4,29.630,27,19,6.5,14.2 seq1,seq3,54.545,11,5,6.9,14.2
¤¬ÆÀ¤é¤ì¤ë¡£¾ÜºÙ¤Ïblastp -help¤ª¤è¤ÓDisplay BLAST search results with custom output format - BLAST® Command Line Applications User Manual - NCBI Bookshelf¤ò»²¾È¡£
Evalue¤ÎÃͤÎÀ©¸Â¤ò´Ë¤á¤Æ¡ÊÂ礤¯¤·¤Æ¡Ë¡¢¤â¤Ã¤ÈÉý¹¤¯°ìÃפò¼è¤ë¤È¡¢¡Ê¤³¤³¤Ç¤ÏÆþÎϤò¸µ¤Îtest.fsa¤ÎÁ´ÇÛÎó¤Ë¤·¤Æ¤¤¤ë¡Ë
blastp -db test -outfmt "10 qseqid sseqid pident length mismatch evalue bitscore" -evalue 100 < test.fsa
¤À¤È¡¢¤Û¤ÜÁ´Éô¤ÎÁȹ礻¤Ë¤Ä¤¤¤Æ½ÐÎϤµ¤ì¤ë¡£
seq1,seq1,100.000,160,0,2.25e-121,328 seq1,seq4,29.630,27,19,6.5,14.2 seq1,seq3,54.545,11,5,6.9,14.2 seq1,seq3,25.000,40,30,22,12.7 seq1,seq3,50.000,8,4,28,12.3 seq1,seq3,33.333,21,14,40,11.9 seq1,seq3,33.333,9,6,57,11.5 seq1,seq5,25.000,20,10,44,11.9 seq1,seq5,35.000,20,12,90,10.8 seq2,seq2,100.000,78,0,5.31e-54,151 seq2,seq3,33.333,12,8,56,10.4 seq3,seq3,100.000,292,0,0.0,602 seq3,seq5,19.403,67,39,0.28,19.6 seq3,seq5,28.000,25,18,2.5,16.5 seq3,seq5,29.730,37,25,4.9,15.8 seq3,seq5,50.000,8,4,59,12.3 seq3,seq4,27.711,83,39,8.1,15.0 seq3,seq4,66.667,6,2,17,13.9 seq3,seq4,27.368,95,49,45,12.7 seq3,seq1,54.545,11,5,14,14.2 seq3,seq1,25.000,40,30,33,13.1 seq3,seq1,50.000,8,4,50,12.3 seq3,seq1,33.333,9,6,99,11.5 seq4,seq4,100.000,246,0,0.0,489 seq4,seq3,28.421,95,48,2.2,16.5 seq4,seq3,27.473,91,45,6.0,15.0 seq4,seq3,66.667,6,2,13,14.2 seq4,seq5,38.462,13,8,8.9,14.6 seq4,seq5,42.857,21,9,35,12.7 seq5,seq5,100.000,260,0,0.0,519 seq5,seq3,19.403,67,39,0.26,19.6 seq5,seq3,28.000,25,18,2.6,16.5 seq5,seq3,29.730,37,25,4.7,15.4 seq5,seq3,50.000,8,4,51,12.3 seq5,seq4,38.462,13,8,10.0,14.6 seq5,seq4,42.857,21,9,32,13.1 seq5,seq1,25.000,20,10,77,11.5 seq6,seq6,100.000,47,0,6.34e-33,95.9
¼ÂºÝ¤Î¥Ç¡¼¥¿¤Ç¤ä¤Ã¤Æ¤ß¤ë¡£AE000512¦¤ÎCDS¤Îfasta¥Õ¥¡¥¤¥ë¤òAE000512CDS.fasta¤È¤·¤Æºî¤Ã¤Æ¤¢¤ë¤Î¤Ç¡¢¤³¤ì¤ò¥Ç¡¼¥¿¥Ù¡¼¥¹¤ËÊÑ´¹¤¹¤ë¡£
makeblastdb -in AE000512CDS.fasta -parse_seqids -blastdb_version 5 -title "Thermo" -out AE000512 -dbtype nucl Building a new DB, current time: 09/03/2019 15:23:58 New DB name: /home/yamanouc/src/Kishimoto2017/test-Blast/AE000512 New DB title: Thermo Sequence type: Nucleotide Keep MBits: T Maximum file size: 1000000000B Adding sequences from FASTA; added 1846 sequences in 0.117031 seconds.
¤³¤Î¥Ç¡¼¥¿¥Ù¡¼¥¹¤ò»È¤Ã¤Æ¡¢blastn ¤òµ¯Æ°¤·¡¢AP012030¦¤ÎCDS¤Îfasta¥Õ¥¡¥¤¥ëAP012030CDS.fasta¤òÆþÎϤȤ·¤ÆÍ¿¤¨¤ë¡£
blastn -db AE000512 -outfmt "10 qseqid sseqid pident length mismatch evalue bitscore" -out AP012030blast.out -evalue 100 < AP012030CDS.fasta
¤È¤·¤¿¤È¤³¤í¡¢ÆÀ¤é¤ì¤¿¤Î¤Ï
EcoliECDH1ME8569_1662,ThermoTM_0740,100.000,30,0,3.16e-08,56.5
¤À¤±¤Ç¤¢¤Ã¤¿¡£Evalue¤ò¤³¤³¤Þ¤Ç¹¤²¤Æ¤â£±¤Ä¤À¤±¤À¤Ã¤¿¡£
µÕ¤ò»î¤·¤Æ¤ß¤è¤¦¡£¤Ä¤Þ¤ê¡¢¥Ç¡¼¥¿¥Ù¡¼¥¹¤ËAP012030¤òÃÖ¤¡¢blastÆþÎϤËAE000512¤ò Æþ¤ì¤Æ¤ß¤ë¡£
¥Ç¡¼¥¿¥Ù¡¼¥¹¤òºî¤ë¡£
makeblastdb -in AP012030CDS.fasta -parse_seqids -blastdb_version 5 -title "Ecoli" -out AP012030 -dbtype nucl
AE000512CDS.fasta¤ÈÈæ³Ó¤¹¤ë¡£
blastn -db AP012030 -outfmt "10 qseqid sseqid pident length mismatch evalue bitscore" -out AE000512blast.out -evalue 100 < AE000512CDS.fasta
·ë²Ì¤ÏƱ¤¸¤Ç¡¢
ThermoTM_0740,EcoliECDH1ME8569_1662,100.000,30,0,7.23e-08,56.5
¤Ç¤¢¤Ã¤¿¡£
¤Ê¤ª¡¢¤³¤Î£²¤Ä¤Î°äÅÁ»Ò¤Ï¡¢AP012030¦¤¬
gene complement(1784023..1785951) /gene="thrS" /locus_tag="ECDH1ME8569_1662" /gene_synonym="b1719" /gene_synonym="ECK1717" /gene_synonym="JW1709" CDS complement(1784023..1785951) /gene="thrS" /locus_tag="ECDH1ME8569_1662" /gene_synonym="b1719" /gene_synonym="ECK1717" /gene_synonym="JW1709" /function="Threonyl-tRNA synthetase" /codon_start=1 /transl_table=11 /product="threonyl-tRNA synthetase" /protein_id="BAJ43518.1"
ÇÛÎó¤¬
ATGCCTGTTATAACTCTTCCTGATGGCAGCCAACGCCATTACGATCACGCTGTAAGCCCCATGGATGTTGCGCTGGACATTGGTCCAGGTCTGGCGAAAGCCTGTATCGCAGGGCGCGTTAATGGCGAACTGGTTGATGCTTGCGATCTGATTGAAAACGACGCACAACTGTCGATCATTACCGCCAAAGACGAAGAAGGTCTGGAGATCATTCGTCACTCCTGTGCGCACCTGTTAGGGCACGCGATTAAACAACTTTGGCCGCATACCAAAATGGCAATCGGCCCGGTTATTGACAACGGTTTTTATTACGACGTTGATCTTGACCGCACGTTAACCCAGGAAGATGTCGAAGCACTCGAGAAGCGGATGCATGAGCTTGCTGAGAAAAACTACGACGTCATTAAGAAGAAAGTCAGCTGGCACGAAGCGCGTGAAACTTTCGCCAACCGTGGGGAGAGCTACAAAGTCTCCATTCTTGACGAAAACATCGCCCATGATGACAAGCCAGGTCTGTACTTCCATGAAGAATATGTCGATATGTGCCGCGGTCCGCACGTACCGAACATGCGTTTCTGCCATCATTTCAAACTAATGAAAACGGCAGGGGCTTACTGGCGTGGCGACAGCAACAACAAAATGTTGCAACGTATTTACGGTACGGCGTGGGCAGACAAAAAAGCACTTAACGCTTACCTGCAGCGCCTGGAAGAAGCCGCGAAACGCGACCACCGTAAAATCGGTAAACAGCTCGACCTGTACCATATGCAGGAAGAAGCGCCGGGTATGGTATTCTGGCACAACGACGGCTGGACCATCTTCCGTGAACTGGAAGTGTTTGTTCGTTCTAAACTGAAAGAGTACCAGTATCAGGAAGTTAAAGGTCCGTTCATGATGGACCGTGTCCTGTGGGAAAAAACCGGTCACTGGGACAACTACAAAGATGCAATGTTCACCACATCTTCTGAGAACCGTGAATACTGCATTAAGCCGATGAACTGCCCGGGTCACGTACAAATTTTCAACCAGGGGCTGAAGTCTTATCGCGATCTGCCGCTGCGTATGGCCGAGTTTGGTAGCTGCCACCGTAACGAGCCGTCAGGTTCGCTGCATGGCCTGATGCGCGTGCGTGGATTTACCCAGGATGACGCGCATATCTTCTGTACTGAAGAACAAATTCGCGATGAAGTTAACGGATGTATCCGTTTAGTCTATGATATGTACAGCACTTTTGGCTTCGAGAAGATCGTCGTCAAACTCTCCACTCGTCCTGAAAAACGTATTGGCAGCGACGAAATGTGGGATCGTGCTGAGGCGGACCTGGCGGTTGCGCTGGAAGAAAACAACATCCCGTTTGAATATCAACTGGGTGAAGGCGCTTTCTACGGTCCGAAAATTGAATTTACCCTGTATGACTGCCTCGATCGTGCATGGCAGTGCGGTACAGTACAGCTGGACTTCTCTTTGCCGTCTCGTCTGAGCGCTTCTTATGTAGGCGAAGACAATGAACGTAAAGTACCGGTAATGATTCACCGCGCAATTCTGGGGTCGATGGAACGTTTCATCGGTATCCTGACCGAAGAGTTCGCTGGTTTCTTCCCGACCTGGCTTGCGCCGGTTCAGGTTGTTATCATGAATATTACCGATTCACAGTCTGAATACGTTAACGAATTGACGCAAAAACTATCAAATGCGGGCATTCGTGTTAAAGCAGACTTGAGAAATGAGAAGATTGGCTTTAAAATCCGCGAGCACACTTTGCGTCGCGTCCCATATATGCTGGTCTGTGGTGATAAAGAGGTGGAATCAGGCAAAGTTGCCGTTCGCACCCGCCGTGGTAAAGACCTGGGAAGCATGGACGTAAATGAAGTGATCGAGAAGCTGCAACAAGAGATTCGCAGCCGCAGTCTTAAACAATTGGAGGAATAA
¤Þ¤¿AE000512¦¤¬
gene complement(763605..765527) /locus_tag="TM_0740" /old_locus_tag="TM0740" CDS complement(763605..765527) /locus_tag="TM_0740" /old_locus_tag="TM0740" /note="similar to GB:M36593 SP:P18256 PID:143764 PID:1565235 GB:AL009126 percent identity: 73.50; identified by sequence similarity; putative" /codon_start=1 /transl_table=11 /product="threonyl-tRNA synthetase" /protein_id="AAD35821.1" /translation="MKIKVKLPDGKEKEYDRGITPAEIAKELGIKKAIGAVVNGELWD LKRPIENDCELRLVTLEDPEAPEFYRHTMAHILAQAVMRLYGKENVKLGIGPTIENGF YYDFDIKNGRLTEEDLPKIEQEMKKIIKENLPIERKEISKEEARELFRDQPYKLELIE EIEGNRVTIYRQGEFVDLCRGPHLPSTGIVKHFKLLSVSGAYWRGSEKNPMLTRVYGT AFAKKEDLDNYLKFLEEAQRRDHRKLGPQLELFMLNTEYAPGMPFFLPKGVVVLNELM KFSRELHRERGYQEIFTPLIMNEQLWKISGHWDHYAENMYFIEKDEERYAVKPMNCPG HILVYKSRTVSYRDLPLRFFEFGRVHRYERSGVLHGLMRVRSFTQDDAHIFCTPDQIE EEILGVLDLINTIYSQFGFTYRVELSTMPEDHMGDEAIWEKATTALKNALERAGLSYK VNEGEGAFYGPKIDFHIRDSIGREWQCATIQLDFMMPEKFNVTYIGPDNKEHRAVMIH RAIYGSLERFFGILIEHFAGAFPTWLAPIQVAVIPISEKHNDGAEKVARRISQEGFRV FFDNRRETLGYRIRQAQTQKIPYMIIIGDKELESGKISVRTRTGKEIKDVDPEHFVET LRNEVLSRKLELSMEG"
ÇÛÎó¤¬
ATGAAGATAAAGGTGAAGCTTCCAGACGGAAAAGAAAAAGAGTACGACAGAGGCATCACACCGGCGGAGATCGCAAAAGAACTGGGCATAAAAAAGGCGATCGGAGCGGTTGTTAACGGTGAACTCTGGGACCTGAAAAGACCCATCGAAAATGACTGCGAACTGCGGCTTGTAACTCTGGAGGATCCCGAGGCACCCGAGTTCTACAGACACACCATGGCACACATCCTCGCACAGGCTGTGATGAGACTCTATGGAAAAGAAAACGTAAAACTGGGAATCGGCCCCACCATTGAGAACGGTTTTTACTACGATTTCGATATAAAGAACGGAAGGCTCACAGAGGAAGATCTCCCAAAGATAGAACAGGAGATGAAGAAGATAATAAAGGAAAATCTTCCTATAGAAAGAAAGGAGATCAGCAAAGAAGAGGCAAGAGAACTTTTCAGAGATCAGCCTTACAAACTCGAGCTAATAGAGGAGATCGAAGGCAACAGAGTAACGATCTACCGTCAGGGAGAGTTCGTGGATCTCTGCAGAGGTCCTCATCTTCCTTCGACAGGAATAGTGAAGCACTTCAAACTCCTCTCCGTTTCAGGCGCGTACTGGCGAGGCAGTGAGAAGAATCCCATGCTCACAAGAGTTTACGGAACGGCCTTCGCCAAAAAGGAAGACCTCGACAATTACTTGAAATTCCTTGAAGAAGCCCAGCGAAGGGACCACAGAAAATTGGGACCTCAACTCGAACTTTTCATGCTGAACACCGAGTACGCTCCCGGTATGCCGTTCTTCCTTCCAAAGGGAGTCGTAGTGCTCAACGAATTGATGAAATTTTCCAGAGAACTCCATCGTGAGAGAGGATACCAGGAGATCTTCACACCGCTCATAATGAACGAGCAGCTCTGGAAGATCTCCGGTCACTGGGATCACTACGCTGAGAACATGTACTTCATAGAGAAAGACGAGGAAAGGTACGCGGTGAAACCCATGAACTGTCCCGGTCACATTCTCGTGTACAAGAGCAGGACGGTTTCCTACAGGGACCTACCGCTGAGATTCTTCGAGTTTGGCCGGGTTCACAGATACGAGAGAAGCGGAGTTCTCCACGGTCTCATGAGGGTGAGATCCTTCACACAGGACGATGCACACATATTCTGCACTCCCGATCAGATCGAGGAAGAGATACTCGGCGTTCTCGATCTCATAAACACCATCTACAGCCAATTCGGCTTCACCTACAGAGTGGAACTCAGCACAATGCCCGAAGATCACATGGGTGACGAAGCGATTTGGGAGAAGGCAACAACCGCCTTGAAGAATGCCTTAGAAAGGGCCGGGCTCTCGTACAAAGTGAACGAGGGTGAAGGCGCTTTCTACGGTCCGAAAATAGACTTCCACATAAGAGATTCGATAGGAAGAGAGTGGCAGTGTGCCACGATTCAGCTGGATTTCATGATGCCCGAAAAGTTCAACGTCACTTACATAGGCCCAGACAACAAAGAGCACAGAGCCGTGATGATACACAGAGCGATATACGGTTCGCTCGAAAGGTTCTTTGGCATCCTGATAGAGCACTTCGCTGGAGCGTTTCCAACCTGGCTTGCACCGATCCAGGTGGCAGTTATTCCCATATCAGAAAAACACAACGACGGTGCGGAAAAGGTTGCAAGGAGGATCTCTCAGGAAGGGTTCAGGGTCTTCTTCGATAACCGCCGTGAAACTCTTGGATACCGTATCAGACAGGCTCAAACCCAGAAAATACCGTACATGATAATAATAGGTGACAAAGAGCTGGAGAGCGGGAAGATCTCCGTGAGAACAAGAACGGGAAAAGAGATAAAAGACGTAGATCCAGAACACTTCGTCGAAACTCTGAGAAACGAAGTTCTGAGCAGAAAGCTCGAACTTTCAATGGAGGGATAA
¤Ç¤¢¤Ã¤¿¡£
ǰ¤Î¤¿¤á¡¢Éôʬ°ìÃפ¬Àµ¤·¤¯¸¡½Ð¤Ç¤¤Æ¤¤¤ë¤«¤Î³Îǧ¤Î¤¿¤á¡¢¼¡¤Î¤è¤¦¤Ê¥Æ¥¹¥È¤ò¹Ô¤Ã¤¿¡£
AE000512¤ò¥Ç¡¼¥¿¥Ù¡¼¥¹¤Ë¤·¤Æ¡¢Æ±¤¸AE000512.fasta¤«¤é°ìÉô¤òÈ´¤½Ð¤·¤Æ¹¹¤Ë¥·¡¼¥±¥ó¥¹¤ÎÁ°¸å¤òŬÅö¤ËÀÚ¤ê¼Î¤Æ¤ÆÃ»¤¯¤·¤¿¡Ê²¼µTM_0005¤ÏŤµ100¤ò»Ä¤·¡¢0006¤ÏŤµ200¤ò»Ä¤·¤¿¡Ë¤â¤Î¤òÍѰդ·¤¿¡£¤³¤ì¤Ë¤è¤êÇÛÎó¤ÎÅÓÃæÉôʬ¤¬´°Á´°ìÃפ·¤Æ¤¤¤ëÎã¤òºî¤Ã¤¿¤³¤È¤Ë¤Ê¤ë¡£¤³¤ì¤òblast¤ÇÈæ³Ó¤·¤¿¡£
>ThermoTM_0005_Mod DNA helicase, putative GAAGATAGACACAGAAATCGGTGTGGGTGATCTTGTTCTCATAAGCAAAGGAAACCCCCTCAAGAGTGATTACACGGGAACGGTTGTGGAGAAAGGAGAG
>ThermoTM_0006_Mod muconate cycloisomerase GGAAGACATCGAAGCCGTGGAAGAGATAGCGAAAGTCACCCGTGGAGCGAAGTACATAGTTGACGCGAACATGGGATACACCCAGAAGGAGGCGGTGGAGTTCGCGAGAGCAGTATATCAAAAAGGAATAGACATCGCTGTGTACGAACAACCTGTGAGGAGGGAAGACATAGAAGGCTTGAAGTTCGTGAGGTTCCACT
¤³¤Î¥Õ¥¡¥¤¥ëAEshort.fasta¤ËÂФ·¤Æ
blastn -db AE000512 -outfmt "10 qseqid sseqid pident length mismatch evalue bitscore" -evalue 100 < AEshort.fasta
¤ò¼Â¹Ô¤·¤¿¤È¤³¤í¡¢·ë²Ì¤Ï
ThermoTM_0005_Mod,ThermoTM_0005,100.000,100,0,1.64e-48,185 ThermoTM_0006_Mod,ThermoTM_0006,100.000,200,0,9.45e-104,370
¤È¤Ê¤ê¡¢Àµ¤·¤¯°ìÃ׸¡½Ð¤·¤¿¤À¤±¤Ç¤Ê¤¯¡¢°ìÃ×Éôʬ¤Î°ìÃ×Ψ100%¤Ç°ìÃ׍µ¤â100,200¤ÈÀµ¤·¤¯ÆÀ¤é¤ì¤Æ¤¤¤ë¡£
¤È¤Ê¤ë¤È¡¢AP012030¤ÈAE000512¤Î´Ö¤Ç¤Î°ìÃ×Èæ³Ó¤¬£±¤Ä¤·¤«½Ð¤Ê¤«¤Ã¤¿¤³¤È¤Ï¡¢¤«¤Ê¤ê³Î¼Â¤Ë¡¢Éôʬ°ìÃפ¹¤ë¤â¤Î¤¬Ìµ¤«¤Ã¤¿¤È¸À¤¨¤ë¤Î¤À¤í¤¦¡£
GenBank¥¢¥Î¥Æ¡¼¥·¥ç¥ó¥Õ¥¡¥¤¥ë¤«¤é¡¢CDS¤òÃê½Ð¤·¡¢¤½¤ì¤¾¤ì¤ÎCDS¤Î¡ã¥¢¥ß¥Î»À¤Î¡ä FASTA¥Õ¥¡¥¤¥ë¤òºî¤ë¡£´ðËÜŪ¤Ë¤Ï¾å¤ÈƱ¤¸¤À¤¬¡¢ÇÛÎó¤ò±ö´ðÇÛÎ󤫤饢¥ß¥Î»ÀÇÛÎó¤Ë ÊÑ´¹¤·¤Æ¤«¤é½ÐÎϤ¹¤ë¡£ÊÑ´¹¤ÏBioPython¤ÎTranslateµ¡Ç½¡£
#Read CDS and write Fasta -- AminoAcid version import os, sys import numpy as np import pandas as pd from Bio import SeqIO from Bio.Seq import * from Bio.SeqRecord import SeqRecord from Bio.SeqFeature import SeqFeature, FeatureLocation from collections import OrderedDict import pickle from ReadCDSforComparison import ReadCDS
def OutputCDS(tab1, tab2): with open('test-Clustalo.fasta', 'w') as cf: for i,r in tab1.iterrows(): print('>Ecoli_%s %s\n%s\n' % (r['locus_tag'].strip(), r['product'], r['seq']), file=cf) for i,r in tab2.iterrows(): print('>Thermo_%s %s\n%s\n' % (r['locus_tag'].strip(), r['product'], r['seq']), file=cf) return def OutputCDSforBlastp(tab1, tab2): with open('AP012030CDSAA.fasta', 'w') as cf: for i,r in tab1.iterrows(): AAseq = r['seq'].translate(table=11, stop_symbol="") print('>%s %i %s\n%s\n' % (r['locus_tag'].strip(), len(AAseq), r['product'], AAseq), file=cf) with open('AE000512CDSAA.fasta', 'w') as cf: for i,r in tab2.iterrows(): AAseq = r['seq'].translate(table=11, stop_symbol="") print('>Th_%s %i %s\n%s\n' % (r['locus_tag'].strip(), len(AAseq), r['product'], AAseq), file=cf) return if __name__ == '__main__': ReviseEcoliPickle = False #ReviseEcoliPickle = True ReviseThermoPickle = False #ReviseThermoPickle = True Ecoligbfile = 'AP012030.gb' EcoliCDSfile = 'AP012030CDS.pickle' if os.path.exists(EcoliCDSfile) and not ReviseEcoliPickle: with open(EcoliCDSfile, 'rb') as pf: EcoliCDS = pickle.load(pf) else: EcoliCDS = ReadCDS(Ecoligbfile) with open(EcoliCDSfile, 'wb') as pwf: pickle.dump(EcoliCDS, pwf) Thermogbfile = 'AE000512.gb' ThermoCDSfile = 'AE000512CDS.pickle' if os.path.exists(ThermoCDSfile) and not ReviseThermoPickle: with open(ThermoCDSfile, 'rb') as pf: ThermoCDS = pickle.load(pf) else: ThermoCDS = ReadCDS(Thermogbfile) with open(ThermoCDSfile, 'wb') as pwf: pickle.dump(ThermoCDS, pwf) OutputCDSforBlastp(EcoliCDS, ThermoCDS) print('complete')
¤³¤ì¤ÇÆÀ¤é¤ì¤ëAP012030CDSAA.fasta¤ÈAE000512CDSAA.fasta¤ò»È¤Ã¤Æblastp¤òưºî¤µ¤»¤ë¡£ ¤³¤Î¼ê½ç¤Ï¾åµ¤ÎDNAÇÛÎó»þ¤ÈƱ¤¸¡£
¤Þ¤º¡¢ÇÛÎó¥Ç¡¼¥¿¥Ù¡¼¥¹¤òºî¤ë¡£
makeblastdb -in AE000512CDSAA.fasta -parse_seqids -blastdb_version 5 -title "ThermoAA" -out AE000512AA -dbtype prot
·ë²Ì¤Ï
Building a new DB, current time: 09/05/2019 06:51:46 New DB name: /home/yamanouc/src/Kishimoto2017/test-Blast/AE000512AA New DB title: ThermoAA Sequence type: Protein Keep MBits: T Maximum file size: 1000000000B Adding sequences from FASTA; added 1846 sequences in 0.10688 seconds.
¤³¤Î¥Ç¡¼¥¿¥Ù¡¼¥¹¤ò»È¤Ã¤Æ¡¢blastp ¤òµ¯Æ°¤·¡¢AP012030¦¤ÎCDSAA¤Îfasta¥Õ¥¡¥¤¥ëAP012030CDSAA.fasta¤òÆþÎϤȤ·¤ÆÍ¿¤¨¤ë¡£º£²ó¤Ï¤«¤Ê¤ê¿¿ô¤Î¥Ò¥Ã¥È¤¬¸«¤é¤ì¤ë¤Î¤Ç¡¢E-value¤ÎïçÃͤϥǥե©¥ë¥ÈÄ̤êE<10¤È¤¹¤ë¡£
blastp -db AE000512AA -outfmt "10 qseqid sseqid pident length mismatch evalue bitscore" -out AP012030AAblast.out -num_threads 32 -evalue 10 < AP012030CDSAA.fasta
µÕÊý¸þ¤Ç»î¤¹¡£¥Ç¡¼¥¿¥Ù¡¼¥¹¤òºî¤ë¡£
makeblastdb -in AP012030CDSAA.fasta -parse_seqids -blastdb_version 5 -title "EcoliAA" -out AP012030AA -dbtype prot
AE000512CDSAA.fasta¤ÈÈæ³Ó¤¹¤ë¡£
blastp -db AP012030AA -outfmt "10 qseqid sseqid pident length mismatch evalue bitscore" -out AE000512AAblast.out -num_threads 32 -evalue 10 < AE000512CDSAA.fasta
·ë²Ì¤Ï¡¢Â¿¿ô¤ÎÎà»÷¥·¡¼¥±¥ó¥¹¤¬Ãê½Ð¤µ¤ì¤¿¡£ |¥Ç¡¼¥¿¥Ù¡¼¥¹vsÂÐ¾Ý | E<10 ¤Î¥Þ¥Ã¥Á¥ó¥°·ï¿ô|
AE000512 vs AP012030 | 47949 |
AP012030 vs AE000512 | 147190 |
¤«¤Ê¤ê¿¤¤¤Î¤Ç¡¢¤µ¤é¤Ë¹Ê¤ê¹þ¤àɬÍפ¬¤¢¤ë¡£
blastp¤Îµ¯Æ°¤Ï¡¢biopython¤ò²ð¤·¤Æ¹Ô¤¦¤³¤È¤¬¤Ç¤¤ë¡£¤³¤ì¤Î¥á¥ê¥Ã¥È¤Ï¡¢Æþ½ÐÎϤÎÀܳ¤¬¡Ê¥Õ¥¡¥¤¥ë·Ðͳ¤ËÈæ¤Ù¤Æ¡Ë¤ä¤äÊØÍø¤«¤Ê¡¢¤È¤¤¤¦¤È¤³¤í¡£¤µ¤Û¤ÉÊØÍø¤Ç¤â¤Ê¤¤¤«¡©
¤È¤ê¤¢¤¨¤º¡¢¼¡¤Î¤è¤¦¤Ë¤¹¤ë¤È¤Ç¤¤ë¡£
from Bio.Blast.Applications import NcbiblastpCommandline #help(NcbiblastpCommandline) blastp_cline = NcbiblastpCommandline(db='AP012030AA', query='AE000512CDSAA.fasta', outfmt='"10 qseqid sseqid pident length mismatch evalue bitscore"', out='AE000512AAblast2.out2', num_threads=32, evalue=0.001) blastp_cline # ¤³¤ì¤Ç¼Â¹Ô¡© print(blastp_cline) # ¤³¤ì¤Ë¤è¤Ã¤Æ°Ê²¼¤Î¤è¤¦¤Ë½ñ¤¤À¤¹¡£ blastp -out AE000512AAblast2.out2 -outfmt "10 qseqid sseqid pident length mismatch evalue bitscore" -query AE000512CDSAA.fasta -db AP012030AA -evalue 0.001 -num_threads 32 #stdout, stderr = blastp_cline() # ¤³¤ì¤ÏÍפé¤Ê¤¤¡© #print(stdout) print('complete')
¤³¤ì¤Ë¤è¤Ã¤Æ¡¢-out¤Ç»ØÄꤷ¤¿¥Õ¥¡¥¤¥ë AE000512AAblast2.out2 ¤¬À¸À®¤µ¤ì¤ë¡£
¤Ê¤ª¡¢¥³¥Þ¥ó¥É¥é¥¤¥ó¤Ç¤Ï bitscore ¤Ë¤è¤ëïçÃ͹ʤê¹þ¤ß¤Ï¤Ç¤¤Ê¤¤¡£evalue¤Î¤ß¡£¤·¤¿¤¬¤Ã¤Æ¡¢¤¿¤È¤¨¤Ð -evalue=0.001 ¤È¤«¤·¤Æ¤ª¤¤¤Æ¡¢½ÐÎÏ¥Õ¥¡¥¤¥ë¤Î¾å¤Ç¹Ê¤ê¹þ¤àɬÍפ¬¤¢¤ë¡£
blastp¤ÇÆÀ¤é¤ì¤¿¤½¤ì¤¾¤ì¤Î°ìÃ׸õÊä¤ËÂФ·¤Æ¡¢GenBank¥¢¥Î¥Æ¡¼¥·¥ç¥ó¤«¤é LocusID¤òÍê¤ê¤Ë¤·¤Æ¡¢product¾ðÊó¤ÈÇÛÎó¤ÎŤµ¡Ê¥¢¥ß¥Î»ÀÎó¤ÎŤµ¡Ë¾ðÊó¤òÄɵ¤¹¤ë¡£
#Read blast-out and augment with length(AA) and product import os, sys import numpy as np import pandas as pd from collections import OrderedDict import pickle def read_blast_out(blast_out_file): if not os.path.exists(blast_out_file): print('blast-out file', blast_out_file, 'does not exist') return df = pd.read_csv(blast_out_file, sep=',', header=None) df.columns = ['locus_tag_xx', 'locus_tag_yy', 'pident', 'length', 'mismatch', 'evalue', 'bitscore'] return(df) if __name__ == '__main__': EcoliCDSfile = 'AP012030CDS.pickle' if os.path.exists(EcoliCDSfile): with open(EcoliCDSfile, 'rb') as pf: EcoliCDS = pickle.load(pf) else: print('EcoliCDS file not found', EcoliCDSfile) return ThermoCDSfile = 'AE000512CDS.pickle' if os.path.exists(ThermoCDSfile): with open(ThermoCDSfile, 'rb') as pf: ThermoCDS = pickle.load(pf) else: print('ThermoCDS file not found', ThermoCDSfile) return # Read blast-out file APDF = read_blast_out('AP012030AAblast.out') print('----') print(EcoliCDS.head()) print('----') print(APDF.head()) print('----') ThermoCDS['locus_tag'] = ThermoCDS['locus_tag'].map(lambda u: 'Th_'+u) ThermoCDS = ThermoCDS.drop(columns='AAseq') print(ThermoCDS.head()) print('----') AugAPDF = pd.merge(APDF, EcoliCDS, left_on='locus_tag_xx', right_on='locus_tag') AugAPDF = pd.merge(AugAPDF, ThermoCDS, left_on='locus_tag_yy', right_on='locus_tag') print(AugAPDF.head()) AugAPDF = AugAPDF.sort_values(['locus_tag_yy', 'evalue', 'locus_tag_xx']) AugAPDF.to_excel('AP012030AAblast.xlsx') ### AEDF = read_blast_out('AP012030AAblast.out') AugAEDF = pd.merge(AEDF, ThermoCDS, left_on='locus_tag_yy', right_on='locus_tag') AugAEDF = pd.merge(AugAEDF, EcoliCDS, left_on='locus_tag_xx', right_on='locus_tag') print(AugAEDF.head()) AugAEDF = AugAEDF.sort_values(['locus_tag_xx', 'evalue', 'locus_tag_yy']) AugAEDF.to_excel('AE000512AAblast.xlsx') print('complete')
·ë²Ì¤ÎÀèÆ¬Éôʬ¤Ï¤³¤ó¤Ê¡ÊźÉÕ¥Õ¥¡¥¤¥ë¡Ë´¶¤¸¤Ë¤Ê¤Ã¤¿
£±¡ËAP012030AA¤Î¥Ç¡¼¥¿¥Ù¡¼¥¹¤Ïͽ¤áºî¤Ã¤Æ¤ª¤¯
£²¡ËÂоݹ¥Ç®¶Ýgb¤«¤é¡¢CDSÉôʬ¤Î¥¢¥ß¥Î»ÀÎó¤ò¼è½Ð¤·¤Æ¡¢FASTA¥Õ¥¡¥¤¥ë¤Ëºî¤ë¡£
afile.fasta <- getAA(afile.gb)
£³¡Ëblastp¤Ç¡¢db=AP012030AA¤È¤·¤Æ query=afile.fasta ¤È¤·¤Æ¡¢½ÐÎϤò out=afile.blt
£´¡Ëblastp¤Î½ÐÎϤ«¤é¡¢bitscore>=200 ¤Î¥¨¥ó¥È¥ê¡¼¤À¤±¤ò¥Õ¥£¥ë¥¿¤·¤Æafile200.blt¤ò½ÐÎϤ¹¤ë¡£
£µ¡Ë¤½¤ì¤Ë¡¢Î¾Êý¤Î¥Ç¡¼¥¿¤Îproduct¤òÉղ乤ë
£¶¡Ë¤µ¤Þ¤¶¤Þ¤ÊÂоݹ¥Ç®¶Ý¤Î½ÐÎϤò½¸¤á¤ë¡£