»³Æâ¤Î¥µ¥¤¥È

Blast¤Î¥À¥¦¥ó¥í¡¼¥É

Blast¤Î¥À¥¦¥ó¥í¡¼¥É¡¡Download BLAST Software and Databases Documentation

wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ncbi-blast-2.9.0+-x64-linux.tar.gz
tar -xvf ncbi-blast-2.9.0+-x64-linux.tar.gz

¥¤¥ó¥¹¥È¡¼¥ë¡¡Installation - BLAST; Command Line Applications User Manual - NCBI Bookshelf

¤³¤³¤Ç¤Ï¡¡/usr/local/bin/ncbi-blast-2.9.0+/¡¡¤ËŸ³«¤µ¤ì¤Æ¤¤¤ë¤Î¤Ç¡¢

ln -s /usr/local/ncbi-blast-2.9.0+ blast

¤Ç /usr/local/blast ¤òºî¤Ã¤Æ¤ª¤­¡¢PATH¤Ë /usr/local/blast/bin ¤òÄɲá£

»È¤¤Êý¤Î¥Þ¥Ë¥å¥¢¥ë¤Ï¡¡BLAST® Command Line Applications User Manual - NCBI Bookshelf ¤Ë¤¢¤ë¡£

BLAST configuration file¤òºî¤ë¤Î¤¬Îɤ¤¤é¤·¤¤¡£

¼«Ê¬ÍѤΥǡ¼¥¿¥Ù¡¼¥¹¤òºî¤ë

FASTA¤«¤éºî¤ì¤ë¤é¤·¤¤¡£Building a BLAST database with local sequences - BLAST® Command Line Applications User Manual - NCBI Bookshelf ¥µ¥ó¥×¥ë¤Ç»î¤·¤Æ¤ß¤ë¡£

>seq1
MSFSTKPLDMATWPDFAALVERHNGVWGGCWCMAFHAKGSGAVGNREAKEARVREGSTHAALVFDGSACVGWCQFGPTGE
LPRIKHLRAYEDGQAVLPDWRITCFFSDKAFRGKGVAAAALAGALAEIGRLGGGTVESYPEDAQGRTVAGAFLHNGTLAM
>seq2
MKAIDLKAEEKKRLIEGIQDFFYEERNEEIGIIAAEKALDFFLSGVGKLIYNKALDESKIWFSRRLEDISLDYELLYK
>seq3 
MTLAAAAQSATWTFIDGDWYEGNVAILGPRSHAMWLGTSVFDGARWFEGVAPDLELHAARVNASAIALGLAPNMTPEQIV
GLTWDGLKKFDGKTAVYIRPMYWAEHGGYMGVPADPASTRFCLCLYESPMISPTGFSVTVSPFRRPTIETMPTNAKAGCL
YPNNGRAILEAKARGFDNALVLDMLGNVAETGSSNIFLVKDGHVLTPAPNGTFLSGITRSRTMTLLGDYGFRTTEKTLSV
RDFLEADEIFSTGNHSKVVPITRIEGRDLQPGPVAKKARELYWDWAHSASVG
>seq4
MRSFFHHVAAADPASFGVAQRVLTIPIKRAHIEVTHHLTKAEVDALIAAPNPRTSRGRRDRTFLLFLARTGARVSEATGV
NANDLQLERSHPQVLLRGKGRRDRVIPIPQDLARALTALLAEHGIANHEPRPIFIGARQERLTRFGATHIVRRAAAQAVT
IKPALAHKPISPHIFRHSLAMKLLQSGVDLLTIQAWLGHAQVATTHRYAAADVEMMRKGLEKAGVSGDLGLRFRPNDAVL
QLLTSI
>seq5
MTISRVCGSRTEAMLTNGQEIAMTSILKSTGAVALLLLYTLTANATSLMISPSSIERVAPDRAAVFHLRNQMDRPISIKV
RVFRWSQKGGVEKLEPTGDVVASPISAQLSPNGNRAVRVVRVSKEPLRSEEGYRVVIDEADPTRNTPEAESLSARHVLPV
LFRPPDVLGPEIELSLTRSDGWLMLVVENKGASRLRRSDVTLAQGSAGIARREGFVGYVLPGLTRHWRVGREDSYSGGIV
TVSANSSGGAIGEQLVVSGR
>seq6
TTLLLQVPIGWGVLHQGGALVVLGFAIAHWRGFVGTYTRDTAIEMRD

¤³¤ì¤ËÂФ·¤Æ

makeblastdb -in test.fsa -parse_seqids -blastdb_version 5 -title "Cookbook demo" -out test -dbtype prot

Building a new DB, current time: 09/03/2019 14:35:54
New DB name:   /home/yamanouc/src/Kishimoto2017/test-Blast/test.fsa
New DB title:  Cookbook demo
Sequence type: Protein
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 6 sequences in 0.0214489 seconds.

¹¹¤Ë¡¢Èæ³ÓÆþÎÏÍѤÎÇÛÎó¤È¤·¤Æ test.fasta ¤òͽ¤á¡Ê¥Ç¡¼¥¿¥Ù¡¼¥¹¥Ç¡¼¥¿¤Ë»È¤Ã¤¿seq1¤À¤±¤ò´Þ¤à¥Õ¥¡¥¤¥ë¤È¤·¤Æ¡Ëºî¤Ã¤Æ¤ª¤¯¡£

¤³¤ì¤ò»È¤Ã¤Æ¡¢

blastp -db test < test.fasta

¤È¤¹¤ë¤È¡¢Æ°ºî¤¹¤ë¡£

BLASTP 2.9.0+


Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A.
Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of
protein database search programs", Nucleic Acids Res. 25:3389-3402.


Reference for composition-based statistics: Alejandro A. Schaffer,
L. Aravind, Thomas L. Madden, Sergei Shavirin, John L. Spouge, Yuri
I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001),
"Improving the accuracy of PSI-BLAST protein database searches with
composition-based statistics and other refinements", Nucleic Acids
Res. 29:2994-3005.



Database: Cookbook demo
           6 sequences; 1,083 total letters



Query= seq1

Length=160
                                                                     Score        E
Sequences producing significant alignments:                          (Bits)     Value

seq1 unnamed protein product                                          328        2e-121
seq4 unnamed protein product                                          14.2       6.5   
seq3 unnamed protein product                                          14.2       6.9   


>seq1 unnamed protein product
Length=160

 Score = 328 bits (840),  Expect = 2e-121, Method: Compositional matrix adjust.
 Identities = 160/160 (100%), Positives = 160/160 (100%), Gaps = 0/160 (0%)

Query  1    MSFSTKPLDMATWPDFAALVERHNGVWGGCWCMAFHAKGSGAVGNREAKEARVREGSTHA  60
            MSFSTKPLDMATWPDFAALVERHNGVWGGCWCMAFHAKGSGAVGNREAKEARVREGSTHA
Sbjct  1    MSFSTKPLDMATWPDFAALVERHNGVWGGCWCMAFHAKGSGAVGNREAKEARVREGSTHA  60

Query  61   ALVFDGSACVGWCQFGPTGELPRIKHLRAYEDGQAVLPDWRITCFFSDKAFRGKGVAAAA  120
            ALVFDGSACVGWCQFGPTGELPRIKHLRAYEDGQAVLPDWRITCFFSDKAFRGKGVAAAA
Sbjct  61   ALVFDGSACVGWCQFGPTGELPRIKHLRAYEDGQAVLPDWRITCFFSDKAFRGKGVAAAA  120

Query  121  LAGALAEIGRLGGGTVESYPEDAQGRTVAGAFLHNGTLAM  160
            LAGALAEIGRLGGGTVESYPEDAQGRTVAGAFLHNGTLAM
Sbjct  121  LAGALAEIGRLGGGTVESYPEDAQGRTVAGAFLHNGTLAM  160


>seq4 unnamed protein product
Length=246

 Score = 14.2 bits (25),  Expect = 6.5, Method: Compositional matrix adjust.
 Identities = 8/27 (30%), Positives = 11/27 (41%), Gaps = 0/27 (0%)

Query  101  RITCFFSDKAFRGKGVAAAALAGALAE  127
            R+T F +    R     A  +  ALA 
Sbjct  141  RLTRFGATHIVRRAAAQAVTIKPALAH  167


>seq3 unnamed protein product
Length=292

 Score = 14.2 bits (25),  Expect = 6.9, Method: Compositional matrix adjust.
 Identities = 6/11 (55%), Positives = 8/11 (73%), Gaps = 0/11 (0%)

Query  56   GSTHAALVFDG  66
            GS++  LV DG
Sbjct  192  GSSNIFLVKDG  202



Lambda      K        H        a         alpha
   0.321    0.136    0.447    0.792     4.96 

Gapped
Lambda      K        H        a         alpha    sigma
   0.267   0.0410    0.140     1.90     42.6     43.6 

Effective search space used: 117390


  Database: Cookbook demo
    Posted date:  Sep 3, 2019  2:57 PM
  Number of letters in database: 1,083
  Number of sequences in database:  6



Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Neighboring words threshold: 11
Window for multiple hits: 40

½ÐÎϤòºÙ¤«¤¯À©¸æ¤·¤è¤¦¤È¤¹¤ë¤È¡¢¤¿¤È¤¨¤Ð

blastp -db test -outfmt "10 qseqid sseqid pident length mismatch evalue bitscore" -out test.output < test.fasta

¤È¤¹¤ë¤È¡¢

seq1,seq1,100.000,160,0,2.25e-121,328
seq1,seq4,29.630,27,19,6.5,14.2
seq1,seq3,54.545,11,5,6.9,14.2

¤¬ÆÀ¤é¤ì¤ë¡£¾ÜºÙ¤Ïblastp -help¤ª¤è¤ÓDisplay BLAST search results with custom output format - BLAST® Command Line Applications User Manual - NCBI Bookshelf¤ò»²¾È¡£

Evalue¤ÎÃͤÎÀ©¸Â¤ò´Ë¤á¤Æ¡ÊÂ礭¤¯¤·¤Æ¡Ë¡¢¤â¤Ã¤ÈÉý¹­¤¯°ìÃפò¼è¤ë¤È¡¢¡Ê¤³¤³¤Ç¤ÏÆþÎϤò¸µ¤Îtest.fsa¤ÎÁ´ÇÛÎó¤Ë¤·¤Æ¤¤¤ë¡Ë

blastp -db test -outfmt "10 qseqid sseqid pident length mismatch evalue bitscore" -evalue 100 < test.fsa 

¤À¤È¡¢¤Û¤ÜÁ´Éô¤ÎÁȹ礻¤Ë¤Ä¤¤¤Æ½ÐÎϤµ¤ì¤ë¡£

seq1,seq1,100.000,160,0,2.25e-121,328
seq1,seq4,29.630,27,19,6.5,14.2
seq1,seq3,54.545,11,5,6.9,14.2
seq1,seq3,25.000,40,30,22,12.7
seq1,seq3,50.000,8,4,28,12.3
seq1,seq3,33.333,21,14,40,11.9
seq1,seq3,33.333,9,6,57,11.5
seq1,seq5,25.000,20,10,44,11.9
seq1,seq5,35.000,20,12,90,10.8
seq2,seq2,100.000,78,0,5.31e-54,151
seq2,seq3,33.333,12,8,56,10.4
seq3,seq3,100.000,292,0,0.0,602
seq3,seq5,19.403,67,39,0.28,19.6
seq3,seq5,28.000,25,18,2.5,16.5
seq3,seq5,29.730,37,25,4.9,15.8
seq3,seq5,50.000,8,4,59,12.3
seq3,seq4,27.711,83,39,8.1,15.0
seq3,seq4,66.667,6,2,17,13.9
seq3,seq4,27.368,95,49,45,12.7
seq3,seq1,54.545,11,5,14,14.2
seq3,seq1,25.000,40,30,33,13.1
seq3,seq1,50.000,8,4,50,12.3
seq3,seq1,33.333,9,6,99,11.5
seq4,seq4,100.000,246,0,0.0,489
seq4,seq3,28.421,95,48,2.2,16.5
seq4,seq3,27.473,91,45,6.0,15.0
seq4,seq3,66.667,6,2,13,14.2
seq4,seq5,38.462,13,8,8.9,14.6
seq4,seq5,42.857,21,9,35,12.7
seq5,seq5,100.000,260,0,0.0,519
seq5,seq3,19.403,67,39,0.26,19.6
seq5,seq3,28.000,25,18,2.6,16.5
seq5,seq3,29.730,37,25,4.7,15.4
seq5,seq3,50.000,8,4,51,12.3
seq5,seq4,38.462,13,8,10.0,14.6
seq5,seq4,42.857,21,9,32,13.1
seq5,seq1,25.000,20,10,77,11.5
seq6,seq6,100.000,47,0,6.34e-33,95.9

ËÜÈ֥ǡ¼¥¿¤Ç¤ä¤Ã¤Æ¤ß¤ë

¼ÂºÝ¤Î¥Ç¡¼¥¿¤Ç¤ä¤Ã¤Æ¤ß¤ë¡£AE000512¦¤ÎCDS¤Îfasta¥Õ¥¡¥¤¥ë¤òAE000512CDS.fasta¤È¤·¤Æºî¤Ã¤Æ¤¢¤ë¤Î¤Ç¡¢¤³¤ì¤ò¥Ç¡¼¥¿¥Ù¡¼¥¹¤ËÊÑ´¹¤¹¤ë¡£

makeblastdb -in AE000512CDS.fasta -parse_seqids -blastdb_version 5 -title "Thermo" -out AE000512 -dbtype nucl


Building a new DB, current time: 09/03/2019 15:23:58
New DB name:   /home/yamanouc/src/Kishimoto2017/test-Blast/AE000512
New DB title:  Thermo
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 1846 sequences in 0.117031 seconds.

¤³¤Î¥Ç¡¼¥¿¥Ù¡¼¥¹¤ò»È¤Ã¤Æ¡¢blastn ¤òµ¯Æ°¤·¡¢AP012030¦¤ÎCDS¤Îfasta¥Õ¥¡¥¤¥ëAP012030CDS.fasta¤òÆþÎϤȤ·¤ÆÍ¿¤¨¤ë¡£

blastn -db AE000512 -outfmt "10 qseqid sseqid pident length mismatch evalue bitscore" -out AP012030blast.out -evalue 100 < AP012030CDS.fasta

¤È¤·¤¿¤È¤³¤í¡¢ÆÀ¤é¤ì¤¿¤Î¤Ï

EcoliECDH1ME8569_1662,ThermoTM_0740,100.000,30,0,3.16e-08,56.5

¤À¤±¤Ç¤¢¤Ã¤¿¡£Evalue¤ò¤³¤³¤Þ¤Ç¹­¤²¤Æ¤â£±¤Ä¤À¤±¤À¤Ã¤¿¡£

µÕ¤ò»î¤·¤Æ¤ß¤è¤¦¡£¤Ä¤Þ¤ê¡¢¥Ç¡¼¥¿¥Ù¡¼¥¹¤ËAP012030¤òÃÖ¤­¡¢blastÆþÎϤËAE000512¤ò Æþ¤ì¤Æ¤ß¤ë¡£

¥Ç¡¼¥¿¥Ù¡¼¥¹¤òºî¤ë¡£

makeblastdb -in AP012030CDS.fasta -parse_seqids -blastdb_version 5 -title "Ecoli" -out AP012030 -dbtype nucl

AE000512CDS.fasta¤ÈÈæ³Ó¤¹¤ë¡£

blastn -db AP012030 -outfmt "10 qseqid sseqid pident length mismatch evalue bitscore" -out AE000512blast.out -evalue 100 < AE000512CDS.fasta

·ë²Ì¤ÏƱ¤¸¤Ç¡¢

ThermoTM_0740,EcoliECDH1ME8569_1662,100.000,30,0,7.23e-08,56.5

¤Ç¤¢¤Ã¤¿¡£

¤Ê¤ª¡¢¤³¤Î£²¤Ä¤Î°äÅÁ»Ò¤Ï¡¢AP012030¦¤¬

    gene            complement(1784023..1785951)
                    /gene="thrS"
                    /locus_tag="ECDH1ME8569_1662"
                    /gene_synonym="b1719"
                    /gene_synonym="ECK1717"
                    /gene_synonym="JW1709"
    CDS             complement(1784023..1785951)
                    /gene="thrS"
                    /locus_tag="ECDH1ME8569_1662"
                    /gene_synonym="b1719"
                    /gene_synonym="ECK1717"
                    /gene_synonym="JW1709"
                    /function="Threonyl-tRNA synthetase"
                    /codon_start=1
                    /transl_table=11
                    /product="threonyl-tRNA synthetase"
                    /protein_id="BAJ43518.1"

ÇÛÎó¤¬

ATGCCTGTTATAACTCTTCCTGATGGCAGCCAACGCCATTACGATCACGCTGTAAGCCCCATGGATGTTGCGCTGGACATTGGTCCAGGTCTGGCGAAAGCCTGTATCGCAGGGCGCGTTAATGGCGAACTGGTTGATGCTTGCGATCTGATTGAAAACGACGCACAACTGTCGATCATTACCGCCAAAGACGAAGAAGGTCTGGAGATCATTCGTCACTCCTGTGCGCACCTGTTAGGGCACGCGATTAAACAACTTTGGCCGCATACCAAAATGGCAATCGGCCCGGTTATTGACAACGGTTTTTATTACGACGTTGATCTTGACCGCACGTTAACCCAGGAAGATGTCGAAGCACTCGAGAAGCGGATGCATGAGCTTGCTGAGAAAAACTACGACGTCATTAAGAAGAAAGTCAGCTGGCACGAAGCGCGTGAAACTTTCGCCAACCGTGGGGAGAGCTACAAAGTCTCCATTCTTGACGAAAACATCGCCCATGATGACAAGCCAGGTCTGTACTTCCATGAAGAATATGTCGATATGTGCCGCGGTCCGCACGTACCGAACATGCGTTTCTGCCATCATTTCAAACTAATGAAAACGGCAGGGGCTTACTGGCGTGGCGACAGCAACAACAAAATGTTGCAACGTATTTACGGTACGGCGTGGGCAGACAAAAAAGCACTTAACGCTTACCTGCAGCGCCTGGAAGAAGCCGCGAAACGCGACCACCGTAAAATCGGTAAACAGCTCGACCTGTACCATATGCAGGAAGAAGCGCCGGGTATGGTATTCTGGCACAACGACGGCTGGACCATCTTCCGTGAACTGGAAGTGTTTGTTCGTTCTAAACTGAAAGAGTACCAGTATCAGGAAGTTAAAGGTCCGTTCATGATGGACCGTGTCCTGTGGGAAAAAACCGGTCACTGGGACAACTACAAAGATGCAATGTTCACCACATCTTCTGAGAACCGTGAATACTGCATTAAGCCGATGAACTGCCCGGGTCACGTACAAATTTTCAACCAGGGGCTGAAGTCTTATCGCGATCTGCCGCTGCGTATGGCCGAGTTTGGTAGCTGCCACCGTAACGAGCCGTCAGGTTCGCTGCATGGCCTGATGCGCGTGCGTGGATTTACCCAGGATGACGCGCATATCTTCTGTACTGAAGAACAAATTCGCGATGAAGTTAACGGATGTATCCGTTTAGTCTATGATATGTACAGCACTTTTGGCTTCGAGAAGATCGTCGTCAAACTCTCCACTCGTCCTGAAAAACGTATTGGCAGCGACGAAATGTGGGATCGTGCTGAGGCGGACCTGGCGGTTGCGCTGGAAGAAAACAACATCCCGTTTGAATATCAACTGGGTGAAGGCGCTTTCTACGGTCCGAAAATTGAATTTACCCTGTATGACTGCCTCGATCGTGCATGGCAGTGCGGTACAGTACAGCTGGACTTCTCTTTGCCGTCTCGTCTGAGCGCTTCTTATGTAGGCGAAGACAATGAACGTAAAGTACCGGTAATGATTCACCGCGCAATTCTGGGGTCGATGGAACGTTTCATCGGTATCCTGACCGAAGAGTTCGCTGGTTTCTTCCCGACCTGGCTTGCGCCGGTTCAGGTTGTTATCATGAATATTACCGATTCACAGTCTGAATACGTTAACGAATTGACGCAAAAACTATCAAATGCGGGCATTCGTGTTAAAGCAGACTTGAGAAATGAGAAGATTGGCTTTAAAATCCGCGAGCACACTTTGCGTCGCGTCCCATATATGCTGGTCTGTGGTGATAAAGAGGTGGAATCAGGCAAAGTTGCCGTTCGCACCCGCCGTGGTAAAGACCTGGGAAGCATGGACGTAAATGAAGTGATCGAGAAGCTGCAACAAGAGATTCGCAGCCGCAGTCTTAAACAATTGGAGGAATAA

¤Þ¤¿AE000512¦¤¬

    gene            complement(763605..765527)
                    /locus_tag="TM_0740"
                    /old_locus_tag="TM0740"
    CDS             complement(763605..765527)
                    /locus_tag="TM_0740"
                    /old_locus_tag="TM0740"
                    /note="similar to GB:M36593 SP:P18256 PID:143764
                    PID:1565235 GB:AL009126 percent identity: 73.50;
                    identified by sequence similarity; putative"
                    /codon_start=1
                    /transl_table=11
                    /product="threonyl-tRNA synthetase"
                    /protein_id="AAD35821.1"
                    /translation="MKIKVKLPDGKEKEYDRGITPAEIAKELGIKKAIGAVVNGELWD
                    LKRPIENDCELRLVTLEDPEAPEFYRHTMAHILAQAVMRLYGKENVKLGIGPTIENGF
                    YYDFDIKNGRLTEEDLPKIEQEMKKIIKENLPIERKEISKEEARELFRDQPYKLELIE
                    EIEGNRVTIYRQGEFVDLCRGPHLPSTGIVKHFKLLSVSGAYWRGSEKNPMLTRVYGT
                    AFAKKEDLDNYLKFLEEAQRRDHRKLGPQLELFMLNTEYAPGMPFFLPKGVVVLNELM
                    KFSRELHRERGYQEIFTPLIMNEQLWKISGHWDHYAENMYFIEKDEERYAVKPMNCPG
                    HILVYKSRTVSYRDLPLRFFEFGRVHRYERSGVLHGLMRVRSFTQDDAHIFCTPDQIE
                    EEILGVLDLINTIYSQFGFTYRVELSTMPEDHMGDEAIWEKATTALKNALERAGLSYK
                    VNEGEGAFYGPKIDFHIRDSIGREWQCATIQLDFMMPEKFNVTYIGPDNKEHRAVMIH
                    RAIYGSLERFFGILIEHFAGAFPTWLAPIQVAVIPISEKHNDGAEKVARRISQEGFRV
                    FFDNRRETLGYRIRQAQTQKIPYMIIIGDKELESGKISVRTRTGKEIKDVDPEHFVET
                    LRNEVLSRKLELSMEG"

ÇÛÎó¤¬

ATGAAGATAAAGGTGAAGCTTCCAGACGGAAAAGAAAAAGAGTACGACAGAGGCATCACACCGGCGGAGATCGCAAAAGAACTGGGCATAAAAAAGGCGATCGGAGCGGTTGTTAACGGTGAACTCTGGGACCTGAAAAGACCCATCGAAAATGACTGCGAACTGCGGCTTGTAACTCTGGAGGATCCCGAGGCACCCGAGTTCTACAGACACACCATGGCACACATCCTCGCACAGGCTGTGATGAGACTCTATGGAAAAGAAAACGTAAAACTGGGAATCGGCCCCACCATTGAGAACGGTTTTTACTACGATTTCGATATAAAGAACGGAAGGCTCACAGAGGAAGATCTCCCAAAGATAGAACAGGAGATGAAGAAGATAATAAAGGAAAATCTTCCTATAGAAAGAAAGGAGATCAGCAAAGAAGAGGCAAGAGAACTTTTCAGAGATCAGCCTTACAAACTCGAGCTAATAGAGGAGATCGAAGGCAACAGAGTAACGATCTACCGTCAGGGAGAGTTCGTGGATCTCTGCAGAGGTCCTCATCTTCCTTCGACAGGAATAGTGAAGCACTTCAAACTCCTCTCCGTTTCAGGCGCGTACTGGCGAGGCAGTGAGAAGAATCCCATGCTCACAAGAGTTTACGGAACGGCCTTCGCCAAAAAGGAAGACCTCGACAATTACTTGAAATTCCTTGAAGAAGCCCAGCGAAGGGACCACAGAAAATTGGGACCTCAACTCGAACTTTTCATGCTGAACACCGAGTACGCTCCCGGTATGCCGTTCTTCCTTCCAAAGGGAGTCGTAGTGCTCAACGAATTGATGAAATTTTCCAGAGAACTCCATCGTGAGAGAGGATACCAGGAGATCTTCACACCGCTCATAATGAACGAGCAGCTCTGGAAGATCTCCGGTCACTGGGATCACTACGCTGAGAACATGTACTTCATAGAGAAAGACGAGGAAAGGTACGCGGTGAAACCCATGAACTGTCCCGGTCACATTCTCGTGTACAAGAGCAGGACGGTTTCCTACAGGGACCTACCGCTGAGATTCTTCGAGTTTGGCCGGGTTCACAGATACGAGAGAAGCGGAGTTCTCCACGGTCTCATGAGGGTGAGATCCTTCACACAGGACGATGCACACATATTCTGCACTCCCGATCAGATCGAGGAAGAGATACTCGGCGTTCTCGATCTCATAAACACCATCTACAGCCAATTCGGCTTCACCTACAGAGTGGAACTCAGCACAATGCCCGAAGATCACATGGGTGACGAAGCGATTTGGGAGAAGGCAACAACCGCCTTGAAGAATGCCTTAGAAAGGGCCGGGCTCTCGTACAAAGTGAACGAGGGTGAAGGCGCTTTCTACGGTCCGAAAATAGACTTCCACATAAGAGATTCGATAGGAAGAGAGTGGCAGTGTGCCACGATTCAGCTGGATTTCATGATGCCCGAAAAGTTCAACGTCACTTACATAGGCCCAGACAACAAAGAGCACAGAGCCGTGATGATACACAGAGCGATATACGGTTCGCTCGAAAGGTTCTTTGGCATCCTGATAGAGCACTTCGCTGGAGCGTTTCCAACCTGGCTTGCACCGATCCAGGTGGCAGTTATTCCCATATCAGAAAAACACAACGACGGTGCGGAAAAGGTTGCAAGGAGGATCTCTCAGGAAGGGTTCAGGGTCTTCTTCGATAACCGCCGTGAAACTCTTGGATACCGTATCAGACAGGCTCAAACCCAGAAAATACCGTACATGATAATAATAGGTGACAAAGAGCTGGAGAGCGGGAAGATCTCCGTGAGAACAAGAACGGGAAAAGAGATAAAAGACGTAGATCCAGAACACTTCGTCGAAACTCTGAGAAACGAAGTTCTGAGCAGAAAGCTCGAACTTTCAATGGAGGGATAA

¤Ç¤¢¤Ã¤¿¡£

ǰ¤Î¤¿¤á¡¢Éôʬ°ìÃפ¬Àµ¤·¤¯¸¡½Ð¤Ç¤­¤Æ¤¤¤ë¤«¤Î³Îǧ¤Î¤¿¤á¡¢¼¡¤Î¤è¤¦¤Ê¥Æ¥¹¥È¤ò¹Ô¤Ã¤¿¡£

AE000512¤ò¥Ç¡¼¥¿¥Ù¡¼¥¹¤Ë¤·¤Æ¡¢Æ±¤¸AE000512.fasta¤«¤é°ìÉô¤òÈ´¤­½Ð¤·¤Æ¹¹¤Ë¥·¡¼¥±¥ó¥¹¤ÎÁ°¸å¤òŬÅö¤ËÀÚ¤ê¼Î¤Æ¤ÆÃ»¤¯¤·¤¿¡Ê²¼µ­TM_0005¤ÏŤµ100¤ò»Ä¤·¡¢0006¤ÏŤµ200¤ò»Ä¤·¤¿¡Ë¤â¤Î¤òÍѰդ·¤¿¡£¤³¤ì¤Ë¤è¤êÇÛÎó¤ÎÅÓÃæÉôʬ¤¬´°Á´°ìÃפ·¤Æ¤¤¤ëÎã¤òºî¤Ã¤¿¤³¤È¤Ë¤Ê¤ë¡£¤³¤ì¤òblast¤ÇÈæ³Ó¤·¤¿¡£

>ThermoTM_0005_Mod DNA helicase, putative
GAAGATAGACACAGAAATCGGTGTGGGTGATCTTGTTCTCATAAGCAAAGGAAACCCCCTCAAGAGTGATTACACGGGAACGGTTGTGGAGAAAGGAGAG
>ThermoTM_0006_Mod muconate cycloisomerase
GGAAGACATCGAAGCCGTGGAAGAGATAGCGAAAGTCACCCGTGGAGCGAAGTACATAGTTGACGCGAACATGGGATACACCCAGAAGGAGGCGGTGGAGTTCGCGAGAGCAGTATATCAAAAAGGAATAGACATCGCTGTGTACGAACAACCTGTGAGGAGGGAAGACATAGAAGGCTTGAAGTTCGTGAGGTTCCACT

¤³¤Î¥Õ¥¡¥¤¥ëAEshort.fasta¤ËÂФ·¤Æ

blastn -db AE000512 -outfmt "10 qseqid sseqid pident length mismatch evalue bitscore" -evalue 100 < AEshort.fasta

¤ò¼Â¹Ô¤·¤¿¤È¤³¤í¡¢·ë²Ì¤Ï

ThermoTM_0005_Mod,ThermoTM_0005,100.000,100,0,1.64e-48,185
ThermoTM_0006_Mod,ThermoTM_0006,100.000,200,0,9.45e-104,370

¤È¤Ê¤ê¡¢Àµ¤·¤¯°ìÃ׸¡½Ð¤·¤¿¤À¤±¤Ç¤Ê¤¯¡¢°ìÃ×Éôʬ¤Î°ìÃ×Ψ100%¤Ç°ìÃ׍µ¤â100,200¤ÈÀµ¤·¤¯ÆÀ¤é¤ì¤Æ¤¤¤ë¡£

¤È¤Ê¤ë¤È¡¢AP012030¤ÈAE000512¤Î´Ö¤Ç¤Î°ìÃ×Èæ³Ó¤¬£±¤Ä¤·¤«½Ð¤Ê¤«¤Ã¤¿¤³¤È¤Ï¡¢¤«¤Ê¤ê³Î¼Â¤Ë¡¢Éôʬ°ìÃפ¹¤ë¤â¤Î¤¬Ìµ¤«¤Ã¤¿¤È¸À¤¨¤ë¤Î¤À¤í¤¦¡£


¥¢¥ß¥Î»ÀÎó¤ËÊÑ´¹¤·¤Æ¤«¤éÈæ³Ó¤¹¤ë

GenBank¥¢¥Î¥Æ¡¼¥·¥ç¥ó¥Õ¥¡¥¤¥ë¤«¤é¡¢CDS¤òÃê½Ð¤·¡¢¤½¤ì¤¾¤ì¤ÎCDS¤Î¡ã¥¢¥ß¥Î»À¤Î¡ä FASTA¥Õ¥¡¥¤¥ë¤òºî¤ë¡£´ðËÜŪ¤Ë¤Ï¾å¤ÈƱ¤¸¤À¤¬¡¢ÇÛÎó¤ò±ö´ðÇÛÎ󤫤饢¥ß¥Î»ÀÇÛÎó¤Ë ÊÑ´¹¤·¤Æ¤«¤é½ÐÎϤ¹¤ë¡£ÊÑ´¹¤ÏBioPython¤ÎTranslateµ¡Ç½¡£

#Read CDS and write Fasta -- AminoAcid version
import os, sys
import numpy as np
import pandas as pd
from Bio import SeqIO
from Bio.Seq import *
from Bio.SeqRecord import SeqRecord
from Bio.SeqFeature import SeqFeature, FeatureLocation
from collections import OrderedDict
import pickle
from ReadCDSforComparison import ReadCDS
def OutputCDS(tab1, tab2):
    with open('test-Clustalo.fasta', 'w') as cf:
        for i,r in tab1.iterrows():
            print('>Ecoli_%s %s\n%s\n' % (r['locus_tag'].strip(), r['product'], r['seq']), file=cf)
        for i,r in tab2.iterrows():
            print('>Thermo_%s %s\n%s\n' % (r['locus_tag'].strip(), r['product'], r['seq']), file=cf)
    return

def OutputCDSforBlastp(tab1, tab2):
    with open('AP012030CDSAA.fasta', 'w') as cf:
        for i,r in tab1.iterrows():
            AAseq = r['seq'].translate(table=11, stop_symbol="")
            print('>%s %i %s\n%s\n' % (r['locus_tag'].strip(), len(AAseq), r['product'], AAseq), file=cf)

    with open('AE000512CDSAA.fasta', 'w') as cf:
        for i,r in tab2.iterrows():
            AAseq = r['seq'].translate(table=11, stop_symbol="")
            print('>Th_%s %i %s\n%s\n' % (r['locus_tag'].strip(), len(AAseq), r['product'], AAseq), file=cf)

    return
    
if __name__ == '__main__':
    ReviseEcoliPickle = False
    #ReviseEcoliPickle = True
    ReviseThermoPickle = False
    #ReviseThermoPickle = True
    
    Ecoligbfile = 'AP012030.gb'
    EcoliCDSfile = 'AP012030CDS.pickle'
    if os.path.exists(EcoliCDSfile) and not ReviseEcoliPickle:
        with open(EcoliCDSfile, 'rb') as pf:
            EcoliCDS = pickle.load(pf)
    else:
        EcoliCDS = ReadCDS(Ecoligbfile)
        with open(EcoliCDSfile, 'wb') as pwf:
            pickle.dump(EcoliCDS, pwf)

    Thermogbfile = 'AE000512.gb'
    ThermoCDSfile = 'AE000512CDS.pickle'
    if os.path.exists(ThermoCDSfile) and not ReviseThermoPickle:
        with open(ThermoCDSfile, 'rb') as pf:
            ThermoCDS = pickle.load(pf)
    else:
        ThermoCDS = ReadCDS(Thermogbfile)
        with open(ThermoCDSfile, 'wb') as pwf:
            pickle.dump(ThermoCDS, pwf)

    OutputCDSforBlastp(EcoliCDS, ThermoCDS)
   
    print('complete')

¤³¤ì¤ÇÆÀ¤é¤ì¤ëAP012030CDSAA.fasta¤ÈAE000512CDSAA.fasta¤ò»È¤Ã¤Æblastp¤òưºî¤µ¤»¤ë¡£ ¤³¤Î¼ê½ç¤Ï¾åµ­¤ÎDNAÇÛÎó»þ¤ÈƱ¤¸¡£

¤Þ¤º¡¢ÇÛÎó¥Ç¡¼¥¿¥Ù¡¼¥¹¤òºî¤ë¡£

makeblastdb -in AE000512CDSAA.fasta -parse_seqids -blastdb_version 5 -title "ThermoAA" -out AE000512AA -dbtype prot

·ë²Ì¤Ï

Building a new DB, current time: 09/05/2019 06:51:46
New DB name:   /home/yamanouc/src/Kishimoto2017/test-Blast/AE000512AA
New DB title:  ThermoAA
Sequence type: Protein
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 1846 sequences in 0.10688 seconds.

¤³¤Î¥Ç¡¼¥¿¥Ù¡¼¥¹¤ò»È¤Ã¤Æ¡¢blastp ¤òµ¯Æ°¤·¡¢AP012030¦¤ÎCDSAA¤Îfasta¥Õ¥¡¥¤¥ëAP012030CDSAA.fasta¤òÆþÎϤȤ·¤ÆÍ¿¤¨¤ë¡£º£²ó¤Ï¤«¤Ê¤ê¿¿ô¤Î¥Ò¥Ã¥È¤¬¸«¤é¤ì¤ë¤Î¤Ç¡¢E-value¤ÎïçÃͤϥǥե©¥ë¥ÈÄ̤êE<10¤È¤¹¤ë¡£

blastp -db AE000512AA -outfmt "10 qseqid sseqid pident length mismatch evalue bitscore" -out AP012030AAblast.out -num_threads 32 -evalue 10 < AP012030CDSAA.fasta

µÕÊý¸þ¤Ç»î¤¹¡£¥Ç¡¼¥¿¥Ù¡¼¥¹¤òºî¤ë¡£

makeblastdb -in AP012030CDSAA.fasta -parse_seqids -blastdb_version 5 -title "EcoliAA" -out AP012030AA -dbtype prot

AE000512CDSAA.fasta¤ÈÈæ³Ó¤¹¤ë¡£

blastp -db AP012030AA -outfmt "10 qseqid sseqid pident length mismatch evalue bitscore" -out AE000512AAblast.out -num_threads 32 -evalue 10 < AE000512CDSAA.fasta

·ë²Ì¤Ï¡¢Â¿¿ô¤ÎÎà»÷¥·¡¼¥±¥ó¥¹¤¬Ãê½Ð¤µ¤ì¤¿¡£ |¥Ç¡¼¥¿¥Ù¡¼¥¹vsÂÐ¾Ý | E<10 ¤Î¥Þ¥Ã¥Á¥ó¥°·ï¿ô|

AE000512 vs AP01203047949
AP012030 vs AE000512147190

¤«¤Ê¤ê¿¤¤¤Î¤Ç¡¢¤µ¤é¤Ë¹Ê¤ê¹þ¤àɬÍפ¬¤¢¤ë¡£


blastp¤Îµ¯Æ°¤Ï¡¢biopython¤ò²ð¤·¤Æ¹Ô¤¦¤³¤È¤¬¤Ç¤­¤ë¡£¤³¤ì¤Î¥á¥ê¥Ã¥È¤Ï¡¢Æþ½ÐÎϤÎÀܳ¤¬¡Ê¥Õ¥¡¥¤¥ë·Ðͳ¤ËÈæ¤Ù¤Æ¡Ë¤ä¤äÊØÍø¤«¤Ê¡¢¤È¤¤¤¦¤È¤³¤í¡£¤µ¤Û¤ÉÊØÍø¤Ç¤â¤Ê¤¤¤«¡©

¤È¤ê¤¢¤¨¤º¡¢¼¡¤Î¤è¤¦¤Ë¤¹¤ë¤È¤Ç¤­¤ë¡£

from Bio.Blast.Applications import NcbiblastpCommandline
#help(NcbiblastpCommandline)

blastp_cline = NcbiblastpCommandline(db='AP012030AA', query='AE000512CDSAA.fasta', 
                                     outfmt='"10 qseqid sseqid pident length mismatch evalue bitscore"',
                                     out='AE000512AAblast2.out2', num_threads=32, evalue=0.001)
blastp_cline   # ¤³¤ì¤Ç¼Â¹Ô¡©

print(blastp_cline)
# ¤³¤ì¤Ë¤è¤Ã¤Æ°Ê²¼¤Î¤è¤¦¤Ë½ñ¤­¤À¤¹¡£
blastp -out AE000512AAblast2.out2 -outfmt "10 qseqid sseqid pident length mismatch evalue bitscore" -query AE000512CDSAA.fasta -db AP012030AA -evalue 0.001 -num_threads 32
#stdout, stderr = blastp_cline()   # ¤³¤ì¤ÏÍפé¤Ê¤¤¡©
#print(stdout)
print('complete')

¤³¤ì¤Ë¤è¤Ã¤Æ¡¢-out¤Ç»ØÄꤷ¤¿¥Õ¥¡¥¤¥ë AE000512AAblast2.out2 ¤¬À¸À®¤µ¤ì¤ë¡£

¤Ê¤ª¡¢¥³¥Þ¥ó¥É¥é¥¤¥ó¤Ç¤Ï bitscore ¤Ë¤è¤ëïçÃ͹ʤê¹þ¤ß¤Ï¤Ç¤­¤Ê¤¤¡£evalue¤Î¤ß¡£¤·¤¿¤¬¤Ã¤Æ¡¢¤¿¤È¤¨¤Ð -evalue=0.001 ¤È¤«¤·¤Æ¤ª¤¤¤Æ¡¢½ÐÎÏ¥Õ¥¡¥¤¥ë¤Î¾å¤Ç¹Ê¤ê¹þ¤àɬÍפ¬¤¢¤ë¡£


·ë²Ì¤Î¹Ê¤ê¹þ¤ß

blastp¤ÇÆÀ¤é¤ì¤¿¤½¤ì¤¾¤ì¤Î°ìÃ׸õÊä¤ËÂФ·¤Æ¡¢GenBank¥¢¥Î¥Æ¡¼¥·¥ç¥ó¤«¤é LocusID¤òÍê¤ê¤Ë¤·¤Æ¡¢product¾ðÊó¤ÈÇÛÎó¤ÎŤµ¡Ê¥¢¥ß¥Î»ÀÎó¤ÎŤµ¡Ë¾ðÊó¤òÄɵ­¤¹¤ë¡£

#Read blast-out and augment with length(AA) and product
import os, sys
import numpy as np
import pandas as pd
from collections import OrderedDict
import pickle

def read_blast_out(blast_out_file):
    if not os.path.exists(blast_out_file):
        print('blast-out file', blast_out_file, 'does not exist')
        return
    df = pd.read_csv(blast_out_file, sep=',', header=None)
    df.columns = ['locus_tag_xx', 'locus_tag_yy', 'pident', 'length', 'mismatch', 'evalue', 'bitscore']
    return(df)

if __name__ == '__main__':
    EcoliCDSfile = 'AP012030CDS.pickle'
    if os.path.exists(EcoliCDSfile):
        with open(EcoliCDSfile, 'rb') as pf:
            EcoliCDS = pickle.load(pf)
    else:
        print('EcoliCDS file not found', EcoliCDSfile)
        return
    
    ThermoCDSfile = 'AE000512CDS.pickle'
    if os.path.exists(ThermoCDSfile):
        with open(ThermoCDSfile, 'rb') as pf:
            ThermoCDS = pickle.load(pf)
    else:
        print('ThermoCDS file not found', ThermoCDSfile)
        return

    # Read blast-out file
    APDF = read_blast_out('AP012030AAblast.out')
    print('----')
    print(EcoliCDS.head())
    print('----')
    print(APDF.head())
    print('----')
    ThermoCDS['locus_tag'] = ThermoCDS['locus_tag'].map(lambda u: 'Th_'+u)
    ThermoCDS = ThermoCDS.drop(columns='AAseq')
    print(ThermoCDS.head())
    print('----')
    AugAPDF = pd.merge(APDF, EcoliCDS, left_on='locus_tag_xx', right_on='locus_tag')
    AugAPDF = pd.merge(AugAPDF, ThermoCDS, left_on='locus_tag_yy', right_on='locus_tag')
    print(AugAPDF.head())
    AugAPDF = AugAPDF.sort_values(['locus_tag_yy', 'evalue', 'locus_tag_xx'])
    AugAPDF.to_excel('AP012030AAblast.xlsx')
    ###
    AEDF = read_blast_out('AP012030AAblast.out')
    AugAEDF = pd.merge(AEDF, ThermoCDS, left_on='locus_tag_yy', right_on='locus_tag')
    AugAEDF = pd.merge(AugAEDF, EcoliCDS, left_on='locus_tag_xx', right_on='locus_tag')
    print(AugAEDF.head())
    AugAEDF = AugAEDF.sort_values(['locus_tag_xx', 'evalue', 'locus_tag_yy'])
    AugAEDF.to_excel('AE000512AAblast.xlsx') 

    print('complete')

·ë²Ì¤ÎÀèÆ¬Éôʬ¤Ï¤³¤ó¤Ê¡ÊźÉÕ¥Õ¥¡¥¤¥ë¡Ë´¶¤¸¤Ë¤Ê¤Ã¤¿



Á´ÂΤò¤Þ¤È¤á¤¿½èÍý¤ò¹Í¤¨¤ë

£±¡ËAP012030AA¤Î¥Ç¡¼¥¿¥Ù¡¼¥¹¤Ïͽ¤áºî¤Ã¤Æ¤ª¤¯

£²¡ËÂоݹ¥Ç®¶Ýgb¤«¤é¡¢CDSÉôʬ¤Î¥¢¥ß¥Î»ÀÎó¤ò¼è½Ð¤·¤Æ¡¢FASTA¥Õ¥¡¥¤¥ë¤Ëºî¤ë¡£
afile.fasta <- getAA(afile.gb)

£³¡Ëblastp¤Ç¡¢db=AP012030AA¤È¤·¤Æ query=afile.fasta ¤È¤·¤Æ¡¢½ÐÎϤò out=afile.blt

£´¡Ëblastp¤Î½ÐÎϤ«¤é¡¢bitscore>=200 ¤Î¥¨¥ó¥È¥ê¡¼¤À¤±¤ò¥Õ¥£¥ë¥¿¤·¤Æafile200.blt¤ò½ÐÎϤ¹¤ë¡£

£µ¡Ë¤½¤ì¤Ë¡¢Î¾Êý¤Î¥Ç¡¼¥¿¤Îproduct¤òÉղ乤ë

£¶¡Ë¤µ¤Þ¤¶¤Þ¤ÊÂоݹ¥Ç®¶Ý¤Î½ÐÎϤò½¸¤á¤ë¡£


¥È¥Ã¥×   ÊÔ½¸ Åà·ë º¹Ê¬ ¥Ð¥Ã¥¯¥¢¥Ã¥× źÉÕ Ê£À½ ̾Á°Êѹ¹ ¥ê¥í¡¼¥É   ¿·µ¬ °ìÍ÷ ñ¸ì¸¡º÷ ºÇ½ª¹¹¿·   ¥Ø¥ë¥×   ºÇ½ª¹¹¿·¤ÎRSS
Last-modified: 2019-10-22 (²Ð) 17:05:04 (1254d)