![]() |
ノート/breseq/チュートリアルを試す1http://pepper.is.sci.toho-u.ac.jp/pepper/index.php?%A5%CE%A1%BC%A5%C8%2Fbreseq%2F%A5%C1%A5%E5%A1%BC%A5%C8%A5%EA%A5%A2%A5%EB%A4%F2%BB%EE%A4%B9%A3%B1 |
![]() |
ノート/ノート
訪問者数 575 最終更新 2017-04-10 (月) 08:37:33
ノート/breseq/チュートリアルを試す1 > ノート/breseq/チュートリアルを試す2 > ノート/breseq/UT-Austinの進化データを試す
ノート/breseq/Multi-Isolate1 > ノート/breseq/Multi-Isolate2
Identification of mutations in laboratory evolved microbes from next-generation sequencing data using breseq
3.1 Predicting mutations in a clonal sample (page 3)
$ breseq -j 4 -o Clonal_Output -r REL606.gbk SRR030257_1.fastq SRR030257_2.fastq
出力はHTMLで(見やすく)作られるので、この場所 に置いた。
(2017-04-08追加) 参照シーケンスをREL606とは別の「O157H7」にしてみた。
チュートリアルでは親株のREL616との差異(変異)を見ようとして、bowtie2での参照シーケンスをREL606としてそれにリードを張り付けている。
ここの追加では、試しに、参照シーケンスを変えてみる。NCBIが貼っているEscherichia coliの参照シーケンスのうち、適当に選んでみる。今回はO157-H7のアノテーション付きGeneBankデータを使ってみた。
結果はここに置いた。
元の試行と極端に異なるのが、outputをまとめるための処理時間である。Summary Statisticsのタブを開くと、下の方にExecution Timeの表が載っている。O157-H7の方が全体に長くかかる傾向があるが、特にPolymorphic StatisticsとOutputのセクションが伸びている。(Polymorphic Statisticsが43倍、Outputが1430倍)
step REL606 O157-H7 Read and reference sequence file input 41 seconds 49 seconds Read alignment to reference genome 23 minutes 46 seconds 36 minutes 11 seconds Preprocessing alignments for candidate junction identification 2 minutes 43 seconds 6 minutes 50 seconds Preliminary analysis of coverage distribution 1 minute 38 seconds 1 minute 41 seconds Identifying junction candidates 14 seconds 17 seconds Re-alignment to junction candidates 27 seconds 44 seconds Resolving alignments with junction candidates 2 minutes 58 seconds 3 minutes 34 seconds Creating BAM files 1 minute 26 seconds 1 minute 25 seconds Tabulating error counts 30 seconds 49 seconds Re-calibrating base error rates 1 second 2 seconds Examining read alignment evidence 5 minutes 51 seconds 15 minutes 45 seconds Polymorphism statistics 1 second 43 seconds Output 21 seconds 8 hours 20 minutes 28 seconds Total 40 minutes 37 seconds 9 hours 29 minutes 18 seconds
breseqが何をやっているかを詳細に見たい。実行ログが吐き出されるので、以下に貼ってみた。これをていねいに追うと、何をどのように(=どのようなパラメタを与えて)やっているかが見えてくる。
[yamanouc@ginger Clonal_Sample]$ breseq -j 4 -o Clonal_Output -r REL606.gbk SRR030257_1.fastq SRR030257_2.fastq ================================================================================ breseq 0.30.0 http://barricklab.org/breseq Active Developers: Barrick JE, Deatherage DE Contact: <jeffrey.e.barrick@gmail.com> breseq is free software; you can redistribute it and/or modify it under the terms the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. Copyright (c) 2008-2010 Michigan State University Copyright (c) 2011-2016 The University of Texas at Austin If you use breseq in your research, please cite: Deatherage, D.E., Barrick, J.E. (2014) Identification of mutations in laboratory-evolved microbes from next-generation sequencing data using breseq. Methods Mol. Biol. 1151: 165?188. If you use structural variation (junction) predictions, please cite: Barrick, J.E., Colburn, G., Deatherage D.E., Traverse, C.C., Strand, M.D., Borges, J.J., Knoester, D.B., Reba, A., Meyer, A.G. (2014) Identifying structural variation in haploid microbial genomes from short-read resequencing data using breseq. BMC Genomics 15:1039. ================================================================================ ---> bowtie2 :: version 2.3.0 [/usr/local/bin/bowtie2] ---> R :: version 3.3.2 [/usr/bin/R] +++ NOW PROCESSING Read and reference sequence file input READ FILE::SRR030257_1 Converting/filtering FASTQ file... Original base quality format: SANGER New format: SANGER Original reads: 3800180 bases: 136806480 Filtered reads: 19 (?50% N) 60951 (?90% one base) Analyzed reads: 3739210 bases: 134611560 READ FILE::SRR030257_2 Converting/filtering FASTQ file... Original base quality format: SANGER New format: SANGER Original reads: 3800180 bases: 136806480 Filtered reads: 369 (?50% N) 47630 (?90% one base) Analyzed reads: 3752181 bases: 135078516 ::TOTAL:: Original reads: 7600360 bases: 273612960 Analyzed reads: 7491391 bases: 269690076 [samtools] faidx Clonal_Output/data/reference.fasta REFERENCE: REL606 LENGTH: 4629812 +++ NOW PROCESSING Read alignment to reference genome [system] bowtie2-build -q Clonal_Output/data/reference.fasta Clonal_Output/02_reference_alignment/reference [system] bowtie2 -t -p 4 --local -L 18 --ma 1 --mp 3 --np 0 --rdg 2,3 --rfg 2,3 --ignore-quals -k 2000 -i S,1,0.25 --score-min L,1,0.9 --reorder -x Clonal_Output/02_reference_alignment/reference -U Clonal_Output/01_sequence_conversion/SRR030257_1.converted.fastq -S Clonal_Output/02_reference_alignment/SRR030257_1.stage1.sam --un Clonal_Output/02_reference_alignment/SRR030257_1.stage1.unmatched.fastq Time loading reference: 00:00:00 Time loading forward index: 00:00:00 Time loading mirror index: 00:00:00 Multiseed full-index search: 00:00:33 3739210 reads; of these: 3739210 (100.00%) were unpaired; of these: 600703 (16.06%) aligned 0 times 3040853 (81.32%) aligned exactly 1 time 97654 (2.61%) aligned >1 times 83.94% overall alignment rate Time searching: 00:00:33 Overall time: 00:00:33 [system] bowtie2 -t -p 4 --local -L 18 --ma 1 --mp 3 --np 0 --rdg 2,3 --rfg 2,3 --ignore-quals -k 2000 -i S,1,0.25 --score-min L,1,0.9 --reorder -x Clonal_Output/02_reference_alignment/reference -U Clonal_Output/01_sequence_conversion/SRR030257_2.converted.fastq -S Clonal_Output/02_reference_alignment/SRR030257_2.stage1.sam --un Clonal_Output/02_reference_alignment/SRR030257_2.stage1.unmatched.fastq Time loading reference: 00:00:00 Time loading forward index: 00:00:00 Time loading mirror index: 00:00:00 Multiseed full-index search: 00:00:34 3752181 reads; of these: 3752181 (100.00%) were unpaired; of these: 612595 (16.33%) aligned 0 times 3042265 (81.08%) aligned exactly 1 time 97321 (2.59%) aligned >1 times 83.67% overall alignment rate Time searching: 00:00:34 Overall time: 00:00:34 [system] bowtie2 -t -p 4 --local -L 9 --ma 1 --mp 3 --np 0 --rdg 2,3 --rfg 2,3 --ignore-quals -k 2000 -i S,1,0.25 --score-min L,6,0.2 --reorder -x Clonal_Output/02_reference_alignment/reference -U Clonal_Output/02_reference_alignment/SRR030257_1.stage1.unmatched.fastq -S Clonal_Output/02_reference_alignment/SRR030257_1.stage2.matched.sam Time loading reference: 00:00:00 Time loading forward index: 00:00:00 Time loading mirror index: 00:00:00 Multiseed full-index search: 00:10:58 600703 reads; of these: 600703 (100.00%) were unpaired; of these: 3722 (0.62%) aligned 0 times 10227 (1.70%) aligned exactly 1 time 586754 (97.68%) aligned >1 times 99.38% overall alignment rate Time searching: 00:10:58 Overall time: 00:10:58 [system] bowtie2 -t -p 4 --local -L 9 --ma 1 --mp 3 --np 0 --rdg 2,3 --rfg 2,3 --ignore-quals -k 2000 -i S,1,0.25 --score-min L,6,0.2 --reorder -x Clonal_Output/02_reference_alignment/reference -U Clonal_Output/02_reference_alignment/SRR030257_2.stage1.unmatched.fastq -S Clonal_Output/02_reference_alignment/SRR030257_2.stage2.matched.sam Time loading reference: 00:00:00 Time loading forward index: 00:00:00 Time loading mirror index: 00:00:00 Multiseed full-index search: 00:11:03 612595 reads; of these: 612595 (100.00%) were unpaired; of these: 5810 (0.95%) aligned 0 times 13420 (2.19%) aligned exactly 1 time 593365 (96.86%) aligned >1 times 99.05% overall alignment rate Time searching: 00:11:03 Overall time: 00:11:03 [system] rm -f Clonal_Output/02_reference_alignment/reference* [system] rm -f Clonal_Output/02_reference_alignment/SRR030257_1.stage1.unmatched.fastq [system] rm -f Clonal_Output/02_reference_alignment/SRR030257_1.stage1.sam [system] rm -f Clonal_Output/02_reference_alignment/SRR030257_2.stage1.unmatched.fastq [system] rm -f Clonal_Output/02_reference_alignment/SRR030257_2.stage1.sam [system] rm -f Clonal_Output/02_reference_alignment/SRR030257_1.stage2.matched.sam [system] rm -f Clonal_Output/02_reference_alignment/SRR030257_2.stage2.matched.sam +++ NOW PROCESSING Preprocessing alignments for candidate junction identification Preprocessing alignments. READ FILE::SRR030257_1 ALIGNED READ:100000 ALIGNED READ:200000 ALIGNED READ:300000 : ALIGNED READ:3500000 ALIGNED READ:3600000 ALIGNED READ:3700000 READ FILE::SRR030257_2 ALIGNED READ:3800000 ALIGNED READ:3900000 ALIGNED READ:4000000 : ALIGNED READ:7200000 ALIGNED READ:7300000 ALIGNED READ:7400000 Summary... Aligned reads: 7481861 Read alignments: 19713263 Alignments split on indels: 21 Reads with alignments split on indels: 21 Split alignments: 6090236 Reads with split alignments: 601157 +++ NOW PROCESSING Preliminary analysis of coverage distribution [samtools] import Clonal_Output/data/reference.fasta.fai Clonal_Output/03_candidate_junctions/best.sam Clonal_Output/03_candidate_junctions/best.unsorted.bam [samtools] sort --threads 4 -o Clonal_Output/03_candidate_junctions/best.bam -T Clonal_Output/03_candidate_junctions/best.bam Clonal_Output/03_candidate_junctions/best.unsorted.bam [samtools] index Clonal_Output/03_candidate_junctions/best.bam Clonal_Output/03_candidate_junctions/best.bam.bai REFERENCE: REL606 LENGTH: 4629812 POSITION:10000 POSITION:20000 POSITION:30000 : POSITION:4610000 POSITION:4620000 [system] R --vanilla < /usr/local/bin/../share/breseq//coverage_distribution.r > Clonal_Output/03_candidate_junctions/0.unique_only_coverage_distribution.tab.r.log distribution_file=Clonal_Output/03_candidate_junctions/0.unique_only_coverage_distribution.tab plot_file=Clonal_Output/03_candidate_junctions/0.coverage.pdf deletion_propagation_pr_cutoff=2.32374e-05 [system] rm -f Clonal_Output/03_candidate_junctions/best.unsorted.bam [system] rm -f Clonal_Output/03_candidate_junctions/best.sam [system] rm -f Clonal_Output/03_candidate_junctions/best.bam [system] rm -f Clonal_Output/03_candidate_junctions/best.bam.bai [system] rm -f Clonal_Output/03_candidate_junctions/0.coverage.pdf [system] rm -f Clonal_Output/03_candidate_junctions/0.unique_only_coverage_distribution.tab [system] rm -f Clonal_Output/03_candidate_junctions/0.unique_only_coverage_distribution.tab.r.log +++ NOW PROCESSING Identifying junction candidates READ FILE::SRR030257_1 ALIGNED READ:10000 CANDIDATE JUNCTIONS:2852 ALIGNED READ:20000 CANDIDATE JUNCTIONS:5715 ALIGNED READ:30000 CANDIDATE JUNCTIONS:8570 : ALIGNED READ:270000 CANDIDATE JUNCTIONS:83719 ALIGNED READ:280000 CANDIDATE JUNCTIONS:87232 Passed alignment pairs examined: 100001 WARNING: Reached limit of 100000 passed alignment pairs. Specify a greater value for --junction-alignment-pair-limit for more thorough junction prediction. Taking top candidate junctions... Minimum number to keep: 100 Maximum number to keep: 5000 Maximum length to keep: 462981 bases Initial: Number = 76, Cumulative Length = 5723 bases Testing Pos Hash Score = 11, Num = 1, Length = 75 Testing Pos Hash Score = 10, Num = 1, Length = 74 Testing Pos Hash Score = 9, Num = 2, Length = 147 Testing Pos Hash Score = 8, Num = 4, Length = 297 Testing Pos Hash Score = 7, Num = 2, Length = 149 Testing Pos Hash Score = 6, Num = 6, Length = 436 Testing Pos Hash Score = 5, Num = 3, Length = 220 Testing Pos Hash Score = 4, Num = 10, Length = 734 Testing Pos Hash Score = 3, Num = 10, Length = 742 Testing Pos Hash Score = 2, Num = 37, Length = 2849 Accepted: Number = 76, Pos Hash Score >= 2, Cumulative Length = 5723 bases [samtools] faidx Clonal_Output/03_candidate_junctions/candidate_junction.fasta [system] rm -f Clonal_Output/03_candidate_junctions/SRR030257_1.split.sam [system] rm -f Clonal_Output/03_candidate_junctions/SRR030257_2.split.sam +++ NOW PROCESSING Re-alignment to junction candidates [system] bowtie2-build -q Clonal_Output/03_candidate_junctions/candidate_junction.fasta Clonal_Output/04_candidate_junction_alignment/candidate_junction [system] bowtie2 -t -p 4 --local -L 10 --ma 1 --mp 3 --np 0 --rdg 2,3 --rfg 2,3 --ignore-quals -k 2000 -i S,1,0.25 --score-min L,1,0.70 --reorder -x Clonal_Output/04_candidate_junction_alignment/candidate_junction -U Clonal_Output/01_sequence_conversion/SRR030257_1.converted.fastq -S Clonal_Output/04_candidate_junction_alignment/SRR030257_1.candidate_junction.sam Time loading reference: 00:00:00 Time loading forward index: 00:00:00 Time loading mirror index: 00:00:00 Multiseed full-index search: 00:00:13 3739210 reads; of these: 3739210 (100.00%) were unpaired; of these: 3735689 (99.91%) aligned 0 times 920 (0.02%) aligned exactly 1 time 2601 (0.07%) aligned >1 times 0.09% overall alignment rate Time searching: 00:00:13 Overall time: 00:00:13 [system] bowtie2 -t -p 4 --local -L 10 --ma 1 --mp 3 --np 0 --rdg 2,3 --rfg 2,3 --ignore-quals -k 2000 -i S,1,0.25 --score-min L,1,0.70 --reorder -x Clonal_Output/04_candidate_junction_alignment/candidate_junction -U Clonal_Output/01_sequence_conversion/SRR030257_2.converted.fastq -S Clonal_Output/04_candidate_junction_alignment/SRR030257_2.candidate_junction.sam Time loading reference: 00:00:00 Time loading forward index: 00:00:00 Time loading mirror index: 00:00:00 Multiseed full-index search: 00:00:13 3752181 reads; of these: 3752181 (100.00%) were unpaired; of these: 3748811 (99.91%) aligned 0 times 840 (0.02%) aligned exactly 1 time 2530 (0.07%) aligned >1 times 0.09% overall alignment rate Time searching: 00:00:13 Overall time: 00:00:13 [system] rm -f Clonal_Output/04_candidate_junction_alignment/candidate_junction* +++ NOW PROCESSING Resolving alignments with junction candidates READ FILE:SRR030257_1 READS:10000 READS:20000 : READS:3720000 READS:3730000 READ FILE:SRR030257_2 READS:3740000 READS:3750000 : READS:7480000 READS:7490000 [system] rm -f Clonal_Output/01_sequence_conversion/SRR030257_1.converted.fastq [system] rm -f Clonal_Output/01_sequence_conversion/SRR030257_2.converted.fastq [system] rm -f Clonal_Output/02_reference_alignment/SRR030257_1.reference.sam [system] rm -f Clonal_Output/02_reference_alignment/SRR030257_2.reference.sam [system] rm -f Clonal_Output/04_candidate_junction_alignment/SRR030257_1.candidate_junction.sam* [system] rm -f Clonal_Output/04_candidate_junction_alignment/SRR030257_2.candidate_junction.sam* +++ NOW PROCESSING Creating BAM files [samtools] import Clonal_Output/03_candidate_junctions/candidate_junction.fasta.fai Clonal_Output/05_alignment_correction/junction.sam Clonal_Output/06_bam/junction.unsorted.bam [samtools] sort --threads 4 -o Clonal_Output/06_bam/junction.bam -T Clonal_Output/06_bam/junction.bam Clonal_Output/06_bam/junction.unsorted.bam [system] rm -f Clonal_Output/06_bam/junction.unsorted.bam [samtools] index Clonal_Output/06_bam/junction.bam Clonal_Output/06_bam/junction.bam.bai [samtools] import Clonal_Output/data/reference.fasta.fai Clonal_Output/05_alignment_correction/reference.sam Clonal_Output/06_bam/reference.unsorted.bam [samtools] sort --threads 4 -o Clonal_Output/data/reference.bam -T Clonal_Output/data/reference.bam Clonal_Output/06_bam/reference.unsorted.bam [system] rm -f Clonal_Output/06_bam/reference.unsorted.bam [samtools] index Clonal_Output/data/reference.bam Clonal_Output/data/reference.bam.bai [system] rm -f Clonal_Output/05_alignment_correction/reference.sam [system] rm -f Clonal_Output/05_alignment_correction/junction.sam +++ NOW PROCESSING Tabulating error counts REFERENCE: REL606 LENGTH: 4629812 POSITION:10000 POSITION:20000 POSITION:30000 : POSITION:4610000 POSITION:4620000 +++ NOW PROCESSING Re-calibrating base error rates [system] R --vanilla < /usr/local/bin/../share/breseq//coverage_distribution.r > Clonal_Output/07_error_calibration/0.unique_only_coverage_distribution.tab.r.log distribution_file=Clonal_Output/07_error_calibration/0.unique_only_coverage_distribution.tab plot_file=Clonal_Output/output/calibration/0.unique_coverage.pdf deletion_propagation_pr_cutoff=2.32374e-05 [system] R --vanilla in_file=Clonal_Output/07_error_calibration/base_qual_error_prob.SRR030257_1.tab out_file=Clonal_Output/output/calibration/SRR030257_1.error_rates.pdf < /usr/local/bin/../share/breseq//plot_error_rate.r > Clonal_Output/07_error_calibration/SRR030257_1.plot_error_rate.log [system] R --vanilla in_file=Clonal_Output/07_error_calibration/base_qual_error_prob.SRR030257_2.tab out_file=Clonal_Output/output/calibration/SRR030257_2.error_rates.pdf < /usr/local/bin/../share/breseq//plot_error_rate.r > Clonal_Output/07_error_calibration/SRR030257_2.plot_error_rate.log +++ NOW PROCESSING Examining read alignment evidence REFERENCE: REL606 LENGTH: 4629812 POSITION:10000 POSITION:20000 : POSITION:4610000 POSITION:4620000 +++ NOW PROCESSING Polymorphism statistics [system] R --vanilla total_length=4629812 in_file=Clonal_Output/08_mutation_identification/polymorphism_statistics_input.tab out_file=Clonal_Output/08_mutation_identification/polymorphism_statistics_output.tab qual_file=Clonal_Output/08_mutation_identification/error_counts.tab < /usr/local/bin/../share/breseq//polymorphism_statistics.r > Clonal_Output/08_mutation_identification/polymorphism_statistics_output.log +++ NOW PROCESSING Output Creating merged genome diff evidence file... Predicting mutations from evidence... Annotating mutations... Drawing coverage plots... Creating coverage plot for region: REL606:1-4629812 Creating coverage plot for region: REL606:546958-555936 Creating coverage plot for region: REL606:2031674-2054972 Creating coverage plot for region: REL606:3289962-3289977 Creating coverage plot for region: REL606:3893578-3901930 Creating read alignment for region: REL606:546957-546957 Creating read alignment for region: REL606:555937-555937 Creating read alignment for region: REL606:2031673-2031673 Creating read alignment for region: REL606:2054973-2054973 Creating read alignment for region: REL606:3289961-3289961 Creating read alignment for region: REL606:3289978-3289978 Creating read alignment for region: REL606:3893577-3893577 Creating read alignment for region: REL606:3901931-3901931 Creating read alignment for region: REL606:161041.0-161041.0 Creating read alignment for region: REL606:380188.0-380188.0 Creating read alignment for region: REL606:430835.0-430835.0 Creating read alignment for region: REL606:475288.1-475288.1 Creating read alignment for region: REL606:649391.0-649391.0 Creating read alignment for region: REL606:668787.0-668787.0 Creating read alignment for region: REL606:683496.0-683496.0 Creating read alignment for region: REL606:910316.0-910316.0 Creating read alignment for region: REL606:1004251.0-1004251.0 Creating read alignment for region: REL606:1248380.0-1248380.0 Creating read alignment for region: REL606:1286699.0-1286699.0 Creating read alignment for region: REL606:1329516.0-1329516.0 Creating read alignment for region: REL606:1976879.0-1976879.0 Creating read alignment for region: REL606:2082685.0-2082685.0 Creating read alignment for region: REL606:2732014.0-2732014.0 Creating read alignment for region: REL606:3045069.0-3045069.0 Creating read alignment for region: REL606:3155168.0-3155168.0 Creating read alignment for region: REL606:3200643.0-3200643.0 Creating read alignment for region: REL606:3248957.0-3248957.0 Creating read alignment for region: REL606:3288092.0-3288092.0 Creating read alignment for region: REL606:3339158.0-3339158.0 Creating read alignment for region: REL606:3369432.0-3369432.0 Creating read alignment for region: REL606:3370027.0-3370027.0 Creating read alignment for region: REL606:3483047.0-3483047.0 Creating read alignment for region: REL606:3490112.0-3490112.0 Creating read alignment for region: REL606:3762741.0-3762741.0 Creating read alignment for region: REL606:3875625.1-3875625.1 Creating read alignment for region: REL606:3893550.1-3893550.1 Creating read alignment for region: REL606:4100655.0-4100655.0 Creating read alignment for region: REL606:4126699.0-4126699.0 Creating read alignment for region: REL606:4201911.0-4201911.0 Creating read alignment for region: REL606:4616396.0-4616396.0 Creating read alignment for region: REL606:87812.0-87812.0 Creating read alignment for region: REL606:93946.0-93946.0 Creating read alignment for region: REL606:139812.0-139812.0 Creating read alignment for region: REL606:402056.0-402056.0 Creating read alignment for region: REL606:986054.0-986054.0 Creating read alignment for region: REL606:1318108.0-1318108.0 Creating read alignment for region: REL606:1509936.0-1509936.0 Creating read alignment for region: REL606:1812720.0-1812720.0 Creating read alignment for region: REL606:1854388.0-1854388.0 Creating read alignment for region: REL606:2167569.0-2167569.0 Creating read alignment for region: REL606:2698093.0-2698093.0 Creating read alignment for region: REL606:3357000.0-3357000.0 Creating read alignment for region: REL606:3357002.0-3357002.0 Creating read alignment for region: REL606:3357010.0-3357010.0 Creating read alignment for region: REL606:3357012.0-3357012.0 Creating read alignment for region: REL606:3357016.0-3357016.0 Creating read alignment for region: REL606:4080986.0-4080986.0 Creating read alignment for region: REL606:4462991.0-4462991.0 Creating read alignment for region: REL606:4462995.0-4462995.0 Creating read alignment for region: REL606:4462997.0-4462997.0 Creating read alignment for region: REL606__1__1__REL606__4629812__-1__0____36__36__0__0:36-37 Creating read alignment for region: REL606:1-1 Creating read alignment for region: REL606:4629812-4629812 Creating read alignment for region: REL606__16972__1__REL606__588495__1__0____36__36__0__1:36-37 Creating read alignment for region: REL606:16972-16972 Creating read alignment for region: REL606:588495-588495 Creating read alignment for region: REL606__16976__-1__REL606__590471__-1__2____36__36__0__1:36-39 Creating read alignment for region: REL606:16974-16974 Creating read alignment for region: REL606:590471-590471 Creating read alignment for region: REL606__72407__1__REL606__3650758__1__7____36__36__0__0:36-44 Creating read alignment for region: REL606:72407-72407 Creating read alignment for region: REL606:3650765-3650765 Creating read alignment for region: REL606__391964__-1__REL606__3875885__-1__-6__TTACTG__36__36__0__0:36-43 Creating read alignment for region: REL606:391964-391964 Creating read alignment for region: REL606:3875885-3875885 Creating read alignment for region: REL606__547699__-1__REL606__555924__1__0____36__36__1__1:36-37 Creating read alignment for region: REL606:547699-547699 Creating read alignment for region: REL606:555924-555924 Creating read alignment for region: REL606__589884__1__REL606__590471__-1__2____36__36__1__1:36-39 Creating read alignment for region: REL606:589886-589886 Creating read alignment for region: REL606:590471-590471 Creating read alignment for region: REL606__664688__1__REL606__1270658__1__2____36__36__1__0:36-39 Creating read alignment for region: REL606:664688-664688 Creating read alignment for region: REL606:1270660-1270660 Creating read alignment for region: REL606__809428__1__REL606__4141021__-1__5____36__36__0__0:36-42 Creating read alignment for region: REL606:809428-809428 Creating read alignment for region: REL606:4141016-4141016 Creating read alignment for region: REL606__1154778__1__REL606__4215353__-1__11____36__36__0__0:36-48 Creating read alignment for region: REL606:1154778-1154778 Creating read alignment for region: REL606:4215342-4215342 Creating read alignment for region: REL606__1733645__1__REL606__2775877__-1__2____36__36__0__1:36-39 Creating read alignment for region: REL606:1733647-1733647 Creating read alignment for region: REL606:2775877-2775877 Creating read alignment for region: REL606__1733649__-1__REL606__2774435__1__0____36__36__0__1:36-37 Creating read alignment for region: REL606:1733649-1733649 Creating read alignment for region: REL606:2774435-2774435 Creating read alignment for region: REL606__1802331__1__REL606__3944627__1__7____36__36__0__0:36-44 Creating read alignment for region: REL606:1802331-1802331 Creating read alignment for region: REL606:3944634-3944634 Creating read alignment for region: REL606__1821523__1__REL606__2775877__-1__2____36__36__0__1:36-39 Creating read alignment for region: REL606:1821525-1821525 Creating read alignment for region: REL606:2775877-2775877 Creating read alignment for region: REL606__1821528__-1__REL606__2774435__1__1____36__36__0__1:36-38 Creating read alignment for region: REL606:1821527-1821527 Creating read alignment for region: REL606:2774435-2774435 Creating read alignment for region: REL606__2103888__1__REL606__2103918__-1__11____36__36__0__0:36-48 Creating read alignment for region: REL606:2103888-2103888 Creating read alignment for region: REL606:2103907-2103907 Creating read alignment for region: REL606__2429311__1__REL606__2792997__-1__8____36__36__0__0:36-45 Creating read alignment for region: REL606:2429311-2429311 Creating read alignment for region: REL606:2792989-2792989 Creating read alignment for region: REL606__2772854__1__REL606__4524530__-1__3____36__36__1__0:36-40 Creating read alignment for region: REL606:2772854-2772854 Creating read alignment for region: REL606:4524527-4524527 Creating read alignment for region: REL606__2774196__-1__REL606__4524519__1__3____36__36__1__0:36-40 Creating read alignment for region: REL606:2774196-2774196 Creating read alignment for region: REL606:4524522-4524522 Creating read alignment for region: REL606__2775877__-1__REL606__3015769__1__2____36__36__1__0:36-39 Creating read alignment for region: REL606:2775877-2775877 Creating read alignment for region: REL606:3015771-3015771 Creating read alignment for region: REL606__3015774__-1__REL606__3894996__-1__0____36__36__0__1:36-37 Creating read alignment for region: REL606:3015774-3015774 Creating read alignment for region: REL606:3894996-3894996 Creating read alignment for region: REL606__3289961__-1__REL606__3289978__1__0____36__36__0__0:36-37 Creating read alignment for region: REL606:3289961-3289961 Creating read alignment for region: REL606:3289978-3289978 Creating read alignment for region: REL606__3630716__-1__REL606__3652533__-1__0____36__36__1__1:36-37 Creating read alignment for region: REL606:3630716-3630716 Creating read alignment for region: REL606:3652533-3652533 Creating read alignment for region: REL606__3651091__1__REL606__3741531__-1__0____36__36__1__1:36-37 Creating read alignment for region: REL606:3651091-3651091 Creating read alignment for region: REL606:3741531-3741531 Creating read alignment for region: REL606__3652533__-1__REL606__3901929__1__2____36__36__1__0:36-39 Creating read alignment for region: REL606:3652533-3652533 Creating read alignment for region: REL606:3901931-3901931 Creating read alignment for region: REL606__3749572__-1__REL606__4157637__-1__6____36__36__0__0:36-43 Creating read alignment for region: REL606:3749572-3749572 Creating read alignment for region: REL606:4157631-4157631 Creating read alignment for region: REL606__3813062__-1__REL606__4157637__-1__6____36__36__0__0:36-43 Creating read alignment for region: REL606:3813062-3813062 Creating read alignment for region: REL606:4157631-4157631 Creating read alignment for region: REL606__3944627__1__REL606__4157637__-1__7____36__36__0__0:36-44 Creating read alignment for region: REL606:3944627-3944627 Creating read alignment for region: REL606:4157630-4157630 Creating index HTML table... [system] rm -f Clonal_Output/01_sequence_conversion/*.trims [system] rm -f Clonal_Output/03_candidate_junctions/candidate_junction.fasta [system] rm -f Clonal_Output/03_candidate_junctions/candidate_junction.fasta.fai [system] rm -f Clonal_Output/05_alignment_correction/jc_evidence.gd [system] rm -f Clonal_Output/06_bam/junction.bam [system] rm -f Clonal_Output/06_bam/junction.bam.bai +++ SUCCESSFULLY COMPLETED
ディレクトリ Colonal Outputを見る。出力はWebで見やすいようになっている。
ここでは、サーバーからディレクトリ Colonal Output全体をローカルWinにコピーして その中で見て回る。
index.htmlからはじまる。
BRESEQ __ Mutation Predictions.pdf
BRESEQ __ Marginal Predictions.pdf
BRESEQ __ Summary Statistics.pdf
2017-03-14_breseq_cmdlineの解説.txt
// 01 sequence conversion ファイル SRR030257_1 ファイル SRR030257_2 | +------------------+ | シーケンス変換 | 入力をfastaやGFF3に変換する、quality scoreをSangerにする +------------------+ | +-------------------+ | SAMのfaidxを作る| samtools_faidx(reference.fasta) +-------------------+ | +--------------------+ | trimファイルを作る| calculate_trims(..) +--------------------+ | // 02 reference alignment | BUILDING MAPPING INDEX +------------------------------------------------------------------------------+ | bowtie2-build -q reference.fasta 02_reference_alignment/reference | +------------------------------------------------------------------------------+ | STAGE-1 ALIGNMENT +------------------------------------------------------------------------------+ | bowtie2 -t -p 4 --local -L 18 --ma 1 --mp 3 --np 0 --rdg 2,3 --rfg 2,3 | | --ignore-quals -k 2000 -i S,1,0.25 --score-min L,1,0.9 --reorder | | -x 02_reference_alignment/reference | | -U 01_sequence_conversion/SRR030257_1.converted.fastq | | -S 02_reference_alignment/SRR030257_1.stage1.sam | | --un 02_reference_alignment/SRR030257_1.stage1.unmatched.fastq | +------------------------------------------------------------------------------+ | 83.94% overall alignment rate +------------------------------------------------------------------------------+ | bowtie2 -t -p 4 --local -L 18 --ma 1 --mp 3 --np 0 --rdg 2,3 --rfg 2,3 | | --ignore-quals -k 2000 -i S,1,0.25 --score-min L,1,0.9 --reorder | | -x 02_reference_alignment/reference | | -U 01_sequence_conversion/SRR030257_2.converted.fastq | | -S 02_reference_alignment/SRR030257_2.stage1.sam | | --un 02_reference_alignment/SRR030257_2.stage1.unmatched.fastq | +-------------------------------------------------------------------------------+ | 83.67% overall alignment rate | STAGE-2 ALGINMENT +-------------------------------------------------------------------------------+ | bowtie2 -t -p 4 --local -L 9 --ma 1 --mp 3 --np 0 --rdg 2,3 --rfg 2,3 | | --ignore-quals -k 2000 -i S,1,0.25 --score-min L,6,0.2 --reorder | | -x 02_reference_alignment/reference | | -U 02_reference_alignment/SRR030257_1.stage1.unmatched.fastq | | -S 02_reference_alignment/SRR030257_1.stage2.matched.sam + +--------------------------------------------------------------------------------+ | 99.38% overall alignment rate +--------------------------------------------------------------------------------+ | bowtie2 -t -p 4 --local -L 9 --ma 1 --mp 3 --np 0 --rdg 2,3 --rfg 2,3 | | --ignore-quals -k 2000 -i S,1,0.25 --score-min L,6,0.2 --reorder | | -x 02_reference_alignment/reference | | -U 02_reference_alignment/SRR030257_2.stage1.unmatched.fastq | | -S 02_reference_alignment/SRR030257_2.stage2.matched.sam | +--------------------------------------------------------------------------------+ | 99.05% overall alignment rate | MERGE SAM FILES +---------------------------------------------------------------------------------+ | PreprocessAlignments::merge_sort_sam_files( | | stage1_reference_sam_file_name, stage2_reference_sam_file_name, | | reference_sam_file_name ); | +---------------------------------------------------------------------------------+ | // 03_candidate_junctions | Preprocessing alignments for candidate junction identification +----------------------------------------------------------------------------------+ | PreprocessAlignments::preprocess_alignments( | | settings, summary, ref_seq_info); | +----------------------------------------------------------------------------------+ | Aligned reads: 7481861 | Read alignments: 19713263 | Alignments split on indels: 21 | Reads with alignments split on indels: 21 | Split alignments: 6090236 | Reads with split alignments: 601157 | Preliminary analysis of coverage distribution +-----------------------------------------------------------------------------------+ | samtools_import(data/reference.fasta.fai, | | 03_candidate_junctions/best.sam, | | 03_candidate_junctions/best.unsorted.bam | +-----------------------------------------------------------------------------------+ | +-----------------------------------------------------------------------------------+ | samtools_sort( --threads 4 -o 03_candidate_junctions/best.bam, | | -T 03_candidate_junctions/best.bam, | | 03_candidate_junctions/best.unsorted.bam | +-----------------------------------------------------------------------------------+ | +----------------------------------------------------------------------------------------+ | samtools_index( | | 03_candidate_junctions/best.bam 03_candidate_junctions/best.bam.bai ) | +----------------------------------------------------------------------------------------+ | +-----------------------------------------------------------------------------------+ | CoverageDistribution::analyze_unique_coverage_distributions(settings, | | summary, ref_seq_info, settings.coverage_junction_plot_file_name, | | settings.coverage_junction_distribution_file_name, | | settings.coverage_junction_done_file_name ); | +--+ この中でRを呼んで処理 | | R --vanilla | | < /usr/local/bin/../share/breseq//coverage_distribution.r | | > 03_candidate_junctions/0.unique_only_coverage_distribution.tab.r.log | | distribution_file= | | 03_candidate_junctions/0.unique_only_coverage_distribution.tab | | plot_file=03_candidate_junctions/0.coverage.pdf | | deletion_propagation_pr_cutoff=2.32374e-05 | +-------------------------------------------------------------------------------+ | Identifying junction candidates +-----------------------------------------------------------------------------------+ | CandidateJunctions::identify_candidate_junctions( | | settings, summary, ref_seq_info); | +-----------------------------------------------------------------------------------+ | Passed alignment pairs examined: 100001 | WARNING: Reached limit of 100000 passed alignment pairs. | Specify a greater value for --junction-alignment-pair-limit | for more thorough junction prediction. | Taking top candidate junctions... | Minimum number to keep: 100 | Maximum number to keep: 5000 | Maximum length to keep: 462981 bases | Initial: Number = 76, Cumulative Length = 5723 bases | Accepted: Number = 76, Pos Hash Score >= 2, | Cumulative Length = 5723 bases | +-----------------------------------------------------------------------------------+ | samtools_faidx 03_candidate_junctions/candidate_junction.fasta | +-----------------------------------------------------------------------------------+ | Re-alignment to junction candidates +-----------------------------------------------------------------------------------+ | bowtie2-build -q 03_candidate_junctions/candidate_junction.fasta | | 04_candidate_junction_alignment/candidate_junction | +-----------------------------------------------------------------------------------+ | +-----------------------------------------------------------------------------------+ | bowtie2 -t -p 4 --local -L 10 --ma 1 --mp 3 --np 0 --rdg 2,3 --rfg 2,3 | | --ignore-quals -k 2000 -i S,1,0.25 --score-min L,1,0.70 --reorder | | -x 04_candidate_junction_alignment/candidate_junction | | -U 01_sequence_conversion/SRR030257_1.converted.fastq | | -S 04_candidate_junction_alignment/SRR030257_1.candidate_junction.sam | +-----------------------------------------------------------------------------------+ | 0.09% overall alignment rate +-----------------------------------------------------------------------------------+ | bowtie2 -t -p 4 --local -L 10 --ma 1 --mp 3 --np 0 --rdg 2,3 --rfg 2,3 | | --ignore-quals -k 2000 -i S,1,0.25 --score-min L,1,0.70 --reorder | | -x 04_candidate_junction_alignment/candidate_junction | | -U 01_sequence_conversion/SRR030257_2.converted.fastq | | -S 04_candidate_junction_alignment/SRR030257_2.candidate_junction.sam +------------------------------------------------------------------------------------+ | 0.09% overall alignment rate // 05 alignment_correction <-- line 1609 // * Resolve matches to new junction candidates | Resolving alignments with junction candidates +------------------------------------------------------------------------------------+ | resolve_alignments( settings, summary, ref_seq_info, junction_prediction, | | settings.read_files ); | +------------------------------------------------------------------------------------+ | // 05 bam // * Create BAM read alignment database files | Creating BAM files +-------------------------------------------------------------------------------------+ | samtools_import( 03_candidate_junctions/candidate_junction.fasta.fai | | 05_alignment_correction/junction.sam 06_bam/junction.unsorted.bam ) | +--------------------------------------------------------------------------------------+ | +--------------------------------------------------------------------------------------+ | samtools_sort( --threads 4 -o 06_bam/junction.bam | | -T 06_bam/junction.bam 06_bam/junction.unsorted.bam ) | +--------------------------------------------------------------------------------------+ | +--------------------------------------------------------------------------------------+ | samtools_index( 06_bam/junction.bam 06_bam/junction.bam.bai ) | +--------------------------------------------------------------------------------------+ | +--------------------------------------------------------------------------------------+ | samtools_import ( data/reference.fasta.fai | | 05_alignment_correction/reference.sam 06_bam/reference.unsorted.bam )| +--------------------------------------------------------------------------------------+ | +--------------------------------------------------------------------------------------+ | samtools_sort( --threads 4 -o data/reference.bam | | -T data/reference.bam /06_bam/reference.unsorted.bam ) | +--------------------------------------------------------------------------------------+ | +--------------------------------------------------------------------------------------+ | samtools_index( data/reference.bam data/reference.bam.bai ) | +--------------------------------------------------------------------------------------+ | Tabulating error counts +-------------------------------------------------------------------------------------+ | error_count( settings, summary, reference_bam_file_name, | | reference_fasta_file_name, settings.error_calibration_path, | | settings.read_files.base_names(), coverage=true, errors=true, | | preprocess=false, settings.base_quality_cutoff, // minimum quality score| | read_set=num_read_files,obs_base,ref_base,quality=num_qual ); | +-------------------------------------------------------------------------------------+ | // Calculate error rates | Re-calibrating base error rates +------------------------------------------------------------------------------------+ | CoverageDistribution::analyze_unique_coverage_distributions ( | | settings, summary, ref_seq_info, | | settings.unique_only_coverage_plot_file_name, | | settings.unique_only_coverage_distribution_file_name, | | "NULL" // means to never delete intermediates ); | +--+ この中でfitを呼ぶ ------------------------------------------------------------+ | vector<string> lines = dist.fit(settings, | | unique_only_coverage_distribution_file_name, | | unique_only_coverage_plot_file_name, | | deletion_propagation_pr_cutoff ); | +--+ この中でRを呼ぶ ---------------------------------------------------------+ | R --vanilla | | < /usr/local/bin/../share/breseq//coverage_distribution.r | | > 07_error_calibration/0.unique_only_coverage_distribution.tab.r.log | | distribution_file=07_error_calibration/0.unique_only_coverage_distribution.tab | | plot_file=output/calibration/0.unique_coverage.pdf | | deletion_propagation_pr_cutoff=2.32374e-05 | +--------------------------------------------------------------------------- + | +----------------------------------------------------------------------------+ | R --vanilla | | in_file=07_error_calibration/base_qual_error_prob.SRR030257_1.tab| | out_file=output/calibration/SRR030257_1.error_rates.pdf | | < /usr/local/bin/../share/breseq//plot_error_rate.r | | > 07_error_calibration/SRR030257_1.plot_error_rate.log | +----------------------------------------------------------------------------+ | +----------------------------------------------------------------------------+ | R --vanilla | | in_file=07_error_calibration/base_qual_error_prob.SRR030257_2.tab| | out_file=output/calibration/SRR030257_2.error_rates.pdf | | < /usr/local/bin/../share/breseq//plot_error_rate.r | | > 07_error_calibration/SRR030257_2.plot_error_rate.log | +---------------------------------------------------------------------------+ | Examining read alignment evidence +-------------------------------------------------------------------------------------+ | identify_mutations( settings, summary, reference_bam_file_name, | | reference_fasta_file_name, ra_mc_genome_diff_file_name, | | deletion_propagation_cutoffs, deletion_seed_cutoffs, | | settings.mutation_log10_e_value_cutoff, // mutation_cutoff | | settings.polymorphism_log10_e_value_cutoff, // polymorphism_cutoff | | settings.polymorphism_frequency_cutoff, //polymorphism_frequency_cutoff| | settings.polymorphism_precision_decimal, | | settings.polymorphism_precision_places, | | settings.print_mutation_identification_per_position_file //per_position_file );| +--------------------------------------------------------------------------------------+ | Polymorphism statistics +--------------------------------------------------------------------------------------+ | ref_seq_info.polymorphism_statistics(settings, summary) | +--+ この中でpolymorphism_statisticsを呼ぶ ------------------------------------+ | cReferenceSequences::polymorphism_statistics | +--+ この中でRを呼ぶ --------------------------------------------------------+ | R --vanilla total_length=4629812 | | in_file=08_mutation_identification/polymorphism_statistics_input.tab| | out_file=08_mutation_identification/polymorphism_statistics_output.tab| | qual_file=08_mutation_identification/error_counts.tab | | < /usr/local/bin/../share/breseq//polymorphism_statistics.r | | > /08_mutation_identification/polymorphism_statistics_output.log | +------------------------------------------------------------------------------+ | Output +---------------------------------------------------------------------------+ | Creating merged genome diff evidence file... | | Predicting mutations from evidence... | | Annotating mutations... | | Drawing coverage plots... | | Creating index HTML table... | +----------------------------------------------------------------------------+ | SUCCESSFULLY COMPLETED