[[ノート/ノート]]~
訪問者数 &counter(); 最終更新 &lastmod();
[[ノート/breseq/チュートリアルを試す1]] > [[ノート/breseq/チュートリアルを試す2]] > [[ノート/breseq/UT-Austinの進化データを試す]]~
[[ノート/breseq/Multi-Isolate1]] > [[ノート/breseq/Multi-Isolate2]]
*breseq 2017-02-25 [#dafede4e]
Identification of mutations in laboratory evolved microbes from
next-generation sequencing data using breseq
3.1 Predicting mutations in a clonal sample (page 3)
$ breseq -j 4 -o Clonal_Output -r REL606.gbk SRR030257_1.fastq SRR030257_2.fastq
出力はHTMLで(見やすく)作られるので、[[この場所:http://ginger.yy.is.sci.toho-u.ac.jp/~yamanouc/NGS2017/SRR030257_output]] に置いた。
>(2017-04-08追加) 参照シーケンスをREL606とは別の「O157H7」にしてみた。~
チュートリアルでは親株のREL616との差異(変異)を見ようとして、bowtie2での参照シーケンスをREL606としてそれにリードを張り付けている。~
ここの追加では、試しに、参照シーケンスを変えてみる。NCBIが貼っているEscherichia coliの参照シーケンスのうち、適当に選んでみる。今回はO157-H7のアノテーション付きGeneBankデータを使ってみた。~
[[結果はここに置いた。:http://ginger.yy.is.sci.toho-u.ac.jp/~yamanouc/NGS2017/SRR030257-O157H7_output]]~
元の試行と極端に異なるのが、outputをまとめるための処理時間である。Summary Statisticsのタブを開くと、下の方にExecution Timeの表が載っている。O157-H7の方が全体に長くかかる傾向があるが、特にOutputのセクションが長い。
元の試行と極端に異なるのが、outputをまとめるための処理時間である。Summary Statisticsのタブを開くと、下の方にExecution Timeの表が載っている。O157-H7の方が全体に長くかかる傾向があるが、特にPolymorphic StatisticsとOutputのセクションが伸びている。(Polymorphic Statisticsが43倍、Outputが1430倍)
| step |REL606 | O157-H7 |
|Read and reference sequence file input|41 seconds|49 seconds|
|Read alignment to reference genome |23 minutes 46 seconds|36 minutes 11 seconds|
|Preprocessing alignments for candidate junction identification|2 minutes 43 seconds|6 minutes 50 seconds|
|Preliminary analysis of coverage distribution|1 minute 38 seconds|1 minute 41 seconds|
|Identifying junction candidates|14 seconds|17 seconds|
|Re-alignment to junction candidates|27 seconds|44 seconds|
|Resolving alignments with junction candidates|2 minutes 58 seconds|3 minutes 34 seconds|
|Creating BAM files|1 minute 26 seconds|1 minute 25 seconds|
|Tabulating error counts|30 seconds|49 seconds|
|Re-calibrating base error rates|1 second|2 seconds|
|Examining read alignment evidence| 5 minutes 51 seconds|15 minutes 45 seconds|
|Polymorphism statistics| 1 second| 43 seconds|
|Output | 21 seconds | 8 hours 20 minutes 28 seconds|
|Total | 40 minutes 37 seconds |9 hours 29 minutes 18 seconds|
breseqが何をやっているかを詳細に見たい。実行ログが吐き出されるので、以下に貼ってみた。これをていねいに追うと、何をどのように(=どのようなパラメタを与えて)やっているかが見えてくる。
[yamanouc@ginger Clonal_Sample]$ breseq -j 4 -o Clonal_Output -r REL606.gbk SRR030257_1.fastq SRR030257_2.fastq
================================================================================
breseq 0.30.0 http://barricklab.org/breseq
Active Developers: Barrick JE, Deatherage DE
Contact: <jeffrey.e.barrick@gmail.com>
breseq is free software; you can redistribute it and/or modify it under the
terms the GNU General Public License as published by the Free Software
Foundation; either version 2, or (at your option) any later version.
Copyright (c) 2008-2010 Michigan State University
Copyright (c) 2011-2016 The University of Texas at Austin
If you use breseq in your research, please cite:
Deatherage, D.E., Barrick, J.E. (2014) Identification of mutations
in laboratory-evolved microbes from next-generation sequencing
data using breseq. Methods Mol. Biol. 1151: 165?188.
If you use structural variation (junction) predictions, please cite:
Barrick, J.E., Colburn, G., Deatherage D.E., Traverse, C.C.,
Strand, M.D., Borges, J.J., Knoester, D.B., Reba, A., Meyer, A.G.
(2014) Identifying structural variation in haploid microbial genomes
from short-read resequencing data using breseq. BMC Genomics 15:1039.
================================================================================
---> bowtie2 :: version 2.3.0 [/usr/local/bin/bowtie2]
---> R :: version 3.3.2 [/usr/bin/R]
+++ NOW PROCESSING Read and reference sequence file input
READ FILE::SRR030257_1
Converting/filtering FASTQ file...
Original base quality format: SANGER New format: SANGER
Original reads: 3800180 bases: 136806480
Filtered reads: 19 (?50% N) 60951 (?90% one base)
Analyzed reads: 3739210 bases: 134611560
READ FILE::SRR030257_2
Converting/filtering FASTQ file...
Original base quality format: SANGER New format: SANGER
Original reads: 3800180 bases: 136806480
Filtered reads: 369 (?50% N) 47630 (?90% one base)
Analyzed reads: 3752181 bases: 135078516
::TOTAL::
Original reads: 7600360 bases: 273612960
Analyzed reads: 7491391 bases: 269690076
[samtools] faidx Clonal_Output/data/reference.fasta
REFERENCE: REL606
LENGTH: 4629812
+++ NOW PROCESSING Read alignment to reference genome
[system] bowtie2-build -q Clonal_Output/data/reference.fasta Clonal_Output/02_reference_alignment/reference
[system] bowtie2 -t -p 4 --local -L 18 --ma 1 --mp 3 --np 0 --rdg 2,3 --rfg 2,3 --ignore-quals -k 2000 -i S,1,0.25 --score-min L,1,0.9 --reorder
-x Clonal_Output/02_reference_alignment/reference -U Clonal_Output/01_sequence_conversion/SRR030257_1.converted.fastq
-S Clonal_Output/02_reference_alignment/SRR030257_1.stage1.sam
--un Clonal_Output/02_reference_alignment/SRR030257_1.stage1.unmatched.fastq
Time loading reference: 00:00:00
Time loading forward index: 00:00:00
Time loading mirror index: 00:00:00
Multiseed full-index search: 00:00:33
3739210 reads; of these:
3739210 (100.00%) were unpaired; of these:
600703 (16.06%) aligned 0 times
3040853 (81.32%) aligned exactly 1 time
97654 (2.61%) aligned >1 times
83.94% overall alignment rate
Time searching: 00:00:33
Overall time: 00:00:33
[system] bowtie2 -t -p 4 --local -L 18 --ma 1 --mp 3 --np 0 --rdg 2,3 --rfg 2,3 --ignore-quals -k 2000 -i S,1,0.25 --score-min L,1,0.9 --reorder
-x Clonal_Output/02_reference_alignment/reference -U Clonal_Output/01_sequence_conversion/SRR030257_2.converted.fastq
-S Clonal_Output/02_reference_alignment/SRR030257_2.stage1.sam
--un Clonal_Output/02_reference_alignment/SRR030257_2.stage1.unmatched.fastq
Time loading reference: 00:00:00
Time loading forward index: 00:00:00
Time loading mirror index: 00:00:00
Multiseed full-index search: 00:00:34
3752181 reads; of these:
3752181 (100.00%) were unpaired; of these:
612595 (16.33%) aligned 0 times
3042265 (81.08%) aligned exactly 1 time
97321 (2.59%) aligned >1 times
83.67% overall alignment rate
Time searching: 00:00:34
Overall time: 00:00:34
[system] bowtie2 -t -p 4 --local -L 9 --ma 1 --mp 3 --np 0 --rdg 2,3 --rfg 2,3 --ignore-quals -k 2000 -i S,1,0.25 --score-min L,6,0.2 --reorder
-x Clonal_Output/02_reference_alignment/reference -U Clonal_Output/02_reference_alignment/SRR030257_1.stage1.unmatched.fastq
-S Clonal_Output/02_reference_alignment/SRR030257_1.stage2.matched.sam
Time loading reference: 00:00:00
Time loading forward index: 00:00:00
Time loading mirror index: 00:00:00
Multiseed full-index search: 00:10:58
600703 reads; of these:
600703 (100.00%) were unpaired; of these:
3722 (0.62%) aligned 0 times
10227 (1.70%) aligned exactly 1 time
586754 (97.68%) aligned >1 times
99.38% overall alignment rate
Time searching: 00:10:58
Overall time: 00:10:58
[system] bowtie2 -t -p 4 --local -L 9 --ma 1 --mp 3 --np 0 --rdg 2,3 --rfg 2,3 --ignore-quals -k 2000 -i S,1,0.25 --score-min L,6,0.2 --reorder
-x Clonal_Output/02_reference_alignment/reference -U Clonal_Output/02_reference_alignment/SRR030257_2.stage1.unmatched.fastq
-S Clonal_Output/02_reference_alignment/SRR030257_2.stage2.matched.sam
Time loading reference: 00:00:00
Time loading forward index: 00:00:00
Time loading mirror index: 00:00:00
Multiseed full-index search: 00:11:03
612595 reads; of these:
612595 (100.00%) were unpaired; of these:
5810 (0.95%) aligned 0 times
13420 (2.19%) aligned exactly 1 time
593365 (96.86%) aligned >1 times
99.05% overall alignment rate
Time searching: 00:11:03
Overall time: 00:11:03
[system] rm -f Clonal_Output/02_reference_alignment/reference*
[system] rm -f Clonal_Output/02_reference_alignment/SRR030257_1.stage1.unmatched.fastq
[system] rm -f Clonal_Output/02_reference_alignment/SRR030257_1.stage1.sam
[system] rm -f Clonal_Output/02_reference_alignment/SRR030257_2.stage1.unmatched.fastq
[system] rm -f Clonal_Output/02_reference_alignment/SRR030257_2.stage1.sam
[system] rm -f Clonal_Output/02_reference_alignment/SRR030257_1.stage2.matched.sam
[system] rm -f Clonal_Output/02_reference_alignment/SRR030257_2.stage2.matched.sam
+++ NOW PROCESSING Preprocessing alignments for candidate junction identification
Preprocessing alignments.
READ FILE::SRR030257_1
ALIGNED READ:100000
ALIGNED READ:200000
ALIGNED READ:300000
:
ALIGNED READ:3500000
ALIGNED READ:3600000
ALIGNED READ:3700000
READ FILE::SRR030257_2
ALIGNED READ:3800000
ALIGNED READ:3900000
ALIGNED READ:4000000
:
ALIGNED READ:7200000
ALIGNED READ:7300000
ALIGNED READ:7400000
Summary...
Aligned reads: 7481861
Read alignments: 19713263
Alignments split on indels: 21
Reads with alignments split on indels: 21
Split alignments: 6090236
Reads with split alignments: 601157
+++ NOW PROCESSING Preliminary analysis of coverage distribution
[samtools] import Clonal_Output/data/reference.fasta.fai Clonal_Output/03_candidate_junctions/best.sam Clonal_Output/03_candidate_junctions/best.unsorted.bam
[samtools] sort --threads 4 -o Clonal_Output/03_candidate_junctions/best.bam
-T Clonal_Output/03_candidate_junctions/best.bam Clonal_Output/03_candidate_junctions/best.unsorted.bam
[samtools] index Clonal_Output/03_candidate_junctions/best.bam Clonal_Output/03_candidate_junctions/best.bam.bai
REFERENCE: REL606
LENGTH: 4629812
POSITION:10000
POSITION:20000
POSITION:30000
:
POSITION:4610000
POSITION:4620000
[system] R --vanilla < /usr/local/bin/../share/breseq//coverage_distribution.r >
Clonal_Output/03_candidate_junctions/0.unique_only_coverage_distribution.tab.r.log
distribution_file=Clonal_Output/03_candidate_junctions/0.unique_only_coverage_distribution.tab
plot_file=Clonal_Output/03_candidate_junctions/0.coverage.pdf deletion_propagation_pr_cutoff=2.32374e-05
[system] rm -f Clonal_Output/03_candidate_junctions/best.unsorted.bam
[system] rm -f Clonal_Output/03_candidate_junctions/best.sam
[system] rm -f Clonal_Output/03_candidate_junctions/best.bam
[system] rm -f Clonal_Output/03_candidate_junctions/best.bam.bai
[system] rm -f Clonal_Output/03_candidate_junctions/0.coverage.pdf
[system] rm -f Clonal_Output/03_candidate_junctions/0.unique_only_coverage_distribution.tab
[system] rm -f Clonal_Output/03_candidate_junctions/0.unique_only_coverage_distribution.tab.r.log
+++ NOW PROCESSING Identifying junction candidates
READ FILE::SRR030257_1
ALIGNED READ:10000 CANDIDATE JUNCTIONS:2852
ALIGNED READ:20000 CANDIDATE JUNCTIONS:5715
ALIGNED READ:30000 CANDIDATE JUNCTIONS:8570
:
ALIGNED READ:270000 CANDIDATE JUNCTIONS:83719
ALIGNED READ:280000 CANDIDATE JUNCTIONS:87232
Passed alignment pairs examined: 100001
WARNING: Reached limit of 100000 passed alignment pairs.
Specify a greater value for --junction-alignment-pair-limit for more thorough junction prediction.
Taking top candidate junctions...
Minimum number to keep: 100
Maximum number to keep: 5000
Maximum length to keep: 462981 bases
Initial: Number = 76, Cumulative Length = 5723 bases
Testing Pos Hash Score = 11, Num = 1, Length = 75
Testing Pos Hash Score = 10, Num = 1, Length = 74
Testing Pos Hash Score = 9, Num = 2, Length = 147
Testing Pos Hash Score = 8, Num = 4, Length = 297
Testing Pos Hash Score = 7, Num = 2, Length = 149
Testing Pos Hash Score = 6, Num = 6, Length = 436
Testing Pos Hash Score = 5, Num = 3, Length = 220
Testing Pos Hash Score = 4, Num = 10, Length = 734
Testing Pos Hash Score = 3, Num = 10, Length = 742
Testing Pos Hash Score = 2, Num = 37, Length = 2849
Accepted: Number = 76, Pos Hash Score >= 2, Cumulative Length = 5723 bases
[samtools] faidx Clonal_Output/03_candidate_junctions/candidate_junction.fasta
[system] rm -f Clonal_Output/03_candidate_junctions/SRR030257_1.split.sam
[system] rm -f Clonal_Output/03_candidate_junctions/SRR030257_2.split.sam
+++ NOW PROCESSING Re-alignment to junction candidates
[system] bowtie2-build -q Clonal_Output/03_candidate_junctions/candidate_junction.fasta Clonal_Output/04_candidate_junction_alignment/candidate_junction
[system] bowtie2 -t -p 4 --local -L 10 --ma 1 --mp 3 --np 0 --rdg 2,3 --rfg 2,3 --ignore-quals -k 2000 -i S,1,0.25 --score-min L,1,0.70 --reorder
-x Clonal_Output/04_candidate_junction_alignment/candidate_junction -U Clonal_Output/01_sequence_conversion/SRR030257_1.converted.fastq
-S Clonal_Output/04_candidate_junction_alignment/SRR030257_1.candidate_junction.sam
Time loading reference: 00:00:00
Time loading forward index: 00:00:00
Time loading mirror index: 00:00:00
Multiseed full-index search: 00:00:13
3739210 reads; of these:
3739210 (100.00%) were unpaired; of these:
3735689 (99.91%) aligned 0 times
920 (0.02%) aligned exactly 1 time
2601 (0.07%) aligned >1 times
0.09% overall alignment rate
Time searching: 00:00:13
Overall time: 00:00:13
[system] bowtie2 -t -p 4 --local -L 10 --ma 1 --mp 3 --np 0 --rdg 2,3 --rfg 2,3 --ignore-quals -k 2000 -i S,1,0.25 --score-min L,1,0.70 --reorder
-x Clonal_Output/04_candidate_junction_alignment/candidate_junction -U Clonal_Output/01_sequence_conversion/SRR030257_2.converted.fastq
-S Clonal_Output/04_candidate_junction_alignment/SRR030257_2.candidate_junction.sam
Time loading reference: 00:00:00
Time loading forward index: 00:00:00
Time loading mirror index: 00:00:00
Multiseed full-index search: 00:00:13
3752181 reads; of these:
3752181 (100.00%) were unpaired; of these:
3748811 (99.91%) aligned 0 times
840 (0.02%) aligned exactly 1 time
2530 (0.07%) aligned >1 times
0.09% overall alignment rate
Time searching: 00:00:13
Overall time: 00:00:13
[system] rm -f Clonal_Output/04_candidate_junction_alignment/candidate_junction*
+++ NOW PROCESSING Resolving alignments with junction candidates
READ FILE:SRR030257_1
READS:10000
READS:20000
:
READS:3720000
READS:3730000
READ FILE:SRR030257_2
READS:3740000
READS:3750000
:
READS:7480000
READS:7490000
[system] rm -f Clonal_Output/01_sequence_conversion/SRR030257_1.converted.fastq
[system] rm -f Clonal_Output/01_sequence_conversion/SRR030257_2.converted.fastq
[system] rm -f Clonal_Output/02_reference_alignment/SRR030257_1.reference.sam
[system] rm -f Clonal_Output/02_reference_alignment/SRR030257_2.reference.sam
[system] rm -f Clonal_Output/04_candidate_junction_alignment/SRR030257_1.candidate_junction.sam*
[system] rm -f Clonal_Output/04_candidate_junction_alignment/SRR030257_2.candidate_junction.sam*
+++ NOW PROCESSING Creating BAM files
[samtools] import Clonal_Output/03_candidate_junctions/candidate_junction.fasta.fai Clonal_Output/05_alignment_correction/junction.sam
Clonal_Output/06_bam/junction.unsorted.bam
[samtools] sort --threads 4 -o Clonal_Output/06_bam/junction.bam -T Clonal_Output/06_bam/junction.bam Clonal_Output/06_bam/junction.unsorted.bam
[system] rm -f Clonal_Output/06_bam/junction.unsorted.bam
[samtools] index Clonal_Output/06_bam/junction.bam Clonal_Output/06_bam/junction.bam.bai
[samtools] import Clonal_Output/data/reference.fasta.fai Clonal_Output/05_alignment_correction/reference.sam Clonal_Output/06_bam/reference.unsorted.bam
[samtools] sort --threads 4 -o Clonal_Output/data/reference.bam -T Clonal_Output/data/reference.bam Clonal_Output/06_bam/reference.unsorted.bam
[system] rm -f Clonal_Output/06_bam/reference.unsorted.bam
[samtools] index Clonal_Output/data/reference.bam Clonal_Output/data/reference.bam.bai
[system] rm -f Clonal_Output/05_alignment_correction/reference.sam
[system] rm -f Clonal_Output/05_alignment_correction/junction.sam
+++ NOW PROCESSING Tabulating error counts
REFERENCE: REL606
LENGTH: 4629812
POSITION:10000
POSITION:20000
POSITION:30000
:
POSITION:4610000
POSITION:4620000
+++ NOW PROCESSING Re-calibrating base error rates
[system] R --vanilla < /usr/local/bin/../share/breseq//coverage_distribution.r >
Clonal_Output/07_error_calibration/0.unique_only_coverage_distribution.tab.r.log
distribution_file=Clonal_Output/07_error_calibration/0.unique_only_coverage_distribution.tab
plot_file=Clonal_Output/output/calibration/0.unique_coverage.pdf
deletion_propagation_pr_cutoff=2.32374e-05
[system] R --vanilla in_file=Clonal_Output/07_error_calibration/base_qual_error_prob.SRR030257_1.tab
out_file=Clonal_Output/output/calibration/SRR030257_1.error_rates.pdf <
/usr/local/bin/../share/breseq//plot_error_rate.r >
Clonal_Output/07_error_calibration/SRR030257_1.plot_error_rate.log
[system] R --vanilla in_file=Clonal_Output/07_error_calibration/base_qual_error_prob.SRR030257_2.tab
out_file=Clonal_Output/output/calibration/SRR030257_2.error_rates.pdf <
/usr/local/bin/../share/breseq//plot_error_rate.r >
Clonal_Output/07_error_calibration/SRR030257_2.plot_error_rate.log
+++ NOW PROCESSING Examining read alignment evidence
REFERENCE: REL606
LENGTH: 4629812
POSITION:10000
POSITION:20000
:
POSITION:4610000
POSITION:4620000
+++ NOW PROCESSING Polymorphism statistics
[system] R --vanilla total_length=4629812 in_file=Clonal_Output/08_mutation_identification/polymorphism_statistics_input.tab
out_file=Clonal_Output/08_mutation_identification/polymorphism_statistics_output.tab qual_file=Clonal_Output/08_mutation_identification/error_counts.tab <
/usr/local/bin/../share/breseq//polymorphism_statistics.r >
Clonal_Output/08_mutation_identification/polymorphism_statistics_output.log
+++ NOW PROCESSING Output
Creating merged genome diff evidence file...
Predicting mutations from evidence...
Annotating mutations...
Drawing coverage plots...
Creating coverage plot for region: REL606:1-4629812
Creating coverage plot for region: REL606:546958-555936
Creating coverage plot for region: REL606:2031674-2054972
Creating coverage plot for region: REL606:3289962-3289977
Creating coverage plot for region: REL606:3893578-3901930
Creating read alignment for region: REL606:546957-546957
Creating read alignment for region: REL606:555937-555937
Creating read alignment for region: REL606:2031673-2031673
Creating read alignment for region: REL606:2054973-2054973
Creating read alignment for region: REL606:3289961-3289961
Creating read alignment for region: REL606:3289978-3289978
Creating read alignment for region: REL606:3893577-3893577
Creating read alignment for region: REL606:3901931-3901931
Creating read alignment for region: REL606:161041.0-161041.0
Creating read alignment for region: REL606:380188.0-380188.0
Creating read alignment for region: REL606:430835.0-430835.0
Creating read alignment for region: REL606:475288.1-475288.1
Creating read alignment for region: REL606:649391.0-649391.0
Creating read alignment for region: REL606:668787.0-668787.0
Creating read alignment for region: REL606:683496.0-683496.0
Creating read alignment for region: REL606:910316.0-910316.0
Creating read alignment for region: REL606:1004251.0-1004251.0
Creating read alignment for region: REL606:1248380.0-1248380.0
Creating read alignment for region: REL606:1286699.0-1286699.0
Creating read alignment for region: REL606:1329516.0-1329516.0
Creating read alignment for region: REL606:1976879.0-1976879.0
Creating read alignment for region: REL606:2082685.0-2082685.0
Creating read alignment for region: REL606:2732014.0-2732014.0
Creating read alignment for region: REL606:3045069.0-3045069.0
Creating read alignment for region: REL606:3155168.0-3155168.0
Creating read alignment for region: REL606:3200643.0-3200643.0
Creating read alignment for region: REL606:3248957.0-3248957.0
Creating read alignment for region: REL606:3288092.0-3288092.0
Creating read alignment for region: REL606:3339158.0-3339158.0
Creating read alignment for region: REL606:3369432.0-3369432.0
Creating read alignment for region: REL606:3370027.0-3370027.0
Creating read alignment for region: REL606:3483047.0-3483047.0
Creating read alignment for region: REL606:3490112.0-3490112.0
Creating read alignment for region: REL606:3762741.0-3762741.0
Creating read alignment for region: REL606:3875625.1-3875625.1
Creating read alignment for region: REL606:3893550.1-3893550.1
Creating read alignment for region: REL606:4100655.0-4100655.0
Creating read alignment for region: REL606:4126699.0-4126699.0
Creating read alignment for region: REL606:4201911.0-4201911.0
Creating read alignment for region: REL606:4616396.0-4616396.0
Creating read alignment for region: REL606:87812.0-87812.0
Creating read alignment for region: REL606:93946.0-93946.0
Creating read alignment for region: REL606:139812.0-139812.0
Creating read alignment for region: REL606:402056.0-402056.0
Creating read alignment for region: REL606:986054.0-986054.0
Creating read alignment for region: REL606:1318108.0-1318108.0
Creating read alignment for region: REL606:1509936.0-1509936.0
Creating read alignment for region: REL606:1812720.0-1812720.0
Creating read alignment for region: REL606:1854388.0-1854388.0
Creating read alignment for region: REL606:2167569.0-2167569.0
Creating read alignment for region: REL606:2698093.0-2698093.0
Creating read alignment for region: REL606:3357000.0-3357000.0
Creating read alignment for region: REL606:3357002.0-3357002.0
Creating read alignment for region: REL606:3357010.0-3357010.0
Creating read alignment for region: REL606:3357012.0-3357012.0
Creating read alignment for region: REL606:3357016.0-3357016.0
Creating read alignment for region: REL606:4080986.0-4080986.0
Creating read alignment for region: REL606:4462991.0-4462991.0
Creating read alignment for region: REL606:4462995.0-4462995.0
Creating read alignment for region: REL606:4462997.0-4462997.0
Creating read alignment for region: REL606__1__1__REL606__4629812__-1__0____36__36__0__0:36-37
Creating read alignment for region: REL606:1-1
Creating read alignment for region: REL606:4629812-4629812
Creating read alignment for region: REL606__16972__1__REL606__588495__1__0____36__36__0__1:36-37
Creating read alignment for region: REL606:16972-16972
Creating read alignment for region: REL606:588495-588495
Creating read alignment for region: REL606__16976__-1__REL606__590471__-1__2____36__36__0__1:36-39
Creating read alignment for region: REL606:16974-16974
Creating read alignment for region: REL606:590471-590471
Creating read alignment for region: REL606__72407__1__REL606__3650758__1__7____36__36__0__0:36-44
Creating read alignment for region: REL606:72407-72407
Creating read alignment for region: REL606:3650765-3650765
Creating read alignment for region: REL606__391964__-1__REL606__3875885__-1__-6__TTACTG__36__36__0__0:36-43
Creating read alignment for region: REL606:391964-391964
Creating read alignment for region: REL606:3875885-3875885
Creating read alignment for region: REL606__547699__-1__REL606__555924__1__0____36__36__1__1:36-37
Creating read alignment for region: REL606:547699-547699
Creating read alignment for region: REL606:555924-555924
Creating read alignment for region: REL606__589884__1__REL606__590471__-1__2____36__36__1__1:36-39
Creating read alignment for region: REL606:589886-589886
Creating read alignment for region: REL606:590471-590471
Creating read alignment for region: REL606__664688__1__REL606__1270658__1__2____36__36__1__0:36-39
Creating read alignment for region: REL606:664688-664688
Creating read alignment for region: REL606:1270660-1270660
Creating read alignment for region: REL606__809428__1__REL606__4141021__-1__5____36__36__0__0:36-42
Creating read alignment for region: REL606:809428-809428
Creating read alignment for region: REL606:4141016-4141016
Creating read alignment for region: REL606__1154778__1__REL606__4215353__-1__11____36__36__0__0:36-48
Creating read alignment for region: REL606:1154778-1154778
Creating read alignment for region: REL606:4215342-4215342
Creating read alignment for region: REL606__1733645__1__REL606__2775877__-1__2____36__36__0__1:36-39
Creating read alignment for region: REL606:1733647-1733647
Creating read alignment for region: REL606:2775877-2775877
Creating read alignment for region: REL606__1733649__-1__REL606__2774435__1__0____36__36__0__1:36-37
Creating read alignment for region: REL606:1733649-1733649
Creating read alignment for region: REL606:2774435-2774435
Creating read alignment for region: REL606__1802331__1__REL606__3944627__1__7____36__36__0__0:36-44
Creating read alignment for region: REL606:1802331-1802331
Creating read alignment for region: REL606:3944634-3944634
Creating read alignment for region: REL606__1821523__1__REL606__2775877__-1__2____36__36__0__1:36-39
Creating read alignment for region: REL606:1821525-1821525
Creating read alignment for region: REL606:2775877-2775877
Creating read alignment for region: REL606__1821528__-1__REL606__2774435__1__1____36__36__0__1:36-38
Creating read alignment for region: REL606:1821527-1821527
Creating read alignment for region: REL606:2774435-2774435
Creating read alignment for region: REL606__2103888__1__REL606__2103918__-1__11____36__36__0__0:36-48
Creating read alignment for region: REL606:2103888-2103888
Creating read alignment for region: REL606:2103907-2103907
Creating read alignment for region: REL606__2429311__1__REL606__2792997__-1__8____36__36__0__0:36-45
Creating read alignment for region: REL606:2429311-2429311
Creating read alignment for region: REL606:2792989-2792989
Creating read alignment for region: REL606__2772854__1__REL606__4524530__-1__3____36__36__1__0:36-40
Creating read alignment for region: REL606:2772854-2772854
Creating read alignment for region: REL606:4524527-4524527
Creating read alignment for region: REL606__2774196__-1__REL606__4524519__1__3____36__36__1__0:36-40
Creating read alignment for region: REL606:2774196-2774196
Creating read alignment for region: REL606:4524522-4524522
Creating read alignment for region: REL606__2775877__-1__REL606__3015769__1__2____36__36__1__0:36-39
Creating read alignment for region: REL606:2775877-2775877
Creating read alignment for region: REL606:3015771-3015771
Creating read alignment for region: REL606__3015774__-1__REL606__3894996__-1__0____36__36__0__1:36-37
Creating read alignment for region: REL606:3015774-3015774
Creating read alignment for region: REL606:3894996-3894996
Creating read alignment for region: REL606__3289961__-1__REL606__3289978__1__0____36__36__0__0:36-37
Creating read alignment for region: REL606:3289961-3289961
Creating read alignment for region: REL606:3289978-3289978
Creating read alignment for region: REL606__3630716__-1__REL606__3652533__-1__0____36__36__1__1:36-37
Creating read alignment for region: REL606:3630716-3630716
Creating read alignment for region: REL606:3652533-3652533
Creating read alignment for region: REL606__3651091__1__REL606__3741531__-1__0____36__36__1__1:36-37
Creating read alignment for region: REL606:3651091-3651091
Creating read alignment for region: REL606:3741531-3741531
Creating read alignment for region: REL606__3652533__-1__REL606__3901929__1__2____36__36__1__0:36-39
Creating read alignment for region: REL606:3652533-3652533
Creating read alignment for region: REL606:3901931-3901931
Creating read alignment for region: REL606__3749572__-1__REL606__4157637__-1__6____36__36__0__0:36-43
Creating read alignment for region: REL606:3749572-3749572
Creating read alignment for region: REL606:4157631-4157631
Creating read alignment for region: REL606__3813062__-1__REL606__4157637__-1__6____36__36__0__0:36-43
Creating read alignment for region: REL606:3813062-3813062
Creating read alignment for region: REL606:4157631-4157631
Creating read alignment for region: REL606__3944627__1__REL606__4157637__-1__7____36__36__0__0:36-44
Creating read alignment for region: REL606:3944627-3944627
Creating read alignment for region: REL606:4157630-4157630
Creating index HTML table...
[system] rm -f Clonal_Output/01_sequence_conversion/*.trims
[system] rm -f Clonal_Output/03_candidate_junctions/candidate_junction.fasta
[system] rm -f Clonal_Output/03_candidate_junctions/candidate_junction.fasta.fai
[system] rm -f Clonal_Output/05_alignment_correction/jc_evidence.gd
[system] rm -f Clonal_Output/06_bam/junction.bam
[system] rm -f Clonal_Output/06_bam/junction.bam.bai
+++ SUCCESSFULLY COMPLETED
**出力結果 [#n5e322ee]
ディレクトリ Colonal Outputを見る。出力はWebで見やすいようになっている。
ここでは、サーバーからディレクトリ Colonal Output全体をローカルWinにコピーして
その中で見て回る。
index.htmlからはじまる。
&ref(BRESEQ __ Mutation Predictions.pdf);
&ref(BRESEQ __ Marginal Predictions.pdf);
&ref(BRESEQ __ Summary Statistics.pdf);
** ログファイルから、処理を追ってみる [#xb1517e5]
&ref(2017-03-14_breseq_cmdlineの解説.txt);
// 01 sequence conversion
ファイル SRR030257_1
ファイル SRR030257_2
|
+------------------+
| シーケンス変換 | 入力をfastaやGFF3に変換する、quality scoreをSangerにする
+------------------+
|
+-------------------+
| SAMのfaidxを作る| samtools_faidx(reference.fasta)
+-------------------+
|
+--------------------+
| trimファイルを作る| calculate_trims(..)
+--------------------+
|
// 02 reference alignment
|
BUILDING MAPPING INDEX
+------------------------------------------------------------------------------+
| bowtie2-build -q reference.fasta 02_reference_alignment/reference |
+------------------------------------------------------------------------------+
|
STAGE-1 ALIGNMENT
+------------------------------------------------------------------------------+
| bowtie2 -t -p 4 --local -L 18 --ma 1 --mp 3 --np 0 --rdg 2,3 --rfg 2,3 |
| --ignore-quals -k 2000 -i S,1,0.25 --score-min L,1,0.9 --reorder |
| -x 02_reference_alignment/reference |
| -U 01_sequence_conversion/SRR030257_1.converted.fastq |
| -S 02_reference_alignment/SRR030257_1.stage1.sam |
| --un 02_reference_alignment/SRR030257_1.stage1.unmatched.fastq |
+------------------------------------------------------------------------------+
| 83.94% overall alignment rate
+------------------------------------------------------------------------------+
| bowtie2 -t -p 4 --local -L 18 --ma 1 --mp 3 --np 0 --rdg 2,3 --rfg 2,3 |
| --ignore-quals -k 2000 -i S,1,0.25 --score-min L,1,0.9 --reorder |
| -x 02_reference_alignment/reference |
| -U 01_sequence_conversion/SRR030257_2.converted.fastq |
| -S 02_reference_alignment/SRR030257_2.stage1.sam |
| --un 02_reference_alignment/SRR030257_2.stage1.unmatched.fastq |
+-------------------------------------------------------------------------------+
| 83.67% overall alignment rate
|
STAGE-2 ALGINMENT
+-------------------------------------------------------------------------------+
| bowtie2 -t -p 4 --local -L 9 --ma 1 --mp 3 --np 0 --rdg 2,3 --rfg 2,3 |
| --ignore-quals -k 2000 -i S,1,0.25 --score-min L,6,0.2 --reorder |
| -x 02_reference_alignment/reference |
| -U 02_reference_alignment/SRR030257_1.stage1.unmatched.fastq |
| -S 02_reference_alignment/SRR030257_1.stage2.matched.sam +
+--------------------------------------------------------------------------------+
| 99.38% overall alignment rate
+--------------------------------------------------------------------------------+
| bowtie2 -t -p 4 --local -L 9 --ma 1 --mp 3 --np 0 --rdg 2,3 --rfg 2,3 |
| --ignore-quals -k 2000 -i S,1,0.25 --score-min L,6,0.2 --reorder |
| -x 02_reference_alignment/reference |
| -U 02_reference_alignment/SRR030257_2.stage1.unmatched.fastq |
| -S 02_reference_alignment/SRR030257_2.stage2.matched.sam |
+--------------------------------------------------------------------------------+
| 99.05% overall alignment rate
|
MERGE SAM FILES
+---------------------------------------------------------------------------------+
| PreprocessAlignments::merge_sort_sam_files( |
| stage1_reference_sam_file_name, stage2_reference_sam_file_name, |
| reference_sam_file_name ); |
+---------------------------------------------------------------------------------+
|
// 03_candidate_junctions
|
Preprocessing alignments for candidate junction identification
+----------------------------------------------------------------------------------+
| PreprocessAlignments::preprocess_alignments( |
| settings, summary, ref_seq_info); |
+----------------------------------------------------------------------------------+
| Aligned reads: 7481861
| Read alignments: 19713263
| Alignments split on indels: 21
| Reads with alignments split on indels: 21
| Split alignments: 6090236
| Reads with split alignments: 601157
|
Preliminary analysis of coverage distribution
+-----------------------------------------------------------------------------------+
| samtools_import(data/reference.fasta.fai, |
| 03_candidate_junctions/best.sam, |
| 03_candidate_junctions/best.unsorted.bam |
+-----------------------------------------------------------------------------------+
|
+-----------------------------------------------------------------------------------+
| samtools_sort( --threads 4 -o 03_candidate_junctions/best.bam, |
| -T 03_candidate_junctions/best.bam, |
| 03_candidate_junctions/best.unsorted.bam |
+-----------------------------------------------------------------------------------+
|
+----------------------------------------------------------------------------------------+
| samtools_index( |
| 03_candidate_junctions/best.bam 03_candidate_junctions/best.bam.bai ) |
+----------------------------------------------------------------------------------------+
|
+-----------------------------------------------------------------------------------+
| CoverageDistribution::analyze_unique_coverage_distributions(settings, |
| summary, ref_seq_info, settings.coverage_junction_plot_file_name, |
| settings.coverage_junction_distribution_file_name, |
| settings.coverage_junction_done_file_name ); |
+--+ この中でRを呼んで処理 |
| R --vanilla |
| < /usr/local/bin/../share/breseq//coverage_distribution.r |
| > 03_candidate_junctions/0.unique_only_coverage_distribution.tab.r.log |
| distribution_file= |
| 03_candidate_junctions/0.unique_only_coverage_distribution.tab |
| plot_file=03_candidate_junctions/0.coverage.pdf |
| deletion_propagation_pr_cutoff=2.32374e-05 |
+-------------------------------------------------------------------------------+
|
Identifying junction candidates
+-----------------------------------------------------------------------------------+
| CandidateJunctions::identify_candidate_junctions( |
| settings, summary, ref_seq_info); |
+-----------------------------------------------------------------------------------+
| Passed alignment pairs examined: 100001
| WARNING: Reached limit of 100000 passed alignment pairs.
| Specify a greater value for --junction-alignment-pair-limit
| for more thorough junction prediction.
| Taking top candidate junctions...
| Minimum number to keep: 100
| Maximum number to keep: 5000
| Maximum length to keep: 462981 bases
| Initial: Number = 76, Cumulative Length = 5723 bases
| Accepted: Number = 76, Pos Hash Score >= 2,
| Cumulative Length = 5723 bases
|
+-----------------------------------------------------------------------------------+
| samtools_faidx 03_candidate_junctions/candidate_junction.fasta |
+-----------------------------------------------------------------------------------+
|
Re-alignment to junction candidates
+-----------------------------------------------------------------------------------+
| bowtie2-build -q 03_candidate_junctions/candidate_junction.fasta |
| 04_candidate_junction_alignment/candidate_junction |
+-----------------------------------------------------------------------------------+
|
+-----------------------------------------------------------------------------------+
| bowtie2 -t -p 4 --local -L 10 --ma 1 --mp 3 --np 0 --rdg 2,3 --rfg 2,3 |
| --ignore-quals -k 2000 -i S,1,0.25 --score-min L,1,0.70 --reorder |
| -x 04_candidate_junction_alignment/candidate_junction |
| -U 01_sequence_conversion/SRR030257_1.converted.fastq |
| -S 04_candidate_junction_alignment/SRR030257_1.candidate_junction.sam |
+-----------------------------------------------------------------------------------+
| 0.09% overall alignment rate
+-----------------------------------------------------------------------------------+
| bowtie2 -t -p 4 --local -L 10 --ma 1 --mp 3 --np 0 --rdg 2,3 --rfg 2,3 |
| --ignore-quals -k 2000 -i S,1,0.25 --score-min L,1,0.70 --reorder |
| -x 04_candidate_junction_alignment/candidate_junction |
| -U 01_sequence_conversion/SRR030257_2.converted.fastq |
| -S 04_candidate_junction_alignment/SRR030257_2.candidate_junction.sam
+------------------------------------------------------------------------------------+
| 0.09% overall alignment rate
// 05 alignment_correction <-- line 1609
// * Resolve matches to new junction candidates
|
Resolving alignments with junction candidates
+------------------------------------------------------------------------------------+
| resolve_alignments( settings, summary, ref_seq_info, junction_prediction, |
| settings.read_files ); |
+------------------------------------------------------------------------------------+
|
// 05 bam
// * Create BAM read alignment database files
|
Creating BAM files
+-------------------------------------------------------------------------------------+
| samtools_import( 03_candidate_junctions/candidate_junction.fasta.fai |
| 05_alignment_correction/junction.sam 06_bam/junction.unsorted.bam ) |
+--------------------------------------------------------------------------------------+
|
+--------------------------------------------------------------------------------------+
| samtools_sort( --threads 4 -o 06_bam/junction.bam |
| -T 06_bam/junction.bam 06_bam/junction.unsorted.bam ) |
+--------------------------------------------------------------------------------------+
|
+--------------------------------------------------------------------------------------+
| samtools_index( 06_bam/junction.bam 06_bam/junction.bam.bai ) |
+--------------------------------------------------------------------------------------+
|
+--------------------------------------------------------------------------------------+
| samtools_import ( data/reference.fasta.fai |
| 05_alignment_correction/reference.sam 06_bam/reference.unsorted.bam )|
+--------------------------------------------------------------------------------------+
|
+--------------------------------------------------------------------------------------+
| samtools_sort( --threads 4 -o data/reference.bam |
| -T data/reference.bam /06_bam/reference.unsorted.bam ) |
+--------------------------------------------------------------------------------------+
|
+--------------------------------------------------------------------------------------+
| samtools_index( data/reference.bam data/reference.bam.bai ) |
+--------------------------------------------------------------------------------------+
|
Tabulating error counts
+-------------------------------------------------------------------------------------+
| error_count( settings, summary, reference_bam_file_name, |
| reference_fasta_file_name, settings.error_calibration_path, |
| settings.read_files.base_names(), coverage=true, errors=true, |
| preprocess=false, settings.base_quality_cutoff, // minimum quality score|
| read_set=num_read_files,obs_base,ref_base,quality=num_qual ); |
+-------------------------------------------------------------------------------------+
|
// Calculate error rates
|
Re-calibrating base error rates
+------------------------------------------------------------------------------------+
| CoverageDistribution::analyze_unique_coverage_distributions ( |
| settings, summary, ref_seq_info, |
| settings.unique_only_coverage_plot_file_name, |
| settings.unique_only_coverage_distribution_file_name, |
| "NULL" // means to never delete intermediates ); |
+--+ この中でfitを呼ぶ ------------------------------------------------------------+
| vector<string> lines = dist.fit(settings, |
| unique_only_coverage_distribution_file_name, |
| unique_only_coverage_plot_file_name, |
| deletion_propagation_pr_cutoff ); |
+--+ この中でRを呼ぶ ---------------------------------------------------------+
| R --vanilla |
| < /usr/local/bin/../share/breseq//coverage_distribution.r |
| > 07_error_calibration/0.unique_only_coverage_distribution.tab.r.log |
| distribution_file=07_error_calibration/0.unique_only_coverage_distribution.tab |
| plot_file=output/calibration/0.unique_coverage.pdf |
| deletion_propagation_pr_cutoff=2.32374e-05 |
+--------------------------------------------------------------------------- +
|
+----------------------------------------------------------------------------+
| R --vanilla |
| in_file=07_error_calibration/base_qual_error_prob.SRR030257_1.tab|
| out_file=output/calibration/SRR030257_1.error_rates.pdf |
| < /usr/local/bin/../share/breseq//plot_error_rate.r |
| > 07_error_calibration/SRR030257_1.plot_error_rate.log |
+----------------------------------------------------------------------------+
|
+----------------------------------------------------------------------------+
| R --vanilla |
| in_file=07_error_calibration/base_qual_error_prob.SRR030257_2.tab|
| out_file=output/calibration/SRR030257_2.error_rates.pdf |
| < /usr/local/bin/../share/breseq//plot_error_rate.r |
| > 07_error_calibration/SRR030257_2.plot_error_rate.log |
+---------------------------------------------------------------------------+
|
Examining read alignment evidence
+-------------------------------------------------------------------------------------+
| identify_mutations( settings, summary, reference_bam_file_name, |
| reference_fasta_file_name, ra_mc_genome_diff_file_name, |
| deletion_propagation_cutoffs, deletion_seed_cutoffs, |
| settings.mutation_log10_e_value_cutoff, // mutation_cutoff |
| settings.polymorphism_log10_e_value_cutoff, // polymorphism_cutoff |
| settings.polymorphism_frequency_cutoff, //polymorphism_frequency_cutoff|
| settings.polymorphism_precision_decimal, |
| settings.polymorphism_precision_places, |
| settings.print_mutation_identification_per_position_file //per_position_file );|
+--------------------------------------------------------------------------------------+
|
Polymorphism statistics
+--------------------------------------------------------------------------------------+
| ref_seq_info.polymorphism_statistics(settings, summary) |
+--+ この中でpolymorphism_statisticsを呼ぶ ------------------------------------+
| cReferenceSequences::polymorphism_statistics |
+--+ この中でRを呼ぶ --------------------------------------------------------+
| R --vanilla total_length=4629812 |
| in_file=08_mutation_identification/polymorphism_statistics_input.tab|
| out_file=08_mutation_identification/polymorphism_statistics_output.tab|
| qual_file=08_mutation_identification/error_counts.tab |
| < /usr/local/bin/../share/breseq//polymorphism_statistics.r |
| > /08_mutation_identification/polymorphism_statistics_output.log |
+------------------------------------------------------------------------------+
|
Output
+---------------------------------------------------------------------------+
| Creating merged genome diff evidence file... |
| Predicting mutations from evidence... |
| Annotating mutations... |
| Drawing coverage plots... |
| Creating index HTML table... |
+----------------------------------------------------------------------------+
|
SUCCESSFULLY COMPLETED