[[山内のサイト]]

*手順をメモっておく 2017-12-01  [#b35259f4]

**受け取ったデータをコピー [#y57e1290]
**fastqcによる品質チェック [#v5b401a0]
fastqcはgz圧縮されたままでは出力が文字化けしている。unzipしないとダメらしい。

 cd rawdata/1_2-1
unzip 1_2-1_r1.fastq.gz
 mkdir fastqcreportR1
 fastqc --nogroup -o ./fastqcreportR1 1_2-1_R1.fastq

結果はfastqcreportR1の下にできる

**breseqの処理 [#v8b6c773]
breseq -j16 -o Output -r ../CP001637.fastq 1_2-1_R1.fastq 1_2-1_R2.fastq >& breseq.out

***エラーについて [#ea356e8f]
見つけた関係ありそうな投稿

[[R plot error:https://github.com/barricklab/breseq/issues/120]]

やってみるべきこと
-breseq起動時に、オプション --targeted-sequencing を付ける
-breseqでのreadの長さを限定する。1/4から3/4程度に?
-breseqでオプション --limit-fold-coverage 150 with all reads


**結果のbam/samファイル [#ba8d3594]
***bamからsamへ変換するには [#v6ba4905]
 samtools view -h reference.bam > reference.sam

***samファイルの読み方 [#ebd6223b]
[[sam形式ファイル:http://crusade1096.web.fc2.com/sam.html]]

[[Sequence Alignment/Map Format Specification:http://samtools.github.io/hts-specs/SAMv1.pdf]] 大元の仕様書


------------------

**HTSeq [#h55c0b38]
[[HTSeq:https://github.com/simon-anders/htseq/blob/master/README.rst]]

 pip install 'pysam>=0.10.0'
 collecting pysam>=0.10.0
   Downloading pysam-0.13-cp35-cp35m-manylinux1_x86_64.whl (9.1MB)
     100% |????????????????????????????????| 9.1MB 35kB/s
 Installing collected packages: pysam
 Successfully installed pysam-0.13

 pip install HTSeq
 Collecting HTSeq
   Downloading HTSeq-0.9.1-cp35-cp35m-manylinux1_x86_64.whl (1.1MB)
     100% |????????????????????????????????| 1.1MB 217kB/s
 Requirement already satisfied: pysam>=0.9.0 in /home/yamanouc/.pyenv/versions/3.5.1/envs/notebook/lib/python3.5/site-packages (from HTSeq)
 Requirement already satisfied: numpy in /home/yamanouc/.pyenv/versions/3.5.1/envs/notebook/lib/python3.5/site-packages (from HTSeq)
 Installing collected packages: HTSeq
 Successfully installed HTSeq-0.9.1


ドキュメント [[http://ginger.yy.is.sci.toho-u.ac.jp/~yamanouc/HTseq/]]

 >>> import HTSeq
 >>> import itertools
 
 >>> bam_reader = HTSeq.BAM_Reader ("/home/yamanouc/1710JNHX-0008/rawdata/2_6-2-10E/AllseqOutput/data/reference.bam")
 >>> for a in itertools.islice(bam_reader, 5):
 ...     print(a)
 ...
 <SAM_Alignment object: Read '1:3609922' aligned to CP001637.1:[0,151)/+>
 <SAM_Alignment object: Read '1:4039112' aligned to CP001637.1:[0,151)/+>
 <SAM_Alignment object: Read '1:5708010' aligned to CP001637.1:[0,151)/+>
 <SAM_Alignment object: Read '1:13154294' aligned to CP001637.1:[0,151)/+>
 <SAM_Alignment object: Read '1:24786922' aligned to CP001637.1:[0,151)/+>
 >>>

*SamTools、BcfTools  2018-01-02[#p282a5bc]

***SamTools mpileup/tview [#j1ed9780]
-[[Samtools manual:http://www.htslib.org/doc/samtools.html]]-[[]]
-[[SAMtoolsの機能:http://cell-innovation.nig.ac.jp/surfers/SAMtools.html]]
-[[SAMtoolsワンライナー覚書:http://rnakato.hatenablog.jp/entry/2017/11/08/113245]]
-[[samtools 使い方 mpileup ( calling SNPs ) & annotation:http://bioinfo-dojo.net/2016/07/13/samtools_calling_snps/]]

mpileup




***BCFTools [#s30761a3]
-[[BCFtools +mainly recent version up: https://samtools.github.io/bcftools/bcftools.html]]
-[[BCFtools manual page:https://samtools.github.io/bcftools/bcftools-man.html]]
-[[BCFtools manual htslib-version:http://www.htslib.org/doc/bcftools.html]]

Allelic Depth (AD) について
-[[What Is Ad (Allelic Depth) In 1000Genomes Vcf?:https://www.biostars.org/p/5268/]]~

ADとDPの違い

AD and DP : Allele depth and depth of coverage.

These are complementary fields that represent two important ways of thinking about the depth of the data for this sample at this site.

AD is the unfiltered allele depth, i.e. the number of reads that support each of the reported alleles. All reads at the position (including reads that did not pass the variant caller’s filters) are included in this number, except reads that were considered uninformative. Reads are considered uninformative when they do not provide enough statistical evidence to support one allele over another.

DP is the filtered depth, at the sample level. This gives you the number of filtered reads that support each of the reported alleles. You can check the variant caller’s documentation to see which filters are applied by default. Only reads that passed the variant caller’s filters are included in this number. However, unlike the AD calculation, uninformative reads are included in DP.

BCFtools mpileup

 bcftools mpileup -Ou -f ../../../CP001637-all.fasta reference.bam | bcftools call -vmO z -o study.vcf.gz
 ##fileformat=VCFv4.2
 ##FILTER=<ID=PASS,Description="All filters passed">
 ##bcftoolsVersion=1.6+htslib-1.6
 ##bcftoolsCommand=mpileup -Ou -f ../../../CP001637-all.fasta reference.bam
 ##reference=file://../../../CP001637-all.fasta
 ##contig=<ID=CP001637.1,length=4630707>
 ##ALT=<ID=*,Description="Represents allele(s) other than observed.">
 ##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
 ##INFO=<ID=IDV,Number=1,Type=Integer,Description="Maximum number of reads supporting an indel">
 ##INFO=<ID=IMF,Number=1,Type=Float,Description="Maximum fraction of reads supporting an indel">
 ##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
 ##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias for filtering splice-site artefacts in RNA-seq data (bigger is better)",Version="3">
 ##INFO=<ID=RPB,Number=1,Type=Float,Description="Mann-Whitney U test of Read Position Bias (bigger is better)">
 ##INFO=<ID=MQB,Number=1,Type=Float,Description="Mann-Whitney U test of Mapping Quality Bias (bigger is better)">
 ##INFO=<ID=BQB,Number=1,Type=Float,Description="Mann-Whitney U test of Base Quality Bias (bigger is better)">
 ##INFO=<ID=MQSB,Number=1,Type=Float,Description="Mann-Whitney U test of Mapping Quality vs Strand Bias (bigger is better)">
 ##INFO=<ID=SGB,Number=1,Type=Float,Description="Segregation based metric.">
 ##INFO=<ID=MQ0F,Number=1,Type=Float,Description="Fraction of MQ0 reads (smaller is better)">
 ##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
 ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
 ##INFO=<ID=ICB,Number=1,Type=Float,Description="Inbreeding Coefficient Binomial test (bigger is better)">
 ##INFO=<ID=HOB,Number=1,Type=Float,Description="Bias in the number of HOMs number (smaller is better)">
 ##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes for each ALT allele, in the same order as listed">
 ##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
 ##INFO=<ID=DP4,Number=4,Type=Integer,Description="Number of high-quality ref-forward , ref-reverse, alt-forward and alt-reverse bases">
 ##INFO=<ID=MQ,Number=1,Type=Integer,Description="Average mapping quality">
 ##bcftools_callVersion=1.6+htslib-1.6
 ##bcftools_callCommand=call -vmO z -o study.vcf.gz; Date=Tue Jan  2 07:55:28 2018
 #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  reference.bam
 CP001637.1      1223    .       G       T       139     .       DP=237;VDB=0.926447;SGB=-0.693147;MQSB=1;MQ0F=0;AC=2;AN=2;DP4=0,0,229,2;MQ=20   GT:PL   1/1:169,255,0
 CP001637.1      2903    .       G       A       144     .       DP=234;VDB=0.779268;SGB=-0.693147;RPB=0.892377;MQB=1;MQSB=1;BQB=0.7713;MQ0F=0;AC=2;AN=2;DP4=2,0,219,4;MQ=20     GT:PL   1/1:171,255,0
 CP001637.1      6246    .       C       T       131     .       DP=236;VDB=0.803442;SGB=-0.693147;MQ0F=0;AC=2;AN=2;DP4=0,0,234,0;MQ=20  GT:PL   1/1:161,255,0
 CP001637.1      6258    .       CTTTTT  CTTTT   101     .       INDEL;IDV=233;IMF=0.94332;DP=247;VDB=0.826184;SGB=-0.693147;MQ0F=0;AC=2;AN=2;DP4=17,0,230,0;MQ=20       GT:PL   1/1:128,255,0
 CP001637.1      12837   .       A       G       115     .       DP=243;VDB=0.00479441;SGB=-0.693147;RPB=0.283559;MQB=1;MQSB=1;BQB=0.925163;MQ0F=0;AC=2;AN=2;DP4=11,3,220,5;MQ=20        GT:PL   1/1:142,255,0
 CP001637.1      13078   .       C       T       139     .       DP=232;VDB=0.944894;SGB=-0.693147;MQSB=1;MQ0F=0;AC=2;AN=2;DP4=0,0,230,2;MQ=20   GT:PL   1/1:169,255,0
 CP001637.1      13218   .       C       T       149     .       DP=237;VDB=0.993842;SGB=-0.693147;MQSB=1;MQ0F=0;AC=2;AN=2;DP4=0,0,228,4;MQ=20   GT:PL   1/1:179,255,0
 CP001637.1      16186   .       CAAAAAAAA       CAAAAAAAAA      38.9013 .       INDEL;IDV=215;IMF=0.866935;DP=248;VDB=0.0814562;SGB=-0.693147;MQSB=1;MQ0F=0;AC=2;AN=2;DP4=29,1,216,2;MQ=20      GT:PL   1/1:66,255,0
 CP001637.1      16362   .       C       T       146     .       DP=243;VDB=0.226296;SGB=-0.693147;RPB=0.354375;MQB=1;MQSB=1;BQB=0.459468;MQ0F=0;AC=2;AN=2;DP4=4,0,231,4;MQ=20   GT:PL   1/1:173,255,0
 CP001637.1      18487   .       C       T       136     .       DP=238;VDB=0.153648;SGB=-0.693147;RPB=1;MQB=1;MQSB=1;BQB=1;MQ0F=0;AC=2;AN=2;DP4=1,0,230,1;MQ=20 GT:PL   1/1:163,255,0
 CP001637.1      18769   .       G       A       155     .       DP=245;VDB=0.0565532;SGB=-0.693147;RPB=0.827005;MQB=1;MQSB=1;BQB=0.67469;MQ0F=0;AC=2;AN=2;DP4=7,1,223,10;MQ=20  GT:PL   1/1:182,255,0
 CP001637.1      20751   .       A       G       142     .       DP=243;VDB=0.0859646;SGB=-0.693147;RPB=0.816456;MQB=1;MQSB=1;BQB=0.109705;MQ0F=0;AC=2;AN=2;DP4=2,0,235,2;MQ=20  GT:PL   1/1:169,255,0
 CP001637.1      26552   .       G       A       130     .       DP=237;VDB=0.366551;SGB=-0.693147;RPB=0.454566;MQB=1;MQSB=1;BQB=0.996704;MQ0F=0;AC=2;AN=2;DP4=4,0,231,1;MQ=20   GT:PL   1/1:157,255,0
 CP001637.1      32144   .       C       T       136     .       DP=241;VDB=0.133234;SGB=-0.693147;MQSB=1;MQ0F=0;AC=2;AN=2;DP4=0,0,234,1;MQ=20   GT:PL   1/1:166,255,0
 CP001637.1      33052   .       T       C       96      .       DP=234;VDB=0.00575213;SGB=-0.693147;RPB=0.751256;MQB=1;MQSB=1;BQB=0.173368;MQ0F=0;AC=2;AN=2;DP4=8,2,214,0;MQ=20 GT:PL   1/1:123,255,0
 CP001637.1      33376   .       C       A       156     .       DP=243;VDB=0.383668;SGB=-0.693147;MQSB=1;MQ0F=0;AC=2;AN=2;DP4=0,0,234,5;MQ=20   GT:PL   1/1:186,255,0
 CP001637.1      40843   .       A       G       148     .       DP=240;VDB=0.521241;SGB=-0.693147;RPB=0.966636;MQB=1;MQSB=1;BQB=0.824184;MQ0F=0;AC=2;AN=2;DP4=3,0,232,4;MQ=20   GT:PL   1/1:175,255,0
 CP001637.1      40892   .       C       T       120     .       DP=240;VDB=0.922792;SGB=-0.693147;RPB=0.986237;MQB=1;BQB=0.57425;MQ0F=0;AC=2;AN=2;DP4=5,0,225,0;MQ=20   GT:PL   1/1:147,255,0
 CP001637.1      41111   .       A       G       137     .       DP=242;VDB=0.162769;SGB=-0.693147;RPB=0.377119;MQB=1;MQSB=1;BQB=0.211864;MQ0F=0;AC=2;AN=2;DP4=2,0,235,1;MQ=20   GT:PL   1/1:164,255,0
 CP001637.1      45031   .       C       T       145     .       DP=244;VDB=0.702842;SGB=-0.693147;MQSB=1;MQ0F=0;AC=2;AN=2;DP4=0,0,232,3;MQ=20   GT:PL   1/1:175,255,0
 CP001637.1      49842   .       T       C       223     .       DP=243;VDB=7.62832e-14;SGB=-0.693147;RPB=0.894515;MQB=1;MQSB=1;BQB=0.890295;MQ0F=0;AC=2;AN=2;DP4=1,1,176,61;MQ=20       GT:PL   1/1:250,255,0
 CP001637.1      50215   .       A       G       121     .       DP=183;VDB=0.995931;SGB=-0.693147;RPB=1;MQB=1;MQSB=1;BQB=1;MQ0F=0;AC=2;AN=2;DP4=0,1,176,3;MQ=20 GT:PL   1/1:148,255,0
 CP001637.1      50351   .       A       G       220     .       DP=247;VDB=0.993979;SGB=-0.693147;RPB=1;MQB=1;MQSB=1;BQB=1;MQ0F=0;AC=2;AN=2;DP4=0,1,197,47;MQ=20        GT:PL   1/1:247,255,0
 CP001637.1      53102   .       C       T       122     .       DP=190;VDB=0.999769;SGB=-0.693147;MQSB=1;MQ0F=0;AC=2;AN=2;DP4=0,0,181,3;MQ=20   GT:PL   1/1:152,255,0
 CP001637.1      60168   .       C       A       197     .       DP=244;VDB=2.00034e-13;SGB=-0.693147;RPB=1;MQB=1;MQSB=1;BQB=1;MQ0F=0;AC=2;AN=2;DP4=1,0,222,20;MQ=20     GT:PL   1/1:224,255,0

*VCFファイル 2018-01-02 [#ab154f35]
-[[VCFドキュメント:https://samtools.github.io/hts-specs/VCFv4.2.pdf]]
-[[VFC | VFC フォーマットファイルの特徴および Python でパースする方法:https://bi.biopapyrus.jp/gwas/vcf.html]]

VFCのINFOフィールドに、もしACやAFがあれば容易にわかるはず。
 &#8226; AC : allele count in genotypes, for each ALT allele, in the same order as listed
 &#8226; AF : allele frequency for each ALT allele in the same order as listed: use this 
 when estimated from primary data, not called genotypes
 &#8226; AN : total number of alleles in called genotypes
 &#8226; DP : combined depth across samples, e.g. DP=154
で、これらの情報をだれが作るか?(入っていない?)多分GATKなどのツール

*GATKの情報 [#cd223f86]
[[GATK:https://software.broadinstitute.org/gatk/documentation/]]

ツール群であり、どれをどう使うか?
 Diagnostics and Quality Control Tools
 Sequence Data Processing Tools
 Variant Discovery Tools
 Variant Evaluation Tools
 Variant Manipulation Tools
という感じなので、Variant Discovery Toolsが見つける?
 HaplotypeCaller	Call germline SNPs and indels via local re-assembly of haplotypes
 MuTect2	Call somatic SNPs and indels via local re-assembly of haplotypes
のあたり?

-[[バリアントコール結果のVCFフォーマット:http://kazumaxneo.hatenablog.com/entry/2017/06/02/140208]]

 以下はGATK haplotypecallerで変異検出して出力されたVCFファイルのコメント1行と変異コールの1行を
 表示したものである。
 (中略)
 AC=2;AF=1.00;AN=2;DP=25;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=43.50;QD=34.24;SOR=1.136
 (中略)
 GT:AD:DP:GQ:PL 1/1:0,25:25:75:1115,75,0

[[BAM fileにRead Groupを付ける(GATKへの対応):http://hashiyuki.hatenablog.com/entry/2016/05/07/164740]]~
[[次 世代シークエンサ実習 I I:http://www.iu.a.u-tokyo.ac.jp/~kadota/bioinfo_ngs_sokushu_2014/20140912_4-4_NGS_reseq.pdf]]~
[[平成28年度NGSハンズオン講習会 Reseq解析:https://biosciencedbc.jp/gadget/human/20160726_amelieff_20160803.pdf]]

>GATKでは、BAM fileにRead Group (@RG)が付いてないとエラーが出て解析ができないようです。BAM fileのheaderへのRead groupの入れ方としては、BWAの-R optionを使用するようです。またPicardを使って入れることもできます。

>(BWA memの処理で) -R: RG(read groups)の情報を付与する。複数のサンプル情報が混在している場合に有用。GATKでBAMファイルを扱うにはplatform (PL) およびsample (SM)が必須。PLの例:454, LS454, Illumina, Solid, ABI_Solid

>[[Picard ダウンロード:https://github.com/broadinstitute/picard/releases/tag/2.17.0]]~
[[PicardでReadGroupを追加する:http://broadinstitute.github.io/picard/command-line-overview.html#AddOrReplaceReadGroups]]
 java -jar <picard.jar> AddOrReplaceReadGroups \
       I=input.bam \
       O=output.bam \
       RGID=4 \
       RGLB=lib1 \
       RGPL=illumina \
       RGPU=unit1 \
       RGSM=20
や
 sample=C8W82ANXX_PG1144_02A16_H1_L003
 java -jar picard.jar AddOrReplaceReadGroups \
     INPUT=${sample}.sort.bam \
     OUTPUT=${sample}.sort.addRG.bam \
     RGID=FLOWCELLID \
     RGLB= ${sample}_library1 \
     RGPU=H0164ALXX140820.2 \
     RGPL=illumina \
     RGSM=${sample}
CIGARフィールドのチェックでエラー No real operator (M|I|D|N) in CIGAR が出たので、エラーチェックを緩めることにした。
 java -jar <picard.jar> AddOrReplaceReadGroups \
     色々なパラメタ
     VALIDATION_STRINGENCY=LENIENT
[[デフォルトはSTRICT {STRICT, LENIENT, SILENT}:http://broadinstitute.github.io/picard/command-line-overview.html#StandardOptions]]

>[[bamにread groupを追記する:http://staffblog.amelieff.jp/entry/2015/02/02/153337]]~
→ samtools mergeで追記する方法もあります → [[SamToolsマニュアル:http://www.htslib.org/doc/samtools.html]]
 mergeの項
 -r  Attach an RG tag to each alignment. The tag value is inferred from file names.

>GATKにはreference genomeについてのdictが必要。[[How can I prepare a FASTA file to use as reference?:https://software.broadinstitute.org/gatk/documentation/article?id=1601]]~
このdictを作るには、PicardのツールCreateSequenceDictionaryを使う。[[Where to find CreateSequenceDictionary.jar?:https://gatkforums.broadinstitute.org/gatk/discussion/9796/where-to-find-createsequencedictionary-jar]]
 java -jar picard.jar CreateSequenceDictionary REFERENCE=reference.fa OUTPUT=reference.dict


*解析パイプライン [#j8312833]
FastQC → Trimmomatic → BWA(マッピング)→ GATK(アラインメント)→ BAMファイル
  → GATK(SNV/InDel検出及びフィルタリング) → SnpEff(アノテーション)

[[解析パイプライン:https://biosciencedbc.jp/gadget/human/20160726_amelieff_20160803.pdf]]

[[BWA(Burrows-Wheeler Aligner):http://bio-bwa.sourceforge.net/]]

>BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome.

*AP012030側でsamtoolsがうまくいかない問題 [#s473e89d]
AP012030.fastaではAP012030.1であるのに、reference.bamではAP012030と書いてあるため。

[[ここ:http://seqanswers.com/forums/archive/index.php/t-49676.html]]に曰く、
 If you're fairly certain that the chromosome names differ between the files then you'll
 need to change one of them. Changing them in the BAM file just require "samtools
 view -H foo.bam", changing the names (but not the order!) to be correct, and using
 "samtools reheader". Alternatively, just download the matching fasta file (presumably
 from Ensembl or Gencode, given the error message).
だが、BAMファイルは数が多いので、AP012030.fasta側を書き直してしまった。

AP012030.fastaを修正 ⇒ AP012030-new.fastaとする。かつ、samtools faidxでインデックスファイル生成。

それでもう一度、
 bcftools mpileup -Ou -f ../../../AP012030-new.fasta reference.bam | bcftools call -vmO z -o study.vcf.gz &

**VCFファイルにADなどが出ていない件 [#g7677884]
 samtools mpileup -vu -t AD DP4 -f ../../../AP012030-new.fasta reference.bam -o new.vcf

**VCFファイルに出ている情報 [#g87a36d8]
[[samtoolsの資料:http://samtools.sourceforge.net/mpileup.shtml]]に説明がある。

***I16フィールド [#x3876be1]
samtools mpileupでのVCFのInfoフィールドに I16の記述がある。[[samtoolsの資料:http://samtools.sourceforge.net/mpileup.shtml]]に曰く、
 Tag	Description     I16	16 integers:
 1	#reference Q13 bases on the forward strand	2	#reference Q13 bases on the reverse strand
 3	#non-ref Q13 bases on the forward strand	4	#non-ref Q13 bases on the reverse strand
 5	sum of reference base qualities	6	sum of squares of reference base qualities
 7	sum of non-ref base qualities	8	sum of squares of non-ref base qualities
 9	sum of ref mapping qualities	10	sum of squares of ref mapping qualities
 11	sum of non-ref mapping qualities	12	sum of squares of non-ref mapping qualities
 13	sum of tail distance for ref bases	14	sum of squares of tail distance for ref bases
 15	sum of tail distance for non-ref bases	16	sum of squares of tail distance for non-ref

***PLフィールド [#t1ec2f17]
samtools mpileupでのVCFのInfoフィールドの下の注に PLの記述がある。[[samtoolsの資料:http://samtools.sourceforge.net/mpileup.shtml]]に曰く、

SAMtools/BCFtools writes genotype likelihoods in the PL format which is a comma delimited list of phred-scaled data likelihoods of each possible genotype. For example, suppose REF=C and ALT=A,G, PL=7,0,37,13,40,49 means for the sample we are looking at, P(D|CC)=10^{-0.7}, P(D|CA)=1, P(D|AA)=10^{-3.7}, P(D|CG)=10^{-1.3}, P(D|AG)=1e-4 and P(D|GG)=10^{-4.9}. This ordering has been changed since r921.

***DP4フィールド [#mec797e1]
Number of 1) forward ref alleles; 2) reverse ref; 3) forward non-ref; 4) reverse non-ref alleles, used in variant calling. Sum can be smaller than DP because low-quality bases are not counted.

 ##fileformat=VCFv4.2
 ##FILTER=<ID=PASS,Description="All filters passed">
 ##samtoolsVersion=1.6+htslib-1.6
 ##samtoolsCommand=samtools mpileup -vu -t DP4 -f ../../../AP012030-new.fasta -o
new.vcf reference.bam
 ##reference=file://../../../AP012030-new.fasta
 ##contig=<ID=AP012030,length=4621430>
 ##ALT=<ID=*,Description="Represents allele(s) other than observed.">
 ##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
 ##INFO=<ID=IDV,Number=1,Type=Integer,Description="Maximum number of reads supporting an indel">
 ##INFO=<ID=IMF,Number=1,Type=Float,Description="Maximum fraction of reads supporting an indel">
 ##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
 ##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias for filtering 
 splice-site artefacts in RNA-seq data (bigger is better)",Version="3">
 ##INFO=<ID=RPB,Number=1,Type=Float,Description="Mann-Whitney U test of Read 
 Position Bias (bigger is better)">
 ##INFO=<ID=MQB,Number=1,Type=Float,Description="Mann-Whitney U test of Mapping 
 Quality Bias (bigger is better)">
 ##INFO=<ID=BQB,Number=1,Type=Float,Description="Mann-Whitney U test of Base 
 Quality Bias (bigger is better)">
 ##INFO=<ID=MQSB,Number=1,Type=Float,Description="Mann-Whitney U test of Mapping
 Quality vs Strand Bias (bigger is better)">
 ##INFO=<ID=SGB,Number=1,Type=Float,Description="Segregation based metric.">
 ##INFO=<ID=MQ0F,Number=1,Type=Float,Description="Fraction of MQ0 reads (smaller is better)">
 ##INFO=<ID=I16,Number=16,Type=Float,Description="Auxiliary tag used for calling,
 see description of bcf_callret1_t in bam2bcf.h">
 ##INFO=<ID=QS,Number=R,Type=Float,Description="Auxiliary tag used for calling">
 ##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
 ##FORMAT=<ID=DP4,Number=4,Type=Integer,Description="Number of high-quality ref-fwd,
 ref-reverse, alt-fwd and alt-reverse bases">
 ##bcftools_viewVersion=1.6+htslib-1.6
 ##bcftools_viewCommand=view new.vcf; Date=Thu Jan  4 14:21:43 2018
 #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  referenc
e.bam
 AP012030        1       .       A       <*>     0       .       DP=3084;
   I16=3,10,0,0,430,14354,0,0,260,5200,0,0,0,0,0,0;QS=1,0;MQSB=1;MQ0F=0   
   PL:DP4  0,39,149:3,10,0,0
 AP012030        2       .       G       <*>     0       .       DP=3083;
  I16=94,98,0,0,4001,96357,0,0,3840,76800,0,0,827,4827,0,0;QS=1,0;MQSB=1;MQ0F=0   
  PL:DP4  0,255,204:94,98,0,0
 (中略)
 AP012030        12      .       C       A,<*>   0       .       DP=3161;
   I16=234,
193,1,0,16076,616854,27,729,8540,170800,20,400,3923,49067,4,16;QS=0.997661,0.00233863,0;
   SGB=-0.379885;RPB=1;MQB=1;MQSB=1;BQB=1;MQ0F=0   
   PL:DP4  0,255,255,255,255,255:234,193,1,0


**コマンド例 [#j7ee3cac]
コマンド例〜こちらで試す?
 samtools mpileup -vu -t DP4 -f ../../../AP012030-new.fasta -o new.vcf reference.bam
 bcftools view new.vcf | more

もう1つは
 bcftools mpileup -Ou -f ../../../AP012030-new.fasta reference.bam | bcftools call -vmO z -o study.vcf.gz


*vcftools [#f0d46314]
[[vcftools:https://vcftools.github.io/examples.html]]

インストールの問題:~
./configureが無い ⇒ autoconfで作る ⇒ エラーが出る。
 configure.ac:12: error: possibly undefined macro: AM_INIT_AUTOMAKE
       If this token and others are legitimate, please use m4_pattern_allow.
       See the Autoconf documentation.
付き合いたくないので、ググる。→ [[Shairport Syncのビルドエラー:https://qiita.com/pekeq/items/ce0792d5c28b5c58dd9a]]に曰く
 autoconfのかわりに、 autoreconf --install を実行するといいようだ。
ということで、
 $autoreconf --install
 configure.ac:12: installing './install-sh'
 configure.ac:12: installing './missing'
 src/cpp/Makefile.am: installing './depcomp'

 $./configure
あとはmake → make install。

試してみる。[[Getting allele frequency:https://vcftools.github.io/documentation.html#freq]]

 


トップ   編集 差分 バックアップ 添付 複製 名前変更 リロード   新規 一覧 単語検索 最終更新   ヘルプ   最終更新のRSS