Python¥Ð¥¤¥ª?¡¡Python¥Ð¥¤¥ª/¥Ä¡¼¥ë?
2717¡¡¡¡¡¡2019-06-01 (ÅÚ) 10:43:10
VCF¥Õ¥©¡¼¥Þ¥Ã¥È¤ÈBCF¥Õ¥©¡¼¥Þ¥Ã¥È †
»²¾È
Samtools¤Îcalling¤Ë¤Ä¤¤¤Æ¤Î¡¢¸µ¥é¥Ü¤ÎÀâÌÀ ¢Í Samtools ¤Î Calling and analysis ¤ò¸«¤è
¶ñÂÎŪ¤Ë¤ÏÅö³º²Õ½ê¤ËÛ©¤¯
- The original mpileup calling algorithm plus mathematical notes (mpileup/bcftools call -c):
- Li H, A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data, Bioinformatics (2011) 27(21) 2987-93. [21903627]
- Li H, Mathematical Notes on SAMtools Algorithms (2010) [link]
- Mathematical notes for the updated multiallelic calling model (mpileup/bcftools call -m):
- Danecek P, Schiffels S, and Durbin R, Multiallelic calling model in bcftools (-m) (2014) [link]
- Hidden Markov model for detecting runs of homozygosity (bcftools roh):
- Narasimhan V, Danecek P, Scally A, Xue Y, Tyler-Smith C, and Durbin R, BCFtools/RoH: a hidden Markov model approach for detecting autozygosity from next-generation sequencing data, Bioinformatics (2016) 32(11) 1749-51 [26826718]
- Copy number variation/aneuploidy calling from microarray data (bcftools cnv/bcftools polysomy):
- Danecek P, McCarthy SA, HipSci Consortium, and Durbin R, A Method for Checking Genomic Integrity in Cultured Cell Lines from SNP Genotyping Data, PLoS One (2016) 11(5) e0155014 [27176002]
- Haplotype-aware calling of variant consequences (bcftools csq):
- Danecek P, McCarthy SA, BCFtools/csq: Haplotype-aware variant consequences, Bioinformatics (2017) 33(13) 2037-39 [28205675]
- ¡ÊËܲȡËSAMtools mpileup¤Î»È¤¤Êý¡¢SNPs/INDELs calling¤Ë¤Ä¤¤¤Æ¡¢mpileup¤Î¥Ñ¥é¥á¡¼¥¿¥Á¥å¡¼¥Ë¥ó¥°¡¢VCF/BCF format¤Î°ÕÌ£¡¡¢Í¡¡Multisample SNP calling
SAM¥Õ¥©¡¼¥Þ¥Ã¥È †
¥Ø¥Ã¥À¡¼¥»¥¯¥·¥ç¥ó ¡Á °ì»þ˺¤ì¤ë
¥¢¥é¥¤¥á¥ó¥È¥»¥¯¥·¥ç¥ó ¡Á ¥ê¡¼¥É¤¬1¹Ô¤º¤Ä
Col | Field | Type | Regexp/Range | Brief description |
1 | QNAME | String | [!-?A-~]{1,254} | Query template NAME |
2 | FLAG | Int | [0, 2^16 ¡Ý 1] | bitwise FLAG |
3 | RNAME | String | \*[:rname:¢Ê*=][:rname:]* | Reference sequence NAME |
4 | POS | Int | [0, 2^31 ¡Ý 1] | 1-based leftmost mapping POSition |
5 | MAPQ | Int | [0, 2^8 ¡Ý 1] | MAPping Quality |
6 | CIGAR | String | \*([0-9]+[MIDNSHPX=])+ | CIGAR string |
7 | RNEXT | String | \*=[:rname:¢Ê*=][:rname:]* | Reference name of the mate/next read |
8 | PNEXT | Int | [0, 2^31 ¡Ý 1] | Position of the mate/next read |
9 | TLEN | Int | [¡Ý2^31 + 1, 2^31 ¡Ý 1] | observed Template LENgth |
10 | SEQ | String | \*[A-Za-z=.]+ | segment SEQuence |
11 | QUAL | String | [!-~]+ | ASCII of Phred-scaled base QUALity+33 |
BAM¥Õ¥¡¥¤¥ë¤ÏSAM¥Õ¥¡¥¤¥ë¤ÈƱÅùÆâÍÆ¤Ç¡¢·Á¼°¤¬¥Ð¥¤¥Ê¥ê¡ÊÈóʸ»ú¡Ë¤Ê¤À¤±¡£Î̤¬¸º¤ë¡£
python¤ÇSAM/BAM¥Õ¥¡¥¤¥ë¤ò°·¤¦¥é¥¤¥Ö¥é¥ê pysam
¥½¡¼¥È¤È¥¤¥ó¥Ç¥Ã¥¯¥¹(bai)
VCF¥Õ¥¡¥¤¥ë †