ノート
訪問者数 1122 最終更新 2020-02-16 (日) 12:15:04
論文はたくさんある。
Goodman and Young 1994: A Codon-based Model of Nucleotide Substitution for Protein-coding DNA
Calculating Ka and Ks normally involves three steps. Let us assume that the number of lengths between two DNA sequences compared is n and the number of substitutions between them is m. To calculate Ka and Ks, we need to count the numbers of synonymous (S) and nonsynonymous (N) sites (S + N = n) and the numbers of synonymous (Sd) and nonsynonymous (Nd) substitutions (Sd + Nd = m). Then it is after correcting multiple substitutions that (Nd/N) and (Sd/S) could represent Ka and Ks, respectively, since the observed number of substitutions underestimates the real number of substitutions as sequences diverge over time. Therefore, we can conclude from mentioned above that these methods normally involve three steps to estimate Ka and Ks: counting S and N, counting Sd and Nd, and correction for multiple substitutions.
KaとKsの計算には、通常3つのステップが含まれます。比較する2つのDNA配列間の長さの数をnとし、それらの間の置換数をmと仮定します。 KaおよびKsを計算するには、同義サイト数(S)および非同義サイト数(N)(S + N = n)、同義置換の数(Sd)および非同義置換の数(Nd)(Sd + Nd = m )を数えます。多重置換を修正した後、(Nd / N)と(Sd / S)はそれぞれKaとKsを表すことができます。これは、シーケンスが時間とともに分岐するにつれて、観察された置換数が実際の置換数を過小評価するためです。したがって、上記の結論から、これらの方法には通常、KaとKsを推定するための3つのステップが含まれると結論付けることができます。
Methods for calculating Ka and Ks adopt different substitution models with subtle yet significant differences. They can be classified as approximate methods and maximum-likelihood methods. Different from approximate methods, maximum-likelihood methods adopt the probability theory to finish all three steps mentioned above in one go.
There are several approximate methods incorporated into KaKs_Calculator, and we list their abbreviations in the program and their corresponding reference(s) as follows.
The method of GY takes account of sequence evolutionary features, such as transition/transversion rate ratio and nucleotide frequencies (reflected in the HKY Model) and incorporates these features into a codon-based model. We extend this method to a set of candidate models in a maximum likelihood framework and use the AICc for model selection and model averaging.
Model | Substitution Rates | Nucleotide Frequency |
JC / F81 | rTC=rAG=rTA=rCG=rTG=rCA | Equal/Unequal |
K2P / HKY | rTC=rAG≠rTA=rCG=rTG=rCA | Equal/Unequal |
TrNEF / TrN | rTC≠rAG≠rTA=rCG=rTG=rCA | Equal/Unequal |
K3P / K3PUF | rTC=rAG≠rTA=rCG≠rTG=rCA | Equal/UnEqual |
TIMEF / TIM | rTC≠rAG≠rTA=rCG≠rTG=rCA | Equal/Unequal |
TVMEF / TVM | rTC=rAG≠rTA≠rCG≠rTG≠rCA | Equal/Unequal |
SYM / GTR | rTC=≠AG≠rTA≠rCG≠rTG=≠rCA | Equal/Unequal |
rij: substitution rate between i and j, where i ≠ j and i, j∈{A, C, G, T}