ノート

## dN/dS 別名 Kn/Ks †

Goodman and Young 1994: A Codon-based Model of Nucleotide Substitution for Protein-coding DNA

### National Genomics Data Center (Beijing) のページに曰く †

#### Methods for Calculating Ka and Ks †

Calculating Ka and Ks normally involves three steps. Let us assume that the number of lengths between two DNA sequences compared is n and the number of substitutions between them is m. To calculate Ka and Ks, we need to count the numbers of synonymous (S) and nonsynonymous (N) sites (S + N = n) and the numbers of synonymous (Sd) and nonsynonymous (Nd) substitutions (Sd + Nd = m). Then it is after correcting multiple substitutions that (Nd/N) and (Sd/S) could represent Ka and Ks, respectively, since the observed number of substitutions underestimates the real number of substitutions as sequences diverge over time. Therefore, we can conclude from mentioned above that these methods normally involve three steps to estimate Ka and Ks: counting S and N, counting Sd and Nd, and correction for multiple substitutions.

KaとKsの計算には、通常3つのステップが含まれます。比較する2つのDNA配列間の長さの数をnとし、それらの間の置換数をmと仮定します。 KaおよびKsを計算するには、同義サイト数（S）および非同義サイト数（N）（S + N = n）、同義置換の数（Sd）および非同義置換の数（Nd）（S​​d + Nd = m ）を数えます。多重置換を修正した後、（Nd / N）と（Sd / S）はそれぞれKaとKsを表すことができます。これは、シーケンスが時間とともに分岐するにつれて、観察された置換数が実際の置換数を過小評価するためです。したがって、上記の結論から、これらの方法には通常、KaとKsを推定するための3つのステップが含まれると結論付けることができます。

Methods for calculating Ka and Ks adopt different substitution models with subtle yet significant differences. They can be classified as approximate methods and maximum-likelihood methods. Different from approximate methods, maximum-likelihood methods adopt the probability theory to finish all three steps mentioned above in one go.

#### Approximate Methods †

There are several approximate methods incorporated into KaKs_Calculator, and we list their abbreviations in the program and their corresponding reference(s) as follows.

• NG: Nei, M. and Gojobori, T. (1986)
• LWL: Li, W.H., et al. (1985)
• LPB: Li, W.H. (1993) and Pamilo, P. and Bianchi, N.O. (1993)
• MLWL (Modified LWL), MLPB (Modified LPB): Tzeng, Y.H., et al. (2004)
• YN: Yang, Z. and Nielsen, R. (2000)
• MYN (Modified YN): Zhang, Z., et al. (2006)

#### Maximum-Likelihood Methods †

The method of GY takes account of sequence evolutionary features, such as transition/transversion rate ratio and nucleotide frequencies (reflected in the HKY Model) and incorporates these features into a codon-based model. We extend this method to a set of candidate models in a maximum likelihood framework and use the AICc for model selection and model averaging.

• GY: Goldman, N. and Yang, Z. (1994)
• MS (Model Selection), MA (Model Averaging): based on a set of candidate models defined by Posada, D. (2003) as follows.
 Model Substitution Rates Nucleotide Frequency JC / F81 rTC=rAG=rTA=rCG=rTG=rCA Equal/Unequal K2P / HKY rTC=rAG≠rTA=rCG=rTG=rCA Equal/Unequal TrNEF / TrN rTC≠rAG≠rTA=rCG=rTG=rCA Equal/Unequal K3P / K3PUF rTC=rAG≠rTA=rCG≠rTG=rCA Equal/UnEqual TIMEF / TIM rTC≠rAG≠rTA=rCG≠rTG=rCA Equal/Unequal TVMEF / TVM rTC=rAG≠rTA≠rCG≠rTG≠rCA Equal/Unequal SYM / GTR rTC=≠AG≠rTA≠rCG≠rTG=≠rCA Equal/Unequal

rij: substitution rate between i and j, where i ≠ j and i, j∈{A, C, G, T}

### 計算プログラム †

Last-modified: 2020-02-16 (日) 12:15:04 (453d)