Phenetics versus Cladistics and the pro's and con's of the various phylogeny inference methods


Phenetics versus Cladistics

While a phenogram may serve as an indicator of cladistic relationships, it is not necessarily identical to the cladogram. If there is a linear relationship between the time of divergence and the degree of genetic (or morphological) divergence, the two types of trees may become identical to each other.

The maximum parsimony method is a typical representative of the cladistic approach, whereas the UPGMA method is a typical phenetic method. The other methods, however, cannot be classified easily according to the above criteria. For example, the transformed distance method and the neighbors relation method have often been said to be phenetic methods, but this is not an accurate description. Although these methods use similarity (or dissimilarity, i.e., distance) measures, they do not assume a direct connection between similarity and evolutionary relationship, nor are they intended to infer phenetic relationships.

The maximum likelihood method is a phenetic method that is statistically well founded. It has often lower variance than other methods (ie. it is frequently the estimation method least affected by sampling error) and tends to be robust to many violations of the assumptions in the evolutionary model. Even with very short sequences maximum likelihood tends to outperform alternative methods such as parsimony or distance methods. Different tree topologies are evaluated. An important disadvantage is that is is very CPU intensive and thus time consuming and not appropriate for large datasets.

Distance versus character-state approaches

In molecular phylogeny, a better classification of methods would be to distinguish between distance and character-state approaches. Methods belonging to the former approach are based on distance measures, such as the number of nucleotide or amino-acid substitutions, while methods belonging to the latter approach rely on the state of the character, such as the nucleotide or amino acid at a particular site, or the presence or absence of a deletion or an insertion at a certain DNA location. According to this classification, the UPGMA method, the transformed distance method, and the neighbors relation method are distance methods, while the maximum parsimony method is a character-state method. The maximum likelihood method uses all the information available in the sequence.


Which method is the better one?

It has often been argued that character-state methods are more powerful than distance methods, because the raw data is a string of character states (e.g., the nucleotide sequence) and in transforming character-state data into distance matrices some information is lost. However, while the maximum parsimony method indeed uses the raw data, it usually uses only a small fraction of the available data. For instance, in the example that we have used only three sites are informative and were used while six sites are excluded from the analysis. For this reason, this method is often less efficient than some distance-matrix methods (e.g., see Saitou and Nei 1986). Of course, if the number of informative sites is large, the maximum parsimony method is generally very effective. Distance methods only give one tree, while parsimony analyses many trees and may suggest multiple, equally likely trees, none of which is necessarily the right one. The maximum likelihood method does not suffer from these limitations. It uses the entire sequence information, it analyses many trees and proposes the tree with the highest likelihood. Moreover, the method seems to be robust and relatively unsensitive to violations of the evolutionary model used or to unequal rates of evolution or nucleotide bias. The method, however, is not suitable for large datasets due to its CPU-intensive nature.

NB: When you have a powerful computer and a not too large dataset the maximum likelihood method is the preferred method both for DNA as well as for protein data.

NB: an inferred tree often contains topological errors, regardless of the method used. To obtain a correct tree, a large amount of data is usually required.

Last updated: 8 August 1997.
created by :Fred Opperdoes