version 3.41

NEIGHBOR -- Neighbor-Joining and UPGMA methods

(c) Copyright 1991 by the University of Washington and by Joseph Felsenstein.Written by Joseph Felsenstein. Permission is granted to copy this documentprovided that no fee is charged for it and that this copyright notice is notremoved.

This program implements the Neighbor-Joining method of Nei and Saitou(1987) and the UPGMA method of clustering. The program was written by MaryKuhner, using some code from program FITCH. An important part of the code wastranslated from FORTRAN code from the neighbor-joining program written byNaruya Saitou, and is used with the kind permission of Dr. Saitou.

NEIGHBOR constructs a tree by successive clustering of lineages, settingbranch lengths as the lineages join. The tree is not rearranged thereafter.The tree does not assume an evolutionary clock, so that it is in effect anunrooted tree. It should be somewhat similar to the tree obtained by FITCH.The program cannot evaluate a User tree, nor can it prevent branch lengths frombecoming negative. However the algorithm is far faster than FITCH or KITSCH.This will make it particularly effective for large studies or for boostrap orjackknife resampling studies which require runs on multiple data sets.

The UPGMA option constructs a tree by successive (agglomerative)clustering using an average-linkage method of clustering. It has somerelationship to KITSCH, in that when the tree topology turns out the same, thebranch lengths with UPGMA will be the same as with the P = 0 option of KITSCH.

The options for NEIGHBOR are selected through the menu, which looks likethis:Neighbor-Joining/UPGMA method version 3.41

Settings for this run: N Neighbor-joining or UPGMA tree? Neighbor-joining O Outgroup root? No, use as outgroup species 1 L Lower-triangular data matrix? No R Upper-triangular data matrix? No S Subreplicates? No J Randomize input order of sequences? No. Use input order M Analyze multiple data sets? No 0 Terminal type (IBM PC, ANSI, VT52)? IBM PC 1 Print out the data at start of run No 2 Print indications of progress of run Yes 3 Print out tree Yes 4 Write out trees onto tree file? Yes

Are these settings correct? (type Y or the letter for one to change)

Most of the input options (L, R, S, J, and M) are as given in the DistanceMatrix Programs documentation file, that file, and their input format is thesame as given there. The O (Outgroup) option is described in the maindocumentation file of this package. It is not available when the UPGMA optionis selected.

Option N chooses between the Neighbor-Joining and UPGMA methods. Option Sis the usual Subreplication option. Here, however, it is present only to allowNEIGHBOR to read the input data: the number of replicates is actually ignored,even though it is read in. Note that this means that one cannot use it to have


missing data in the input file, if NEIGHBOR is to be used.

The output consists of an tree (rooted if UPGMA, unrooted if Neighbor-Joining) and the lengths of the interior segments. The Average PercentStandard Deviation is not computed or printed out. If the tree found byNeighbor is fed into FITCH as a User Tree, it will compute such a quantity, butwill also re-estimate all the branch lengths as it does so.

As NEIGHBOR runs it prints out an account of the successive clusteringlevels, if you allow it to. This is mostly for reassurance and can besuppressed using menu option 2.

The CONSTants available for modification at the beginning of the programare: maxsp, the maximum number of species, maxsp2, which is derived from maxsp,namelength, which gives the length of a species name, and the usual BOOLEANconstants that initiliaze the terminal type. There is no feature savingmultiply trees tied for best, partly because we do not expect exact ties exceptin cases where the branch lengths make the nature of the tie obvious, as when abranch is of zero length.

The major advantage of NEIGHBOR is its speed: it requires a time onlyproportional to the square of the number of species. By contrast FITCH andKITSCH require a time that rises as the fourth power of the number of species.Thus NEIGHBOR is well-suited to bootstrapping studies and to analysis of verylarge trees.

-----------------------------TEST DATA SET------------------------------
Alpha      0.000 1.000 2.000 3.000 3.000
Beta       1.000 0.000 2.000 3.000 3.000
Gamma      2.000 2.000 0.000 3.000 3.000
Delta      3.000 3.000 3.000 0.000 1.000
Epsilon    3.000 3.000 3.000 1.000 0.000

------ OUTPUT FROM TEST DATA SET (with all numerical options on) -----------
   5 Populations
Neighbor-Joining/UPGMA method version 3.41
 Neighbor-joining method
 Negative branch lengths allowed
Name          Distances----                         ---------
Alpha         0.00000   1.00000   2.00000   3.00000   3.00000
Beta          1.00000   0.00000   2.00000   3.00000   3.00000
Gamma         2.00000   2.00000   0.00000   3.00000   3.00000
Delta         3.00000   3.00000   3.00000   0.00000   1.00000
Epsilon       3.00000   3.00000   3.00000   1.00000   0.00000


  !                                            +--------------Epsilon  !  !              +--------------Alpha  +--------------2!                 +--------------Beta
remember: this is an unrooted tree!
Between        And            Length-------        ---            ------   3          Gamma             1.00000   3             1              1.50000   1          Delta             0.50000   1          Epsilon           0.50000   3             2              0.50000   2          Alpha             0.50000   2          Beta              0.50000