What type of event is usually associated with each branch on a phylogenetic tree?

Evolutionary Trees

L. Nakhleh, in Brenner's Encyclopedia of Genetics (Second Edition), 2013

Abstract

Evolutionary, or phylogenetic, trees depict the evolution of a set of taxa from their most recent common ancestor (MRCA). A species tree is a phylogenetic tree that models the evolutionary history of a set of species (or populations). A gene tree is a phylogenetic tree that models a genealogy of a gene. Gene trees of different genes sampled from a set of species may disagree with each other, as well as with the species tree, due to a variety of factors. A wide array of algorithms and computer programs are available for inferring phylogenetic trees from various types of data. While true evolutionary trees are rooted and most often binary (bifurcating), inferred trees may be unrooted or multifurcating.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123749840005040

Phylogenetic Tree

A.D. Scott, D.A. Baum, in Encyclopedia of Evolutionary Biology, 2016

Tree Terminology

Phylogenetic trees, by analogy to botanical trees, are made of leaves, nodes, and branches (Figure 1). Let us consider a tree from the canopy down to the trunk, or from the modern day to the past.

What type of event is usually associated with each branch on a phylogenetic tree?

Figure 1. Components of a phylogenetic tree.

The leaves of a tree, also called tips, can be species, populations, individuals, or even genes. If the tips represent a formally named group, they are called taxa (singular: taxon). A ‘taxon’ is a group of organisms at any hierarchical rank, such as a family, genus, or species. The tips of a phylogenetic tree are most commonly living, but may also represent the ends of extinct lineages or fossils.

As in the trees you are already familiar with, tips or leaves are subtended by branches. A branch, which represents the persistence of a lineage through time, may subtend one or many leaves. Branches connect to other branches at nodes, which represents the last common ancestors of organisms at the tips of the descendant lineages. A branch connecting a tip to a node is called an external branch, whereas one connecting two nodes is called an internal branch (Figure 1).

Reading a tree from the past toward the present, a node indicates a point where an ancestral lineage (the branch below the node) split to give rise to two or more descendant lineages (the branches above the node). Branching on an evolutionary tree is also called ‘cladogenesis’ or ‘lineage splitting.’ After a lineage splits into two, evolution happens independently in these newly formed descendant lineages. The sequence of lineage splits in a tree creates its structure or ‘topology.’ Tree topology shows us the branching of lineages through time that gave rise to the tips.

‘Clades’ are groupings on a tree that include a node and all of the lineages descended from that node. The set of all the tips in a clade is defined as being ‘monophyletic,’ referring to the fact that it includes all the descendants of an ancestral lineage. In Figure 2, we could say that the tree supports monophyly of taxa C, D, and E or, put another way, C, D, and E together form a clade. Clades can be hierarchically nested within one another, as shown in Figure 2. A tree’s topology can now be defined more precisely as the set of clades that the tree contains.

What type of event is usually associated with each branch on a phylogenetic tree?

Figure 2. Clades are highlighted in a phylogenetic tree. Note clades can be hierarchically nested.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128000496002031

Phylogenetic Tree Distances

G. Weyenberg, R. Yoshida, in Encyclopedia of Evolutionary Biology, 2016

Abstract

Phylogenetic trees are mathematical objects which summarize the most recent common ancestor relationships between a given set of organisms. There is often a need to quantify the degree of similarity or discordance between two proposed trees. For instance, a person may be interested in knowing whether the phylogenetic trees reconstructed from two distinct sequence alignments are truly different, or if the differences are so minor as to be attributable only to statistical variation. In this article we summarize several of the most widely known methods for defining distances between phylogenetic trees, and provide examples of the calculations when feasible.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128000496002183

Rooting Trees, Methods for

T. Kinene, ... L.M. Boykin, in Encyclopedia of Evolutionary Biology, 2016

Rooted versus Unrooted

Phylogenetic trees are either rooted or unrooted, depending on the research questions being addressed. The root of the phylogenetic tree is inferred to be the oldest point in the tree and corresponds to the theoretical last common ancestor of all taxonomic units included in the tree. The root gives directionality to evolution within the tree (Baldauf, 2003). Accurate rooting of a phylogenetic tree is important for directionality of evolution and increases the power of interpreting genetic changes between sequences (Pearson et al., 2013).

Many techniques such as molecular clock, Bayesian molecular clock, outgroup rooting, or midpoint rooting methods tend to estimate the root of a tree using data and assumptions (Boykin et al., 2010). However, Steel (2012) discusses root location in random trees and points out that information in the prior distribution of the topology alone can convey the location of the root of the tree. These results show that the tree models that treat all taxa equally and are sampling consistently convey information about the location of the ancestral root in unrooted trees (Steel, 2012).

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128000496002158

New Approaches to Prokaryotic Systematics

Vartul Sangal, ... Paul A. Hoskisson, in Methods in Microbiology, 2014

3.5 Phylogenetic analyses

Phylogenetic trees represent the evolutionary relatedness between different strains; several programs are available to infer phylogenetic relationships from whole genomic data. ClonalFrame is a program that infers evolutionary relatedness after accounting for recombination (Didelot & Falush, 2007) and accepts genome alignments that are produced by the program Mauve (Darling et al., 2004). EDGAR is also able to generate phylogenetic trees from the conserved core genome following the masking of nonmatching parts of the alignments (Blom et al., 2009). More recently, the program PhyloPhlAn was developed by the Huttenhower laboratory to identify taxonomic and evolutionary relationships between different strains using 400 protein sequences from microbial genomes (Segata, Bornigen, Morgan, & Huttenhower, 2013). Users need to decide which of these trees fit best to the purpose at hand. EDGAR trees show true evolutionary relatedness from the conserved core genome, whereas ClonalFrame uses the entire genomic alignment and highlights the impact of recombination on phylogenetic relationships. PhyloPhlAn derives phylogenies from conserved protein sequences and is designed to resolve taxonomic groupings for prokaryotes.

Pan-Seq provides SNPs in the core genome in phylip format that can be analysed by a phylogenetic package including PHYLIP (Retief, 2000). The genome alignments or SNP data can be formatted for a variety of phylogenetic packages by users. For example, MEGA (Tamura et al., 2011) can analyse nucleotide and protein sequence alignments by a variety of approaches.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/S0580951714000075

A Phylogenetic Perspective on Molecular Epidemiology

Xin Wang, ... Leonard W. Mayer, in Molecular Medical Microbiology (Second Edition), 2015

Phylogenetic Tree Reconstruction Methods

Phylogenetic tree reconstruction is a powerful and visually intuitive approach for inferring evolutionary relationships between microbial sequences [77,78]. Continued advances in sequencing technology, along with the growing reliance on sequence-based methods for molecular typing, ensure that the phylogenetic approach will become an increasingly important part of molecular epidemiology studies. There are a number of conceptually distinct methodologies used to reconstruct phylogenetic trees using sequence data along with numerous phylogenetic analysis software packages. The phylogenetic literature is full of debates regarding which of these methods is the best, and there exist vigorously entrenched camps in favour of one method or another method over the others. However, when applied carefully to a reliable data set (i.e. a correct multiple sequence alignment), any of the widely used methods for phylogenetic reconstruction should prove to be largely accurate for inferring evolutionary relationships. More to the point, a robust phylogenetic relationship signal should be present irrespective of the method of reconstruction that is employed. As such, agreement between multiple methods can be taken as a measure of support for an inferred evolutionary relationship of interest, and therefore the use of multiple methods of reconstruction, where appropriate to the data being used, is recommended. Accordingly, an understanding of the different classes of phylogenetic reconstruction methods is essential for accurate phylogenetic-based molecular typing.

A critical aside relates to the importance of multiple sequence alignment as a prelude to phylogenetic analysis. The adage of garbage in, garbage out rings especially true when it comes to phylogenetic tree reconstruction. The most rigorous methods for phylogenetic reconstruction will not be able to reconstruct accurate evolutionary relationships if they are applied to unreliable multiple sequence alignments. This problem may be less acute with respect to molecular typing since the sequences being analysed are typically highly related and thus easily aligned. Nevertheless, great care should be taken to ensure that the alignments used for phylogenetic reconstruction are accurate. This includes use of the most reliable and up-to-date alignment software packages (Table 29.2) [79–81] along with mandatory visual inspection, and refinement if needed, of any alignment that is to be used in phylogenetic reconstruction.

Table 29.2. Software Packages for Multiple Sequence Alignment and Phylogenetic Analysis

ProgramWebsiteComment
Multiple Sequence Alignment
Clustal [79] http://www.clustal.org/ One of the first and most widely used alignment tools, accurate and standard
MUSCLE [80] http://www.drive5.com/muscle/ Relatively new tool, widely adopted and with exceptional performance
MUMmer [81] http://mummer.sourceforge.net/ Whole-genome alignment tool
Phylogenetic Analysis Software
MEGA [82] http://www.megasoftware.net/ Most highly recommended package, extremely useful, multiple methods implemented, thorough documentation, alignment tool and editor, excellent graphical interface
PHYLIP [77] http://evolution.gs.washington.edu/phylip.html Oldest distributed package, wide utility with multiple methods, powerful but not very user friendly
PAUP [83] http://paup.csit.fsu.edu/ Widely used, emphasis on parsimony but also includes other methods, not free
MrBayes [84] http://mrbayes.sourceforge.net/ Bayesian inference, highly accurate and widely used, very sensitive to user options
SplitsTree [85] http://www.splitstree.org/ Method for reconstructing reticulate trees or phylogenetic networks

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123971692000299

Phylogenetic Networks

L. Nakhleh, D.A. Morrison, in Encyclopedia of Evolutionary Biology, 2016

Phylogenetic Trees First

A phylogenetic tree on a set of taxa is a tree whose leaves are labeled bijectively (i.e., every taxon labels exactly one leaf in the tree, and no leaf is unlabeled) by the taxa. There are two types of phylogenetic trees: unrooted trees (Figure 1(a))) and rooted trees (Figure 1(b)).

What type of event is usually associated with each branch on a phylogenetic tree?

Figure 1. Phylogenetic trees. (a) An unrooted phylogenetic tree on four taxa: A, B, C, and D. This tree does not model ancestor–descendant relationships; rather, it provides a graphical representation of possible grouping of the taxa based on the data. In this case, A and B are grouped together, and C and D are grouped together. (b) A rooted phylogenetic tree on four taxa: A, B, C, and D. This tree models ancestor–descendant relationships: A and B descended from a common ancestor that, along with C, descended from a common ancestor that, along with D, descended from the most recent common ancestor of all four taxa.

An unrooted phylogenetic tree provides a graphical representation of bipartitions, also known as splits, of the taxa that are pairwise compatible such that they could be uniquely combined into a single tree. That is, if the partitions are nested then the graph will be treelike, but if they overlap then the graph will show complex reticulation patterns.

For example, the unrooted phylogenetic tree in Figure 1(a) represents the bipartition where A and B form one partition and C and D form the other part. Given an unrooted tree, the set of all bipartitions that it represents can be obtained by ‘cutting’ the tree edges one at a time and, for each, recording the bipartition formed by the two sets of taxa separated by the removal of the edge. Indeed, inspecting the bipartitions displayed by an unrooted phylogenetic tree constitutes its main exploratory use by practitioners (i.e., it shows which bipartitions are best supported by the data).

A rooted phylogenetic tree provides a single hypothesis of the evolutionary history of a set of taxa from their most recent common ancestor. The interpretation of such a tree is based on the ancestor–descendant relationships that it captures. For example, the tree in Figure 1(b) captures three such relationships: A and B descended from a most recent common ancestor, say X; X and C descended from a most recent common ancestor, say Y; and, Y and D descended from the most recent common ancestor of all four taxa, which is the root node of the entire tree. Interpreting the tree the other way, the common ancestor splits into two descendants, one of which further splits into two descendants, and one of these splits again.

There is a straightforward relationship between unrooted and rooted phylogenetic trees. An unrooted tree can be rooted at any edge by introducing a new node into the edge and rooting the tree at the new node. A rooted tree can be turned into an unrooted tree by ‘ignoring’ the root node. These two operations are illustrated in Figure 2.

What type of event is usually associated with each branch on a phylogenetic tree?

Figure 2. The relationship between unrooted and rooted phylogenetic trees. An unrooted tree can be rooted in many ways (one possible root for each edge). Here, the edge incident with the leaf labeled by taxon D is broken into two edges by the addition of a new node and then the tree is rooted at the new node by directing all edges away from it. A rooted tree is turned into an unrooted one in one way: the root node is removed, the two children of the root are connected, and all edge directions are ignored.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128000496002213

Consensus Methods, Phylogenetic

J.H. Degnan, in Encyclopedia of Evolutionary Biology, 2016

Introduction

When evolutionary trees are estimated from different sources, biologists need to be able to describe the similarities between the trees. When two or more evolutionary trees are constructed for the same set of species, or taxa, a consensus tree can be used to visualize similarities between these different trees. In addition to answering questions about what a set of trees has in common, this approach summarizes a potentially large amount of information (in the form of many trees) with a single tree.

There are many applications for consensus trees. Consensus trees have often been used descriptively and as a visualization tool for understanding data in the form of trees. However, consensus trees can also be used inferentially as estimates of evolutionary trees.

An early application of consensus trees was when an evolutionary tree was inferred from two sources. For example, one tree is estimated from morphological data, while the other tree on the same species is from DNA data. Similarly, you could have one tree estimated from mitochondrial DNA and the other from nuclear DNA. Often the two trees would have many similarities, but also some disagreement. In this case, a consensus tree could indicate which evolutionary relationships were in agreement in the two trees. Disagreement between two or more trees can be represented in the consensus tree as an unresolved relationship.

The idea for the consensus tree approach is illustrated in Figure 1. In the figure, the three trees at the top can be considered as input, and the consensus method returns a single tree that summarizes relationships that the input trees have in common.

What type of event is usually associated with each branch on a phylogenetic tree?

Figure 1. An example of a consensus tree. Here there are three input trees on the species chimpanzee (C), human (H), gorilla (G), and orangutan (O). A consensus method combines the three trees into an overall tree. In this case, the consensus tree happens to be the same as the left-most input tree. Some consensus methods will lead to this result, while others will lead to different consensus trees. See the text for details.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128000496002195

The Origins and Diversification of HIV

Michael Worobey, in Global HIV/AIDS Medicine, 2008

The Deep Roots of HIV

Phylogenetic trees, reconstructed using the remarkably rich historical information stamped onto the genomes of these viruses, have emerged as the pre-eminent tools for reconstructing the history of HIV and simian immunodeficiency virus (SIV). The genealogical patterns that emerge from phylogenies provide not only an invaluable historical record, but also a window through which we can glimpse the medically relevant evolutionary and epidemiological processes that generated the patterns. Gene trees also offer a framework for systematizing the extensive genetic diversity of HIV and related viruses into a coherent classification scheme. With any organism's classification, however, there is a danger of becoming fixated on pattern rather than process, and HIV is no exception. Some of the potential pitfalls of doing so are discussed below.

AIDS was first recognized in the USA in the early 1980s,1 and the discovery of HIV followed soon after.2 It is now clear, however, that HIV emerged decades earlier, from naturally infected primates on another continent.3 Figure 2.1 illustrates the relationships between the different variants of HIV (red branches) and related viruses that have been discovered in a large number of African monkeys and apes. To date, at least 36 species of non-human primates have shown evidence of infection by SIV,4 every one of which is restricted in range to sub-Saharan Africa. Since the primate lentiviruses form a single, distinct clade on the mammalian lentivirus phylogeny, and given that no primates outside of Africa appear to be infected, SIV evidently had its origin in an African monkey at some point sufficiently deep in time to account for its spread throughout most of the continent, and into most of the (catarrhine) primate species there.

The precise nature of the timescale is still open to question. Remarkably, there is as yet no definitive evidence that would narrow the possible range even within three orders of magnitude. Comparisons of primate and virus phylogenies have led to suggestions that some SIVs have co-diverged with their hosts, as the animals split into new species from their common ancestors. This would place a time scale measured in the millions of years to get back to the most recent common ancestor (MRCA) of the SIVs. (To visualize this, picture the evolutionary tree depicted in Figure 2.1 as a snapshot of an exploding firework, with the ancestral virus at its center and the contemporary sequences at the tips of the branches emanating from it; the center is the point in evolutionary time we are trying to date.) On the other hand, detailed studies of SIV evolutionary rates make it hard to conceive of dates older than a few thousand years for the SIV MRCA.5 Extrapolations from the rapid short-term evolutionary rates observed in lentiviruses6 suggest that, for SIV lineages that diverged more than even a few thousand years ago, the molecular evidence of shared ancestry ought to have become overwritten by a succession of nucleotide substitutions.

There are a couple of possible ways this conundrum could be sorted out. Either long-term rates of SIV are slower than our best current nucleotide substitution models allow, or the viruses did not actually co-speciate with their hosts after all. If the latter turns out to be the case, the pattern of closely-related hosts having closely-related viruses might be explained not by co-divergence but, instead, by cross-species transmission events occurring preferentially between closely-related hosts.7 Recent studies of genes involved in innate immunity, such as APOBEC3G,8 suggest the sort of mechanism that could have generated such a pattern of correspondence.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B978141602882650006X

The origins and diversification of HIV

Michael Worobey, Guan-Zhu Han, in Sande's HIV/AIDS Medicine, 2012

The Deep Roots of HIV

Phylogenetic trees, reconstructed using the remarkably rich historical information stamped into the genomes of these viruses, have emerged as the pre-eminent tools for reconstructing the history of HIV and simian immunodeficiency virus (SIV). The genealogical patterns that emerge from phylogenies provide not only an invaluable historical record but also a window through which we can glimpse the medically relevant evolutionary and epidemiological processes that generated the patterns. Gene trees also offer a framework for systematizing the extensive genetic diversity of HIV and related viruses into a coherent classification scheme. With any organism's classification, however, there is a danger of becoming fixated on pattern rather than process, and HIV is no exception. Some of the potential pitfalls of doing so are discussed below.

AIDS was first recognized in the USA in the early 1980s [1], and the discovery of HIV followed soon after [2]. It is now clear, however, that HIV emerged decades earlier, from naturally infected primates on another continent [3]. Figure 2.1 illustrates the relationships between the different variants of HIV and related viruses that have been discovered in a large number of African monkeys and apes. To date, over 40 species of non-human primates have shown evidence of infection by SIV [4], every one of which is restricted in range to sub-Saharan Africa. Since the primate lentiviruses form a single, distinct clade on the mammalian lentivirus phylogeny, and given that no primates outside of Africa appear to be infected, SIV evidently had its origin in an African monkey at some point sufficiently deep in time to account for its spread throughout most of the continent, and into most of the (catarrhine) primate species there.

Unlike HIV, natural SIV infections generally cause little illness in their hosts and most are thought to be non-pathogenic. The low pathogenicity has been well documented in African green monkeys and sooty mangabeys. However, a recent breakthrough demonstrated that wild chimpanzees naturally infected with SIV do develop hallmarks of AIDS-like illness; SIV infection, as with HIV-1 and HIV-2 infection, is associated with progressive CD4 cell loss, lymphatic tissue destruction, and premature death [5]. Pan troglodytes schweinfurthii infected with SIV in Gombe National Park in Tanzania have a markedly higher death rate than non-infected animals [5].

The precise nature of the time scale of SIV evolution is still open to question. Initially, comparisons of primate and virus phylogenies led to suggestions that some SIVs have co-diverged with their primate hosts, as the animals split into new species from their common ancestors. However, the pattern of closely related hosts having closely related viruses might be explained not by co-divergence but by cross-species transmission events occurring preferentially between closely related hosts [6]. For example, detailed phylogenetic analysis shows that recent cross-species transmission events, instead of ancient co-divergence, likely underlie the fact that three closely related hominoid primates (human, chimpanzees, and gorillas) harbor closely related lentiviruses [7]. Studies of genes involved in innate immunity, such as APOBEC3G [8], suggest just the sort of mechanism that could generate such a pattern of correspondence even if the viruses and their hosts did not co-diverge.

Hence, even at this advanced stage of the investigation of one of the most medically important pathogens, until recently there has been little agreement on whether its progenitors have been circulating in African primates for millions or just thousands of years. A recent study, however, revealed evidence for several SIV lineages endemic to Bioko Island, Equatorial Guinea. This island was isolated from Africa as sea level rose 10,000 to 12,000 years ago. Notably, each of Bioko's four SIV lineages is most closely related to a virus circulating in hosts of the same genus on the African mainland rather than to the SIVs of other Bioko species (Fig. 2.1). This phylogeographic approach established that SIV is ancient—at least 32,000 years old [9].

The discovery of an endogenous lentivirus in the genome of the gray mouse lemur (Microcebus murinus) also suggests a time scale of millions of years for primate lentiviruses [10]. On the other hand, molecular clock methods calibrated by using modern sequences make it hard to conceive of dates older than a few thousand years for the SIV MRCA [11, 12]. Extrapolations from the rapid short-term evolutionary rates observed in lentiviruses [13] suggest that, for SIV lineages that diverged more than even a few thousand years ago, the molecular evidence of shared ancestry ought to have become overwritten by a succession of nucleotide substitutions. Clearly, more work is still needed to resolve this conundrum, including developing new models of sequence evolution that incorporate the idiosyncrasies of RNA virus evolution. Nevertheless, it is now very clear that SIV is no newcomer; these viruses have almost certainly been circulating for tens of thousands of years at least, raising the obvious question: what changed within the past hundred years that allowed multiple SIV lineages to successfully establish themselves in the human population?

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B978145570695200002X

What does each branch on a phylogenetic tree represent?

In all three diagrams, the branches of the trees represent ancestor-descendant relationships, and the nodes represent species. Species included in the relevant group or clade are represented by black nodes.

What event occurs at the node in a phylogenetic tree?

Each node represents the last common ancestor of the two lineages descended from that node.

Does each branch on a phylogenetic tree represent a speciation event?

The vertical lines, called branches, represent a lineage, and nodes are where they diverge, representing a speciation event from a common ancestor. The trunk at the base of the tree, is actually called the root. The root node represents the most recent common ancestor of all of the taxa represented on the tree.