Information

How to use a published phylogenetic tree?

How to use a published phylogenetic tree?



We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I would like to know how should I use a phylogenetic tree from an already published article, to put it on my own article so I can analyze the evolution of a trait. Is there any software I can use, or do I only have to "edit" the tree with a traditional image editing software?

I've seen that using trees from other authors is a common practice. Many times they summarize the trees. But no paper seems to tell you how they did it.

Thank you!!


I don't know how generalized this is, but some journals require the authors to deposit the trees in a database, like treebase. In this case, papers should indicate the accession numbers of their trees.

You then need the appropriate software to load the tree, manipulate it and print it, for instance ete.


How to Read a Phylogenetic Tree

It has been over 50 years since Willi Hennig proposed a new method for determining genealogical relationships among species, which he called phylogenetic systematics. Many people, however, still approach the method warily, worried that they will have to grapple with an overwhelming number of new terms and concepts. In fact, reading and understanding phylogenetic trees is really not difficult at all. You only need to learn three new words, autapomorphy, synapomorphy, and plesiomorphy. All of the other concepts (e.g., ancestors, monophyletic groups, paraphyletic groups) are familiar ones that were already part of Darwinian evolution before Hennig arrived on the scene.

Dan Brooks and I teach a biodiversity course (EEB 265) to second year students at the University of Toronto. The entire course is structured around a phylogenetic framework. We begin with the big, albeit simplified, tree of the Metazoa, then work our way from sponges to snakes, focusing on the characters that bind groups together and the characters that make each group unique. If we are doing our job correctly, our students should be able to answer the following questions—what is this animal (how do you know)? What does it do? What makes it special? What aspects of its biology make it vulnerable to anthropogenic intervention? Since all of the students had already taken a lab in first year biology covering the fundamentals of phylogenetics, we assumed that we wouldn’t need to review phylogenetic methodology in our biodiversity course. It didn’t take long for us to realize that our assumption was naïve by the time many of the students had arrived in EEB 265, they had already hit the delete button next to “phylogenetics” in their brain. It is always humbling to (re)discover that not everyone shares your views about the things in life that are interesting and important!

Back to the drawing board. One of the major problems with teaching a course about metazoan diversity is that you simply don’t have enough time to cover all of the groups. The last thing we wanted to do was to sacrifice biology-based lectures for a discussion about theory. So, the challenge was simple: design a lecture that would, in 50 minutes, teach students how to understand what a phylogenetic tree was telling them. It wasn’t our intention to teach students how to make trees, just how to read them. This paper is based on that lecture.

The word “phylogeny” is a combination of two Greek words, phyle (tribe—in particular, the largest political subdivision in the ancient Athenian state [www.yourdictionary.com www.etymonline.com]: another word we get from this is “phylum”) and geneia (origin [www.etymonline.com]: another word we get from this is “gene”). It was coined by the developmental biologist Ernst Haeckel in 1866 and then championed by Darwin in his famous work, On the Origin of Species (beginning with the 5th edition in 1869). Both biologists tied the idea of “phylogeny”—the origin of groups—to evolution. Phylogenetic trees are thus simply diagrams that depict the origin and evolution of groups of organisms.

Although you might not know it, we are all familiar with the idea of phylogenetic trees. People have been making such trees for decades, substituting the word, “family” for “phylogenetic” (Fig. 1). Just as individual people in a family over generations are connected by bonds of “blood” (the process of reproduction that produces offspring), individual species are connected by evolutionary ties (biological processes like natural selection and geological processes such as continental drift or a river changing course that produce species). In this sense, speciation (the production of new species) = reproduction (the production of new individuals). In other words, we are all, from members of the same family to members of the same species, connected by genes.

Family tree for an interesting group of people. In phylogenetic terms, family trees (genealogies of people) = phylogenetic trees (genealogies of species)

Family trees tend to be drawn as if they were hanging upside down, like a cluster of grapes. Phylogenetic trees are depicted somewhat differently. Imagine that you are holding the family tree for the big cats shown in Fig. 2a. Now, flip it sideways (rotate 90° counterclockwise) and you have the image shown in 2b. Rotate this image yet another 90° counterclockwise, smooth it out, and you have the image shown in Fig. 2c (this tree shape was the one used by Darwin in On the Origin of Species). The important thing to remember is that all three depictions are saying exactly the same thing about the relationships among species of big cats. How you choose to draw your phylogenetic trees depends, in part, on personal preference—some people find it easier to read 2b, others prefer 2c.

ac So many ways to draw a family/phylogenetic tree for the genus Panthera

Phylogenetic trees are reconstructed by a method called “phylogenetic systematics” (Fig. 3). This method clusters groups of organisms together based upon shared, unique characters called synapomorphies. For example, you share the presence of a backbone with cats, but not with butterflies. The presence of a backbone thus allows us to hypothesize that human beings are more closely related to cats than they are to butterflies (Fig. 4a) cats and people both have a backbone, butterflies are spineless Footnote 1 . Not all characters are synapomorphies. Some traits, called plesiomorphies, are shared by all the members of a group. Returning to our tree, we see that cats, people, and butterflies all have DNA (Fig. 4b). The presence of DNA allows us to hypothesize that these three species are all part of the same group, but it does not tell us anything about how those species are related to one another. Think of it this way: my last name tells me that I am part of the McLennan clan. If I meet someone called Jessie McLennan, I know we are related somehow, but I haven’t any idea whether she is a long lost cousin or someone from a more distant branch of the family tree. The final term you need to know is autapomorphy—traits that are only found in one member of the group. For example, butterflies can be distinguished from cats and people because they have an exoskeleton made out of chitin (a tough, waterproof derivative of glucose). Autapomorphies help us identify a particular species in a group but, like plesiomorphies, they tell us nothing about relationships within the group. Overall these three types of characters can be likened to the story of Goldilocks: plesiomorphies are too hot (too widespread), autapomorphies are too cold (too restricted), and synapomorphies are just right (for determining phylogenetic relationships).

The basis of phylogenetic systematics

Identifying types of characters on a phylogenetic tree. a a synapomorphy b a plesiomorphy c an autapomorphy

Enough of characters for the moment back to the trees themselves. Why do the branches on a tree have names (e.g., lion, tiger, etc.), while the lines joining different branches together do not (Fig. 5)? This is because these lines represent ancestors. An ancestor is a species that has undergone a speciation event to produce descendant species. The ancestor usually “disappears” in the process of speciation. Does this mean that the ancestor goes extinct?

Finding ancestors on a phylogenetic tree

In order to answer this, we must do some time traveling carrying a digital device that records everything we see (Fig. 6). Imagine you travel back 10,000,000 years, then stop, intrigued by an interesting species of lizard with red spots all over its back (species A). After a while, you decide to move forward in time five million years or so then stop again. You search around and discover two new lizard species, one with blue spots on its back (species B), and the other with red stripes (species C), but species A is nowhere to be seen. Did it go extinct? You look back over your digital recording of those five million years and discover that species A split into two groups, which became different in some ways from one another through time. In evolutionary terms, species A is an ancestor (ancestor 1) and species B and C are its descendants. Fast forward to today (with more digital material to watch) and you find three species of lizard: your old friend the blue spotted lizard (species B) and two new lizards (descendants of species C, the red striped lizard), one with blue stripes (species D) and the other with a solid black back (species E). Today, then, there are only three species of lizard alive. You no longer see either of the ancestors (the red spotted and red striped lizards), but we still show them on the phylogenetic tree.

Traveling back in time to discover ancestors

The answer to our original question “did the ancestor go extinct?” is thus No! In many cases, the ancestor is subdivided and the biological (genetic) information encompassed within the ancestor is passed on to the descendant species. Over time, the descendants change and become different in some ways from each other and from the ancestor, while retaining some things in common (for example, all of our lizard species have a backbone). This is evolution.

So what really counts as extinction? Extinction is the loss of biological information—the physical loss of a species. For example, consider a simplified phylogenetic tree of the dinosaurs (Fig. 7). All of the groups on dotted branches are extinct—none of the species in those groups exist on this planet anymore (Jurassic Park notwithstanding), which means that all of the information that was unique to each of those groups has been lost. The only group that managed to avoid extinction was Aves (or birds)—avian species are the last remaining dinosaurs.

Actual extinctions. Groups depicted with dotted lines are extinct so all of the genetic, morphological, physiological, ecological, and behavioral traits that are unique to each group have been lost to the biosphere

OK, let’s take what we have learned about ancestors and clustering groups based on shared, unique characters (synapomorphies) and use that to decipher the information contained within a phylogenetic tree. Here is a tree depicting the relationships among living members of the Amniota, a large group of vertebrates that includes most of the animals with which you are familiar (Fig. 8). You already know that the names of species, or groups of species, are written across the tips of the branches on the tree. The next thing you need to know is that characters are depicted at their point of origin on a phylogenetic tree. So, on this tree you can see that (1) the amniotic egg originated in ancestor 1 and was passed on to all of its descendants (mammals, ancestor 2, turtles, ancestor 3, ancestor 4, crocodiles, birds, ancestor 5, tuataras, and lizards plus snakes). In evolutionary terms, the amniotic egg is a unique trait that is shared only by ancestor 1 and all of its descendants (2) a special type of skin protein (β keratin) originated in ancestor 2 and was passed on to all of its descendants (turtles, ancestor 3, ancestor 4, crocodiles, birds, ancestor 5, tuataras and lizards plus snakes). β keratin is a unique trait shared by the group called “Reptilia” and (3) a breakable tail originated in ancestor 5 and was passed on to all of its descendants (tuataras, lizards plus snakes). A breakable tail is a unique trait shared by members of the group tuataras + lizards + snakes.

How to read characters on a phylogenetic tree

In fact, every organism is a complex mosaic of thousands of traits. If you don’t believe this, sit down and list all of the traits that make you, you. In addition to the obvious things like eye color and hair color, don’t forget the fact that you have RNA, DNA, individual cells, an anterior and posterior end, a skull, jaws, bone, arms and legs, come from an amniotic egg, have three bones in your inner ear, were suckled on milk produced in mammary glands, have an opposable thumb, and no tail. In other words, when you look at a phylogenetic tree, you will see that all of the branches have at least one, and more likely many, characters on them (the slash marks on Fig. 9a). Because of this, it is often difficult to actually label all of the traits on a tree because it’s visually distracting. A shorthand method has been developed to deal with this problem: draw the tree showing the relationships among the groups (Fig. 9b) and list the synapomorphies for each branch elsewhere in a table. On the other hand, if you are interested in one or more particular traits, you can highlight them on the phylogenetic tree without showing all the other characters. For example, if you wanted to discuss the evolution of mammals, you could show the amniote tree and highlight just the synapomorphies for the mammals (e.g., three middle ear bones: Fig. 9c). Remember, this is just shorthand!

ac Representing characters on a phylogenetic tree

There is one last thing about characters that is important to understand: characters are not static things. They evolve through time. In other words, a “synapomorphy” may not “look the same” in all species that have it. So, for example, consider the stapes, one of the three bones in your middle ear that are responsible for transferring sound waves from the eardrum to the membrane of the inner ear. This small bone has a long, complicated, and fascinating evolutionary history. To understand that history, we must travel back many of hundreds of millions of years to the origin of the Deuterostomes, a large group that includes the Echinodermata (starfish and their relatives), Hemichordata (worm-like, marine creatures), and Chordata (amphioxus + tunicates + Craniata [organisms with skulls]). The ancestor of this large group had numerous slits in its pharynx (called visceral arches) that were involved with filter feeding. Time passed and cartilaginous rods providing support for the arches appeared, were subdivided and modified. The upper section of the second visceral arch rod is the focus of our tale (Fig. 10). As we move forward still further in time, this character undergoes various structural and positional modifications in essence, it becomes larger, more robust, and involved in supporting the jaws (at which point it is called the hyomandibula), changes from cartilage to bone, then begins a gradual reduction in size, disengages from the jaw/cheek area, and moves into the middle ear (at which point it is called the stapes). Overall then, the upper portion of the 2nd visceral arch—hyomandibula—stapes is the same structure that has had both its shape and function modified over hundreds of millions of years. So although the presence of a “cartilaginous rod in the 2nd visceral arch found in the throat region” may be a synapomorphy for the Craniata, you won’t find that exact structure in any four-footed animals. Instead, what you will find is the modification of that cartilaginous rod, the stapes. The continued evolution of a particular character past its point of origin is called an evolutionary transformation series.

Synapomorphies are not static they may continue to evolve. Changes in the character “upper portion of the second visceral arch” [hyomandibula, stapes] are traced on the phylogenetic tree for the Chordata (animals with notochords). Both the story and the phylogenetic tree have been substantially simplified to emphasize the idea of character origin and modification rather than the finer details of character evolution. Names in italics refer to extinct species known from fossils. Line drawings and photographs of various structures and species can be found easily on the web

The next thing that students of phylogenetics have to know is how to recognize different kinds of groups of organisms. There are two general types of groups, one “good” and the other “bad”.

Let’s begin with “the good,” a monophyletic group (Fig. 11). The word “monophyletic” is a combination of two Greek words, monos (single) and phyle (tribe). It was coined by our old friend Ernest Haekel, who, as you remember, also invented the word phylogeny. A monophyletic group includes an ancestor and all of its descendants. It is identified by the presence of shared, unique characters (synapomorphies). Each phylogenetic tree contains as many monophyletic groups as there are ancestors. For example, looking at the tree in Fig. 11, we can identify five monophyletic groups, only two of which are shown on Fig. 12 (I’ll leave it up to you to discover the other three).

Identifying monophyletic groups

Two of the five monophyletic groups on the hypothetical tree

Now onto “the bad.” The word “paraphyletic” is, once again, a combination of two Geek words, para (near) and phyle (tribe), so the implication is that the whole tribe is not present (Fig. 13). Paraphyletic groups include an ancestor but not all of its descendants. On this hypothetical tree, species C has been eliminated from the group, even though it is a descendant of ancestor 1 just like the rest of the species. Paraphyletic groups are problematic because they mislead us about how characters evolve and how species are related to one another. For example, let’s consider the big tree for the Amniota and highlight the “old” Reptilia, one of the most famous paraphyletic groups (Fig. 14). Even today people still speak about three distinct classes, the reptiles, the birds, and the mammals. When you look at this figure, what is wrong about the class Reptilia, the way it is drawn?

Identifying paraphyletic groups

The most famous paraphyletic group, the reptiles

Right! In (Fig. 15) Ancestor 2 is the ancestor of all the reptiles but, as highlighted on this figure, the Reptilia does not include all of ancestor 2’s descendants ancestor 4 and the birds have been removed from the group. The only way to make the Reptilia a monophyletic group is to redefine the term to include crocodiles, turtles, tuataras, lizards, snakes, and birds. In the past, birds were not considered to be reptiles because they are warm-blooded (in fact, they were often grouped with mammals because of that trait). But phylogenetic studies have demonstrated that birds are indeed reptiles because they share many morphological, behavioral, and molecular characters with other reptilian species in general (synapomorphies originating in ancestor 2 e.g., β keratin), and they share many characters with crocodiles in particular (synapomorphies originating in ancestor 4 e.g., holes in the skull just in front of the eyes).

How to make the Reptilia monophyletic

Why is it important to have monophyletic groups? Say you wanted to figure out how red hair appeared in your family. What would be your chances of tracking down your original red-haired ancestor if no records were kept about the union between your great-great-great-great grandfather Sven and his Irish bride Maggie? Missing information creates problems for any research, be it genealogical or evolutionary, and paraphyletic groups are missing information. In evolutionary terms, monophyletic groups are “real” biological units that is, they are the product of descent with modification (an ancestor and all of its descendants) and as such can be used to study the evolutionary processes that produced them. Paraphyletic groups, on the other hand, are the product of “human error” arising from incomplete or flawed information (e.g., poor descriptions of characters). Using such groups to study evolutionary processes will direct us along misleading and confusing pathways.

Why do we use phylogenetic trees? There are many ways to answer this question (and many papers/books written about it), but the most general answer is that trees summarize valuable information about the evolution of organisms that allows us to understand them better. For example, here’s the family tree for the Hominoidea, the group that includes us and all of our closest relatives (Fig. 16). When you look at the distribution of characters on this tree you can see that a number of traits we associate only with human beings, such as hunting, infanticide, tool making, self-awareness, and language, originated long before Homo sapiens. In other words, human beings are not as unique as you might think. If we want to understand how and why those traits evolved, we must study their expression and function in ourselves and in our relatives. So much information from just one phylogenetic tree!


Welcome to TreeBASE

TreeBASE is a repository of phylogenetic information, specifically user-submitted phylogenetic trees and the data used to generate them. TreeBASE accepts all kinds of phylogenetic data (e.g., trees of species, trees of populations, trees of genes) representing all biotic taxa. Data in TreeBASE are exposed to the public if they are used in a publication that is in press or published in a peer-reviewed scientific journal, book, conference proceedings, or thesis. Data used in publications that are in preparation or in review can be submitted to TreeBASE but will not be available to the public until they have passed peer review. Aside from the submitter, such data are only available to the publication editors or reviewers using a special access URL. TreeBASE is produced and governed by the The Phyloinformatics Research Foundation, Inc.

As of April 2014, TreeBASE contains data for 4,076 publications written by 8,777 different authors. These studies analyzed 8,233 matrices and resulted in 12,817 trees with 761,460 taxon labels that mapped to 104,593 distinct taxa.

The current release includes a host of new features and improvements over the previous TreeBASE prototype. New features include:


How to use a published phylogenetic tree? - Biology

It is usually acknowledged that Jean-Baptiste Pierre Antoine de Monet, Chevalier de Lamarck (1744-1829), published an early evolutionary tree (Lamarck 1809). However, his published trees differ from our modern phylogenetic diagrams in having contemporary higher-level taxonomic groups at both the internal and external nodes, so that each tree represents a transformation series among the taxonomic groups. Thus, while his trees were based on the idea of transmutation, they do not match our current type of tree.

The later published trees of, for example, Charles-Hélion de Barbançois, Hugh Edwin Strickland, and Alfred Russel Wallace, followed the style of Lamarck. Other trees published in the first half of the 19th century, such as those of Jean Louis Rodolphe Agassiz, Augustin Augier, Heinrich Georg Bronn, and Edward Hitchcock, were not intended to be evolutionary diagrams, because their authors did not believe in evolution (Ragan 2009 Tassy 2011).

Charles Darwin (1859) is usually credited as being the originator of modern phylogenetic trees, with contemporary taxa at the leaves and ancestors at the internal nodes. Therefore, an answer to the question posed in the title must involve a post-Darwinian person. There appear to be four candidates for who published the first empirical Darwinian tree, in the period 1865-1866, two of them palaeontologists and two comparative morphologists, two with strong religious beliefs and two apparently without, and including one Englishman, one Frenchman and two Germans. I list them here in the probable order of publication.

Note: For another early tree, see the blog post Fritz Müller and the first phylogenetic tree.

St George Jackson Mivart (1827-1900)

St George Mivart was a comparative morphologist who was an early British convert to Darwinism, although he later fell out with Thomas Henry Huxley and therefore with Darwin. His work was principally on the comparative anatomy of primates, for which he provided very detailed comparisons of the skeletons of a large number of species, notably in Mivart (1865) and Mivart (1867).

The paper by Mivart (1865) therefore seems to be the first publication to contain an explicitly Darwinian tree. This is ironic, given the fact that Mivart later became one of Darwin's strongest critics. That it took 6 years (from 1859) for a biologist to produce such a tree may reflect the fact that Darwin himself published only a single theoretical sketch, thus leaving others to work out how to apply his ideas to empirical data.

The 1865 paper was read before the Zoological Society of London on 27 June 1865, and then appeared as a regular part of the Society's journal later that year (see Dickinson 2005). It was based on a detailed osteological analysis of the spinal columns of 29 primate genera. As noted by Bigoni & Barsanti (2011): "Not only does he use taxonomic names still largely in use today, but, surprisingly, Homo is not the apex or culmination of evolution . in fact it is placed on a lateral diverging branch. This position of humans provides his tree with a particularly modern appearance and is perfectly consistent with the trees or bushes that Darwin drew." Mivart's paper is available from the publishers John Wiley & Sons and also the Biodiversity Heritage Library.

Unfortunately, Mivart's tree does deviate from Darwin's ideas in that the leaves and many of the branches refer to higher taxonomic groups, rather than to species. In this sense his trees look similar to Ernst Haeckel's (see below), although it is doubtful that they were constructed in the same way. Note that Mivart's labels occur along the terminal twigs, rather than at their end, as his contemporaries chose to present them (and as we do today).

Also, being based on different data sets, the 1867 tree (based on the appendicular skeleton, or limbs) does differ in topology from the 1865 one (based on the axial skeleton, or spinal column), thus foreshadowing a problem with phylogeny reconstruction from different data sources that continues to this day (see this later blog post). Mivart explicitly noted in a letter to Darwin (1870): "The diagram in the Pro. Z. Soc. expresses what I believe to be the degree of resemblance as regards the spinal column only. The diagram in the Phil. Trans. expresses what I believe to be the degree of resemblance as regards the appendicular skeleton only" (Darwin Correspondence Project letter 7170). Indeed, in the 1865 paper Mivart also noted that the data for the spinal column "lead to an arrangement of groups and an interpretation of affinities somewhat differing from, yet in part agreeing with, the classification founded on cranial and dental characters".

Mivart's work is discussed in detail by Bigoni & Barsanti (2011). Mivart's views on evolution and theology are presented in Mivart (1871).

From Mivart (1865) p. 592. Click to enlarge.
From Mivart (1867) p. 425. Click to enlarge.

Franz Martin Hilgendorf (1839-1904)

Franz Hilgendorf was a palaeontologist, among other zoological pursuits, although he is relatively unknown today. He was one of the first Germans to accept Darwin's ideas (Reif 1986), and he is also credited with being the first to introduce evolutionary theory into Japan (c. 1873) (Yajima 2007). He could also have been the first to publish a Darwinian tree, but he did not actually do it.

Hilgendorf's PhD work was on the fossil gastropods of the middle Miocene basin at Steinheim, in southern Germany, which he visited in 1862. He studied the morphological variation, in the different stratigraphic layers, of the various fossil forms of what he referred to as Planorbis multiformis. The resulting thesis (Hilgendorf 1863) was passed in April 1863 but was otherwise unpublished, and it apparently contained no images. Nevertheless, Hilgendorf discussed in detail the relationship between a complete stratigraphic series of fossils and Darwin's evolutionary ideas, concluding that the Planorbis fossils could be arranged in a phyletic tree and Reif (1983) found that Hilgendorf's notes did, indeed, contain a preliminary phylogenetic diagram. Reif (1983) presents a version of this phylogeny based on Hilgendorf's notes, which is also reproduced by Janz (1999).

Hilgendorf may thus have been the first to produce a Darwinian tree, even though he did not publish it. His ideas were Darwinian by including ancestral and descendant forms, splitting of lineages, and gradual transition between forms, with the ancestral taxa being varieties, not higher taxa. Interestingly, in the thesis Hilgendorf also raised the possibility that two of the lineages may have fused. He noted: "This does not fit the nice picture of a tree with many branches that Darwin presented to illustrate the descent of species — the branches of a tree never fuse again" [translation taken from Janz 1999].

Hilgendorf then made another excursion to the Steinheim basin in 1865, and wrote up the results for publication, this time with an explicit tree showing the relationship between the 19 different fossil forms that he recognized. This was read as a paper before the Royal Prussian Academy of Sciences on 19 July 1866, and was apparently published simultaneously as an offprint (Hilgendorf 1866). The paper then appeared as a regular part of the Academy's journal (Hilgendorf 1867) — these two versions are evidently identical save only the absence of the subtitle in the latter [which translates as "an example of morphological change through time"]. This is thus Hilgendorf's first published Darwinian tree.

There are actually two versions of the tree in the paper, as shown here, taken from the Biodiversity Heritage Library. The first tree emphasizes which stratigraphic layers each morphological form occupies (there are ten layers), whereas the second tree emphasizes the forms themselves. There is no suggestion of lineage fusion in either tree.

Hilgendorf's work is discussed in detail by Reif (1983), and more generally by Janz (1999).

From Hilgendorf (1867) p. 479. Click to enlarge.
From Hilgendorf (1867) after p. 502. Click to enlarge.

Jean Albert Gaudry (1827-1908)

Albert Gaudry was a palaeontologist who was one of the very few French scientists to promote Darwinian evolution. Indeed, Darwin noted in a letter to Jean Louis Armand de Quatrefages de Bréau (1870): "It is curious how nationality influences opinion: a week hardly passes without my hearing of some naturalist in Germany who supports my views, & often puts an exaggerated value on my works whilst in France I have not heard of a single zoologist except M[onsieur] Gaudry (and he only partially) who supports my views" (APS 379 An Annotated Calendar of the Letters of Charles Darwin in the Library of the American Philosophical Society 1799-1882, p. 212).

The paper by Gaudry (1866) was a separately paginated offprint of the second chapter of part 1 (pp. 325-370) of a larger work about the fossil mammals from the late Miocene locality of Pikermi in Attica, in Greece, which was completed in 1867 (Animaux Fossiles et Géologie de l’Attique d’après les recherches faites en 1855󈞤 et en 1860 sous les auspices de l’Académie des sciences). In this offprint Gaudry expressed his views on palaeontology and evolution. He noted that the Pikermi fossils showed characteristics of two or more groups of animals, so that he could see the passage from order to order, family to family, genus to genus, and species to species in these intermediate forms.

He included five trees showing the relationships among different groups of extant and extinct fossil mammals, within a stratigraphic framework. The pictures shown here are taken from Google Books. I do not know exactly when this offprint was published, but Darwin acknowledged on 17 September 1866 that he received it "some time ago", so that it might pre-date Hilgendorf's own offprint.

As emphasized by Tassy (2006, 2011), Gaudry's trees were Darwinian by including ancestral and descendant species, splitting, gradualism and extinction, with the ancestral taxa being species or sub-species, not higher taxa. However, Gaudry did not fully embrace Darwinism, for religious reasons. As Darwin noted in a letter to Gaudry thanking him for a copy of the 1866 offprint: "I will venture to make one little criticism, namely that you do not fully understand what I mean by 'the struggle for existence', or concurrence vitale but this is of little importance as you do not at all accept my views on the means by which species have been modified." (Darwin Correspondence Project letter 5213). Gaudry attributed evolutionary change to God, rather than to natural selection, as indicated in the closing sentence of his 1866 work: "Mais, nous n'eu douterons pas, l'artiste qui pétrissait était le Créateur lui-même, car chaque transformation a porté un reflet de sa beauté infinite."

Gaudry's work is discussed in detail by Tassy (2006), if you read French, and more briefly by Tassy (2011), if you do not.

From Gaudry (1866) p. 36. Click to enlarge.
From Gaudry (1866) p. 38. Click to enlarge.
From Gaudry (1866) p. 41. Click to enlarge.
From Gaudry (1866) p. 44. Click to enlarge.
From Gaudry (1866) p. 46. Click to enlarge.

Ernst Heinrich Philipp August Haeckel (1834-1919)

Ernst Haeckel is best known today as a comparative morphologist, but he was also an important popularizer of science, as well as a brilliant artist. He was an early German convert to Darwinism, and it has been noted that "more people by the turn of the century had learned of evolutionary theory through Haeckel's depictions than even from Darwin's own writings" (Richards 2011).

Haeckel actually coined the word "phylogeny" (along with many others, including "ontogeny" and "ecology"), and his first phylogenetic trees were published in the second volume of his two-volume opus about animal morphology (Haeckel 1866). Haeckel had the ambitious plan to reform the study of morphology, by synthesizing Darwin's ideas on genealogical descent with the transformational evolutionism of Lamarck, along with the German tradition of naturphilosophie (represented by Johann Wolfgang von Goethe). As noted by Hopwood (2006), for Haeckel: "evolution was the organizing principle of a cosmic synthesis that would unify science, religion, and art on a biological foundation."

There were eight trees in the book, showing the relationships between animals, plants and (for the first time) protists, and within plants and different animal groups. Haeckel used morphology to reconstruct the phylogenetic history of animals, and in the absence of fossils used embryology as evidence of ancestors. The pictures here are taken from the Biodiversity Heritage Library. These are frequently credited as being the first phylogenetic trees published, although Mivart, at least, published earlier. Haeckel claimed to have started the book "several years" before 1864, which is when he apparently started work on the phylogenetic trees (as he mentions in a letter to Darwin), but the Foreword is dated 14 September 1866.

Unfortunately, Haeckel's tree-construction method seems to have owed more to Lamarck than to Darwin (see Dayrat 2003), with the branches indicating morphological transformation among the named groups rather than strictly representing genealogy. Moreover, the trees show higher taxonomic groups at the internal branches, while Darwin treated them as representing extinct species. Thus, it is not clear just how Darwinian Haeckel really was. Indeed, Di Gregorio (2005) has noted: "Haeckel's view of evolution (or rather evolutionism) . from the very beginning, reminds one more of Jean-Baptiste Lamarck than Darwin."

One intriguing detail about Haeckel's early trees is that many, if not most, of the labels occur in the spaces between the terminal twigs, like seeds enclosed within a fruit. All of the trees shown above distinctly label the leaves, even though it it likely that their presentation format was derived independently of each other. Haeckel, on the other hand, appears to be much more vague about exactly what is being labelled. Perhaps this is a by-product of the fact that his images are distinctly tree-like in form, rather than being stick figures or perhaps it comes from the rather speculative nature of many of the relationships proposed (the trees of Mivart, Hilgendorf and Gaudry were based on detailed empirical data, whereas Haeckel's were much more ambitiously hypothetical).

It is perhaps also worth noting that Haeckel first publicly endorsed Darwin's theory in his work on the Radiolaria (Haeckel 1862). On page 234 of that work (see the Biodiversity Heritage Library) he produced what he called a "Verwandtschaftstabelle der Familien, Subfamilien und Gattungen der Radiolarien", which is thus his first attempt at a genealogical diagram. It was not drawn as a tree, and is thus somewhat hard to interpret, but in the same chapter he discussed ancestral and transitional forms, and on pages 231� he made clear that he was attempting to implement Darwin's ideas.

Heackel's life and work are discussed in detail by Di Gregorio (2005) and Richards (2008).

From Haeckel (1866) Taf. I. Click to enlarge.
From Haeckel (1866) Taf. II. Click to enlarge.
From Haeckel (1866) Taf. III. Click to enlarge.
From Haeckel (1866) Taf. IV. Click to enlarge.
From Haeckel (1866) Taf. V. Click to enlarge.
From Haeckel (1866) Taf. VI. Click to enlarge.
From Haeckel (1866) Taf. VII. Click to enlarge.
From Haeckel (1866) Taf. VIII. Click to enlarge.

Bigoni, F., Barsanti, G. (2011) Evolutionary trees and the rise of modern primatology: the forgotten contribution of St. George Mivart. Journal of Anthropological Sciences 89: 93-107.

Darwin, C. (1859) On the Origin of Species by Means of Natural Selection, or the preservation of favoured races in the struggle for life. John Murray, London.

Dayrat, B. (2003) The roots of phylogeny: how did Haeckel build his trees? Systematic Biology 52: 515-527.

Dickinson, E.C. (2005) The Proceedings of the Zoological Society of London, 1859�: an exploration of breaks between calendar years of publication. Journal of Zoology, London 266: 427-430.

Di Gregorio, M.A. (2005) From Here to Eternity: Ernst Haeckel and scientific faith. Vandenhoeck & Ruprecht, Göttingen

Gaudry, A. (1866) Considérations Générales sur les Animaux Fossiles de Pikermi. F. Savy, Paris. 68 pp.

Haeckel, E. (1862) Die Radiolarien (Rhizopoda Radaria): Eine monographie. Verlag von Georg Reimer, Berlin.

Haeckel, E. (1866) Generelle Morphologie der Organismen: Allgemeine grundzüge der organischen formen-wissenschaft, mechanisch begründet durch die von Charles Darwin reformirte descendenztheorie. — Band 1: Allgemeine anatomie der organismen. — Band 2: Allgemeine entwickelungsgeschichte der organismen. Verlag von Georg Reimer, Berlin.

Hilgendorf, F. (1863) Beiträge zur Kenntniß des Süßwasserkalkes von Steinheim. Unpublished PhD Dissertation. Philosophische Fakultät, Universität Tübingen, 42 pp.

Hilgendorf, F. (1866) Planorbis multiformis im Steinheimer Süßwasserkalk: ein beispiel von gestaltveränderung im laufe der zeit. Buchhandlung von W. Weber, Berlin, 36 pp.

Hilgendorf, F. (1867) Über Planorbis multiformis im Steinheimer Süsswasserkalk. Monatsberichte der Königliche Preussischen Akademie der Wissenschaften zu Berlin 1866: 474-504.

Hopwood, N. (2006) Pictures of evolution and charges of fraud: Ernst Haeckel’s embryological illustrations. Isis 97: 260�.

Janz, H. (1999) Hilgendorf’s planorbid tree — the first introduction of Darwin’s theory of transmutation into palaeontology. Paleontological Research 3/4: 287�.

Lamarck, J.-B. (1809) Philosophie Zoologique. Dentu et l'Auteur, Paris.

Mivart, StG. (1865) Contributions towards a more complete knowledge of the axial skeleton in the primates. Proceedings of the Zoological Society of London 33: 545-592.

Mivart, StG. (1867) On the appendicular skeleton of the primates. Philosophical Transactions of the Royal Society of London 157: 299-429.

Mivart, StG. (1871) On the Genesis of Species. Macmillan, London.

Ragan, M.A. (2009) Trees and networks before and after Darwin. Biology Direct 4: 43.

Reif, W.-E. (1983) Hilgendorf's (1863) dissertation on the Steinheim planorbids (Gastropoda Miocene): the development of a phylogenetic research program for paleontology. Paläontologische Zeitschrift 57: 7󈞀.

Reif, W.-E. (1986) The search for a macroevolutionary theory in German paleontology. Journal of the History of Biology 19: 79-130.

Richards, R.J. (2008) The Tragic Sense of Life: Ernst Haeckel and the struggle over evolutionary thought. University of Chicago Press, Chicago.

Richards, R.J. (2011) Images of evolution. American Scientist 99: 165-167.

Tassy, P. (2006) Albert Gaudry et l'émergence de la paléontologie darwinienne au xixe siècle [Albert Gaudry and emerging Darwinian palaeontology during the 19th century]. Annales de Paléontologie 92: 41-70.

Tassy, P. (2011) Trees before and after Darwin. Journal of Zoological Systematics and Evolutionary Research 49: 89-101.


16.3. The Applications of Molecular Phylogenetics

Molecular phylogenetics has grown in stature since the start of the 1990s, largely because of the development of more rigorous methods for tree building, combined with the explosion of DNA sequence information obtained initially by PCR analysis and more recently by genome projects. The importance of molecular phylogenetics has also been enhanced by the successful application of tree reconstruction and other phylogenetic techniques to some of the more perplexing issues in biology. In this final section we will survey some of these successes.

16.3.1. Examples of the use of phylogenetic trees

First, we will consider two projects that illustrate the various ways in which conventional tree reconstruction is being used in modern molecular biology.

DNA phylogenetics has clarified the evolutionary relationships between humans and other primates

Darwin (1871) was the first biologist to speculate on the evolutionary relationships between humans and other primates. His view - that humans are closely related to the chimpanzee, gorilla and orangutan - was controversial when it was first proposed and fell out of favor, even among evolutionists, in the following decades. Indeed, biologists were among the most ardent advocates of an anthropocentric view of our place in the animal world (Goodman, 1962).

From studies of fossils, paleontologists had concluded prior to 1960 that chimpanzees and gorillas are our closest relatives but that the relationship was distant, the split, leading to humans on the one hand and chimpanzees and gorillas on the other, having occurred some 15 million years ago. The first detailed molecular data, obtained by immunological studies in the 1960s (Goodman, 1962 Sarich and Wilson, 1967) confirmed that humans, chimpanzees and gorillas do indeed form a single clade (see Box 16.2) but suggested that the relationship is much closer, a molecular clock indicating that this split occurred only 5 million years ago. This was one of the first attempts to apply a molecular clock to phylogenetic data and the result was, quite naturally, treated with some suspicion. In fact, an acrimonious debate opened up between paleontologists, who believed in the ancient split indicated by the fossil evidence, and biologists, who had more confidence in the recent date suggested by the molecular data. This debate was eventually ‘won’ by the molecular biologists, whose view that the split occurred about 5 million years ago became generally accepted.

Box 16.2

Terminology for molecular phylogenetics. The text includes definitions of most of the important terms used in molecular phylogenetics. Here are a few additional definitions that you may find useful when reading research articles on this subject: Operational (more. )

As more and more molecular data were obtained, the difficulties in establishing the exact pattern of the evolutionary events that led to humans, chimpanzees and gorillas became apparent. Comparisons of the mitochondrial genomes of the three species by restriction mapping (Section 5.3.1) and DNA sequencing suggested that the chimpanzee and gorilla are more closely related to each other than either is to humans (Figure 16.14A), whereas DNA-DNA hybridization data supported a closer relationship between humans and chimpanzees (Figure 16.14B). The reason for these conflicting results is the close similarity between DNA sequences in the three species, the differences being less than 3% for even the most divergent regions of the genomes (Section 15.4). This makes it difficult to establish relationships unambiguously.

Figure 16.14

Different interpretations of the evolutionary relationships between humans, chimpanzees and gorillas. See the text for details. Abbreviation: Myr, million years.

The solution to the problem has been to make comparisons between as many different genes as possible and to target those loci that are expected to show the greatest amount of dissimilarity. By 1997, 14 different molecular datasets had been obtained, including sequences of variable loci such as pseudogenes and non-coding sequences (Ruvolo, 1997). Analysis of these datasets confirmed that the chimpanzee is the closest relative to humans, with our lineages diverging 4.6𠄵.0 million years ago. The gorilla is a slightly more distant cousin, its lineage having diverged from the human-chimp one between 0.3 and 2.8 million years earlier (Figure 16.14C).

The origins of AIDS

The global epidemic of acquired immune deficiency syndrome (AIDS) has touched everyone's lives. AIDS is caused by human immunodeficiency virus 1 (HIV-1), a retrovirus (Section 2.4.2) that infects cells involved in the immune response. The demonstration in the early 1980s that HIV-1 is responsible for AIDS was quickly followed by speculation about the origin of the disease. Speculation centered around the discovery that similar immunodeficiency viruses are present in primates such as the chimpanzee, sooty mangabey, mandrill and various monkeys. These simian immunodeficiency viruses (SIVs) are not pathogenic in their normal hosts but it was thought that if one had become transferred to humans then within this new species the virus might have acquired new properties, such as the ability to cause disease and to spread rapidly through the population.

Retrovirus genomes accumulate mutations relatively quickly because reverse transcriptase, the enzyme that copies the RNA genome contained in the virus particle into the DNA version that integrates into the host genome (see Section 2.4.2), lacks an efficient proofreading activity (Section 13.2.2) and so tends to make errors when it carries out RNA-dependent DNA synthesis. This means that the molecular clock runs rapidly in retroviruses, and genomes that diverged quite recently display sufficient nucleotide dissimilarity for a phylogenetic analysis to be carried out. Even though the evolutionary period we are interested in is less than 100 years, HIV and SIV genomes contain sufficient data for their relationships to be inferred by phylogenetic analysis.

The starting point for this phylogenetic analysis is RNA extracted from virus particles. RT-PCR (see Technical Note 4.4) is therefore used to convert the RNA into a DNA copy and then to amplify the DNA so that sufficient amounts for nucleotide sequencing are obtained. Comparison between virus DNA sequences has resulted in the reconstructed tree shown in Figure 16.15 (Leitner et al., 1996 Wain-Hobson, 1998). This tree has a number of interesting features. First it shows that different samples of HIV-1 have slightly different sequences, the samples as a whole forming a tight cluster, almost a star-like pattern, that radiates from one end of the unrooted tree. This star-like topology implies that the global AIDS epidemic began with a very small number of viruses, perhaps just one, which have spread and diversified since entering the human population. The closest relative to HIV-1 among primates is the SIV of chimpanzees, the implication being that this virus jumped across the species barrier between chimps and humans and initiated the AIDS epidemic. However, this epidemic did not begin immediately: a relatively long uninterrupted branch links the center of the HIV-1 radiation with the internal node leading to the relevant SIV sequence, suggesting that after transmission to humans, HIV-1 underwent a latent period when it remained restricted to a small part of the global human population, presumably in Africa, before beginning its rapid spread to other parts of the world. Other primate SIVs are less closely related to HIV-1, but one, the SIV from sooty mangabey, clusters in the tree with the second human immunodeficiency virus, HIV-2. It appears that HIV-2 was transferred to the human population independently of HIV-1, and from a different simian host. HIV-2 is also able to cause AIDS, but has not, as yet, become globally epidemic.

Figure 16.15

The phylogenetic tree reconstructed from HIV and SIV genome sequences. The AIDS epidemic is due to the HIV-1M type of immunodeficiency virus. ZR59 is positioned near the root of the star-like pattern formed by genomes of this type. Based on Wain-Hobson (more. )

An intriguing addition to the HIV/SIV tree was made in 1998 when the sequence of an HIV-1 isolate from a blood sample taken in 1959 from an African male was sequenced (Zhu et al., 1998). The RNA was highly fragmented and only a short DNA sequence could be obtained, but this was sufficient for the sequence to be placed on the phylogenetic tree (see Figure 16.15). This sequence, called ZR59, attaches to the tree by a short branch that emerges from near the center of the HIV-1 radiation. The positioning indicates that the ZR59 sequence represents one of the earliest versions of HIV-1 and shows that the global spread of HIV-1 was already underway by 1959. A later and more comprehensive analysis of HIV-1 sequences has suggested that the spread began in the period between 1915 and 1941, with a best estimate of 1931 (Korber et al., 2000). Pinning down the date in this way has enabled epidemiologists to begin an investigation of the historic and social conditions that might have been responsible for the start of the AIDS epidemic.

16.3.2. Molecular phylogenetics as a tool in the study of human prehistory

Now we will turn our attention to the use of molecular phylogenetics in intraspecific studies: the study of the evolutionary history of members of the same species. We could choose any one of several different organisms to illustrate the approaches and applications of intraspecific studies, but many people look on Homo sapiens as the most interesting organism so we will investigate how molecular phylogenetics is being used to deduce the origins of modern humans and the geographic patterns of their recent migrations in the Old and New Worlds.

Intraspecific studies require highly variable genetic loci

In any application of molecular phylogenetics, the genes chosen for analysis must display variability in the organisms being studied. If there is no variability then there is no phylogenetic information. This presents a problem in intraspecific studies because the organisms being compared are all members of the same species and so share a great deal of genetic similarity, even if the species has split into populations that interbreed only intermittently. This means that the DNA sequences that are used in the phylogenetic analysis must be the most variable ones that are available. In humans there are three main possibilities.

It is important to note that it is not the potential for change that is critical to the application of these loci in phylogenetic analysis, it is the fact that different alleles or haplotypes of the locus coexist in the population as a whole. The loci are therefore polymorphic (see Box 16.3) and information pertaining to the relationships between different individuals can be obtained by comparing the combinations of alleles and/or haplotypes that those individuals possess.

The origins of modern humans - out of Africa or not?

It seems reasonably certain that the origin of humans lies in Africa because it is here that all of the oldest pre-human fossils have been found. The paleontological evidence reveals that hominids first moved outside of Africa over 1 million years ago, but these were not modern humans, they were an earlier species called Homo erectus. These were the first hominids to become geographically dispersed, eventually spreading to all parts of the Old World.

The events that followed the dispersal of Homo erectus are controversial. From comparisons using fossil skulls and bones, paleontologists have concluded that the Homo erectus populations that became located in different parts of the Old World gave rise to the modern human populations of those areas by a process called multiregional evolution (Figure 16.16A). There may have been a certain amount of interbreeding between humans from different geographic regions, but, to a large extent, these various populations remained separate throughout their evolutionary history.

Figure 16.16

Two competing hypotheses for the origins of modern humans. (A) The multiregional hypothesis states that Homo erectus left Africa over 1 million years ago and then evolved into modern humans in different parts of the Old World. (B) The Out of Africa hypothesis (more. )

Doubts about the multiregional hypothesis were first raised by re-interpretations of the fossil evidence and were subsequently brought to a head by publication in 1987 of a phylogenetic tree reconstructed from mitochondrial RFLP data obtained from 147 humans representing populations from all parts of the World (Cann et al., 1987). The tree (Figure 16.17) confirmed that the ancestors of modern humans lived in Africa but suggested that they were still there about 200 000 years ago. This inference was made by applying the mitochondrial molecular clock to the tree, which showed that the ancestral mitochondrial DNA, the one from which all modern mitochondrial DNAs are descended, existed between 140 000 and 290 000 years ago. The tree showed that this mitochondrial genome was located in Africa, so the person who possessed it, the so-called mitochondrial Eve (she had to be female because mitochondrial DNA is only inherited through the female line), must have been African.

Figure 16.17

Phylogenetic tree reconstructed from mitochondrial RFLP data obtained from 147 modern humans. The ancestral mitochondrial DNA is inferred to have existed in Africa because of the split in the tree between the seven modern African mitochondrial genomes (more. )

The discovery of mitochondrial Eve prompted a new scenario for the origins of modern humans. Rather than evolving in parallel throughout the world, as suggested by the multiregional hypothesis, Out of Africa states that Homo sapiens originated in Africa, members of this species then moving into the rest of the Old World between 100 000 and 50 000 years ago, displacing the descendents of Homo erectus that they encountered (see Figure 16.16B).

Such a radical change in thinking inevitably did not go unchallenged. When the RFLP data obtained by Cann et al. (1987) were examined by other molecular phylogeneticists it became clear that the original computer analysis had been flawed, and that several quite different trees could be reconstructed from the data, some of which did not have a root in Africa. These criticisms were countered by more detailed mitochondrial DNA sequence datasets, most of which are compatible with a relatively recent African origin and so support the Out of Africa hypothesis rather than multiregional evolution (e.g. Ingman et al., 2000). An interesting complement to ‘mitochondrial Eve’ has been provided by studies of the Y chromosome, which suggest that ‘Y chromosome Adam’ also lived in Africa some 200 000 years ago (Pä๋o, 1999). Of course, this Eve and Adam were not equivalent to the biblical characters and were by no means the only people alive at that time: they were simply the individuals who carried the ancestral mitochondrial DNA and Y chromosomes that gave rise to all the mitochondrial DNAs and Y chromosomes in existence today. The important point is that these ancestral DNAs were still in Africa well after the spread of Homo erectus into Eurasia.

The mitochondrial DNA and Y chromosome studies appear to provide strong evidence in support of the Out of Africa theory. But complications have arisen from studies of nuclear genes other than those on the Y chromosome. For example, β-globin sequences give a much earlier date, 800 000 years ago, for the common ancestor (Harding et al., 1997), and studies of an X chromosome gene, PDHA1, place the ancestral sequence at 1 900 000 years ago (Harris and Hey, 1999). Molecular anthropologists are currently debating the significance of these results (Pä๋o, 1999). More datasets, and hopefully some sort of Grand Synthesis, are eagerly awaited.

The patterns of more recent migrations into Europe are also controversial

By whatever evolutionary pathway, modern humans were present throughout most of Europe by 40 000 years ago. This is clear from the fossil and archaeological records. The next controversial issue in human prehistory concerns whether these populations were displaced about 30 000 years later by other humans migrating into Europe from the Middle East.

The question centers on the process by which agriculture spread into Europe. The transition from hunting and gathering to farming occurred in the Middle East some 9000� 000 years ago, when early Neolithic villagers began to cultivate crops such as wheat and barley. After becoming established in the Middle East, farming spread into Asia, Europe and North Africa. By searching for evidence of agriculture at archaeological sites, for example by looking for the remains of cultivated plants or for implements used in farming, it has been possible to trace the expansion of farming along two routes through Europe, one around the coast to Italy and Spain and the second through the Danube and Rhine valleys to northern Europe (Figure 16.18).

Figure 16.18

The spread of agriculture from the Middle East to Europe. The dark-green area is the �rtile Crescent’, the area of the Middle East where many of today's crops - wheat, barley, etc. - grow wild and where these plants are thought to have (more. )

How did farming spread? The simplest explanation is that farmers migrated from one part of Europe to another, taking with them their implements, animals and crops, and displacing the indigenous pre-agricultural communities that were present in Europe at that time. This wave of advance model was initially favored by geneticists because of the results of a large-scale phylogenetic analysis of the allele frequencies for 95 nuclear genes in populations from across Europe (Cavalli-Sforza, 1998). Such a large and complex dataset cannot be analyzed in any meaningful way by conventional tree building but instead has to be examined by more advanced statistical methods, ones based more in population biology than phylogenetics. One such procedure is principal component analysis, which attempts to identify patterns in a dataset corresponding to the uneven geographic distribution of alleles, these uneven distributions possibly being indicative of past population migrations. The most striking pattern within the European dataset, accounting for about 28% of the total genetic variation, is a gradation of allele frequencies across Europe (Figure 16.19). This pattern implies that a migration of people occurred either from the Middle East to northeast Europe, or in the opposite direction. Because the former coincides with the expansion of farming, as revealed by the archaeological record, this first principal component was looked upon as providing strong support for the wave of advance model.

Figure 16.19

A genetic gradation across modern Europe. See the text for details.

The analysis looked convincing but two criticisms were raised. The first was that the data provided no indication of when the inferred migration took place, so the link between the first principal component and the spread of agriculture was based solely on the pattern of the allele gradation, not on any complementary evidence relating to the period when this gradation was set up. The second criticism arose because of the results of a second study of European human populations, one that did include a time dimension (Richards et al., 1996). This study looked at mitochondrial DNA haplotypes in 821 individuals from various populations across Europe. It failed to confirm the gradation of allele frequencies detected in the nuclear DNA dataset, and instead suggested that European populations have remained relatively static over the last 20 000 years. A refinement of this work led to the discovery that eleven mitochondrial DNA haplotypes predominate in the modern European population, each with a different time of origin, thought to indicate the date at which the haplotype entered Europe (Figure 16.20 Richards et al., 2000). The most ancient haplotype, called U, first appeared in Europe approximately 50 000 years ago, coinciding with the period when, according to the archaeological record, the first modern humans moved into the continent as the ice sheets withdrew to the north at the end of the last major glaciation. The youngest haplotypes, J and T1, which at 9000 years in age could correspond to the origins of agriculture, are possessed by just 8.3% of the modern European population, suggesting that the spread of farming into Europe was not the huge wave of advance indicated by the principal component study. Instead, it is now thought that farming was brought into Europe by a smaller group of ‘pioneers’ who interbred with the existing pre-farming communities rather than displacing them.

Figure 16.20

The eleven major European mitochondrial haplotypes. The calculated time of origin for each haplotype is shown, the closed and open parts of each bar indicating different degrees of confidence. The percentages refer to the proportions of the modern European (more. )

Box 16.1

Neandertal DNA. Sequence analysis of 𠆊ncient DNA’ extracted from a fossil bone between 30 000 and 100 000 years old provides support for the Out of Africa hypothesis. Neandertals are extinct hominids who lived in Europe between 300 000 (more. )

Prehistoric human migrations into the New World

Finally we will examine the completely different set of controversies surrounding the hypotheses regarding the patterns of human migration that led to the first entry of people into the New World. There is no evidence for the spread of Homo erectus into the Americas, so it is presumed that humans did not enter the New World until after modern Homo sapiens had evolved in, or migrated into, Asia. The Bering Strait between Asia and North America is quite shallow and if the sea level dropped by 50 meters it would be possible to walk across from one continent to the other. It is believed that this was the route taken by the first humans to venture into the New World (Figure 16.21).

Figure 16.21

The route by which humans first entered the New World.

The sea was 50 meters or more below its current level for most of the last Ice Age, between about 60 000 and 11 000 years ago, but for most of this time the route would have been impassable because of the build-up of ice. Also, the northern parts of America would have been arctic during much of this period, providing few game animals for the migrants to hunt and very little wood with which they could make fires. These considerations, together with the absence of archaeological evidence of humans in North America before 11 500 years ago, led to the adoption of �out 12 000 years ago’ as the date for the first entry of humans into the New World. Recent discoveries of evidence of human occupation at sites dating to 20 000 years ago, both in North and South America, has prompted some rethinking, but it is still generally assumed that a substantial population migration into North America, possibly the one from which all modern Native Americans are descended, occurred about 12 000 years ago.

What information does molecular phylogenetics provide? The first relevant studies were carried out in the late 1980s using RFLP data. These indicated that Native Americans are descended from Asian ancestors and identified four distinct mitochondrial haplotypes among the population as a whole (Wallace et al., 1985 Schurr et al., 1990). Linguistic studies had already shown that American languages can be divided into three different groupings, suggesting that modern Native Americans are descended from three sets of people, each speaking a different language. The inference from the molecular data that there may in fact have been four ancestral populations was not too disquieting. The first significant dataset of mitochondrial DNA sequences was obtained in 1991, enabling the rigorous application of a molecular clock. This indicated that the migrations into North America occurred between 15 000 and 8000 years ago (Ward et al., 1991), which is consistent with the archaeological evidence that humans were absent from the continent before 11 500 years ago.

These early phylogenetic analyses confirmed, or at least were not too discordant with, the complementary evidence provided by archaeological and linguistic studies. However, the additional molecular data that have been acquired since 1992 have tended to confuse rather than clarify the issue. For example, different datasets have provided a variety of estimates for the number of migrations into North America. The most comprehensive analysis, based on mitochondrial DNA (Forster et al., 1996), puts this figure at just one migration, and suggests that it occurred between 25 000 and 20 000 years ago, much earlier than the traditional date. Studies of Y chromosomes have assigned a date of approximately 22 500 years ago to the ‘Native American Adam’, the carrier of the Y chromosome that is ancestral to most, if not all, of the Y chromosomes in modern Native Americans (De Mendoza and Braginski, 1999). The implication from these studies is that humans became established in North America about 20 000 years ago, much earlier than indicated by the archaeological and early genetic evidence. This hypothesis is still being evaluated by other molecular biologists and archaeologists.


Estimating Divergence Times with MEGA X

MEGA was recently updated to a new version, MEGA X. As a major improvement, in addition to Windows, MEGA X now runs natively on Linux machines, maximizing the use of computer resources when running on this operating system. MEGA X version for macOS is currently being developed and will be released in the near future ( Kumar et al. 2018). The program can be downloaded at https://www.megasoftware.net. Once the download is complete, double-click the executable and follow the steps for installation. The default installation process will generate a Desktop icon, which launches MEGA X once you double-click it. It also creates a folder called MEGA X in your Documents directory, which contains several example files. Within this folder, you will find the files named mtCDNA.meg and mtCDNA.nwk, which contain mitochondrial alignment data and the phylogenetic tree of the great apes’ example for which molecular times were previously retrieved from the TimeTree resource (example files are also provided as Supplementary Material ). With these two example files, all the following steps can be performed in MEGA X.

When you start MEGA, the main MEGA window will be opened. To perform a molecular dating analysis, from the Clocks menu, choose Compute TimeTree. Then, the next three options will be available to users: RelTime-ML, RelTime-OLS, and RelTime-Branch Lengths. The RelTime-ML option is the original method proposed by Tamura et al. (2012). In most cases, this is the option you want. Time estimates are computed based on branch lengths optimized by ML, which is a robust statistical method.

RelTime-OLS relies on the estimation of branch lengths by the ordinary least-squares approach, which is a distance method ( Rzhetsky and Nei 1993). Such an option can be useful when dealing with big data sets that have a good coverage (i.e., small amounts of missing data), because it is faster than the ML-based option. Lastly, RelTime-Branch Lengths allows the user to provide the phylogenetic tree with previously estimated branch lengths. In this case, MEGA will not provide confidence intervals for node ages. This option is useful when the user has already optimized branch lengths under approaches different from ML or OLS, or under models of evolution that are not yet implemented in MEGA (e.g., the Lewis 2001 model for morphological data). In this case, you should not provide your alignment data and go directly to step 2.

It is important to mention that in case you have just estimated a phylogenetic tree with MEGA, please note the section entitled “Performing a Phylogenetic Reconstruction Followed by a TimeTree Estimation in MEGA X” near the end of this paper.


Creating Phylogenetic Trees from DNA Sequences

This interactive module shows how DNA sequences can be used to infer evolutionary relationships among organisms and represent them as phylogenetic trees.

Phylogenetic trees are diagrams of evolutionary relationships among organisms. Scientists can estimate these relationships by studying the organisms’ DNA sequences. As the organisms evolve and diverge, their DNA sequences accumulate mutations. Scientists compare these mutations using sequence alignments to reconstruct evolutionary history.

The accompanying “Worksheet” guides students’ exploration of the Click & Learn.

The “Resource Google Folder” link directs to a Google Drive folder of resource documents in the Google Docs format. Not all downloadable documents for the resource may be available in this format. The Google Drive folder is set as “View Only” to save a copy of a document in this folder to your Google Drive, open that document, then select File → “Make a copy.” These documents can be copied, modified, and distributed online following the Terms of Use listed in the “Details” section below, including crediting BioInteractive.


“Holes in the Tree of Life”: Why and how phylogenetic data must be published

Ross Mounce @rmounce and Joseph W Brown have been tweeting about the lack of data to support published phylogenetic studies. (Readers of this blog will know that Ross and I start work in October to extract trees from published PDFs – an awful statement of how bad the situation is.)

Very simply, phylogenetic data is key to our understanding of the history, ecology and biodiversity of the planet. If we don’t understand species then we shall lose them, and if we don’t understand how species interact we shall lose ecosystems. Look into the details of pollination and often the loss of one species affects others directly. (Though Darwin was wrong about cats->->-> clover http://triscience.com/Species/Field/the-cats-to-clover-chain/doculite_view ).

Most peer-reviewed phylogenetics is in closed journals (40 USD for a 1 day read). It’s appallingly arrogant to assume that anyone who needs it (academics) can get the info. But worse, almost none of the data are published. Phylogenetic trees are mainly computed using molecular information (DNA of key genes) and are costly. Yet the data are relatively simple. They are well understood (30+ years of sequence / gene repositories) and they are compact (accession numbers are often fine). An uncompressed tree costs perhaps a few Kb and with indexing/compression a complete study could be published in ca 1 Mb. That’s less than the size of many single images!

Here’s what sparked the discussion. http://www.botanyconference.org/engine/search/index.php?func=detail&aid=167 I’ll give it in full, and argue that any reasonably literate person could understand it. I have highlighted some parts

Missing data lead to holes in the tree of life.

The fundamental importance of archiving scientific datasets has received increasing attention over the past several years, and failure to properly archive data can adversely affect study reproducibility. However, in plant systematics (or evolutionary biology) there has been no comprehensive review that examines the deposition practices of the underlying phylogenetic datasets and trees that are the foundation of the discipline. Furthermore, there is little understanding of how the deposition rate of DNA sequence alignments and phylogenetic trees has changed over time. In the process of gathering data to build the first tree of life for all

1.9 million named species (the Open Tree of Life Project), we sifted through over 7200 peer-reviewed phylogenetic studies published between the years 2000 and 2012. Our survey covered over 100 journals and included publications focusing on green plants, animals, fungi, microbial eukaryotes, bacteria, and archaea. This broad survey included 1243 seed plant publications. Overall, we found that only 17% of examined studies made nucleotide alignment data and/or trees available in an accessible repository such as TreeBASE or Dryad. Within seed plants, only 24% of studies from the past 12 years have been archived. Furthermore, most corresponding authors (54% for seed plants) that we contacted for un-deposited datasets and trees did not respond to our repeated (2) requests for data. Thus, most of the trees and alignments produced during the past several decades is essentially lost forever. The plant systematics community needs to significantly improve data deposition practices to ensure that crucial data (trees, alignments) are archived and thus freely available to other interested scientists. Our results illustrate that voluntary data submission policies have not worked, and dictate the urgent need to adopt new policies requiring public archiving of DNA sequence alignments and trees in a routine manner as is done routinely with raw sequence data. These stark findings should encourage the systematic community as well as journal editorials to adopt data sharing policies that require deposition of alignments and resulting phylogenetic trees in established databases prior to publication.

Very simply (this applies to many subjects) :

Many/most authors don’t care about making their science available to the world. The final result of their work is a “scholarly article”, not useful, reusable, verifiable science that can be built on, re-used by policy makers and citizens. The authors do not feel that being publicly funded gives them any obligations to the public. The ivory tower only rewards their work in the torrid market of scholarship, not the wider value to the world.

It has worked in some subjects – sequences/genes, crystal structures, galaxies. Here the disciplines have developed cultures where scientists are expected and then mandated to deposit data. The commonest ways are (a) on publishers’s websites (e.g. crystallography) and (b) in domain repositories (e.g. sequences).

Making phylogenetic data available for each study is technically straightforward. The bytecount is insignificant in today’s world. The standards and protocols (e.g. nexml) exist. The problem is 99% a people problem.

The problem is community. In some cases the learned societies are more concerned to generate income than to service science (Where are the publishers that actually make subscription material available to the world within – say – 6 months of publication?) Many are actually making it more difficult. The last 12 months have confirmed that most legacy publishers are part of the problem, not the solution.

So how, if publishers are antagonistic or indifferent to requiring publication data do we manage it? The Universities are totally vapid today – they have shown no leadership. So the only clear path is funders mandates.

And that will work. I’ve seen the pressure in the US that the NSF mandate on data management has applied and I think it’s starting to work. That’s got to happen everywhere. So my message to funders is:

Mandate the deposition of data at time of publication. And if not, chop 10% off the grant.

That works. It’s a lot of work, but it’s a trivial amount compared with the current loss of data (which I estimate as >> 100 Billion USD per year).


7 Conclusions and Outlook

Evolutionary relationships are complex, and a single bifurcating tree cannot fully capture the intricacy of the evolutionary process. However, from a pragmatic perspective, a tree does capture the major relationships among organisms. Biologists have been successful at resolving many of the important branches, and are improving in their abilities to make accurate inferences about recalcitrant relationships. Biological progress cannot proceed without at least a hypothesis of the underlying bifurcating relationships, against which to test and understand alternative processes. As Dobzhansky explained in 1973, “Nothing in biology makes sense except in the light of evolution.” 46 From an applied perspective that translates to the idea that nothing in biology makes sense except in the context of phylogenetics. Without that overall evolutionary perspective, and the phylogenetic inferences which describe the ancestry of organisms, biological research cannot proceed. Creating and openly sharing a unified tree of all life provides researchers access to evolutionary context across the diversity of life.


Summary

Existing packages of plotting trees with data only provide limited visualization methods and can only apply to predefined data types. Two methods introduced here have many unique features: integrating node/edge data to the tree that can be mapped to visual characteristics of the tree or other data sets ( supplementary fig. S1 , Supplementary Material online), no restriction of data types or how the data should be plotted in facet_plot ( supplementary table S1 , Supplementary Material online), modular design that separates tree visualization, data integration, and graph alignment. Modular design is a unique feature for ggtree to stand out from other packages. The tree can be fully annotated with multiple data sets attached by the %<+% operator and facet_plot can progressively align multiple panels to the tree ( supplementary fig. S6 , Supplementary Material online) or add multiple geometric layers to visualize one or more data sets on a single panel ( supplementary figs. S4 and S9 , Supplementary Material online). Only with this design, it is possible to plot a fully annotated tree with complex data panels. Besides, ggtree works with other tree objects defined in other R packages ( supplementary figs. S3, S8, and S9 , Supplementary Material online) and the methods introduced here broaden the applications of existing R packages by allowing external data integration. For example, ggtree extends phyloseq package to plot species abundance distribution with the tree ( fig. 1). Comparison with other R packages and a full list of unique features of ggtree can be found in Supplementary Material online.


Watch the video: Phylogenetic analysis for beginners using MEGA 11 software (August 2022).