The Tree of Life

Visit Blog Website

25 posts · 22,506 views

Blog of Jonathan A. Eisen, evolutionary biologist, microbial genomics researchers, and Open Access advocate, Professor at UC Davis and Academic Editor in Chief of PLoS Biology.

Jonathan Eisen
25 posts

Sort by: Latest Post, Most Popular

View by: Condensed, Full

  • August 9, 2010
  • 08:47 PM
  • 6,313 views

Lack of neutrality in bacteria and where pseudogenes go when they die

by Jonathan Eisen in The Tree of Life




Pseudogenes, which are in essence regions of the genome that used to be genes but no longer able to produce a functional unit, have long been considered to be models of the genetic equivalent of Switzerland's neutrality.  With this assumption of neutrality in hand, researchers have used studies of pseudogenes to better understand what happens to DNA when it is not visible to any form of natural selection.  That is, pseudogenes have been thought to be neither harmful (as in, they are not under negative selection) or helpful (i.e., they are not under positive selection).

And from this assumption we have supposedly learned about mutation rates and patterns (because if they are neutral then the changes in pseudogenes should be reflective of mutational processes, not selection) as well as all sorts of other features of genome evolution.
Over the years, some have challenged the assumption of neutrality of pseudogenes (e.g., see here) like many have questioned whether Switzerland is really neutral.  But overall, the feeling that pseudogenes were mostly neutral seems to have stuck.  However, that may change a bit with a new paper from Chih-Horng Chu and Howard Ochman in PLoS Genetics (PLoS Genetics: The Extinction Dynamics of Bacterial Pseudogenes).
In their paper they report: (this is their authors summary)Pseudogenes have traditionally been viewed as evolving in a strictly neutral manner. In bacteria, however, pseudogenes are deleted rapidly from genomes, suggesting that their presence is somehow deleterious. The distribution of pseudogenes among sequenced strains of Salmonella indicates that removal of many of these apparently functionless regions is attributable to their deleterious effects in cell fitness, suggesting that a sizeable fraction of pseudogenes are under selection.Basically, what they did was the following
1. Compare Salmonella genomes.  Identify putative pseudogenes and trace their evolution onto a phylogeny of the species.

 Figure 1. Distribution of pseudogenes among Salmonellagenomes.The phylogenetic tree was inferred from 2,898 single-copy genes shared by all fiveS. enterica subsp. enterica strains and the outgroup S. enterica subsp. arizonae.doi:10.1371/journal.pgen.1001050.g001

2. Carry out a variety of analyses of the pseudogenes such as looking at ratios of Ka/Ks (this is in essence a ratio of amino acid changes - aka non synonymous substitutions to "silent" synonymous changes which occur when the DNA sequence changes but the same amino acid is encoded).
examining the types and frequencies of gene inactivating mutations
3. Then they looked at the "ages" of pseudogenes - with age being estimated by the position in the tree in which the pseudogenes appear to have arise.  
4. Finally the examined the age class distribution of pseudogenes as well as whether there were other differences between pseudogenes of different ages.  And what they found was inconsistent with a neutral model.  Instead, what they conclude is that something is making it advantageous to delete pseudogenes more rapidly than one might expect.  
What explains this?  After testing multiple possibilities the authors conclude that their is some negative selection against pseudogenes (or I guess positive selection for deletion of pseudogenes).  
They conclude by suggesting this is likely to be pervasive across all bacteria and even in archaea.  And furthermore make a connection to possible selection on intron size in eukaryotes.  Anyway - the paper seems quite interesting and worth a read.  Still pondering what it all means, so I would welcome comments.
Kuo, C., & Ochman, H. (2010). The Extinction Dynamics of Bacterial Pseudogenes PLoS Genetics, 6 (8) DOI: 10.1371/journal.pgen.1001050
--------
This is from the "Tree of Life Blog"
of Jonathan Eisen, an evolutionary biologist and Open Access advocate
at the University of California, Davis. For short updates, follow me on Twitter.

--------


... Read more »

  • October 13, 2006
  • 12:00 AM
  • 1,123 views

World's Smallest Genome of a Cellular Organism?

by Jonathan Eisen in The Tree of Life

Discussion of Science paper on a very small genome... Read more »

Nakabachi, A., Yamashita, A., Toh, H., Ishikawa, H., Dunbar, H., Moran, N., & Hattori, M. (2006) The 160-Kilobase Genome of the Bacterial Endosymbiont Carsonella. Science, 314(5797), 267-267. DOI: 10.1126/science.1134196  

  • February 3, 2010
  • 10:53 AM
  • 967 views

Story behind the science: #PLoS Genetics "Evolutionary mirages" paper

by Jonathan Eisen in The Tree of Life

So there is this cool new paper out in PLoS Genetics: Evolutionary Mirages: Selection on Binding Site Composition Creates the Illusion of Conserved Grammars in Drosophila Enhancers. and I have wanted to write about it for a week or so. You see, the paper is about something I have been interested in for most of my career - how the particular processes by which mutations occur can sometimes be biased (i.e., some types of mutations are more common than others) and that these biases can create highly ordered patterns in genomes and in turn that observation of these ordered patters can sometimes be misinterpreted as being the result of adaptation. Mistaken claims of adaptation in genomics are a favorite topic of mine - and let me to create (with tongue in cheek) a new omics word - Adaptationomics.Anyway - so I really really like this paper. But there is a week bit of a problem in writing about it. You see, it is by my brother, Michael Eisen, a Prof. at UC Berkeley (and a student in his lab Richard Lusk). And, well, I don't want to say anything wrong or stupid about the paper since, well, my brother will be pissed off. And so I have not written about it yet. But then I realized the best way to write about this one is to simply ask my brother for the "Story behind the science" for the paper, as I have been doing for some other recent papers.If you want a summary of the paper, here it is in their own words:Authors summary: Because mutation is a random process, most biologists assume that apparently non-random features of genome sequences must be the result of natural selection acting to create and preserve them. Where this is true, genome sequences provide a powerful means to infer aspects of molecular, cellular, and organismal biology from the signatures of selection they have left behind. However, recent analyses have shown that many aspects of genome structure and organization that have traditionally been attributed to selection can often arise from random processes. Several groups—including ours—studying the sequences that specify when and where genes should be produced have identified common, seemingly conserved, architectural features, based on which we have proposed new models for the activity of the complex molecular machines that regulate gene expression. However, in the work described here we simulate the evolution of these regulatory sequences and show that many of the features that we and others have identified can arise as a byproduct of random mutational processes and selection for other properties. This calls into question many conclusions of comparative genome analysis, and more generally highlights what Michael Lynch has called the “frailty of adaptive hypotheses” for the origins of complex genomic structures.Conclusions: Lynch has eloquently argued that biologists are often too quick to assume that organismal and genomic complexity must arise from selection for complex structures and too slow to adopt non-adaptive hypotheses. Our results lend additional support to this view, and extend it to show that indirect and non-adaptive forces can not only produce structure, but also create an illusion that this structure is being conserved. We do not doubt that many aspects of transcriptional regulation constrain the location of transcription factor binding sites within enhancers. Indeed a large body of experimental evidence supports this notion, and we remain committed to identifying and characterizing these constraints. But if this process is to be fueled by comparative sequence analysis, as we believe it must be, it is essential that we give careful consideration to the neutral and indirect forces that we now know can produce evolutionary mirages of structure and function.I must say I love the title lead in "Evolutionary mirages" which is another but much better way of saying "Adaptationism is a bad thing". Anyway, before I get in any more trouble, here are some words about the paper from the Senior Author, Michael Eisen, my brother. Questions by me (I know, not very creative ones - but they will have to do):1. Why did you do this work?This paper started out as a control. My lab is interested in understanding how the enhancers that control gene expression work - focusing on those that control early development in Drosophila. In 2008, we published a paper showing that when we put enhancers from a distantly related family of flies into Drosophila melanogaster embryos, they drive patterns of expression that are identical to the endogenous D. melanogaster enhancers, even though they have almost no conservation of primary DNA sequence. But since they have the same function, they must have something in common - and so we compared the configurations of transcription factor binding sites in orthologous enhancers across different evolutionary timescales looking for something they shared.What we found is that binding sites in all of these enhancers occur in clusters. They are closer to each other than one would expect if they were scattered randomly in the ~1,000 bp of an enhancer. And, what's more, sites that were close to each other were far more likely to be conserved. Surely, we thought, this could be no accident. So we proposed that enhancers are organized into compact clusters of sites for one or more factors - and that these "mini modules" are the primary unit of enhancer function.But as we worked to extend these analyses to whole genomes, we sought a more rigorous, quantitative assessment, of just how improbably different levels of binding site clustering were. Like pretty much everyone in the field, we had used a null model in which binding sites were scattered randomly in an enhancer. But, I've been working with genomes long enough to know that nothing is ever truly random - and that all kinds of adaptive and non-adaptive processes create patterns in genome sequences that confound simple analyses. I wanted to come up with a null model for the distribution of sites within in an enhancer that was more realistic.To do this I turned to my graduate student Rich Lusk, a card-carrying population geneticist trained at the University of Chicago. Rich was proud of his status as one of the few members of the lab who didn't work on flies - but I convinced him to put aside the abstract models of binding site evolution in yeast and work on developing a real null model for our studies of enhancer evolution.The idea was to simulate enhancers evolving without any constraint on the organization of transcription factor binding sites they contain, and to see what happens. But this did not mean letting enhancers evolve neutrally - their extreme functional conservation demonstrates that they are under fairly strong constraint. Since it is pretty clear that these enhancers are responding to the same transcription factors in all of these species, Rich's simulations required that enhancers maintain their binding site composition - but placed no constraints on how the sites were organized relative to each other.And what we found was striking. Even with no explicit selection on binding site organization - these evolved enhancers had lots of structure! Binding sites were clustered together, and, the closer together sites were, the more conserved they were -- just like they were in real enhancers. In made us realize pretty quickly that the patterns we had latched onto - and which many other people were describing in different systems - might not be an evolutionary signature contraint on the organization of sites within in enhancers, but simply a byproduct of selection on binding site composition. If you want details, read the paper! But this has radically altered the way that we look at enhancer evolution.2. How did you come up with the title.Rich and I were writing the paper, and we had some really long, hideous, boring title. In writing the paper, the idea that things are not always what they appear to be was at the forefront of my mind. I was thinking about how desperate we and other people in the field were to figure out how enhancers work - it's a vexing problem that has defied decades of work - and how we all hoped that evolutionary analysis was going to rescue us - and how quickly and eagerly we latched on to the first signs of a signal - and how that was just like a mirage you see in the desert....3. Any interesting background? (see 1)4. When did the work start?About a year ago. We had been thinking about this for a while, but only when Rich focused on it did things get rolling.5. Why PLoS Genetics? Did PLoS Biology reject it?PLoS Genetics was our first choice. PG has become the premier journal for evolutionary genetics - it routinely publishes the most interesting and important work in the field, and everyone reads it. While every paper I've sent there has been heavily scrutinized, the editorial process has been fair (though sometimes agonizingly slow....), and each review has been thoughtful and many (including in this case) helped to vastly improve the paper.... Read more »

  • December 29, 2009
  • 02:03 PM
  • 885 views

Story Behind the Nature Paper on 'A phylogeny driven genomic encyclopedia of bacteria & archaea' #genomics #evolution

by Jonathan Eisen in The Tree of Life



Today is a fun day for me.  A paper on which I am the senior author is being published in Nature (yes, the Academic Editor in Chief of PLoS Biology is publishing a paper in Nature, more on that below ..).  This paper, entitled, "A phylogeny driven genomic encyclopedia of bacteria and archaea" represents a culmination of years of work by many people from multiple institutions.  Today in this blog I am going to do my best to tell the story behind the paper - about the people and the process and a little bit about the science.

First, a brief bit about the science in the paper. In this paper, we (mostly people at the Joint Genome Institute, where I have an Adjunct Appointment -- but also people in my lab at UC Davis and at the DSMZ culture collection) did a relatively simple thing - we started with the rRNA tree of life as a guide.  Then we identified branches in the bacterial and archaeal portions of this tree where there were no genome sequences available (or in progress) (this was done mostly by Phil Hugenholtz, Dongying Wu and Nikos Kyrpides)  Next we searched for representatives of these "unsequenced" branches in the DSMZ culture collection (a collection of bacteria and archaea that can be grown in the lab).  And we identified in total some 200 of these.  And then the DSMZ (under the direction of Hans-Peter Klenk) grew these organisms and sent the DNA to the Joint Genome Institute. And then JGI turned on their genome sequencing muscle and sequenced the genomes of the organisms in the DNA samples.  And finally, we spent a good deal of time then analyzing the data asking a pretty simple question - are there any general benefits that come from this "phylogeny driven" approach to sequencing genomes compared to what one might find with sequencing just any random genome (after all, any genome sequence could have some value)?  The paper, describes what we found, which is that there are in fact many benefits that come from sequencing genomes from branches in the tree for which genomes are not available.

More on the details of the science below.  But first, I want to note that this paper was truly an amazing team effort, with all sorts of people from the JGI in particular, going above and beyond the call of duty to make sure it happened and worked well.  And the Department of Energy has been truly phenomenal in my opinion in supporting this project which in the end is not explicitly about "energy" per se but is really about providing a reference set of genomes that should improve the value of all microbial genome data.

Anyway, now for the story behind the story.  And be prepared, because this is a bit long. But I think it is important to place this work in a bigger context both in terms of my background as well as some of the background of other people in the project.  If you can't wait for more on the GEBA project then perhaps you should go to some of these links:

Videos of talks I have given on the project:
"Genomic Encyclopedia of Bacteria and Archaea (GEBA)"- Jonathan ...
Recent talk I gave at the Sackler NAS "Microbes and Health" meeting
Podcast of interview of me for ASM's Meet the scientist
Stories about GEBA
Nature News from 11.17.2009
Stories about our paper
Nature News
GenomeWeb "GEBA Researchers Publish Results from Dozens of Bacterial, Archaeal Genomes"
Ars Technica article "Presenting a genomic encyclopedia of bacteria (and archaea" by John Timmer
Iddo Friedberg blogged about it
The OpenHelix Blog on it
Leonardo Martins blogs about it here and helps translate a Spanish story about the project
R&D magazine has a post based on the press releases here
NY Times story by Carl Zimmer here.

FriendFeed Discussions here (includes a thread about Nature using a Creative Commons license)
And I will post more links as they come up.  Below what I try to provide is some of the story behind the story:

My personal interest in applied uses of phylogenetics stage 1: undergraduate preparation at Harvard
As this paper is primarily about an applied use of phylogenetics (in selecting genomes for sequencing), I thought it would be worth going into some of how I personally became a bit obsessed with applied uses of phylogenetics. For me, my obsession began as an undergraduate at Harvard where I got exposed to the value of phylogeny as a tool from many many angles including but not limited to:
Freshman year taking a course taught by Stephen Jay Gould where Wayne and David Maddison were Teaching Assistant's and where they were demoing their new phylogenetics software called MacClade
Sophomore year taking a conservation biology class with Eric Fajer and Scott Melvin where I was exposed to the concept of "phylogenetic diversty" as a tool in assessing conservation plans
Junior year working in the lab of Fakhri Bazzaz with people like David Ackerly and Peter Wayne who made use of phylogeny as a key tool in their research projects
Senior year and the year after graduating where I worked in the lab of Colleen Cavanaugh using rRNA based phylogenetic analysis to characterize uncultured chemosynthetic symbionts. I note it was in Colleen's lab that I also became obsessed you could say with microbes and why they rock.
My personal interest in applied uses of phylogenetics stage 2: graduate school at Stanford
All of this and more gave me a strong passion for phylogeny as a tool.  And so when I went to graduate school at Stanford (originally to work with Ward Watt on butterflies, but then I switched to working in Phil Hanawalt's lab on the "Evolution of DNA repair genes, proteins and processes"). And while in that lab I become pretty much obsessed with three things, all related to phylogeny.

First, I was interested in whether the rRNA tree of life, which I had used in my studies in Colleen Cavanaugh's lab (and in my first paper in J. Bacteriology, which, thanks to ASM, is now in Pubmed Central and free at ASM's site too), was robust or, as some critics argued, was not that useful.  This was a critical question since the best way to study the phylogeny of microbes at the time, and also the best way to study uncultured microbes, was to leverage the ability to clone rRNA genes by PCR and then to build evolutionary trees of those rRNA genes.  As part of my graduate work, I did a study where I compared the phylogenetic trees of rRNA to trees of another gene from the same speci... Read more »

Wu, D., Hugenholtz, P., Mavromatis, K., Pukall, R., Dalin, E., Ivanova, N., Kunin, V., Goodwin, L., Wu, M., Tindall, B.... (2009) A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea. Nature, 462(7276), 1056-1060. DOI: 10.1038/nature08656  

  • August 21, 2010
  • 04:53 AM
  • 879 views

More (you know you wanted it) on fecal transplants and the microbiome

by Jonathan Eisen in The Tree of Life

Image fromI Heart Guts blogThere is an interesting mini review in the Journal of Clinical Gastroenterology's September issue that may be of interest to some out there. It is entitled "Fecal Bacteriotherapy, Fecal Transplant, and the Microbiome" by Martin Floch and well, the title is indicative of the article.Yes, the fecal transplant meme is here to stay. Sure, the cognoscenti already knew about fecal transplants. Perhaps they had read Tara Smith's discussion of it in her Aetiology blog in 2007. Perhaps they had pondered it when they read the article from my lab on intestinal transplants. Perhaps they had seen this discussion on MSNBC, or various other stories out there such as this or this post from Angry by Choice. Or, maybe you just learned about it from Bora's Carnival of Poop. But the meme on fecal transplants really spread and many may have first heard about fecal transplants from Carl Zimmer's New York Times article a month or so ago "How microbes defend and define us"In the article Zimmer discussed how Dr. Alexander Khoruts used a fecal transplant to treat a woman with a persistent and severe Clostridium infection. And Zimmer discusses how, thought such transplants had been done before, this was the first time that the microbial community was carefully surveyed before and after. (Note, my favorite part of the article is this part, where my friend Janet Jansson describes her reaction:Two weeks after the transplant, the scientists analyzed the microbes again. Her husband’s microbes had taken over. “That community was able to function and cure her disease in a matter of days,” said Janet Jansson, a microbial ecologist at Lawrence Berkeley National Laboratory and a co-author of the paper. “I didn’t expect it to work. The project blew me away.”Anyway Zimmer's article, as with many of his, garnered a lot of response and got many people discussing the poop on fecal transplants. Well, this issue of the Journal of Clinical Gastroenterology may now be the biggest pile of information about fecal transplants around. That is because, in addition to this little review mentioned above, there are in fact three articles in this issue relating to fecal transplant. Alas, most of you out there will probably only be able to read the review since the other articles are behind a pay wall. But the review is good. And I think this is not the last you will hear about this. (Though I note that, even though I think fecal transplants have some major potential, they seem to be being oversold a bit by many as some cure all -- fodder for a future "Overselling the Microbiome Award" I am sure). I will end with this line from the review which raises some other issues about fecal transplants:Probably one of the major problems is to define how this therapy can become socially accepted. (Can you imagine the Food & Drug Administration discussion?) Floch, M. (2010). Fecal Bacteriotherapy, Fecal Transplant, and the Microbiome Journal of Clinical Gastroenterology, 44 (8), 529-530 DOI: 10.1097/MCG.0b013e3181e1d6e2Grehan, M., Borody, T., Leis, S., Campbell, J., Mitchell, H., & Wettstein, A. (2010). Durable Alteration of the Colonic Microbiota by the Administration of Donor Fecal Flora Journal of Clinical Gastroenterology, 44 (8), 551-561 DOI: 10.1097/MCG.0b013e3181e5d06b... Read more »

  • December 29, 2009
  • 01:43 AM
  • 855 views

More coverage of the GEBA "Phylogeny Driven Genomic Encyclopedia"

by Jonathan Eisen in The Tree of Life

Additional discussion of recent paper... Read more »

Wu, D., Hugenholtz, P., Mavromatis, K., Pukall, R., Dalin, E., Ivanova, N., Kunin, V., Goodwin, L., Wu, M., Tindall, B.... (2009) A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea. Nature, 462(7276), 1056-1060. DOI: 10.1038/nature08656  

  • January 6, 2010
  • 12:32 PM
  • 846 views

#PLoSOne paper keywords revealing: (#Penis #Microbiome #Circumcision #HIV); press release misleading

by Jonathan Eisen in The Tree of Life

UPDATE - READ COMMENTS - LEAD AUTHOR HAS GOTTEN PRESS RELEASE CHANGED

A new paper just showed up on PLoS One and it has some serious potential to be important The paper (PLoS ONE: The Effects of Circumcision on the Penis Microbiome) reports on analyses that show differences in the microbiota (which they call the microbiome - basically what bacterial species were present) in men before and after circumcision. And they found some significant differences. It is a nice study of a relatively poorly examined subject - the bacteria found on the penis w/ and w/o circumcision. This is a particularly important topic in light of other studies that have shown that circumcision may provide some protection against HIV infection.

In summary here is what they did - take samples from men before and after circumcision. Isolate DNA.  Run PCR amplification reactions to amplify variable regions of rRNA genes from these samples. Then conduct 454 sequencing of these amplified products.  And then analyze the sequences to look at the types and #s of different kinds of bacteria.

What they found is basically summarized in their last paragraph

"This study is the first molecular assessment of the bacterial diversity in the male genital mucosa. The observed decrease in anaerobic bacteria after circumcision may be related to the elimination of anoxic microenvironments under the foreskin. Detection of these anaerobic genera in other human infectious and inflammatory pathologies suggests that they may mediate genital mucosal inflammation or co-infections in the uncircumcised state. Hence, the decrease in these anaerobic bacteria after circumcision may complement the loss of the foreskin inner mucosa to reduce the number of activated Langerhans cells near the genital mucosal surface and possibly the risk of HIV acquisition in circumcised men."

And this all sounds interesting and the work seems solid.  I note that some friends / colleagues of mine were involved in this including Jacques Ravel who used to be at TIGR and now is at U MD and Paul Kiem who is associated with TGen in Arizona.  For anyone interested in HIV, the human microbiome, circumcision, etc, it is probably worth looking at.

However, the press release I just saw from TGen really ticked me off. The title alone did me in "Study suggests why circumcised men are less likely to become infected with HIV".  Sure the study did suggest a possible explanation for why circumcised men are less likely to get HIV infections - the paper was justifiably VERY cautious about this inference.  They basically state that there are some correlations worth following up.

The press release goes on to say "The study ... could lead to new non-surgical HIV preventative strategies for the estimated 70 percent of men worldwide (more than 2 billion) who, because of religious or cultural beliefs, or logistic or financial barriers, are not likely to become circumcised."  Well sure, I guess you could say that.  I think they are iplying you could change the microbiome somehow and therefore protect from HIV but that implies (1) that there really is a causal relationship between the microbial differences in HIV protection and (2) that one could change the microbiome easily, which is a big big stretch given how little we know right now.

Anyway - the science seems fine and not over-reaching.  But the press release is annoying and misleading. Shocking I know.  But this one got to me.

UPDATE - SEE COMMENTS HERE AND IN FRIENDFEED. LEAD AUTHOR GOT PRESS RELEASE CHANGED.

Price, L., Liu, C., Johnson, K., Aziz, M., Lau, M., Bowers, J., Ravel, J., Keim, P., Serwadda, D., Wawer, M., & Gray, R. (2010). The Effects of Circumcision on the Penis Microbiome PLoS ONE, 5 (1) DOI: 10.1371/journal.pone.0008422

--------

This is from the "Tree of Life Blog"

of Jonathan Eisen, an evolutionary biologist and Open Access advocate

at the University of California, Davis. For short updates, follow me on Twitter.

--------... Read more »

Price, L., Liu, C., Johnson, K., Aziz, M., Lau, M., Bowers, J., Ravel, J., Keim, P., Serwadda, D., Wawer, M.... (2010) The Effects of Circumcision on the Penis Microbiome. PLoS ONE, 5(1). DOI: 10.1371/journal.pone.0008422  

  • November 15, 2010
  • 07:52 AM
  • 794 views

One of my new favorite things: paleovirology

by Jonathan Eisen in The Tree of Life

Just a quick post here about a paper that came out about a month or so ago: PLoS Biology: Genomic Fossils Calibrate the Long-Term Evolution of Hepadnaviruses

This paper, by Clément Gilbert, Cédric Feschotte is quite cool.  In it they describe their work on "Paleovirology" where they look for viruses than have "endogenized" by inserting into the genome of some host species.  This endogenization is important in particular when the endogenous form becomes inactive and thus, in essence, trapped in the genome.  This in turn is important because many viruses evolve so rapidly when they are "free" that it is very hard to reconstruct their ancient history through comparative analysis.  But the endogenized viruses serve in essence as a molecular "fossil record" that aids in the comparison and phylogenetic analysis of various families of viruses.  As we get more and more genomes, this searching for and analysis of endogenous viruses will get much better.

Anyway, in the paper they report on endogenous viruses in the Zebra Finch genome that are in the Hepadnaviridae family.  Here is their summary:

Paleovirology is the study of ancient viruses and the way they have shaped the innate immune system of their hosts over millions of years. One way to reconstruct the deep evolution of viruses is to search for viral sequences “fossilized” at different evolutionary time points in the genome of their hosts. Besides retroviruses, few virus families are known to have deposited molecular relics in their host's genomes. Here we report on the discovery of multiple fragments of viruses belonging to the Hepadnaviridae family (which includes the human hepatitis B viruses) fossilized in the genome of the zebra finch. We show that some of these fragments infiltrated the germline genome of passerine birds more than 19 million years ago, which implies that hepadnaviruses are much older than previously thought. Based on this age, we can infer a long-term avian hepadnavirus substitution rate, which is a 1,000-fold slower than all short-term substitution rates calculated based on extant hepadnavirus sequences. These results call for a reevaluation of the long-term evolution of Hepadnaviridae, and indicate that some exogenous hepadnaviruses may still be circulating today in various passerine birds.


Figure 4. Summary of the evolutionary scenario inferred in this study.
It is an interesting paper and worth a look if for those who have any interest in viral evolution. And I am becoming more and more fascinated by "Paleovirology" these days so I thought I would just post about this article here.  And I guess I am not alone in this opinion that the article is interesting (though I am late).  Here is some coverage of their paper:

Ancient Virus Found Hiding Out in Finch Genome
Ancient viruses lurk in songbird's DNA
Ancient "Fossil" Virus Shows Infection to Be Millions of Years Old
Fossil virus leaves evolutionary footprints in songbird DNA
It's all in the genes: Songbirds have fossil viruses in their DNA ...
Ancient Bird Virus « Life « Science Today: Beyond the Headlines

Gilbert, C., & Feschotte, C. (2010). Genomic Fossils Calibrate the Long-Term Evolution of Hepadnaviruses PLoS Biology, 8 (9) DOI: 10.1371/journal.pbio.1000495
--------
This is from the "Tree of Life Blog"
of Jonathan Eisen, an evolutionary biologist and Open Access advocate
at the University of California, Davis. For short updates, follow me on Twitter.

--------




... Read more »

  • October 12, 2010
  • 10:00 PM
  • 758 views

Figuring out figures in scientific papers: new search / ranking method outline in PLoS One paper

by Jonathan Eisen in The Tree of Life

Just a quick post here.  A colleague just sent me a link to her fascinating new paper in PLoS One: PLoS ONE: Automatic Figure Ranking and User Interfacing for Intelligent Figure Search

In this paper Hong Yu from the University of Wisconsin in Milwaukee describes a system for better automated characterization of figures from scientific papers.  The system is available through their webserver "Ask Hermes".

If you want to learn more about the system I suggest you read the paper.  Or watch their video.



Basically the general idea is summarized in their background section of the abstract:
Figures are important experimental results that are typically reported in full-text bioscience articles. Bioscience researchers need to access figures to validate research facts and to formulate or to test novel research hypotheses. On the other hand, the sheer volume of bioscience literature has made it difficult to access figures. Therefore, we are developing an intelligent figure search engine (http://figuresearch.askhermes.org). Existing research in figure search treats each figure equally, but we introduce a novel concept of “figure ranking”: figures appearing in a full-text biomedical article can be ranked by their contribution to the knowledge discovery.I particularly like that they also allow searching just for open access figures, which may be of significant value to people who want to do things like make a slide presentation with no copyrighted/protected material in it.  For example see the results of a search for open access figures using the keyword phylogenomics.

Anyway - definitely worth checking this out.

Yu, H., Liu, F., & Ramesh, B. (2010). Automatic Figure Ranking and User Interfacing for Intelligent Figure Search PLoS ONE, 5 (10) DOI: 10.1371/journal.pone.0012983
--------
This is from the "Tree of Life Blog"
of Jonathan Eisen, an evolutionary biologist and Open Access advocate
at the University of California, Davis. For short updates, follow me on Twitter.

--------




... Read more »

  • January 24, 2011
  • 03:36 AM
  • 716 views

Phylogeny rules:

by Jonathan Eisen in The Tree of Life


I am a coauthor on a new paper in PLoS Computational Biology I thought I would promote here.  The full citation for the paper is:

PhylOTU: A High-Throughput Procedure Quantifies Microbial Community Diversity and Resolves Novel Taxa from Metagenomic Data (doi:10.1371/journal.pcbi.1001061). 
The paper discusses a new software program "phylOTU" which is for phylogenetic-based identification of "operational taxonomic units", which are also known as OTUs.   What are OTUs?  They are basically clusters of closely related sequences that are used to represent something akin to a species.  OTUs are used a lot in environmental microbiology b/c one key way to study microbes in the environment is through extraction and sequencing of DNA.  Traditionally this has been done through PCR amplification and sequencing of one particular gene (ss-rRNA).  Now it is also being done through random sequencing of all DNA from environmental samples (so called metagenomics).

Anyway - the paper is (of course) fully open access and you can read it for more detail.  Just thought I would post a little here about it.  The paper / project was led by Tom Sharpton, a post doc in Katie Pollard's lab at UCSF working on a collaborative project between Katie's lab, my lab, and Jessica Green's lab at U. Oregon (and recently Martin Wu's new lab at U. Virginia - he was in my lab previously).  This collaborative project even has a name "iSEEM" which stands for integrating statistical, evolutionary and ecological approaches to metagenomics.  This project has been generously supported by the Gordon and Betty Moore Foundation (via a grant for which I am PI).


Some little tidbits of possible interest about the project

I really wanted to program to be called POTUS, but I guess I lost out ...
You can get the code here: https://github.com/sharpton/PhylOTU 
You can also get code/data here: http://www.biotorrents.net


Sharpton, T., Riesenfeld, S., Kembel, S., Ladau, J., O'Dwyer, J., Green, J., Eisen, J., & Pollard, K. (2011). PhylOTU: A High-Throughput Procedure Quantifies Microbial Community Diversity and Resolves Novel Taxa from Metagenomic Data PLoS Computational Biology, 7 (1) DOI: 10.1371/journal.pcbi.1001061
--------
This is from the "Tree of Life Blog"
of Jonathan Eisen, an evolutionary biologist and Open Access advocate
at the University of California, Davis. For short updates, follow me on Twitter.

--------




... Read more »

  • January 26, 2010
  • 06:29 PM
  • 714 views

Wanted:Feedback on Importance of Finishing (Microbial) Genomes

by Jonathan Eisen in The Tree of Life

To allI am writing because I am working on a project to evaluate the importance of finishing microbial genomes. I know there has been lots of talk about this out there on the web and in papers, etc but I think a fresh discussion is useful. To get people up to speed below is a summary of the issue as I see it.Shotgun sequencing: Genome sequencing relies generally on the shotgun method at the beginning of a project where DNA fragments from an organism of interest are sequenced in a highly random manner.Assembly: After shotgun sequencing, the genome is assembled as best as possible into larger pieces (called contigs) and ordered sets of contigs (called scaffolds). All of this put together can be called an "assembly"Gaps: After the assembly phase, there are almost always gaps in the assembly. These generally come in two forms:sequencing gaps (where we know two contigs go together in some orientation but where we do not know the sequence of the DNA in between the contigs) physical gaps (where we have sets of scaffolds but do not know how the connect to each other). Quality: After the assembly phase, different components of the assembly can have different "qualities" where from example, some sections are somewhat ambiguous and others are highly reliableFinishing: Using any combination of laboratory, computational and other analyses one can both fill in gaps in the assembly and improve the quality of the assembly. This can generally be called "finishing"Quality of final product: Depending on the end quality of the assembly we could assign it to one of a few categories of "completeness" as outlined in a paper by Patrick Chain et al. In essence, you can consider the post to be a follow up to their paper and their work.We plan to try to measure what one gains by the finishing steps. We need to know this because we would like to make intelligent decisions about how to allocate resources. If one gains a lot from finishing then it would make sense to allocate significant resources to it. I note, I and some colleagues wrote a paper about this issue "The value of complete microbial genome sequencing (You get what you pay for)" that was published in 2002. This is without a doubt not the only discussion of the topic but I just wanted to point out I have been involved in this debate before. Despite that, I think we simply do not know right now what the benefits might be in the new sequencing landscape.------------------------------------------So the question I am asking here is:What do people think are the potential benefits that could come from finishing?------------------------------------------Here are some possible answers to get the discussion going:Gene discovery (e.g., there may be interesting/important genes in missing/low quality data)Esthetics of completeness (as in, it just feels better to have a finished genome)Improved analysis of genome organization (in particular from having contigs oriented correctly)Also - I note there has been some discussion of this for animals, plants etc (e.g., see recent paper by Eric Green and others on vertebrates) Many of the issues are similar but they are different enough that I think a microbe focused discussion is useful.Other links of interest:Discussion on Friendfeed to question from Michael BartonLANL finishing in the future meetingScivee talks from 2009 LANL meetingBlakesley, R., Hansen, N., Gupta, J., McDowell, J., Maskeri, B., Barnabas, B., Brooks, S., Coleman, H., Haghighi, P., Ho, S., Schandler, K., Stantripop, S., Vogt, J., Thomas, P., Comparative Sequencing Program, N., Bouffard, G., & Green, E. (2010). Effort required to finish shotgun-generated genome sequences differs significantly among vertebrates BMC Genomics, 11 (1) DOI: 10.1186/1471-2164-11-21Fraser, C., Eisen, J., Nelson, K., Paulsen, I., & Salzberg, S. (2002). The Value of Complete Microbial Genome Sequencing (You Get What You Pay For) Journal of Bacteriology, 184 (23), 6403-6405 DOI: 10.1128/JB.184.23.6403-6405.2002Chain, P., & et al. (2009). Genome Project Standards in a New Era of Sequencing Science, 326 (5950), 236-237 DOI: 10.1126/science.1180614Friendfeed discussion of this post:
--------
This is from the "Tree of Life Blog"
of Jonathan Eisen, an evolutionary biologist and Open Access advocate
at the University of California, Davis. For short updates, follow me on Twitter.

--------



... Read more »

Blakesley, R., Hansen, N., Gupta, J., McDowell, J., Maskeri, B., Barnabas, B., Brooks, S., Coleman, H., Haghighi, P., Ho, S.... (2010) Effort required to finish shotgun-generated genome sequences differs significantly among vertebrates. BMC Genomics, 11(1), 21. DOI: 10.1186/1471-2164-11-21  

Fraser, C., Eisen, J., Nelson, K., Paulsen, I., & Salzberg, S. (2002) The Value of Complete Microbial Genome Sequencing (You Get What You Pay For). Journal of Bacteriology, 184(23), 6403-6405. DOI: 10.1128/JB.184.23.6403-6405.2002  

  • November 30, 1999
  • 12:00 AM
  • 690 views

Most important paper ever in microbiology?

by Jonathan Eisen in The Tree of Life

Discussion of papers reporting discovery of the archaea... Read more »

Woese CR, & Fox GE. (1977) Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proceedings of the National Academy of Sciences of the United States of America, 74(11), 5088-90. PMID: 270744  

Fox GE, Magrum LJ, Balch WE, Wolfe RS, & Woese CR. (1977) Classification of methanogenic bacteria by 16S ribosomal RNA characterization. Proceedings of the National Academy of Sciences of the United States of America, 74(10), 4537-4541. PMID: 16592452  

Balch WE, Magrum LJ, Fox GE, Wolfe RS, & Woese CR. (1977) An ancient divergence among the bacteria. Journal of molecular evolution, 9(4), 305-11. PMID: 408502  

  • December 24, 2009
  • 08:04 AM
  • 682 views

Story Behind the Nature Paper on 'A phylogeny driven genomic encyclopedia of bacteria & archaea' #genomics #evolution

by Jonathan Eisen in The Tree of Life

Discussion of the background to a recent Nature paper ... Read more »

Wu, D., Hugenholtz, P., Mavromatis, K., Pukall, R., Dalin, E., Ivanova, N., Kunin, V., Goodwin, L., Wu, M., Tindall, B.... (2009) A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea. Nature, 462(7276), 1056-1060. DOI: 10.1038/nature08656  

  • January 26, 2010
  • 12:32 PM
  • 682 views

Cool paper, & winner of "worst new omics word award": Predatosome

by Jonathan Eisen in The Tree of Life

And the bad new omics words keep streaming in. Today's winner of the "Worst New Omics Word Award" is going to Carey Lambert, Chien-Yi Chang, Michael J. Capeness and R. Elizabeth Sockett from Nottingham for their use/ invention of "Predatosome". They use this term in the title of their new PLoS One paper: The First Bite— Profiling the Predatosome in the Bacterial Pathogen Bdellovibrio. Here is the very long sentence where the define it:The gene products required for the initial invasive predatory processes have not been extensively studied but the genome sequencing of B. bacteriovorus HD100 [1] revealed a genome of 3.85Mb, including a core genome similar to that of non-predatory bacteria and some 40% of the genome comprising a potential predicted “predatosome” of genes, encoding both hydrolytic products that may be employed in prey degradation, and genes that may be required specifically for host predation and thus are not conserved across the Proteobacteria.The paper is actually quite interesting. They use genomic approaches to characterize a fascinating organism - the bacterial species Bdellovibrio bacteriovorus. This bug is a predatory organism - eating other bacteria. Since it eats them from the inside out, some, including these authors, refer to this organism as a pathogen of other bacteria and their is some discussion here and elsewhere for its potential to serve as a "living antibiotic" in much the same way people are trying to use bacterial viruses (a.k.a. phage).The paper overall is quite nice on first read. They used microarray studies to characterize gene expression patterns in different phases of the life cycle (see Figure above for the life cycle outline). They backed up this work by quantitative PCR studies and regular RT PCR. And based upon their analysis they found some genes that are "Up-Regulated in Predatory, but Not HI" phase (HI stands for host-independent). And here is where they really tell us what they mean by predatosome:This category of 240 genes are very interesting as they potentially exclude those genes simply involved with release from attack-phase into growth, namely they should be part of the “predatosome” of predatorily specific genes.It seems to me this terminology is completely unnecessary. All they need to do is say they are studying the genes related to the predatory phase. To assign these genes to the "predatosome" is a bit much. They continue in the paper to report some really interesting stuff. For example, they also examine another predatory bacterial species, and look at whether there are genes conserved in the process between species. They made some really nice figures by the way about the different phases of hte life cycle in this organism and which genes are expressed:Anyway - the science in the paper is nice. However, the invention of yet another omics word is a bit much. And thus Lambert et al. are winners of the highly coveted "Worst New Omics Word Award" for their invention of "predatosome". Details on the paper are below - and that is where the figures come from too. (Hat tip to Bora for letting me know about the paper, and the word).Lambert, C., Chang, C., Capeness, M., & Sockett, R. (2010). The First Bite— Profiling the Predatosome in the Bacterial Pathogen Bdellovibrio PLoS ONE, 5 (1) DOI: 10.1371/journal.pone.0008599Friendfeed comments below:
--------
This is from the "Tree of Life Blog"
of Jonathan Eisen, an evolutionary biologist and Open Access advocate
at the University of California, Davis. For short updates, follow me on Twitter.

--------




... Read more »

  • February 4, 2011
  • 05:04 PM
  • 622 views

IQ Test for bacteria

by Jonathan Eisen in The Tree of Life





Social IQ of bacteria
Another quick one here.  Interesting paper out in BMC Genomics: Genome sequence of the pattern forming Paenibacillus vortex bacterium reveals potential for thriving in complex environments

The paper is from Eshel-Ben Jacob and colleagues from many institutions around the world.

Here is a summary of the article (from the paper)

BackgroundThe pattern-forming bacterium Paenibacillus vortex is notable for its advanced social behavior, which is reflected in development of colonies with highly intricate architectures. Prior to this study, only two other Paenibacillus species (Paenibacillus sp. JDR-2 and Paenibacillus larvae) have been sequenced. However, no genomic data is available on the Paenibacillus species with pattern-forming and complex social motility. Here we report the de novo genome sequence of this Gram-positive, soil-dwelling, sporulating bacterium.ResultsThe complete P. vortex genome was sequenced by a hybrid approach using 454 Life Sciences and Illumina, achieving a total of 289× coverage, with 99.8% sequence identity between the two methods. The sequencing results were validated using a custom designed Agilent microarray expression chip which represented the coding and the non-coding regions. Analysis of the P. vortex genome revealed 6,437 open reading frames (ORFs) and 73 non-coding RNA genes. Comparative genomic analysis with 500 complete bacterial genomes revealed exceptionally high number of two-component system (TCS) genes, transcription factors (TFs), transport and defense related genes. Additionally, we have identified genes involved in the production of antimicrobial compounds and extracellular degrading enzymes.ConclusionsThese findings suggest that P. vortex has advanced faculties to perceive and react to a wide range of signaling molecules and environmental conditions, which could be associated with its ability to reconfigure and replicate complex colony architectures. Additionally, P. vortex is likely to serve as a rich source of genes important for agricultural, medical and industrial applications and it has the potential to advance the study of social microbiology within Gram-positive bacteria.
The organism is certainly interesting.  See http://en.wikipedia.org/wiki/Paenibacillus_vortex for more detail (Eshel-Ben Jacob told me he updated the site).

But perhaps more interesting is the concept that Eshel-Ben Jacob has been pushing on bacterial social intelligence.  See for more detail:
The Genius of Bacteria
Realizing Social Intelligence in Bacteria
http://ctbp.ucsd.edu/pubs/pdf/462.pdf
Bacteria to be tested for 'social intelligence'? « Anguished Repose
The main idea behind this is to look at social communication strategies as a measure of intelligence.  And from a genomics point of view one can measure the genetic diversity of genes likely involved in these processes.  Such counting of genes is not the most useful thing in the world but more important, these organisms really have some fascinating behaviors and in the end we should measure behavior diversity not genomic diversity of putative social genes to measure "bacterial IQ". 
Sirota-Madi, A., Olender, T., Helman, Y., Ingham, C., Brainis, I., Roth, D., Hagi, E., Brodsky, L., Leshkowitz, D., Galatenko, V., Nikolaev, V., Mugasimangalam, R., Bransburg-Zabary, S., Gutnick, D., Lancet, D., & Ben-Jacob, E. (2010). Genome sequence of the pattern forming Paenibacillus vortex bacterium reveals potential for thriving in complex environments BMC Genomics, 11 (1) DOI: 10.1186/1471-2164-11-710
--------
This is from the "Tree of Life Blog"
of Jonathan Eisen, an evolutionary biologist and Open Access advocate
at the University of California, Davis. For short updates, follow me on Twitter.

--------




... Read more »

  • January 20, 2010
  • 06:57 AM
  • 594 views

Confronting Intelligent Design arguments directly in the scientific literature

by Jonathan Eisen in The Tree of Life

... Read more »

  • December 22, 2009
  • 12:00 AM
  • 570 views

Story behind the story for new #PLoSOne paper on Bayesian phylogenetics

by Jonathan Eisen in The Tree of Life

There is an interesting new paper in PLoS One" Long-Branch Attraction Bias and Inconsistency in Bayesian Phylogenetics" by Brian Kolaczkowski and Joseph Thornton. The work focuses on methods for inferring phylogenetic history and in particular two types of statistical approaches: Likelihood and Bayesian. These methods are related to each other in that both attempt to use statistical models of evolution and then test different possible phylogenetic trees related taxa by how well certain data sets about those taxa map into the different possible trees. What they did in this new paper was test, with some simulations, and with some mathematical analyses. And somewhat surprisingly, they find that Bayesian methods, which have become more popular recently, appear to be more prone to errors than likelihood methods, when the data sets have multiple not closely related taxa with long branches. (Note if you want to learn more about phylogenetic methods, you can look at the online chapter (html format or PDF) from my Evolution Textbook, though I confess this needs a bit of revision, which I am working on now).... Read more »

  • October 29, 2009
  • 01:24 PM
  • 562 views

More on the PLoS Special Collection on the Genomic of Emerging Infectious Diseases

by Jonathan Eisen in The Tree of Life

Discussion of new PLoS Series on Genomics of Emerging Infectious Diseases... Read more »

  • May 3, 2010
  • 03:33 PM
  • 561 views

Holy lateral transfer batman; amazing story on fungal to aphid transfer from Nancy Moran

by Jonathan Eisen in The Tree of Life

As many know, I generally do not write a lot about papers in non open access journal because I like readers to be able to access all the papers which I write about. But this is one of the exceptions to my normal rule. An amazing paper was published a few days ago in Science by Nancy Moran and Tyler Jarvik. Lateral Transfer of Genes from Fungi Underlies Carotenoid Production in Aphids -- Moran and Jarvik 328 (5978): 624 -- ScienceI first found out about this from Ed Yong's blog post here (just a note - his Not Exactly Rocket Science is such a frigging incredible blog). He really does the whole story on this so I am just posting a bit here.Anyway Moran and Jarkiv paper focuses on genes in the aphid genome that encode enzymes for carotenoid synthesis. These enzymes are involved in red and/or green coloring seen in the pea aphids. Recently the pea aphid genome was sequenced (a paper about this was published in PLoS Biology ) and it was analysis of the genome data that helped lead Moran and Jarvik to the study reported in the recent issue of Science.In their study they report a detailed evolutionary and phylogenetic analysis of the carotenoid synthesis genes found in the aphid genome and show quite convincingly that these genes do not appear to be of "normal" descent. That is, they seem to have an ancestry separate from many of the "normal" animal genes in the genome. Instead, these genes are related to genes from fungi. In fact, these genes are embedded in an evolutionary sense, in a group of genes which are all from fungi and thus Moran and Jarvik conclude the most likely explanation is that some time in relatively recent pea aphid evolutionary history, these genes were acquired from some fungus. About to have some eye drops put in my eyes so gotta go for now, but just wanted to get something out there about this fascinating work. For more on this story - there is lots out there, such as the following:Aphid's Color Comes From a Fungus GeneAphids Pilfered Red Genes from FungusPea Aphids Create Their Own Coloring, Study RevealsInsect stole fungus gene to make plant pigmentAnimals Can Get Genes From Other Species: ResearchAphids evolved special, surprising talents1 st pigment-making animal foundAphids make their own bright colorsMoran, N., & Jarvik, T. (2010). Lateral Transfer of Genes from Fungi Underlies Carotenoid Production in Aphids Science, 328 (5978), 624-627 DOI: 10.1126/science.1187113. (2010). Genome Sequence of the Pea Aphid Acyrthosiphon pisum PLoS Biology, 8 (2) DOI: 10.1371/journal.pbio.1000313
--------
This is from the "Tree of Life Blog"
of Jonathan Eisen, an evolutionary biologist and Open Access advocate
at the University of California, Davis. For short updates, follow me on Twitter.

--------


... Read more »

  • March 18, 2011
  • 05:48 PM
  • 514 views

The story behind the story of my new #PLoSOne paper on "Stalking the fourth domain of life" #metagenomics #fb

by Jonathan Eisen in The Tree of Life

Well, here goes.

This is a post about a paper that has been a long long time coming.  Today, a paper of mine is being published in PLoS One.  The paper is titled "Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, and Interpreting Novel, Deep Branches in Marker Gene Phylogenetic Trees" and is available at http://dx.plos.org/10.1371/journal.pone.0018011.  (or if that link does not work you can get a copy here).   This paper represents something I started a long time ago and I am going to try to describe the story behind the paper here. 
I note - we are not doing a press release for the paper, for a few reasons.  But one of them is that, well, I am starting to hate press releases.  So I guess this is kind of my press release.  But this will be a bit longer than most press releases.  I note - my key fear here is that somehow in my communications with the press or in our text in the paper or in this post I will overstate our findings.  Here is the punchline - we found some very phylogenetically novel forms of phylogenetic marker genes in metagenomic data.  We do not have a conclusive explanation for the origin of these sequences.  They may be from novel viruses.  The may be ancient paralogs of the marker genes.  Or they may be from a new branch of cellular organisms in the tree of life, distinct from bacteria, archaea or eukaryotes.  I think most likely they are from novel viruses.  But we just don't know. 
First - a summary of what we did.  
In the paper,  we searched through metagenomic data (sequences from environmental samples) for phylogenetically novel sequences for three standard phylogenetic marker genes (ss-rRNA, recA, rpoB).  We focused on sequences from the Venter Global Ocean Sampling data set because, well, we started this analysis many years ago when that was the best data set available (more on this below). What we were looking for were evolutionary lineages of these genes that were separate from the branches that corresponded to the three known "Domains" of life (bacteria, archaea and eukaryotes).  

To search for such novel lineages in the metagenomic data, we built evolutionary trees using these genes where we included sequences from known organisms (and viruses) as well as sequences from metagenomic data. We then looked through the trees for groups that were both phylogenetically novel and included only environmental data (i.e., they were new compared to known organisms or viruses).  This method did not work very well for rRNA sequences (largely because making high quality alignments of short phylogenetically novel rRNA sequences was difficult - more on this below).  But with RecA and RpoB homologs we were able to generate what we believe to be robust phylogenetic trees.  And in these trees we found evidence for phylogenetically very novel sequences in environmental data.



We then propose and discuss four potential mechanisms that could lead to the existence of such evolutionarily novel sequences.  The two we consider most likely are the following(1) The sequences could be from novel viruses(2) The sequences could be from a fourth major branch on the tree of life
Unfortunately, we do not actually know what is the source of these sequences.  So we cannot determine which of the theories is correct.  Obviously if there is a novel lineages of cellular organisms out there, well, that would be cool.  But we have no evidence right now if that is what is going on.  Personally, I think it is most likely that these novel sequences are from weird viruses.  But as far as we can tell, they truly could be from a fourth major branch of cellular organisms and thus even though we did not have the story completely pinned down, we decided to finally write up the paper to get other people to think about this issue.


Below I give all sorts of other details about the project in the following areas
The history of the project
More detail on what is in the ... Read more »

Dongying Wu, Martin Wu, Aaron Halpern, Douglas B. Rusch, Shibu Yooseph, Marvin Frazier,, & J. Craig Venter, Jonathan A. Eisen. (2011) Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, and Interpreting Novel, Deep Branches in Marker Gene Phylogenetic Trees. PLoS One, 6(3). info:/10.1371/journal.pone.0018011

join us!

Do you write about peer-reviewed research in your blog? Use ResearchBlogging.org to make it easy for your readers — and others from around the world — to find your serious posts about academic research.

If you don't have a blog, you can still use our site to learn about fascinating developments in cutting-edge research from around the world.

Register Now

Research Blogging is powered by SMG Technology.

To learn more, visit seedmediagroup.com.