18 posts · 21,642 views
Sort by Latest Post, Most Popular
View by Condensed, Full
A few months ago, Jennifer did a nice tip on on NCBI’s Genome Resources and the changes there. There she briefly mentioned Genome Project resource moving to a new home, BioProject, just about a year ago. Today, I’d like to give you a quick overview of BioProject. It was described in this year’s issue of [...]... Read more »
Barrett, T., Clark, K., Gevorgyan, R., Gorelenkov, V., Gribov, E., Karsch-Mizrachi, I., Kimelman, M., Pruitt, K., Resenchuk, S., Tatusova, T.... (2011) BioProject and BioSample databases at NCBI: facilitating capture and organization of metadata. Nucleic Acids Research, 40(D1). DOI: 10.1093/nar/gkr1163
The database and search tool I will focus on in this tip of the week is Mapper. Mapper uses TFBS from Transfac and Jasper and maps them to genomic locations for several species. Using “the search power of profile hidden Markov models (HMMs),” Mapper includes a database of pre-computed TFBS locations and an on-the-fly search engine for TBFS. Additionally, there is rSNPs, a nice handy tool designed to identify SNPs which have a significant effect on the score of a TFBS.... Read more »
Marinescu, V., Kohane, I., & Riva, A. (2005) MAPPER: a search engine for the computational identification of putative transcription factor binding sites in multiple genomes . BMC Bioinformatics, 6(1), 79. DOI: 10.1186/1471-2105-6-79
eGIFT, as the tag line says, is a tool to extract gene information from text. It’s a tool that allows you to search for and explore terms and documents related to a gene or set of genes. There are many ways to search and explore eGIFT, find genes given a specific term, find terms related to a set of genes and more. How does the tool do this?... Read more »
Tudor, C., Schmidt, C., & Vijay-Shanker, K. (2010) eGIFT: Mining Gene Information from the Literature. BMC Bioinformatics, 11(1), 418. DOI: 10.1186/1471-2105-11-418
Well, more than a tip, a lecture. We haven’t done a tip today, we are in grant application process (time limiting) and this is an excellent video we’d like for more to see. Mary posted the first lecture, The Genomic Landscape circa 2012, in a series given at NIH. As the course description mentions, “The lectures [...]... Read more »
Green, E., Guyer, M., Green, E., Guyer, M., Manolio, T., & Peterson, J. (2011) Charting a course for genomic medicine from base pairs to bedside. Nature, 470(7333), 204-213. DOI: 10.1038/nature09764
Who can resist a nice cup of eggnog for the holidays (especially with added brandy). I know I can’t. I make my grandpa’s recipe every December and, considering it uses tons of sugar, eggs, heavy cream and alcohol and that 1/2 & 1/2 is the lightest ingredient, only December.
Oh, that’s not what this tip is about, it’s about database of orthologous groups of genes, eggNOG. We’ve mentioned eggNOG before several times, but only in passing or in relation (orthologous? :D ) to another database or tool. Today, in perfect timing for the season, thought I’d do a quick tip to introduce eggNOG.... Read more »
Powell, S., Szklarczyk, D., Trachana, K., Roth, A., Kuhn, M., Muller, J., Arnold, R., Rattei, T., Letunic, I., Doerks, T.... (2011) eggNOG v3.0: orthologous groups covering 1133 organisms at 41 different taxonomic ranges. Nucleic Acids Research. DOI: 10.1093/nar/gkr1060
Over 2 years ago I did a tip of the week on Phosida. Phosida is a database of phosphorylation, acetylation, and N-glycosylation data. Since the last tip, Phosida has undergone significant growth and some changes, including the addition of much more data (80,000 phosphorylation, acetylation and N-glycosylated sites from 9 different species) and tools (prediction and [...]... Read more »
Gnad, F., Gunawardena, J., & Mann, M. (2010) PHOSIDA 2011: the posttranslational modification database. Nucleic Acids Research, 39(Database). DOI: 10.1093/nar/gkq1159
Today’s video tip of the week is on MapMi. This tool is found at EBI and was developed by the Enright lab. The purpose of this tool is a computational system for mapping of miRNAs within and across species. As the abstract of their recent paper says: Currently miRBase is their primary repository, providing annotations [...]... Read more »
Guerra-Assuncao, J., & Enright, A. (2010) MapMi: automated mapping of microRNA loci. BMC Bioinformatics, 11(1), 133. DOI: 10.1186/1471-2105-11-133
Plaza, a resource for plant comparative genomics, has a lot more than meets the eye at first. Currently the database has comparative tools and data for nearly 2 dozen plants including monocots, dicots, mosses and algae. There are some obvious tools and data from the homepage, but I suggest you take a look at the [...]... Read more »
Proost, S., Van Bel, M., Sterck, L., Billiau, K., Van Parys, T., Van de Peer, Y., & Vandepoele, K. (2009) PLAZA: A Comparative Genomics Resource to Study Gene and Genome Evolution in Plants. THE PLANT CELL ONLINE, 21(12), 3718-3731. DOI: 10.1105/tpc.109.071506
I did a tip on CoGe’s tool, GeVo about two years ago and we’ve had a guest post about CoGe from Eric Lyons, the lead developer of CoGe just over a year ago. In our ongoing and occasional quest to keep our tips fresh (and move them to SciVee), I’ve decided to revisit CoGe and [...]... Read more »
Tang, H., Lyons, E., Pedersen, B., Schnable, J., Paterson, A., & Freeling, M. (2011) Screening synteny blocks in pairwise genome comparisons through integer programming. BMC Bioinformatics, 12(1), 102. DOI: 10.1186/1471-2105-12-102
The researchers and developers at PhylomeDB haven’t rested on their laurels. I did a tip of the week on PhylomeDB 3 months ago and not too long ago I was checking over there and found the team had created another useful database and analysis tool, MetaPhoOrs. What is MetaPhoOrs? To quote from the homepage:
MetaPhOrs is a public repository of phylogeny-based orthology and paralogy predictions that were computed using resources available in seven popular homology prediction services (PhylomeDB, EnsemblCompara, EggNOG, OrthoMCL, COG, Fungal Orthogroups, andTreeFam).
The research article on their methodology published in NAR (online 12/10) will give you a better understanding how these orthology and paralogy predictions are made. Basically, MetaPhOrs uses phylogenetic orthology and paralogy predictions from several sources. These phylogenies overlap:
Since many of these repositories overlap, partially, in terms of genomes covered, it is often the case that phylogenetic information regarding a pair of proteins can be found in several databases.
Moreover, these phylogenies are built with different protein sets, parameters and methodologies.
Such level of information redundancy can be exploited to assess the robustness of a given orthology or paralogy prediction to changes in the phylogenetic settings…. Intuitively, a prediction that is not affected by such settings will be considered more reliable.
MetaPhOrs uses this information to predict orthologs and paralogs for protein pairs with a consistency score (CS, “the fraction of trees predicting an orthology relationship over the total of trees considered”) and a evidence level (EL, “how many independent sources have been used for the prediction”). CS for orthologs ranges from 0 (all trees predict paralogy) to 1 (all trees predict orthology). Take a look at the paper for more information on this methodology and results.
To date, the database uses over 700,000 phylogenies from several sources to predict over 300 million homologous protein pairs from over 800 fully sequenced genomes. They plan to regularly update and add more phylogenetic and protein data.
Today’s tip spends 5 minutes going over the database and showing you how to access these predictions.
Pryszcz, L., Huerta-Cepas, J., & Gabaldon, T. (2010). MetaPhOrs: orthology and paralogy predictions from multiple phylogenetic evidence using a consistency-based confidence score Nucleic Acids Research, 39 (5) DOI: 10.1093/nar/gkq953
... Read more »
Pryszcz, L., Huerta-Cepas, J., & Gabaldon, T. (2010) MetaPhOrs: orthology and paralogy predictions from multiple phylogenetic evidence using a consistency-based confidence score. Nucleic Acids Research, 39(5). DOI: 10.1093/nar/gkq953
microRNAs have become a rich source of research as they probably have a huge effect on gene expression and disease. The human genome may encode over 1,000 miRNAs that target over half of our genes. They might be implicated in a lot of common diseases (which not yet have been picked up in GWAS studies?). They are a fascinating area of biology that has only come of it’s on in the last decade. As such, the number of databases to catalog miRNAs is large. Today’s tip is on a new one, RepTar, which is reported in the upcoming NAR database issue. The niche RepTar is attempting to fill is to get predictions of miRNAs more comprehensive by including new research in the algorithm. This new research suggests there are more possible target sites than previously thought. As mentioned in the article,
Recently, the miRNA binding options were expanded further with the identification of ‘centered sites’, functional miRNA target sites that lack both perfect seed pairing and 3′-compensatory pairing and instead exhibit pairing with the target along 11–12 contiguous pairs at the center of the miRNA (4). While some algorithms relaxed the evolutionary conservation criterion (5–11) and/or offer also predictions of 3′-compensatory sites [e.g. (6,12,13)], few databases offer predictions of the whole repertoire of miRNA targeting patterns. Furthermore to date, no database lists genome-wide prediction of cellular targets of viral miRNAs. These miRNAs lack significant evolutionary conservation and their targets are not necessarily expected to be evolutionarily conserved. In addition, the few identified viral miRNA targets have shown both conventional seed binding and 3′-compensatory binding [e.g. (3,14)].
Here we present a database of genome-wide miRNA target predictions for mouse and human genes, based on the predictions of our novel target prediction algorithm, RepTar
I’ll leave the predictive value up to miRNA researchers, but I thought I’d introduce the site.
While I’m at it, allow me to list a few other miRNA sites from labs and institutes as far flung as China, Italy, Israel, Canada and the U.S.. Perhaps someday I’ll do a comparison.
CircuitsDB, which Jennifer did a great tip of the week tutorial on.
miRBase, which we have a full-length tutorial on.
microRNA.org
HMDD
miRDB
tarBase
miRecords:
PicTar, they have an annotation track for UCSC Genome Browser
miRNA2Disease
PuTmiR (in relation to transcription factors)
microRNAdb:
two lists to catch some others: http://mirnablog.com/microrna-target-prediction-tools/ and http://www.ncrna.org/KnowledgeBase/link-database/mirna_target_database
Elefant, N., Berger, A., Shein, H., Hofree, M., Margalit, H., & Altuvia, Y. (2010). RepTar: a database of predicted cellular targets of host and viral miRNAs Nucleic Acids Research DOI: 10.1093/nar/gkq1233
... Read more »
Elefant, N., Berger, A., Shein, H., Hofree, M., Margalit, H., & Altuvia, Y. (2010) RepTar: a database of predicted cellular targets of host and viral miRNAs. Nucleic Acids Research. DOI: 10.1093/nar/gkq1233
As I and my family await our 23andme kit to scan our genomes, family history has come back to the forefront of my thoughts. I used to be very fascinated by my own genealogy, and with adopted children, the concepts of descent, biology and culture have taken adjusted meanings for me. It’s why we have a ‘family map’ instead of a ‘family tree’. The difference between our cultural genealogy and our genetic genealogy has been become quite clear to me. Obtaining our family ancestry through these tests will bring a lot of these issues back to focus.
But there is a specific issue that is directly related to genomics, genomics tools and my family: same-gender headed household representation in pedigree and genealogy software. It’s non-existent or takes a difficult workaround to make it happen.
With the rising use of personal genomics data, there is a corresponding rise in the use of pedigree software for medical purposes and genealogy software for family history purposes. Neither of these handle non-traditional family structures well. I use ‘non-traditional’ lightly here though because even though same-gender headed households might be relatively new as a recognized family structure, the concept of family can be quite fluid across time and cultures. What is traditional and considered the ‘norm’ today in US culture (nuclear families of two genders with children born to them) for ‘family’, is obviously not the case in the past, nor in contemporary cultures in other parts of the world.
A paper published last year entitled When Family Means More (or less) than Genetics by Burns and Edwards focuses on this inability of current tools to model family histories that aren’t within this norm. As they state:
One challenge in using family history as a health technology is that the geneticist or clinician defines family based on biology, whereas individuals often include those linked socially.
Genetic heritage and history is indeed important in determining disease susceptibilities, but ignoring or misunderstanding socially-defined kinship can lead to misdiagnosis, the lack of understanding of environmental influences and worse. Tools for modeling pedigrees must be able to flexibly model these family structures in order to be useful.
The researchers look at two groups and conclude that current tools are inadequate to model their family structures. Samoans were one group (Japanese-Americans the other):
When Samoan American participants were asked, “tell me about your family,” persons fulfilling social roles were described by that relationship. For example, an individual raised as a brother was identified as a brother whether or not there was a biological basis to the relationship. Similarly, individuals adopted in to or out of a family were described as the children of the family in which they were raised, not as offspring of the biological family. When further questioned, the participants could identify the biological link. But even when the biological relationship was known, the Samoan Americans reported family relationships based on social rather than biological ties.
They go in to good detail into why this is a problem. They also, early in the paper, suggest modern American society is changing. Americans already are one of the most ‘adopting’ nations in the world. And, as the authors note, our family structures are becoming more fluid (perhaps converging with Samoan concepts in some ways?):
For example, the Western postmodern family has looser kinship ties than in the past, with relationships that are diverse and fluid (Stacey, 1998). Blended, adoptive, and gay families, as well as those resulting from a variety of assisted reproductive technologies, place an emphasis on choice rather than genetics. For many, family is about social relationships and not solely concerned with the transfer of genes from one generation to the next (Finkler, 2001;Lévi-Strauss, 1969; Peletz, 1995). Nonbiological social factors, such as role behavior, determine family membership, so that a mother’s sister’s son who has been raised with you is your brother (Finkler, 2001). Both formal and informal adoptions are traditional practices and very common in certain societies: Polynesia often being presented as the exemplar (Brady, 1976; Carroll, 1970; Levy, 1973).
So, let me side step adoption or other non-genetic descent issues for a moment, and hone in on gay families and representation in current pedigree tools available. Though the Recommendations for Standardized Human Pedigree Nomenclature (pdf) mentions it in passing (“For example, information that is commonly recorded on a pedigree (e.g., same-sex relationships…)”) there is no standard suggested. In my and my colleague’s research so far we have yet to find a software or online medical pedigree tool that easily accepts same-gender parental groups, or represents them well.
I took at one excellent online tool, Madeline 2.0. If one enters a parent, entering a second parent automatically forces an opposite gender. Though there is the ability to model adoptive relationships, there is yet no way to model same-gender couples. I wrote the developers of the tool and received a thoughtful reply. No, there was ability to do this, but considering adopt-in and adopt-out relationships are model, it would make sense to include same-gender couples. They suggested they indeed will consider implementing this. Of course, as with all software and online tools, funding, timing and priorities I know will be an issue. I’ll definitely will keep an eye on developments. So as to not single Madeline out, no other tools that we know of (see here, here and here) allow for same-gender couples or headed families.
When going to family history modeling software for genealogy, the omission is as stark. Every individual has two family trees: a cultural/historical one and a genetic one. For most individuals, those histories overlap. The culture you received from your parents and they from theirs is pretty close to the genetic descent. Even then, its not a perfect overlap. What is important to who you are from a cultural or historical perspective might not at all be related to who are you from a genetic one, and who you are is as much cultural as it is genetic. I am as interested in where I got my cultural ancestry as where I got my genetic one, this has become quite clear to me as we’ve adopted children.
And in the future, descendants will look at their family genealogies and it will be very important to them that one of their ancestors was raised by two men, or two women whether adopted or biological from one parent. As these genealogies are built, those relationships which are very important to their family culture and histories should be represented. I know I personally will hope that this will be the case for our family history in the years to follow.
Yet, for software available it is impossible, a complicated workaround or awkward to allow for same-gender parents in the representation (not to mention paper family trees!). GEDCOM is the defacto standard for exchanging genealogical information. There is no simple standard in GEDCOM for including same-sex parents. That it was developed by the Mormon Church probably has something to do with that ‘oversight’, but frankly given the oversight across the board in pedigree and genealogy standards and software, I doubt that was a deliberate one.
So far I have found software that requires complicated workarounds, like Legacy, or it’s not easy to figure out (though once you do, it’s simple . Of the many I’ve tried, none even allow it.
In a world where the number of same-sex couples is increasing annually (not to mention adoption, blended families and many other types of structures) and increased interest in family history through both genomics and culture and history, I look forward to seeing the software catch up to the ability to model my family for future researchers and historians.
... Read more »
Burns McGrath, B., & Edwards, K. (2009) When Family Means More (or Less) Than Genetics: The Intersection of Culture, Family, and Genomics. Journal of Transcultural Nursing, 20(3), 270-277. DOI: 10.1177/1043659609334931
Galaxy started out as a very useful tool to do genomics research that was reproducible and sharable. One of my pet peeves in reading research papers that use genomic analysis or online genomics resources is the materials and methods sections. Often the methods and parameters used are mentioned only in a very cursory manner, if [...]... Read more »
Goecks, J., Nekrutenko, A., Taylor, J., & Galaxy Team, T. (2010) Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biology, 11(8). DOI: 10.1186/gb-2010-11-8-r86
Please indulge a long post from a personal perspective, what genomics is about to do for _me_. This is information that many, if not all, of our readers already know. I’ve been researching and working in either experimental biology or genomics for over 20 years. Ever since the beginning of the Human Genome Project , which coincidently started the same year I started my Ph.D. program, into my postdoctoral research at EMBL and now my work at OpenHelix, I’ve known that someday personal genomics was going to impact me, and millions of others, in a big way. Yet, it has always felt that it was one of those things that would be a decision I and we as a society didn’t have to make until we turned that corner that seemed always “just ahead.”
But now I think we’ve turned a corner. It feels, to mix metaphors, that we’ve hit a tipping point. The Human genome project, the mapping and sequencing of the/a human genome from 1990 to 2003, cost approximately 2,700,000,000 dollars (that’s 2.7 billion, I wanted to get all the zeros in). Celera did the genome for 300,000,000. The cost of sequencing an entire human genome has been plummeting ever since. In 2007, the cost of sequencing the genome of James Watson (co-discoverer of DNA) was about 2,000,000. The today cost is about 10,000. Complete Genomics and other companies are on the march to quickly reducing the cost of sequencing a genome under 1,000.
Let me graph the last 8 years for you, mind you, this is starting from the 300,000,000 number, not the 2.7billion, because that graph would be a straight line down.
So, within a year, the cost of sequencing your, my, genome will reach 1,000. If not less. We’ve seen this coming for years now, and it’s upon us. But what does it mean? A lot of data. But data means nothing without context and analysis. Sequencing my genome would be a waste of 1,000 dollars if I gleaned nothing from it.
Yet, even that seems to have turned the corner from a few tidbits of genetic information to a steady steam and the beginning of a flood.
You know you’ve turned a corner when a genomics testing company begins to offer genetic tests to the mass market through Walgreens. There’s enough context in that data to make money from it, or so they hope. You can be sure the corner is safely behind you when the FDA tells Pathway Genomics and Walgreens that they will need to hold off while they make sense of the regulatory implications. Genomic ancestry test are are also gaining is usability… and scrutiny.
It was the recent Lancet paper on the clinical analysis that seemed to be a tipping point, not for me or those in the field. Genomics has been on my radar since 1988, but for society. I blogged about the paper and it’s use of genomics resources such as GVS, dbSNP and others. In the paper, the researchers did a thorough clinical assessment of an individual’s genome. We’ve brought down the cost of sequencing, now we are learning how much it’s going to take to assess that data from a medical point of view, and importantly, what we can learn from it.
What can we learn from it? I read this paper again from a personal perspective now. Could I learn something from sequencing and analyzing my genome, and if so what. My answer came to this: yes, I could learn something and in fact enough that I’m not convinced that as soon as that sequencing gets down to a 1,000 or lower (and is a high quality sequence , I’m going to do it.
There are three things I see from this paper that one could learn from assessing their genome: prevention, early detection and therapy. I believe the former will be, for most people, something they already know and their genome sequence will tell them nothing new. The other two could be a wealth of information they will want, even need, to know. You’ll notice I left off ‘cure.’ I saw nothing in this paper, and nothing on the near horizon, that suggests to me that our genome sequence data will help with curing anything. Perhaps, just not much. Yet, the possibilities of early detection of disease and personalized drug treatment are tantalizing.
Prevention: The authors have a graph. In the middle of the circle are all the diseases that this individual has a high propensity or probability of contracting, the size of the text indicates the increased probability of the individual getting the disease. For this individual, the biggies are type 2 diabetes, obesity, osteoarthritis and coronary heart disease. Of course, these and others also inter-relate, obesity adversely affects hypertension which adversely affects coronary heart disease, etc. I think my chart might be somewhat different. I know hypertension would be a large one. I’ve had hypertension since I was a skinny 18 year old and the doctor measured my blood pressure at 200/160. That’s going to be a big one. From my family history, I’m sure coronary heart disease will be a large one having had several people in my family have heart attacks at early ages. I am not sure what else will big for me in that middle circle.
Around the circumference of the circle are environmental and lifestyle factors that could have an effect (positive or negative) on the probability of the person getting the disease. A guideline of sorts for the things this person needs to do to lower his chance of getting these diseases. Perhaps it’s a function of the specific diseases he has a propensity towards and other’s might have different ones that would have different lifestyle and environmental factors involved, but I’m think it’s going to be a safe bet that it won’t be much different for the majority of people:
Diet
Exercise
Smoking
Stress
Those are his four biggies of lifestyle and environmental changes he could make to lower his chance of getting these diseases. We’ve known this for decades, if not most of human history. Diet is simple (if not easy), lots of whole grains, fruits and vegetables, nuts and legumes garnished with a bit of dairy, eggs and lean meats. Limit the processed foods piled with fat, salt and sugar. Exercise is simple (if not easy), a moderate daily activity. Stop smoking, calm down. Nothing new here, move along.
As I’ve struggled with hypertension and weight over the years, I’ve learned this. It’s pretty simple and straight forward and our doctors have been telling us this for years. If you smoke, stop it. Change your diet, start exercising, calm down. It’s simple. It’s not easy. It’s been a Herculean task for me over the last decade, but I’ve lost 50lbs, my diet has changed almost exclusively to what I mention above, I’ve incorporated walking several miles a day into my routine and I took up knitting (to lower stress, with the added benefit of making things).
My take home from this is that sequencing your genome will tell you nothing you probably don’t already know. So what will it tell you that you don’t know and that could help you?
Early Detection: As with the individual studied in the paper, my family history tells me a lot. Sequencing my genome in many ways will tell me little I don’t already know. I have a family history of hypertension, heart disease, prostate cancer and a couple other things. But don’t I know. Interestingly, from the paper they discovered from this man’s genome that he had a much higher propensity to contract hemochromotosis. Quoting:
Analysis of the patient’s genome revealed three novel and potentially damaging variants in two related genes that were previously associated with development of haemochromatosis. Subsequent to these findings, detailed review of personal and family history did not identify a history of haemochromatosis in the patient or family members. Echocardiogram results and liver function tests did not show evidence of ... Read more »
Ashley, E., Butte, A., Wheeler, M., Chen, R., Klein, T., Dewey, F., Dudley, J., Ormond, K., Pavlovic, A., & Morgan, A. (2010) Clinical assessment incorporating a personal genome. The Lancet, 375(9725), 1525-1535. DOI: 10.1016/S0140-6736(10)60452-7
The Lancet paper, Clinical assessment incorporating a personal genome, has held my fascination this weekend (yes, I read it at the beach). Mary posted Friday and again Saturday on the paper and related NPR segment. It feels to me to be a seminal paper, though I do agree with Daniel at Genetic Future, there are a lot there we still don’t know. A large portion of the variation is in non-coding regions, and thus predictions and propensities are hard to come by with the available analysis. In fact, as he pointed out, many of the coding region variations have little information as to their effect on disease. I would add also that even if we get to that holy grail of $1,000 to sequence a personal genome, this kind of extensive analysis would still be time and cost-prohibitive for the vast majority of sequenced genomes.
Yet, as with all early steps in science and medicine, there’s missing pieces, large gaps and huge efforts (think “space travel,” “computers,” “microwave ovens,” “internet,”) that over time become inexpensive and commonplace (ok, so the former isn’t necessarily “inexpensive”). Sequencing genomes will become inexpensive before the analysis does, but both will come. And I think this paper is pointing to that future.
The other hurdle to large scale personal genomics I see (of course) is the understanding and use of the genomics and data resources. The authors use a large (and excellent, in my opinion) suite of genomics resources to do obtain data and do their analysis. I’ll list them here with links in alphabetical order:
dbSNP (T)
GVS (T)
HapMap (T)
HGMD
OMIM (T)
PharmGKB
PolyPhen
PubMed (T)
SIFT
UniProt (T)
All of these resources have a wealth of data, but even then, that is a lot of analysis and familiarization that is needed with each tool. Each tool does have documentation and tutorials, and of course OpenHelix has tutorials on many of the ones mentioned (those with linked “T”s after the name). Still, this one analysis took a large number of tools and familiarization.
The paper does have a pretty good figure (figure 1) outlining the analysis process. For example, they SIFTed the genome to find gene-associated, non-synonymous, rare and novel and disease associated variations and then analyzed those using dbSNP, HGMD, OMIM and PubMed to analyze something like HFE2 which might have an association with Haemochromotosis. One of my quibbles with the paper, as often is with these papers, is that there isn’t a good methods ‘walk-through’ of the paper using something like Galaxy or Taverna in a history or workflow that would help reproduce the analysis.
We also have a tutorial I’d like to point you to, one that walks through a similar process and teaches users the basics of walking through that process. You can find this tutorial here, it’s free and publicly available. The tutorial walks the user through the analysis of a gene variation, in this case in the CYPC9 that effects an individual’s response to Warfarin. There is a similar variation (different gene, affects same drug response) in the paper. The tutorial uses the NIEHS SNPs site to get an overview of the variation including SIFT and PolyPhen predictions, then to the UCSC Genome Browser to find an overview of the region, walks through the dbSNP information and does a quick tag SNP analysis using GVS. That tutorial is only one very small step in what will have to be a immense education into genomics and genomics resources.
That is all to point out that the paper is an fascinating first step, and as a first step suggests the gaping holes we will have in bringing personal genomics to medicine.
Ashley, E., Butte, A., Wheeler, M., Chen, R., Klein, T., Dewey, F., Dudley, J., Ormond, K., Pavlovic, A., & Morgan, A. (2010). Clinical assessment incorporating a personal genome The Lancet, 375 (9725), 1525-1535 DOI: 10.1016/S0140-6736(10)60452-7
... Read more »
Ashley, E., Butte, A., Wheeler, M., Chen, R., Klein, T., Dewey, F., Dudley, J., Ormond, K., Pavlovic, A., & Morgan, A. (2010) Clinical assessment incorporating a personal genome. The Lancet, 375(9725), 1525-1535. DOI: 10.1016/S0140-6736(10)60452-7
Today’s tip is on Genomicus. Genomicus is a great tool to visualize gene duplication, synteny and genome evolution. The search and display interfaces are quite straightforward, and there are lots of great features (viewing ancestral gene information, links out to resources, different views of phylogenies, etc) in the tool. This video is only a short introduction. You can delve deeper into the tool with the help and documentation, including an 11 minute video.
There is also a recent (advance access) paper in the journal “Bioinformatics” that will give you a lot more detail on how the database and tool works and what is there.
Muffato, M., Louis, A., Poisnel, C., & Roest Crollius, H. (2010). Genomicus: a database and a browser to study gene synteny in modern and ancestral genomes Bioinformatics DOI: 10.1093/bioinformatics/btq079
You will also notice today the video is a SciVee embed. We are trying out a new way to post and share our tips. SciVee allows us to not only post on our blog, but for you to share the tip with others and also for scientists in the SciVee community to view the tips. This is only a test. We will be working with this for the next couple weeks to find the best way to post and share. Eventually, soon, we hope to share these on Facebook and Youtube also. If the video is not high enough quality for you (SciVee and other video sharing sites by necessity reduce size, you can try out the entire mpeg4 version a this link.
... Read more »
Muffato, M., Louis, A., Poisnel, C., & Roest Crollius, H. (2010) Genomicus: a database and a browser to study gene synteny in modern and ancestral genomes. Bioinformatics. DOI: 10.1093/bioinformatics/btq079
So, I wrote about defunding resources and briefly mentioned a paper in Database about funding (or ‘re’funding) databases and resources. I’d like to discuss this a bit further. The paper, by Chandras et. al, discusses how databases and, to use their term, Biological Resource Centers (BRCs) are to maintain financial viability.
Let me state first, I completely agree with their premise, that databases and resources have become imperative. The earlier model of “publication of experimental results and sharing of the reated research materials” needs to be extended. As they state:
It is however no longer adequate to share data through traditional modes of publication, and, particularly with high throughput (’-omics) technologies, sharing of datasets requires submission to public databases as has long been the case with nucleic acid and protein sequence data.
The authors state, factually, that the financial model for most biological databases (we are talking the thousands that exist), has often been a 3-5 year development funding, that once runs out, the infrastructure needs to be supported by another source. In fact, this has lead to the defunding of databases such as TAIR and VBRC (and many others), excellent resources with irreplaceable data and tools, that then must struggle to find funding to maintain the considerable costs of funding infrastructure and continued development.
The demands of scientific research, open, shared data, require a funding model that maintains the publicly available nature of these databases. And thus the problem as they state:
If, for financial reasons, BRCs are unable to perform their tasks under conditions that meet the requirements of sceintfic research and the deamnds of industry, scientists will either see valuable information lost or being transferred into strictly commercial environment with at east two consequences: (i) blockade of access to this information and/or high costs and (ii) loss of data and potentioal for technology transfer for the foreseeable future. In either case the effect on both the scientific and broader community will be detrimental.
Again, I agree.
They discuss several possible solutions to maintaining the viability of publicly available databases including a private-public dual tier system where for-profits paid an annual fee and academic researchers have free access. They mention Uniprot, which underwent a funding crisis over a decade ago, as an example. Uniprot (then Swissprot) went back to complete public funding in 2002. There are still several other databases that are attempting to fund themselves by such a model. BioBase is one where several databases have been folded. TransFac is one. There is a free, reduced functionality, version that is available to academics through gene-regulation.com and the fuller version for a subscription at BioBase. This former version allows some data to be shared, as one could see at VISTA or UCSC. I am not privy to the financials of BioBase and other similar models, and I assume that will work for some, but I agree with the authors that many useful databases and resources would be hard-pressed to be maintained this way.
Other possibilities include fully including databases under a single public institution funding mechanism. The many databases of NCBI and EBI fit this model. In fact, there is even a recent case of a resource being folded into this model at NCBI. Again, this works for some, but not all useful resources.
Most will have to find variable methods for funding their databases. Considering the importance of doing so, it is imperative that viable models are found. The authors reject, out of hand, advertising. As they mention, most advertisers will not be drawn to website advertising without a visibility of at least 10,000 visitors per month. There might be some truth to this (and I need to read the reference they cite that use to back that up).
But the next model they suggest seems to me to have the same drawback. In this model, the database or resource would have a ‘partnership of core competencies.’ An example they cite is MMdb (not to be confused with MMDB). This virtual mutant mouse repository provides direct trial links to Invitrogen from it’s gene information to the product page. They mention that though 6 companies were approached, only one responded. It would seem that this model has the same issues as directly selling advertising.
They also mention that, at least for their research community of mouse functional genomics, “Institutional Funding” seems the best solution for long-term viability and open access. Unfortunately, until institutions like NIH and EMBL are willing or able to fund these databases, I’m not sure that’s thats a solution.
As they mention in the paper, the rate of growth of the amounts and types of data that is being generated is exponential. I am not sure that government or institutional funding can financially keep up with housing the infrastructure needed to maintain and further develop these databases so that all the data generated can remain publicly and freely accessible.
Information is should be free, but unfortunately it is not without cost. It will be interesting to see how funding of databases and resources evolves in this fast growing genomics world (and imperative we figure out solutions).
PS: On a personal note, the authors use their resource, EMMA (European Mouse Mutant Archive), as an example in the paper. I like the name since it’s the name of my daughter, but it just goes to prove that names come in waves. We named our daughter thinking few would name their daughter the same. When even databases name the same name, you know that’s not the case.
Chandras, C., Weaver, T., Zouberakis, M., Smedley, D., Schughart, K., Rosenthal, N., Hancock, J., Kollias, G., Schofield, P., & Aidinis, V. (2009). Models for financial sustainability of biological databases and resources Database, 2009 DOI: 10.1093/database/bap017
... Read more »
Chandras, C., Weaver, T., Zouberakis, M., Smedley, D., Schughart, K., Rosenthal, N., Hancock, J., Kollias, G., Schofield, P., & Aidinis, V. (2009) Models for financial sustainability of biological databases and resources. Database. DOI: 10.1093/database/bap017
A recent paper in PLoS One finds hundreds of new putative transcription start sites (TSS): PLoS ONE: Genome-Wide Identification of Transcription Start Sites, Promoters and Transcription Factor Binding Sites in E. coli. I found the paper interesting, and a good example of how high-throughput studies and genomics can advance our understanding of biology and work in concert with experimental biology, while at the same time dumping a whole lot of new data in our laps.
I’d like to point out some of the databases and resources that are mentioned and used in this paper. In fact, this is the first semi-weekly installment of ‘what did they use?’ post. I’d like to start citing papers that I find interesting and pull out the software, databases and genomics resources used in them. Might help our readers get an understanding of what is being used out there.
First and foremost, this paper has added a large set of new data to RegulonDB, or to paraphrase their about page:
RegulonDB is a computational model of mechanisms of transcriptional regulation including the complex regulation of transcription initiation or regulatory network of the cell and of the organization of the genes in transcription units, operons and simple and complex regulons.
So, if you have used RegulonDB in the past, or might find use of it, you’ll see there is a large set of new data.
Additionally, the paper does it’s analysis using several programs (some of which have web interfaces) including WConsensus (from the same lab that brings you Consensus, tutorial, for those subscribed) and Patser (ftp link to download, also from the Stormo lab) to predict promotors. The authors also use Matrix-Scan, to predict transcription factor binding sites.
As with many papers, I had to go to the citation of the paper about the resource, find the paper and then determine where the database or software resided. As I’ve said before, there needs to be a better way to reference work done using databases.
Mendoza-Vargas, A., Olvera, L., Olvera, M., Grande, R., Vega-Alvarado, L., Taboada, B., Jimenez-Jacinto, V., Salgado, H., Juárez, K., Contreras-Moreira, B., Huerta, A., Collado-Vides, J., & Morett, E. (2009). Genome-Wide Identification of Transcription Start Sites, Promoters and Transcription Factor Binding Sites in E. coli PLoS ONE, 4 (10) DOI: 10.1371/journal.pone.0007526
... Read more »
Mendoza-Vargas, A., Olvera, L., Olvera, M., Grande, R., Vega-Alvarado, L., Taboada, B., Jimenez-Jacinto, V., Salgado, H., Juárez, K., Contreras-Moreira, B.... (2009) Genome-Wide Identification of Transcription Start Sites, Promoters and Transcription Factor Binding Sites in E. coli. PLoS ONE, 4(10). DOI: 10.1371/journal.pone.0007526
Do you write about peer-reviewed research in your blog? Use ResearchBlogging.org to make it easy for your readers — and others from around the world — to find your serious posts about academic research.
If you don't have a blog, you can still use our site to learn about fascinating developments in cutting-edge research from around the world.