22 posts · 23,476 views
Elsevier is not the only publisher with a large innovation inertia. In fact, I think many large organizations do, particularly if there are too many interdependencies, causing too long lines. Greg Laundrum made me aware that one American Chemical Society journal is now going to encourage (not require) machine readable forms of chemical structures to be included in their flagship. The reasoning by Gilson et al. is balanced. It is also 15 years too late. This question was relevant at the end of th........ Read more »
Gilson MK, Georg G, & Wang S. (2014) Digital Chemistry in the Journal of Medicinal Chemistry. Journal of medicinal chemistry. PMID: 24521446
I don't think I mentioned this JISC project by David Shotton et al. yet, and should perhaps have done so earlier. But it is not too late, as Shotton is calling out for help in a Nature Comment this week (doi:10.1038/502295a). Now, I have been tracking what is citing the CDK literature using CiteUlike since 2010, and just asked the project developers how I can contribute this data.
Interestingly, the visualization from OpenCitations.net is interesting as it also shows papers citing papers t........ Read more »
D. Shotton. (2013) Publishing: Open citations. Nature, 502(7471), 295-297. info:/10.1038/502295a
Before I moved to my current position in Maastricht, I had the great pleasure to work with Prof. Roland Grafström (check his pathway bioinformatics done with his then PhD Rebecca) and Prof. Bengt Fadeel at the Karolinska Institutet. During this year I part-time worked on ToxBank and part-time on nano-QSAR, and worked on semantics, predictive toxicology, and Open Data. This blog post is about the ToxBank work.
I promised firework, and the first rockets are heading upw........ Read more »
Kohonen, P., Benfenati, E., Bower, D., Ceder, R., Crump, M., Cross, K., Grafström, R., Healy, L., Helma, C., Jeliazkova, N.... (2013) The ToxBank Data Warehouse: Supporting the Replacement of In Vivo Repeated Dose Systemic Toxicity Testing. Molecular Informatics. DOI: 10.1002/minf.201200114
One day on, and still struggling with the chemistry behind gene regulation. Let no biologist ever tell me again not to use acronyms (yes, I am looking at you!). But it is interesting. I learned a lot about ChIP, histone modifications, etc, etc. This is an amazing world, where specific histone complex protein residues get methylated, acetylated, citrullinated, and phosphorylated. Of course, all this is in the context of the ENCODE meeting we have tomorrow at BiGCaT, where I will try to ........ Read more »
Thurman, R., Rynes, E., Humbert, R., Vierstra, J., Maurano, M., Haugen, E., Sheffield, N., Stergachis, A., Wang, H., Vernot, B.... (2012) The accessible chromatin landscape of the human genome. Nature, 489(7414), 75-82. DOI: 10.1038/nature11232
I have started learning about epigenetics, and particularly the regulatory effects of DNA methylation and acetylation. It's cool, it's hot, it's everything we hope will explain genetics, because genes certainly did not.
The chemistry behind this involves interesting pathways, involves storage of information that passes from one generation to another... epigenetic effects down to the grandchild generation have repeatedly been shown now. I likely candidate are mRNAs that persist beyond the cell d........ Read more »
Donohoe, D., Collins, L., Wali, A., Bigler, R., Sun, W., & Bultman, S. (2012) The Warburg Effect Dictates the Mechanism of Butyrate-Mediated Histone Acetylation and Cell Proliferation. Molecular Cell. DOI: 10.1016/j.molcel.2012.08.033
Earlier this week an important cheminformatics paper appeared in the Journal of Cheminformatics. It is about the Open Molecule Generator (see below for the paper). This was one important piece of functionality still missing from Open Source cheminformatics. This works uses the Chemistry Development Kit, and was written by Julio Peironcely.
The Analytical Biosciences' group of Prof. Hankemeier (and many others, including also Theo Reijmers) and funded by the Netherlands Metab........ Read more »
I use CiTO to keep track of how the CDK is cited and used, and just looked at a typical QSAR paper. Here are my comments on "Study of indole derivative inhibitors of Cytosolic phospholipase A2α based on Quantitative Structure Activity Relationship", by Lu et al (doi:10.1016/j.chemolab.2011.11.011). Normally, I am fairly short in these reviews which I publish via the CDK Google+ page, briefly describing what CDK functionality is being used. But this time the post became a more substantial r........ Read more »
Lu, X., Ji, D., Chen, J., Zhou, X., & Shi, H. (2012) Study of indole derivative inhibitors of Cytosolic phospholipase A2α based on Quantitative Structure Activity Relationship. Chemometrics and Intelligent Laboratory Systems. DOI: 10.1016/j.chemolab.2011.11.011
I guess reader of my blog already heard about it via other channels (e.g. via Noel's blog post), but our second Blue Obelisk paper is out. In the past five-ish years since Peter instantiated this initiative, it has created a solid set of shoulder on which to developed Open Source-based cheminformatics solutions. I created the following diagram for the paper, showing how various Blue Obelisk projects interoperate (image is CC-BY, from the paper):
It shows a number of Open Standards (diamonds)........ Read more »
Guha, R., Howard, M., Hutchison, G., Murray-Rust, P., Rzepa, H., Steinbeck, C., Wegner, J., & Willighagen, E. (2006) The Blue ObeliskInteroperability in Chemical Informatics. Journal of Chemical Information and Modeling, 46(3), 991-998. DOI: 10.1021/ci050400b
O'Boyle NM, Guha R, Willighagen EL, Adams SE, Alvarsson J, Bradley JC, Filippov IV, Hanson RM, Hanwell MD, Hutchison GR.... (2011) Open Data, Open Source and Open Standards in chemistry: The Blue Obelisk five years on. Journal of cheminformatics, 3(1), 37. PMID: 21999342
QSAR and QSPR are the fields that statistically correlate chemical substance features with (biological) activities (QSAR) or properties (QSPR). The chemical substance can be molecular structures, drug (which are not uncommonly mixtures), and true mixture like nanomaterials (NanoQSAR). Readers of this blog know I have been working towards making these kind of studies more reproducible for many years now.
Parts of this full story include the Blue Obelisk Data Repository (BODR), QSAR-ML, the CDK f........ Read more »
Hastings, J., Chepelev, L., Willighagen, E., Adams, N., Steinbeck, C., & Dumontier, M. (2011) The Chemical Information Ontology: Provenance and Disambiguation for Chemical Data on the Biological Semantic Web. PLoS ONE, 6(10). DOI: 10.1371/journal.pone.0025513
Forking is an important part of Open Source development, and forking is good. Of course, forks should interact too, and genes from one fork should merge back into another fork. Forks are probably also a good indication for the success of a project: if a project is forked, it means it is significant. On the other hand, it can also mean that the main project is too hard to work with. Maybe the CDK is that. Indeed, it's easier to not have your code peer-reviewed, and just fork. That is freedom. (........ Read more »
Jeliazkova, N., & Jeliazkov, V. (2011) AMBIT RESTful web services: an implementation of the OpenTox application programming interface. Journal of Cheminformatics, 3(1), 18. DOI: 10.1186/1758-2946-3-18
Wetzel, S., Klein, K., Renner, S., Rauh, D., Oprea, T., Mutzel, P., & Waldmann, H. (2009) Interactive exploration of chemical space with Scaffold Hunter. Nature Chemical Biology, 5(8), 581-583. DOI: 10.1038/nchembio.187
Yap, C. (2011) PaDEL-descriptor: An open source software to calculate molecular descriptors and fingerprints. Journal of Computational Chemistry, 32(7), 1466-1474. DOI: 10.1002/jcc.21707
I was recently asked about a volume descriptor in Bioclipse, which is not yet available. Jmol can calculate surfaces, so that was my first thought. However, I then ran into a paper from 2003 by Zhao, called Fast Calculation of van der Waals Volume as a Sum of Atomic and Bond Contributions and Its Application to Drug Compounds (doi:10.1021/jo034808o).
The paper presents a very simple mathematical model, which approximates the molecular volume by a sum of atomic contributions, and a three terms t........ Read more »
Zhao, Y., Abraham, M., & Zissimos, A. (2003) Fast Calculation of van der Waals Volume as a Sum of Atomic and Bond Contributions and Its Application to Drug Compounds. The Journal of Organic Chemistry, 68(19), 7368-7373. DOI: 10.1021/jo034808o
Last week I was in sunny Cascais, and in three days experienced -23oC and +18oC. The reason I was there was the kick-off meeting of the EU FP7 cluster SEURAT, which includes 'our' ToxBank project.
Data types we will host include many different types, including my favorite metabolomics. Don't ask me what this will practically mean, but some keywords we already know include RDF, OpenTox, and ToxML. With metabolomics, I hope to squeeze in metabolomics.
And that data warehousing for metabolo........ Read more »
Bais, H., Prithiviraj, B., Jha, A., Ausubel, F., & Vivanco, J. (2005) Mediation of pathogen resistance by exudation of antimicrobials from roots. Nature, 434(7030), 217-221. DOI: 10.1038/nature03356
The readers of Antony's blog know enough about the problem. And many in the QSAR community know it too (and many other do not). Chemical structure data is noisy. I haven't recently created a new local data set for analysis, so I have not taken time to blog about it much, but the ambiguity in chemical databases is enormous. Just yesterday, Antony and I had a good discussion about tautomers and in particular how things are linked together.
If we are in the field of property prediction, knowing wh........ Read more »
Oscar uses a Maximum Entropy Markov Model (MEMM) based on n-grams. Peter Corbett has written this up (doi:10.1186/1471-2105-9-S11-S4). So, it basically is statistics once more. If you really want a proper bioinformatics education, so do your PhD at a (proteo)chemometrics department.
N-grams are word parts of n characters. For example, the trigrams of acetic acid include ace, cid, tic, eti, and aci. N-grams of length four include acid, etic, and acet. The MEMM assigns weights to these n-grams, a........ Read more »
Corbett, P., & Copestake, A. (2008) Cascaded classifiers for confidence-based chemical named entity recognition. BMC Bioinformatics, 9(Suppl 11). DOI: 10.1186/1471-2105-9-S11-S4
OK, the second paper I ran into today is a perfect match for the paper by Khanna and Ranganathan I just dicussed in the Commercial or Proprietary? post. So perfect, in fact, that it I should have really combined them. But since the other post is already infecting the WWW, I'll have to post this update.
Yap wrote up a paper on PaDEL-descriptor: An open source software to calculate molecular descriptors and fingerprints (doi:10.1002/jcc.21707), and Table 2 is quite like Table 1 in the paper by Kh........ Read more »
Yap, C. (2010) PaDEL-descriptor: An open source software to calculate molecular descriptors and fingerprints. Journal of Computational Chemistry. DOI: 10.1002/jcc.21707
Some smart software developer once said to not optimize your code too early. However, not caring about it at all does not help either. Some basic knowledge of memory management can keep you going. That is, I just ran into the limits of Oscar and ChemicalTagger. As I blogged earlier today, I am analyzing the BJOC literature, but Lezan and I are running into a reproducible out-of-memory exception. At first I thought it was a memory leak, as it was the 95th paper if fell over on, but after we optim........ Read more »
Späth, A., & König, B. (2010) Molecular recognition of organic ammonium-ions in solution using synthetic receptors. Beilstein Journal of Organic Chemistry. DOI: 10.3762/bjoc.6.32
Buijnsters, P. J. J. A.; García-Rodríguez, C. L.; Willighagen, E. L.; Sommerdijk, N. A. J. M.; Kremer, A.; Camilleri, P.; Feiters, M. C.; Nolte, R. J. M.; Zwanenburg, B. (2002) Cationic Gemini Surfactants Based on Tartaric Acid: Synthesis, Aggregation, Monolayer Behaviour, and Interaction with DNA. European Journal of Organic Chemistry, 2002(8), 1397-1406. info:/10.1002/1099-0690(200204)2002:83.0.CO;2-6
As you know, my post-doc in Uppsala ended. It was a good time, and it was great collaborating on Bioclipse with Ola, Jonathan, Arvid, and Carl. I would have loved tighter integration with the work of Maris and Martin, but that was limited to one joined paper (in press). I thank Professors Jarl Wikberg and Eva Brittebo for allowing me to continue my research at their department, and hope this is not the end of the collaboration yet.
Like with new year, the end of a contract is a good time to ref........ Read more »
Wagener, J., Spjuth, O., Willighagen, E., & Wikberg, J. (2009) XMPP for cloud computing in bioinformatics supporting discovery and invocation of asynchronous web services. BMC Bioinformatics, 10(1), 279. DOI: 10.1186/1471-2105-10-279
Spjuth, O., Alvarsson, J., Berg, A., Eklund, M., Kuhn, S., Mäsak, C., Torrance, G., Wagener, J., Willighagen, E., Steinbeck, C.... (2009) Bioclipse 2: A scriptable integration platform for the life sciences. BMC Bioinformatics, 10(1), 397. DOI: 10.1186/1471-2105-10-397
Spjuth, O., Willighagen, E., Guha, R., Eklund, M., & Wikberg, J. (2010) Towards interoperable and reproducible QSAR analyses: Exchange of datasets. Journal of Cheminformatics, 2(1), 5. DOI: 10.1186/1758-2946-2-5
Derek's blog pointed me to an editorial by Royce Murray Science Blogs and Caveat Emptor (doi:10.1021/ac102628p). He is warning us, science scholars, for blogs. He is accusing bloggers for not being scholarly, not checking facts etc.
He did himself and the journal a big disfavor: in his blog he does precisely what he is accusing the blogger of: fail to check facts. Even worse, particularly for the 'Analytical Chemistry' journal, he showed inadequate in analyzing the problem, putting his scholarl........ Read more »
I do not think I have ever blogged the paper that played an important role in my thesis (doi:10.1021/ci990038z); research of one of the papers in my thesis, started with the hypothesis proposed therein. The paper had a really good idea; but, unfortunately, it did not contain the data to support the hypothesis. That gets me to one important lesson I learned: a QSAR data set of less than 100 molecules is not enough to make untargeted statistical models.
The paper reads quite nicely, and the resul........ Read more »
Bursi, R., Dao, T., van Wijk, T., de Gooyer, M., Kellenbach, E., & Verwer, P. (1999) Comparative Spectra Analysis (CoSA): Spectra as Three-Dimensional Molecular Descriptors for the Prediction of Biological Activities. Journal of Chemical Information and Modeling, 39(5), 861-867. DOI: 10.1021/ci990038z
Willighagen, E., Denissen, H., Wehrens, R., & Buydens, L. (2006) On the Use of H and C 1D NMR Spectra as QSPR Descriptors . Journal of Chemical Information and Modeling, 46(2), 487-494. DOI: 10.1021/ci050282s
The idea has been lingering in the air for a long time now: sharing large science data sets using bittorrent. Over the past couple of years I have seen a lot of science related software being distributed over torrents, and the use in open source in general is abundant. Given a good network of so-called seeders, download times go down dramatically, and the overall energy consumption goes down too, as data has to follow a much shorter path.
It could very well be that the uptake of this technology........ Read more »
Langille, M., & Eisen, J. (2010) BioTorrents: A File Sharing Service for Scientific Data. PLoS ONE, 5(4). DOI: 10.1371/journal.pone.0010071
Do you write about peer-reviewed research in your blog? Use ResearchBlogging.org to make it easy for your readers — and others from around the world — to find your serious posts about academic research.
If you don't have a blog, you can still use our site to learn about fascinating developments in cutting-edge research from around the world.
Research Blogging is powered by SMG Technology.
To learn more, visit seedmediagroup.com.