This blog deals with chemblaics in the broader sense. Chemblaics (pronounced chem-bla-ics) is the science that uses computers to address and possibly solve problems in the area of chemistry, biochemistry and related fields. The big difference between chemblaics and areas as chem(o)?informatics, chemometrics, computational chemistry, etc, is that chemblaics only uses open source software, making experimental results reproducable and validatable. And this is a big difference!

  • February 21, 2014
  • 04:43 AM

Slow publishing innovation

by egonw in Chem-bla-ics

Elsevier is not the only publisher with a large innovation inertia. In fact, I think many large organizations do, particularly if there are too many interdependencies, causing too long lines. Greg Laundrum made me aware that one American Chemical Society journal is now going to encourage (not require) machine readable forms of chemical structures to be included in their flagship. The reasoning by Gilson et al. is balanced. It is also 15 years too late. This question was relevant at the end of th........ Read more »

Gilson MK, Georg G, & Wang S. (2014) Digital Chemistry in the Journal of Medicinal Chemistry. Journal of medicinal chemistry. PMID: 24521446  

  • October 20, 2013
  • 10:07 AM
  • 286 views calls for help

by egonw in Chem-bla-ics

I don't think I mentioned this JISC project by David Shotton et al. yet, and should perhaps have done so earlier. But it is not too late, as Shotton is calling out for help in a Nature Comment this week (doi:10.1038/502295a). Now, I have been tracking what is citing the CDK literature using CiteUlike since 2010, and just asked the project developers how I can contribute this data.

Interestingly, the visualization from is interesting as it also shows papers citing papers t........ Read more »

D. Shotton. (2013) Publishing: Open citations. Nature, 502(7471), 295-297. info:/10.1038/502295a

  • January 22, 2013
  • 04:09 AM

ToxBank: the next generation toxicology

by egonw in Chem-bla-ics

Before I moved to my current position in Maastricht, I had the great pleasure to work with Prof. Roland Grafström (check his pathway bioinformatics done with his then PhD Rebecca) and Prof. Bengt Fadeel at the Karolinska Institutet. During this year I part-time worked on ToxBank and part-time on nano-QSAR, and worked on semantics, predictive toxicology, and Open Data. This blog post is about the ToxBank work.

I promised firework, and the first rockets are heading upw........ Read more »

Kohonen, P., Benfenati, E., Bower, D., Ceder, R., Crump, M., Cross, K., Grafström, R., Healy, L., Helma, C., Jeliazkova, N.... (2013) The ToxBank Data Warehouse: Supporting the Replacement of In Vivo Repeated Dose Systemic Toxicity Testing. Molecular Informatics. DOI: 10.1002/minf.201200114  

  • November 18, 2012
  • 03:39 PM

DHSs and histone modifications: methylation, acetylation, citrullination, and phosphorylation

by egonw in Chem-bla-ics

One day on, and still struggling with the chemistry behind gene regulation. Let no biologist ever tell me again not to use acronyms (yes, I am looking at you!). But it is interesting. I learned a lot about ChIP, histone modifications, etc, etc. This is an amazing world, where specific histone complex protein residues get methylated, acetylated, citrullinated, and phosphorylated. Of course, all this is in the context of the ENCODE meeting we have tomorrow at BiGCaT, where I will try to ........ Read more »

Thurman, R., Rynes, E., Humbert, R., Vierstra, J., Maurano, M., Haugen, E., Sheffield, N., Stergachis, A., Wang, H., Vernot, B.... (2012) The accessible chromatin landscape of the human genome. Nature, 489(7414), 75-82. DOI: 10.1038/nature11232  

Felsenfeld G, Boyes J, Chung J, Clark D, & Studitsky V. (1996) Chromatin structure and gene expression. Proceedings of the National Academy of Sciences of the United States of America, 93(18), 9384-8. PMID: 8790338  

  • November 17, 2012
  • 12:02 PM

The chemistry of DNA modifications for gene regulation

by egonw in Chem-bla-ics

I have started learning about epigenetics, and particularly the regulatory effects of DNA methylation and acetylation. It's cool, it's hot, it's everything we hope will explain genetics, because genes certainly did not.

The chemistry behind this involves interesting pathways, involves storage of information that passes from one generation to another... epigenetic effects down to the grandchild generation have repeatedly been shown now. I likely candidate are mRNAs that persist beyond the cell d........ Read more »

  • September 22, 2012
  • 11:02 AM

OMG! An Open Molecule Generator!

by egonw in Chem-bla-ics

Earlier this week an important cheminformatics paper appeared in the Journal of Cheminformatics. It is about the Open Molecule Generator (see below for the paper). This was one important piece of functionality still missing from Open Source cheminformatics. This works uses the Chemistry Development Kit, and was written by Julio Peironcely.

The Analytical Biosciences' group of Prof. Hankemeier (and many others, including also Theo Reijmers) and funded by the Netherlands Metab........ Read more »

Julio E Peironcely, Miguel Rojas-Chertó, Davide Fichera, Theo Reijmers, Leon Coulier, Jean-Loup Faulon, & Thomas Hankemeier. (2012) OMG: open molecule generator. Journal of Cheminformatics, 21. DOI: 10.1186/1758-2946-4-21  

  • April 7, 2012
  • 04:47 AM

A typical QSAR study (cite:citesAsAuthority)

by egonw in Chem-bla-ics

I use CiTO to keep track of how the CDK is cited and used, and just looked at a typical QSAR paper. Here are my comments on "Study of indole derivative inhibitors of Cytosolic phospholipase A2α based on Quantitative Structure Activity Relationship", by Lu et al (doi:10.1016/j.chemolab.2011.11.011). Normally, I am fairly short in these reviews which I publish via the CDK Google+ page, briefly describing what CDK functionality is being used. But this time the post became a more substantial r........ Read more »

  • October 18, 2011
  • 04:53 AM

The Blue Obelisk Shoulders for Translational Cheminformatics

by egonw in Chem-bla-ics

I guess reader of my blog already heard about it via other channels (e.g. via Noel's blog post), but our second Blue Obelisk paper is out. In the past five-ish years since Peter instantiated this initiative, it has created a solid set of shoulder on which to developed Open Source-based cheminformatics solutions. I created the following diagram for the paper, showing how various Blue Obelisk projects interoperate (image is CC-BY, from the paper):

It shows a number of Open Standards (diamonds)........ Read more »

Guha, R., Howard, M., Hutchison, G., Murray-Rust, P., Rzepa, H., Steinbeck, C., Wegner, J., & Willighagen, E. (2006) The Blue ObeliskInteroperability in Chemical Informatics. Journal of Chemical Information and Modeling, 46(3), 991-998. DOI: 10.1021/ci050400b  

O'Boyle NM, Guha R, Willighagen EL, Adams SE, Alvarsson J, Bradley JC, Filippov IV, Hanson RM, Hanwell MD, Hutchison GR.... (2011) Open Data, Open Source and Open Standards in chemistry: The Blue Obelisk five years on. Journal of cheminformatics, 3(1), 37. PMID: 21999342  

  • October 9, 2011
  • 02:46 AM

An ontology for QSAR and cheminformatics

by egonw in Chem-bla-ics

QSAR and QSPR are the fields that statistically correlate chemical substance features with (biological) activities (QSAR) or properties (QSPR). The chemical substance can be molecular structures, drug (which are not uncommonly mixtures), and true mixture like nanomaterials (NanoQSAR). Readers of this blog know I have been working towards making these kind of studies more reproducible for many years now.

Parts of this full story include the Blue Obelisk Data Repository (BODR), QSAR-ML, the CDK f........ Read more »

  • July 13, 2011
  • 06:58 AM

CDK Forks

by egonw in Chem-bla-ics

Forking is an important part of Open Source development, and forking is good. Of course, forks should interact too, and genes from one fork should merge back into another fork. Forks are probably also a good indication for the success of a project: if a project is forked, it means it is significant. On the other hand, it can also mean that the main project is too hard to work with. Maybe the CDK is that. Indeed, it's easier to not have your code peer-reviewed, and just fork. That is freedom. (........ Read more »

  • June 17, 2011
  • 04:03 PM

Fast Calculation of van der Waals Volume as a Sum of Atomic and Bond Contributions

by egonw in Chem-bla-ics

I was recently asked about a volume descriptor in Bioclipse, which is not yet available. Jmol can calculate surfaces, so that was my first thought. However, I then ran into a paper from 2003 by Zhao, called Fast Calculation of van der Waals Volume as a Sum of Atomic and Bond Contributions and Its Application to Drug Compounds (doi:10.1021/jo034808o).

The paper presents a very simple mathematical model, which approximates the molecular volume by a sum of atomic contributions, and a three terms t........ Read more »

  • March 8, 2011
  • 07:40 AM

ToxBank: a data warehouse for (computational) toxicology

by egonw in Chem-bla-ics

Last week I was in sunny Cascais, and in three days experienced -23oC and +18oC. The reason I was there was the kick-off meeting of the EU FP7 cluster SEURAT, which includes 'our' ToxBank project.

Data types we will host include many different types, including my favorite metabolomics. Don't ask me what this will practically mean, but some keywords we already know include RDF, OpenTox, and ToxML. With metabolomics, I hope to squeeze in metabolomics.

And that data warehousing for metabolo........ Read more »

Bais, H., Prithiviraj, B., Jha, A., Ausubel, F., & Vivanco, J. (2005) Mediation of pathogen resistance by exudation of antimicrobials from roots. Nature, 434(7030), 217-221. DOI: 10.1038/nature03356  

Walker, T., Bais, H., Halligan, K., Stermitz, F., & Vivanco, J. (2003) Metabolic Profiling of Root Exudates of . Journal of Agricultural and Food Chemistry, 51(9), 2548-2554. DOI: 10.1021/jf021166h  

  • February 9, 2011
  • 02:01 AM

Chemical data curation: yes, it is that bad.

by egonw in Chem-bla-ics

The readers of Antony's blog know enough about the problem. And many in the QSAR community know it too (and many other do not). Chemical structure data is noisy. I haven't recently created a new local data set for analysis, so I have not taken time to blog about it much, but the ambiguity in chemical databases is enormous. Just yesterday, Antony and I had a good discussion about tautomers and in particular how things are linked together.

If we are in the field of property prediction, knowing wh........ Read more »

Porter, W. (2010) Warfarin: history, tautomerism and activity. Journal of Computer-Aided Molecular Design, 24(6-7), 553-573. DOI: 10.1007/s10822-010-9335-7  

  • December 26, 2010
  • 10:37 AM

Oscar: training data, models, etc

by egonw in Chem-bla-ics

Oscar uses a Maximum Entropy Markov Model (MEMM) based on n-grams. Peter Corbett has written this up (doi:10.1186/1471-2105-9-S11-S4). So, it basically is statistics once more. If you really want a proper bioinformatics education, so do your PhD at a (proteo)chemometrics department.

N-grams are word parts of n characters. For example, the trigrams of acetic acid include ace, cid, tic, eti, and aci. N-grams of length four include acid, etic, and acet. The MEMM assigns weights to these n-grams, a........ Read more »

  • December 21, 2010
  • 04:22 AM

re: Commercial or Proprietary?

by egonw in Chem-bla-ics

OK, the second paper I ran into today is a perfect match for the paper by Khanna and Ranganathan I just dicussed in the Commercial or Proprietary? post. So perfect, in fact, that it I should have really combined them. But since the other post is already infecting the WWW, I'll have to post this update.

Yap wrote up a paper on PaDEL-descriptor: An open source software to calculate molecular descriptors and fingerprints (doi:10.1002/jcc.21707), and Table 2 is quite like Table 1 in the paper by Kh........ Read more »

  • December 11, 2010
  • 03:21 PM

Supramolecular chemistry

by egonw in Chem-bla-ics

Some smart software developer once said to not optimize your code too early. However, not caring about it at all does not help either. Some basic knowledge of memory management can keep you going. That is, I just ran into the limits of Oscar and ChemicalTagger. As I blogged earlier today, I am analyzing the BJOC literature, but Lezan and I are running into a reproducible out-of-memory exception. At first I thought it was a memory leak, as it was the 95th paper if fell over on, but after we optim........ Read more »

Buijnsters, P. J. J. A.; García-Rodríguez, C. L.; Willighagen, E. L.; Sommerdijk, N. A. J. M.; Kremer, A.; Camilleri, P.; Feiters, M. C.; Nolte, R. J. M.; Zwanenburg, B. (2002) Cationic Gemini Surfactants Based on Tartaric Acid: Synthesis, Aggregation, Monolayer Behaviour, and Interaction with DNA. European Journal of Organic Chemistry, 2002(8), 1397-1406. info:/10.1002/1099-0690(200204)2002:83.0.CO;2-6

  • November 27, 2010
  • 06:27 AM

Uppsala Status Report

by egonw in Chem-bla-ics

As you know, my post-doc in Uppsala ended. It was a good time, and it was great collaborating on Bioclipse with Ola, Jonathan, Arvid, and Carl. I would have loved tighter integration with the work of Maris and Martin, but that was limited to one joined paper (in press). I thank Professors Jarl Wikberg and Eva Brittebo for allowing me to continue my research at their department, and hope this is not the end of the collaboration yet.

Like with new year, the end of a contract is a good time to ref........ Read more »

Spjuth, O., Alvarsson, J., Berg, A., Eklund, M., Kuhn, S., Mäsak, C., Torrance, G., Wagener, J., Willighagen, E., Steinbeck, C.... (2009) Bioclipse 2: A scriptable integration platform for the life sciences. BMC Bioinformatics, 10(1), 397. DOI: 10.1186/1471-2105-10-397  

Spjuth, O., Willighagen, E., Guha, R., Eklund, M., & Wikberg, J. (2010) Towards interoperable and reproducible QSAR analyses: Exchange of datasets. Journal of Cheminformatics, 2(1), 5. DOI: 10.1186/1758-2946-2-5  

  • October 16, 2010
  • 04:52 AM

Royce Murray and Caveat Emptor

by egonw in Chem-bla-ics

Derek's blog pointed me to an editorial by Royce Murray Science Blogs and Caveat Emptor (doi:10.1021/ac102628p). He is warning us, science scholars, for blogs. He is accusing bloggers for not being scholarly, not checking facts etc.

He did himself and the journal a big disfavor: in his blog he does precisely what he is accusing the blogger of: fail to check facts. Even worse, particularly for the 'Analytical Chemistry' journal, he showed inadequate in analyzing the problem, putting his scholarl........ Read more »

Murray R. (2010) Science Blogs and Caveat Emptor. Analytical chemistry. PMID: 20939598  

  • June 20, 2010
  • 11:47 AM

Looking at your statistical models...

by egonw in Chem-bla-ics

I do not think I have ever blogged the paper that played an important role in my thesis (doi:10.1021/ci990038z); research of one of the papers in my thesis, started with the hypothesis proposed therein. The paper had a really good idea; but, unfortunately, it did not contain the data to support the hypothesis. That gets me to one important lesson I learned: a QSAR data set of less than 100 molecules is not enough to make untargeted statistical models.

The paper reads quite nicely, and the resul........ Read more »

Willighagen, E., Denissen, H., Wehrens, R., & Buydens, L. (2006) On the Use of H and C 1D NMR Spectra as QSPR Descriptors . Journal of Chemical Information and Modeling, 46(2), 487-494. DOI: 10.1021/ci050282s  

  • April 18, 2010
  • 08:00 AM

BitTorrents for Science

by egonw in Chem-bla-ics

The idea has been lingering in the air for a long time now: sharing large science data sets using bittorrent. Over the past couple of years I have seen a lot of science related software being distributed over torrents, and the use in open source in general is abundant. Given a good network of so-called seeders, download times go down dramatically, and the overall energy consumption goes down too, as data has to follow a much shorter path.

It could very well be that the uptake of this technology........ Read more »

