Machine Learning from an Expert’s Evaluation of Compounds  

November 14, 2014

CDD has recently developed, as part of CDD Visualization, the ability to build machine learning models.

These can be used to make predictions for properties of molecules that may not have been tested. Once a scientist chooses a “good” set of compounds, perhaps with a desired bioactivity, and a “bad” set, the inactives, with a single click they can build a machine learning model. In a few seconds the model is built and cross validation is performed. The model can then be used to score other compounds, such as from a vendor library, the FDA approved drugs in CDD Public, or molecules that are stored securely in the users own private CDD Vault.

In discussions with our advisory board member Dr. Christopher Lipinski, we began to wonder what other properties could the machine learning models predict. An expert medicinal chemist’s appraisal of the MLPCN probe compounds was high on the list. Dr. Lipinski had previously been involved in a study where 11 experts had given their opinion on the 64 NIH Probes. It has been over 5 years since that study and now there are more than 300 NIH Probes. Dr. Lipinski has evaluated them as he would for a drug development program through an exhaustive, manual iterative process. He ultimately determined if a probe was desirable or not, considering the literature and chemical reactivity. We have now included Dr. Lipinski’s evaluations alongside the NIH Probes available on CDD Public. From his decisions, we defined a “good” and “bad” set and built machine learning models that could predict his scores! You can read the details in a recent paper published in J Chem Inf Model.

This heat map shows a comparison of our “Expert Model” with other druglikeness metrics for the probes labeled as undesirable by Dr. Lipinski. Red corresponds to a less druglike value for each metric.


This work still leaves some open questions – would a machine learning model be more predictive with evaluations from more than one expert? Can a scientist build a model that works based on their own evaluations? If you’d like to see how your compounds would score in our “Expert Model,” please contact us at [email protected].

This blog is authored by members of the CDD Vault community. CDD Vault is a hosted drug discovery informatics platform that securely manages both private and external biological and chemical data. It provides core functionality including chemical registration, structure activity relationship, chemical inventory, and electronic lab notebook capabilities!

CDD Vault: Drug Discovery Informatics your whole project team will embrace!