CDD Awarded $1.5 Million Grant to Automate Open Source Machine Learning Models

Burlingame, Calif.—March 27, 2017—Collaborative Drug Discovery, provider of CDD Vault® web-based drug discovery informatics platform, announced they have been awarded a $1.5M Phase 2B SBIR grant to automate machine learning models titled “Biocomputation Across Distributed Private Data Sets to Enhance Drug Discovery.”

This work builds upon two earlier projects that provided open source descriptors and 1-click model building within CDD Vault and in the CDK Open Source Toolkit for use in any software.   Importantly, the descriptors and algorithms were validated on a large set of diverse data in a co-publication between CDD and Pfizer titled “Using open source computational tools for predicting human metabolic stability and additional ADME/Tox properties” (see: Gupta, R. et al Drug Metab Dispos, 38: 2083-2090, 2010).

ECFP diagram

This SBIR grant will pave the way for a more collaborative, differentiated secure sharing within the popular CDD Vault platform. “The exciting, ambitious part of this project is to democratize models by having the automated optimization of model selection become more valuable than the cost of individual, expert-driven model building.  This builds on our track record making equally complex bioassay data capture more engaging for a range of heterogeneous scientists.” said Barry A. Bunin, PhD, CDD CEO and the Principal Investigator (PI) on this latest grant.

Specific Aims of Award Number 2R44TR000942-05:

Our goal is to democratize the role in drug discovery of computational models – which have historically been restricted to computational experts – and allow models to become routine aids to the discovery workflow in academia, foundations, government laboratories, and small companies that do not have the resources to employ them today. In Phase 2 we implemented modified Bayesian model building directly within CDD’s web-based CDD Vault platform, which securely hosts structure-activity relationship (SAR) data; any user can now easily train a Bayesian model with experimental data stored in her private Vault, then apply the model to predict activity for untested compounds. In Phase 2B we propose to generalize this capability with the following new Specific Aims, which are needed to achieve a widespread scientific and commercial impact:

Aim 1: Integrate a suite of diverse computational techniques (such as QSAR, Neural Networks, Support Vector Machines, Random Forest, k-Nearest Neighbors, and possibly others) into a single framework, to allow direct side-by-side comparison.

Aim 2: Develop and validate a universal metric that ranks the predictive strength of each method as applied to a particular dataset.

Aim 3: Apply the metric to automatically generate thousands of models from high-quality, public-access structure-activity and ADME/Tox datasets and present key results to the user.

Aim 4: Develop a novel capability to build models collaboratively, by aggregating multiple datasets, and share the models without revealing the compounds and data in the training sets.

Collaborative Drug Discovery (CDD) proposes to develop technology that will vastly simplify and integrate all the processes required to exploit predictive models for drug discovery. The software will make it easy for scientists without specialized training in informatics to create, train, apply, evaluate, share, and archive models with minimal effort, and also leverage a large library of pre-computed models with zero effort. The software will also enable scientists working in different organizations to collectively build models from their aggregated data and share these models with each other, without sharing the underlying training data.

About this grant

The Phase 2B Small Business Innovation Research (SBIR) is part of a program to enable sharing of biological data. Award Number 2R44TR000942-05 from National Center for Advancing Translational Sciences as described on NIH Reporter supports this project. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Center for Advancing Translational Sciences or the National Institutes of Health.

About Collaborative Drug Discovery, Inc.

CDD’s ( flagship product, “CDD Vault®”, is used to manage chemical registration, structure activity relationships (SAR), and securely scale collaborations. CDD Vault® is a hosted database solution for secure management and sharing of biological and chemical data. It lets you intuitively organize chemical structures and biological study data, and collaborate with internal or external partners through an easy to use web interface. Available modules within CDD Vault include Activity & Registration, Visualization, Inventory, and ELN.

A complete list of >60 publications and patents from CDD can be found online on our resources page:

Media Contacts: Barry Bunin, PhD, Collaborative Drug Discovery, (650) 204-3084,