From the desk of CDD CSO Sean Ekins, M.Sc., Ph.D., D.Sc.
If I have seen further it is by standing on ye sholders of Giants – Isaac Newton
As someone working in drug discovery you cannot escape what appears to be the vast number of papers (and growing) that use cheminformatics, computational modeling and machine learning. While each group either describes a model they build and hopefully some validation, very few discuss how they plan on making the models accessible to anyone else. This likely severely impedes anyone else actually building on or benefitting from these earlier publications.
We have been thinking about this challenge for several years and have been able to obtain an SBIR grant to help make models shareable. In a short time we have made some widely used fingerprints ECFP6 and FCFP6 open access on Github, and used these along with a Bayesian algorithm to implement models in a the TB Mobile app in order to predict potential targets for a molecule. This mobile app is free and as well as using the fingerprints for models it also uses them for similarity searching and clustering in the app. TB Mobile therefore provided us with a mechanism to do some rapid prototyping. Our more recent implementation of the FCFP6 fingerprints and Bayesian algorithm in CDD Vault has lead to the CDD Models within the recently launched CDD Vision. This has been described in a new paper in JCIM (download accepted version here).
This provides CDD users with access to CDD Vision the ability to build models in their private CDD Vault using their own data or the CDD Public datasets (or combinations thereof). We have also implemented the ability to export these models (if desired) in a format that can be loaded into several mobile apps currently available like MMDS from Dr. Alex Clark at Molecular Materials Informatics.
More importantly, by making the technologies open source, we can allow others to build on our work. To illustrate this, Dr. Alex Clark used the ChEMBL database to build over 2000 computational models and make them available to the community.
While we still have work ongoing to make the modeling tools in CDD Vault even more powerful, we think it is time to let people know what we are doing in this domain. Our hope is that groups will use CDD Vault to store their data, build their models and when ready, to publish their models so that they can be shared with the community (if desired).
This blog is authored by members of the CDD Vault community. CDD Vault is a hosted drug discovery informatics platform that securely manages both private and external biological and chemical data. It provides core functionality including chemical registration, structure activity relationship, chemical inventory, and electronic lab notebook capabilities!
CDD Vault: Drug Discovery Informatics your whole project team will embrace!