Dark Chemical Matter in Drug Discovery: When Compounds Simply Won’t Work

Over the years, thousands and thousands of compounds have been tested over an ever-increasing number of targets and assay conditions. And while it is expected to find most compounds in screening exercises to be inactive, nobody previously had investigated if there were any compounds that would be inactive against any target/assay used to date.

Well, Novartis addressed this question in a recent publication in Nature Chemical Biology. In this study, the authors looked into the Novartis screening deck and the NIH’s Molecular Libraries collection, identified those compounds that were inactive in at least 100 assays (Novartis: 234 assays; NIH: 429 assays), and coined a new label for these compounds: Dark Chemical Matter (DCM).

Novartis found that out of their 803,990 compounds tested in at least 100 Novartis assays, a remarkable 112,872 compounds were inactive (14.0%) across all assays. And an analysis of NIH data showed that out of 363,598 compounds tested in at least 100 NIH assays, similarly 131,726 compounds were inactive (36.2%).

The authors provided a DCM compound list of 139,352 molecules containing 10,355 Novartis compounds and 131,726 PubChem compounds (noting that there is an overlap of 2,729 compounds).

The compounds and assay data (reporter-gene assay, gene expression, QC) were imported into CDD Vault. A “Dark Matter Compounds” data set was publicly shared with the scientific community and commercial subscribers can securely compare their own SAR with these mysterious dark matter compounds – see:


Exact matches are automatically identified; substructure and similarity searches can be performed from the CDD Vault search page. QSAR models, calculations, and visualization for advanced comparisons can easily be done directly within the new CDD Vision module. A medicinal chemistry analysis on the compounds was conducted using calculated molecular physicochemical and drugability parameters (calculated automatically for compounds when uploaded into the CDD) from a rule-of-5 (drug-like), rule-of-3 (lead-like) and PAINS perspectives. Detailed data are provided at the bottom of this post as an appendix below the leading references, for convenience.

It is clear that these compounds were selected from a drug-like bias following Lipinski’s rule-of-5 (93.7%) over a lead-like approach (7.4%) keeping pan assay interference compounds (PAINS) mostly out of the collection (1.5%). However, it is surprising at this stage to find that almost one-fifth of the compounds used contain moieties that are generally avoided for screening since such compounds will be reactive. For example, this DCM collection has some compounds such as carbonic acid, tetrabromoethane, Michael acceptors, and large compounds with molecular weight from 800 Dalton up to over 1,000 Dalton.

This study brings to light that there are compounds that for unknown reasons are simply inactive in many biological assays. Clearly, robust database software with assay analytical and data visualization capabilities like those found in CDD Vault and CDD Vision help rapidly ID activity cliffs. Yet, with any tools, perhaps the darkness associated with compounds can be better understood over time as a function of the type of assay, target pursued, and biological assay conditions.

Who is to say that one of these dark compounds wouldn’t be active in an assay in some other research group or company? As robust as this study is, it is still limited to Novartis’ and NIH’s chemical and biological space, and while they both are unarguably big, they are still finite. In science, each question answered, suggests multiple follow-on questions.

In this context, having collaborative software where different members of the Project Team can interrogate the data from complementary perspectives may shine light on the darkness.



Gilbert M. Rishton. Reactive compounds and in vitro false positive in HTS. Drug Discovery Today 1997, Vol. 2, pp. 382–384

Simon Saubern, Rajarshi Guha, and Jonathan B. Baell. KNIME Workflow to Assess PAINS Filters in SMARTS Format. Comparison of RDKit and Indigo Cheminformatics Libraries. Molecular Informatics 2011, Vol. 30, pp. 847–850

Anne Mai Wassermann, Eugen Lounkine, Dominic Hoepfner, Gaelle Le Goff, Frederick J King, Christian Studer, John M Peltier, Melissa L Grippo, Vivian Prindle, Jianshi Tao, Ansgar Schuffenhauer, Iain M Wallace, Shanni Chen, Philipp Krastel, Amanda Cobos-Correa, Christian N Parker, John W Davies, & Meir Glick. Dark chemical matter as a promising starting point for drug lead discovery. Nature Chemical Biology 2015, Vol. 11, pp. 958–966


Appendix: Analyses of the Dark Chemical Matter Compound Collection by physical chemical properties, including Rule of 5, Rule of 3, PAINS, and reactivity filters:


The following is an overview of the DCM compound collection:


The following is an analysis from a Lipinski’s rule-of-5 perspective:


The following is an analysis from a lead-like rule-of-3 perspective:


An analysis determining compounds that failed PAINS filters: 2,107 (1.5%)


Analysis from a general medicinal chemistry (red flags):


New Call-to-action

Translated with Google Translate