PubChem is best known as an archive, but it's also a knowledge base.
PubChem is helpful for those researchers who are interested, not just in chemical information, but also in the known biological activities of a particular compound.
The data in PubChem is integrated with the entire scope of what researchers are looking for: genes, genomes, and literature, as well as physical properties, such as toxicity.
Literature, basic chemical data, and the biological data for a huge base of chemicals, is searchable via PubChem.
chEMBL is a manually created database of bioactive molecules.
Many of these are related to drug discovery, and that is where the origin of much of the data comes from.
But, there are other types of bioactive molecules in the database as well, including small molecules, peptides, and therapeutic antibodies.
In chEMBL, there is a certain amount of bespoke curation, to enhance the quality of the information within the database.
Additionally, chEMBL shares data with resources such as PubChem.
The sharing and accessibility of data are key.
Otherwise, researchers end up in the situation where they keep reinventing the wheel and wasting time, instead of making more discoveries in an expedited fashion.
chEMBL contains data about molecules and their activities against biological targets.
So, within the chEMBL database, it’s possible to ask a question such as, “show me all of the bioactive molecules you have against this particular protein, or indeed against this family of proteins.”
Then, using the data about those proteins, that are often reported in publications and documents, chEMBL will display the requested information.
That allows for follow-up questions about those molecules to be asked.
A researcher can ask questions such as, “What are the selectivities of the targets? What is the status of these molecules? Are they in clinical trials? Are they marketed drugs?”
In PubChem, researchers can use the data view to gather information about a biological target.
If a researcher is interested in a particular target, there's no sense in having to look at thousands of different sets of experiments that have been run.
Instead, with data view, there is a single page where everything is aggregated together into one document where it’s possible to download that content.
It becomes more actionable.
Additionally, instead of just viewing one possible target, with aggregator-based pages it’s possible to look at a set of targets.
For example, a researcher may not be interested in just a single GCPR, but rather is interested in all GCPRs, or not a single potassium ion channel receptor but all potassium ion channel receptors.
The aggregator pages allow for these broad questions to be asked.
It allows researchers to get access to the content they need, download it, and then do something more with it in their own research.
How do we get to the point where we can use these high-powered tools and large databases to answer questions through open data? It’s a lot of work to get to that point and there are a lot of tough questions and current barriers that we have to work through in order to ensure that we have future success in this area. Being able to innovate and quicken this kind of drug discovery is one of our goals. But, it’s important that we continue to drive forward in making drug discovery data more widely available.
This blog is authored by members of the CDD Vault community. CDD Vault is a hosted drug discovery informatics platform that securely manages both private and external biological and chemical data. It provides core functionality including chemical registration, structure activity relationship, chemical inventory, and electronic lab notebook capabilities.