CDD’s informatics research wing has been busy this summer. In a collaboration with the Bioassay Ontology Project at the University of Miami and funded by a NIH SBIR Phase I grant, CDD has prototyped a new tool for annotating bioassay data. Our method is fully described in this paper, just published in PeerJ: Fast and accurate semantic annotation of bioassays exploiting a hybrid of machine learning and user confirmation.
Why is annotation important? Data that is fully and accurately annotated enables “intelligent” searches to be performed. Rather than just simple text searches, annotated data can be queried for complex ideas. For example, one could execute a search to find compounds that activate kinases implicated in diabetes that also have been tested in cellular growth assays or inhibitors identified through luciferase-based screens. Such innovative searches are a key area of research for the BARD project.
Why do we need a tool for annotating data? To annotate bioassay data well is a laborious process. Each bioassay has dozens to hundreds of key pieces of information that would ideally be captured through annotation. These concepts include the target, the instrumentation, the species, the controls, etc. Filling this data out in an organized manner can take hours, a substantial barrier for busy scientists.
How will CDD’s annotator tool help? Scientists will enter a description of a bioassay to the tool, which uses a natural language processing/machine learning approach to suggest annotations. Scientists approve each annotation with a click, and can search or manually select annotations that the algorithm has trouble finding. While complete automation might be ideal, this hybrid approach with human feedback ensures that the annotations are accurate. In tests of this tool, researchers were able to annotate their complex data in minutes.
I’m in! When will this tool be available? We look forward to building fast and accurate annotation into the CDD Vault to help researchers best leverage their own private data as well as mine publicly available data. In the meantime, we’d love to collaborate with you to help you annotate your data ASAP. Interested? Contact us. More data annotation is good for everyone!
This blog is authored by members of the CDD Vault community. CDD Vault is a hosted drug discovery informatics platform that securely manages both private and external biological and chemical data. It provides core functionality including chemical registration, structure activity relationship, chemical inventory, and electronic lab notebook capabilities!
CDD Vault: Drug Discovery Informatics your whole project team will embrace!