CDD Blog

Human Thoughts on ChatGPT - Part 1

Written by Brian Moore | Feb 23, 2023 1:00:29 PM


Barry Bunin, PhD
Founder & CEO
Collaborative Drug Discovery

Can ChatGPT help our mission of drug discovery? 

My youngest kid introduced me to ChatGTP for the first time by asking the algorithm to write a Flappy Bird inspired game in Python and a rap about logarithms. My oldest kid then complained that it wasn't a rap, it was just a poem.   

As a scientist, I try to look at the scope and limitations and possibilities of all new technologies as objectively as possible. Here are my thoughts on how to combine the best of ChatGPT type automation and human intelligence.  I intentionally did not write this article with ChatGPT, but provide the algorithm's take as a P.S. (for Kellan):

To the extent ChatGPT, and similar AI technologies, help create better software, and methodologies to explore the vast realms of data, all scientists can become more efficient.  When scientists are more efficient, especially within the realm of drug discovery, the whole world benefits.

We have always looked for ways to employ the newest technology to make life easier for the researchers we support. Within our flagship platform, CDD Vault, our developers were early adopters of high quality test-driven development (TDD) as espoused by Rob Mee.  In addition to >10,000 automated tests and >4 lines of test code for every line of production code, difficult problems recommend pair programming to keep the quality high in the Pivotal Process (now part of VMWare).

Software Quality:  GitHub has Copilot which "uses the OpenAI Codex to suggest code and entire functions in real-time right from your editor."  There is a recursive element where programmers train the algorithm, as the enhanced algorithm trains the programmers. I read an interesting story of an entrepreneur who created an algorithm to optimally teach people new languages.   His software would prompt the language learner to relearn the new terms just before forgetting, while elongating the time between prompts to effectively maximize the rate of learning (by increasing the length of remembering) with every cycle.  After successfully building the language learning company, the entrepreneur turned it around and used the algorithm cadence to optimally train his own brain.   While limiting his social utility, the process optimized the rate at which his brain could learn.  It'll be interesting to see how the algorithms and developers rapidly co-evolve for software development.  The rate at which software improves will have a multiplier effect for the scientists who most intelligently (with these new algorithms and other tools) use the software.

Reliable Data at Scale:  With The Pharmaceutical KnowledgeBase, we have been using OpenAI's GPT-3 (embedding an enhanced version of their most capable model for classification), since well before the big ChatGTP splash. We are harnessing the power of ChatGTP as a semantic data product to power drug discovery with automation technology scale, identifying, correct, unique identifiers. The goal is to enable precise, and more accurate models, by having standardized, correct Company, Drug, and Disease (CDD) data under the PharmaKB hood.

Disclaimer: CDD supports the Bill and Melinda Gates Foundation TB Drug Accelerator (collaborations are public information) with collaborations back to 2008 and I personally have respected Bill Gates thought leadership since he wrote the 1995 classic The Road Ahead. I was an early adopter of LinkedIn, respected Reid Hoffman's leadership setting up the social network, acquired LinkedIn stock early on (now MSFT stock).  Scientists can edit the Office Documents that everyone uses (such as Microsoft Excel, Word and PowerPoint) directly within CDD Vault ever since we announced our partnership providing direct integration of Microsoft Office 365 and CDD Vault Electronic Laboratory Notebook (ELN) in 2020 as part of the Microsoft Cloud Storage Partner Program.

P.S.  For comparison, here is what ChatGPT wrote when queried with "Write me an article on how ChatGPT can help our Collaborative Drug Discovery software team better develop our flagship CDD Vault software platform?"

ChatGPT:  Collaborative drug discovery is a complex process that requires an immense amount of data processing and analysis. To make this process more efficient, the software team at CDD Vault can leverage the power of artificial intelligence and natural language processing, with the help of ChatGPT, a cutting-edge language model trained by OpenAI.

ChatGPT can provide valuable support to the CDD Vault software development team in several ways. Here are a few examples:

  1. Natural Language Processing: One of the primary benefits of using ChatGPT is its ability to process natural language. This means that the team can use ChatGPT to analyze the vast amounts of unstructured data that are generated during drug discovery. ChatGPT can help to extract important information from unstructured data such as scientific articles, lab notes, and other relevant sources. By using natural language processing, ChatGPT can help the team to better understand the data and make informed decisions.
  2. Machine Learning: Machine learning is another area where ChatGPT can be incredibly helpful. The CDD Vault software development team can use machine learning algorithms to analyze large datasets and make predictions based on patterns and trends that the system has learned from the data. ChatGPT can help to train these machine learning algorithms by providing context and understanding of the data. This can be particularly useful in drug discovery, where small differences in molecular structures can have significant impacts on the final outcome.
  3. Chatbot: ChatGPT can also be used as a chatbot to help streamline communication within the software development team. For example, the team can use ChatGPT to answer common questions related to the CDD Vault platform, such as how to use certain features, or how to troubleshoot common issues. This can save time and resources for the team and help them to focus on more complex problems.
  4. Semantic Search: Semantic search is another feature that ChatGPT can provide to the CDD Vault software team. By using natural language processing and machine learning, ChatGPT can help to improve search functionality within the platform. This can help users to find relevant information quickly and easily, leading to a more efficient and streamlined workflow.

In conclusion, the software development team at CDD Vault can benefit greatly from incorporating ChatGPT into their development process. By leveraging the power of natural language processing, machine learning, chatbot functionality, and semantic search, the team can improve their ability to analyze and interpret complex data, and streamline communication and workflows. With these tools at their disposal, the CDD Vault software team can make significant strides in advancing drug discovery and development.

P.P.S.  I asked ChatGPT to write a scientific abstract on one of our active areas of research - it happened to be on our deep learning and generative models for drug discovery. It wasn't perfect. However, the thing that surprised me was not how bad it was, but that it was better than I expected.

=

Barry A. Bunin, PhD, is the Founder & CEO of Collaborative Drug Discovery, which provides a modern approach to drug discovery research informatics trusted globally by thousands of leading researchers. The CDD Vault is a hosted biological and chemical database that securely manages your private and external data.

 

1.  Question to ChatGPT (on 18 January 2023):

Question: "Write me an abstract about generative and deep learning for drug discovery?" Answer:  Generative and deep learning methods have emerged as powerful tools in drug discovery, with the potential to significantly accelerate the development of new therapies. Generative models, such as Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), can be used to generate new molecular structures with desired properties, such as high binding affinity for a target protein. Deep learning techniques, such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs), can be applied to analyze large amounts of data from high-throughput screening experiments and predict the efficacy and toxicity of potential drug candidates. By leveraging these techniques, researchers can more efficiently identify promising compounds for further development, ultimately leading to the discovery of new and effective drugs.