March 27, 2026

Augmented Design: Practical Applications of LLMs and Agentic Workflows in Medicinal Chemistry

A recap of the CDD Vault webinar featuring Jeff Blaney, PhD, Sr. Director of Discovery Chemistry at Genentech, and Garry Pairaudeau, PhD, CEO and Co-Founder of Dalton Tx

CDD Vault hosted a panel discussion on the practical application of large language models (LLMs) and agentic workflows in medicinal chemistry. Jeff Blaney, PhD, Sr. Director of Discovery Chemistry at Genentech, and Garry Pairaudeau, PhD, CEO and Co-Founder of Dalton Tx, discussed where these tools are currently delivering value, where they fall short, and what the near-term trajectory looks like for drug discovery teams of all sizes.

Where LLMs Are Currently Useful

Blaney described a shift in his assessment of LLMs over approximately the past nine months. Initially, he viewed them as useful primarily for code generation and document summarization. His position changed when LLMs demonstrated the ability to retrieve relevant literature he was unaware of and to propose conceptually sound suggestions in structure-based design, a domain where he has deep expertise.

These results came from general-purpose models with no access to proprietary internal data. Blaney characterized this capability as inference rather than design, and he was explicit that extrapolation to genuinely new chemical space remains limited. The models perform well within or near the training distribution, not outside it.

Pairaudeau added that LLMs allow researchers to ask outcome-focused questions rather than manually configuring complex software pipelines. The practical value is in compressing the setup time for multi-step computational workflows, including generative chemistry, predictive modeling, scoring, and synthetic tractability assessment, from weeks to minutes. The computation itself still takes time; the setup does not.

The Extrapolation Problem

Both panelists identified extrapolation outside the training distribution as the central limitation of current ML and AI methods in medicinal chemistry. Blaney noted that the most widely used models show limited capacity to suggest truly novel molecules, which is the core requirement in drug discovery. Chemical space is effectively unbounded; any active project operates at least partially outside prior training data.

Pairaudeau distinguished between ML models and LLMs on this point. ML models cannot extrapolate in any meaningful sense. LLMs, because they encode broader scientific context from the literature, can occasionally produce inferences that extend beyond the training distribution. Physics-based methods combined with ML offer another route out of the training domain. Neither approach fully solves the problem.

For smaller organizations with limited internal data, Pairaudeau pushed back on the assumption that AI tools are only viable at scale. Local models built on as few as 10 to 20 project-specific compounds can outperform large general models for a given project, because the local data is in-domain. The quality of how a model learns matters more than the volume of data it was trained on.

Agentic Workflows: Current State

The panel discussed agentic AI not as a replacement for scientific judgment but as a system for orchestrating computational workflows and reducing the time scientists spend navigating software interfaces.

Pairaudeau framed the value proposition around democratizing access to high-end computation, including molecular dynamics and quantum mechanics, by allowing domain experts to query these systems in natural language rather than learning specialized configuration procedures. The bottleneck shifts from "how do I run this tool" to "what does the output mean."

Blaney noted that several internal groups at Genentech are building agents that connect internal data to LLMs for interactive analysis. He also described existing automated systems that route newly synthesized compounds into standard assay panels and flag outliers for repeat experiments without human intervention. He characterized this as a relatively simple form of decision automation that has been operational for years, distinct from the more complex agentic systems currently under development.

Both panelists agreed that as experiments become more expensive and higher-stakes, the required level of human oversight increases. Automated decision-making is appropriate for low-cost, fully automated in vitro assays. It is not appropriate for candidate selection or target identification, where errors in AI output carry substantial downstream consequences.

Knowledge Capture and Cross-Project Learning

Both panelists identified cross-project pattern retrieval as an underexplored use case for agentic LLM workflows. Blaney described a scenario common in drug discovery: a senior medicinal chemist with no direct exposure to a project recognizes a structural liability from prior work on an unrelated target and proposes a solution. Current informatics systems make it straightforward to retrieve data within a project but do not facilitate that kind of target-agnostic pattern retrieval across projects.

He identified this as a near-term opportunity for agentic systems coupled with LLMs, given that LLMs already encode a substantial portion of the published literature and that agents with access to internal data could apply similar logic to proprietary datasets.

Pairaudeau noted that work conducted within an agent-assisted workflow is inherently logged, creating an audit trail that future team members or systems can learn from. This positions agentic workflows as a foundation for organizational knowledge accumulation, not just individual productivity.

Data Infrastructure as a Prerequisite

Blaney raised a point both panelists endorsed: organizations cannot extract value from ML or AI tools without first establishing sound informatics infrastructure. Data must be machine-ready, consistently formatted, and well-curated before models or agents can operate on it effectively. He described this investment as less visible than AI development itself, but foundational to any downstream ML work. He referenced CDD Vault as one external partner Genentech uses to support that informatics layer.

Pairaudeau added that AI methods are increasingly being applied to the data structuring problem itself, which reduces the manual effort required to get data into a usable state.

Security and Accessibility

On the question of whether agentic AI is viable for smaller organizations, Pairaudeau was direct: it does not require large-pharma resources. A private LLM deployment addresses most IP and security concerns. The primary caution he raised was against uploading sensitive target or biological data to public-facing general-purpose models.

Blaney described Genentech's approach as preferring on-premises deployment where possible, with contractual and technical controls applied when using third-party cloud systems. Preclinical and clinical data carry different risk profiles, and the controls applied should reflect that distinction.

What AI Should Not Be Trusted to Do

When asked directly, Blaney identified two areas where expert review is mandatory before any action is taken:

Molecule selection for expensive experiments. Automating this decision without human review is not appropriate given current model reliability.
Target identification and validation. Colleagues working in target biology consistently report a high rate of hallucinations in this domain, which also carries the highest downstream cost per error.

Pairaudeau offered a different framing: rather than asking what AI should not be trusted to do, the more useful question is whether any single source, human or AI, should be trusted without corroboration. The answer, consistent with how peer review functions in drug discovery generally, is no.

Tag(s): Webinars , CDD Blog