“The other thing is, most of us scientists, we have some familiarity with computers, we use them all the time, we have smart phones, but we don’t want to know what’s going on behind the curtain computationally. We just want to use it. So it has to be user-friendly and intuitive. That’s what made Apple so successful – they didn’t invent any of this stuff, they just made it user-friendly. This is where CDD Vault stood out when I was doing an evaluation – it is user-friendly, it makes intuitive sense, it is easier for me to use. So I think this is going to be key going forward in this industry.”
Project Leader, OptiKira
President, RMK Drug Discovery Consulting
Rick Keenan is a senior level drug discovery scientist with extensive global experience consulting for pharmaceutical and biotechnology companies, venture capital institutions, philanthropic disease foundations and academic research laboratories. Rick has held positions at GlaxoSmithKline and was a founding member of CEEDD at GSK (the Center of Excellence for External Drug Discovery). At the Microbiology, Musculoskeletal and Proliferative Disease Center for Excellence in Drug Discovery (MMPD CEDD) at GSK his work with a team of medicinal chemists resulted in the identification of the TPO receptor agonist eltrombopag (presently marketed as Promacta). In his current roles (managing the research collaborations for BioMotiv and acting as project leader for OptiKira’s discovery activities on novel ophthalmology kinase inhibitors), Rick recommends the use of CDD Vault for the management of multiple CRO projects occurring within virtual drug discovery companies. Rick realizes the advantages that CDD Vault offers to the evolving data-intensive pharmaceutical industry business models which benefit from open science practices and are accelerating discovery.
CDD Advocate Dr. Shirley Louis-May spoke with Rick about his career-long promotion of open-source data policies in his many industry positions and about his use of CDD Vault for virtual pharma data management in particular.
SLM: Rick, in your long career and many roles in drug discovery, and currently as project leader for a virtual company OptiKira, you have been a champion of open innovation, novel data sharing initiatives and open source drug discovery. Can you tell me a bit about the strategic vision you have for current drug discovery efforts and how CDD Vault fits into that vision?
RMK: Sure. It became clear to me as I was carrying out various project leadership work for BioMotiv. You mentioned virtual companies, this is a prime example of a virtual company. I am the project leader sitting in Pennsylvania, the Project manager is based in Cleveland, the principal scientists are out of University of California San Francisco and the University of Washington in Seattle. We must collaborate, work together, and we have to have a way of sharing our data. It became clear to me that this was a big gap, the data sharing, as we were trying to put together the research team and the research planning.
I was asked by BioMotiv to investigate various computational platforms. I had worked closely with Barry Bunin, Kellan Gregory, and the rest of the CDD team just a couple of years before on the malaria data project (at GSK) and so I immediately thought of CDD. I tested a few other data hosting platforms and none came close to the type of power and capability that CDD Vault had. So, I recommended to BioMotiv to use CDD Vault as a platform for all of the virtual companies they start up. I think at least one other company in the BioMotiv portfolio is using CDD Vault currently, and as more companies get started from BioMotiv, I am sure we will see even more users of CDD Vault.
SLM: What specific attributes were you looking for when you evaluated a number of computational data hosting platforms?
RMK: It had to be user-friendly, easy to use. It had to work on both Macintosh and Windows computers. The structure drawing had to be straightforward with the ability to do a substructure search and search not just on keywords and phrases but also on structural information – that was paramount. A coincidence that helped in the decision making was that Barry Bunin’s lab mate from graduate school, Bradley Backes, was one of the co-founders of OptiKira. He had also worked with CDD before and liked it, so having a couple of people on the scientific team with familiarity with CDD Vault, and who appreciated its attributes, made it an easy sell. I also really appreciated the support that Charlie Weatherall and his team gave us at the outset, like walking us through various ways to input data, get data back out, search – that sort of hand-holding at the outset was important for a number of the team members. Since I had some familiarity with CDD Vault, it was straightforward for me, but not for some others. Right now, most of the data entry is done by people in the program management group who might not be experienced chemists but can still easily use the system and generate the information that is necessary.
SLM: Are you using standard types of data or are there any more advanced types of data being input, like the need for curve calculations or quantifiable images?
RMK: We haven’t gotten that far yet in the research but we are going to work in that direction as we start doing animal studies. Specifically, we will need to be able to input the optical coherence tomography data in CDD Vault. I think that another way that CDD Vault has proven to be very useful is when we are undergoing an external diligence. Our effort in retinitis pigmentosa is attracting some attention from some potential licensors. There are some large companies working in this area that would like to look at our research capabilities and because our data is so well organized in our CDD Vault, its relatively straightforward to go through these diligence processes as a small company. We don’t have to spend a lot of time on organizing files and getting all the data together. It’s all right there in CDD Vault. As you go through multiple of these due diligences, it could become a very time consuming exercise if not for the ready availability of the data in a computational platform such as CDD Vault. So this is making our life easier, not just in managing the research, but also as we have to share our data both within the team and with potential licensors.
SLM: Since you have done due diligence in your consulting capacity for a number of companies, to what extent does CDD Vault fulfill the informational requirement that companies need to consider as they are storing their data, such as documentation of protocols and documenting the creation of novel ideas for compounds or assays?
RMK: Due diligence is a process by which scientists at another company have to gain confidence in the scientific results you are presenting to them. So you can show them PowerPoint slides and graphs, but there’s nothing like showing them the raw data. That really helps. That’s scientists talking to scientists. They want to see how you calculated the data, where did the data come from, and they want to have confidence that you have done the experiments correctly. This is what CDD Vault allows you to do. You can show them as much detail as they need to gain that confidence because you can just keep clicking and clicking until you get down to the raw data. While some people are more satisfied earlier on in the process and some are more exhaustive, the amount of work from our end is no different. It’s all about giving them access to the data. We will be doing this two more times in the next couple of months, so we will see how it goes as we accumulate more data, but I honestly thought that this was going to be a lot more laborious than its actually turning out to be. Ironically, sharing the data is the easy part now. The hard part is scheduling the meetings with all the involved members at these companies, basically coordinating time on everyone’s calendar!
SLM: To that end, I know that within CDD Vault you are able to generate reports on the vault data. Some of the other CDD clients that I’ve talked to use the report generation tool to provide a lot of information prior to joint meetings. They also use a data notification system so that when new data is entered, that information automatically goes out to the team members so that in between meetings, there is information about the project flowing back and forth. Have the companies that you have worked with used any of these CDD Vault features?
RMK: Currently, our company is so small that we haven’t had a need for that. The companies I am familiar with in the Atlas Ventures portfolio – I did some consulting work for Quartet Medicine, and Mark Tebbe and his group – Quartet Medicine is a small company. I think they only have four people so are living up to the name. Sharing data among four people was not that challenging. At OptiKira, it is me as the project leader and Glen Gaughan as the project manager. There’s just the two of us, and we have a bi-weekly meeting with the academic principle investigators. Two of them have a chemistry background and like to wade into CDD Vault and look at the raw data. The other two just want to hear updates and not dive into the Vault as much, so I just put together some updates on a bi-weekly basis and its good enough for them. I think we’re going to expand our little team in the next year to year and a half, and it might be in the near future when we start using more advanced features of CDD Vault.
SLM: Well, I can imagine that in the early phase of a company, you would want to see all the data as it goes in because you want to make sure that things are being recorded as intended and that any formatting issues that could lead to data errors are checked for before you start automating any process. So when its small and manageable manually, it’s probably fine. However, I know that CDD Support offers advice on how to automate and validate the processes once the throughput gets to be more than is comfortable handling manually. I think that’s definitely one of the advantages that the CDD team offers.
RMK: That’s a good point because different CROs all have their own kinds of computational platforms. A lot of the Chinese companies use software that would have made it more straight forward to get the chemistry data into the system that way. But we found that it just wasn’t as good as CDD Vault. It was kind of clunkier. I don’t know how to say that in a more technical way.
SLM: I think one of the real nice things about CDD Vault is that, even though it originated in a chemistry-centric format, it has really expanded the ways in which it describes the biological data. The chemistry data isn’t really going to change much but the biological data continually varies from simple numbers to plots to quantifiable images and all the associated data therein. So development of the biological data capabilities has been one of CDD’s real strengths.
RMK: Yes, we are eager to take advantage of all that as we move forward. We started our chemistry operations in April of last year, and didn’t test our first molecule until later in July or August, so we’ve only been inputting data for less than a year now. But it’s really worked out well for us.
SLM: Had you worked with CDD prior to these virtual company projects? Aside from the initial Vault that hosted the GSK malaria data, had you used CDD public data sets or consulted on CDD private Vault data sets for external collaborations in your prior roles at GSK?
RMK: No, GSK didn’t use CDD for those purposes. They used it a little bit for the shared tuberculosis data that Barry put on there as a result of CDD’s funding from the Gates Foundation and GSK had some work underway in tuberculosis. Most of the projects in GSK were self-contained and didn’t have much data sharing. I think back in the day there was much more apprehensiveness around firewalls and security. I think that those concerns have all been managed well by CDD and other suppliers and I think that the way the world has evolved; you have to share now. You could do more things completely in-house in the past and I don’t think that’s how life is these days in the pharmaceutical industry. You have to get out there and find ways to put data in a place where people from different locations can access it. Six years ago, when I was last there, there were internal databases in GSK where everyone could see the data and there was less of a need to go external but I think life has probably changed within GSK too now.
SLM: As a consultant working with some of the smaller companies and partnering those to larger pharmas, have you come across any other pharmas that you know of that are using hosted database solutions or any kind of data sharing technology?
RMK: I mentioned Quartet, the Atlas Venture company that I was consulting with, they recently completed a deal providing funding for their sepiapterin reductase program. I know it was very helpful to Mark and his small team for their data to have been hosted in CDD Vault. I was involved in a quarterly chemistry consulting arrangement as one of three chemistry consultants. Mark always had all the data at his fingertips. He would just host us for the afternoon, launch CDD Vault, and we could ask him any question that came to mind. He would just have the data results from a quick search right there for us. That helped sell it for me. I thought that the BioMotiv portfolio was very akin to the Atlas portfolio and that, as new portfolio companies cropped up that were to be run by very small teams on the management side (largely virtual), CDD Vault would be the ideal vehicle to enable this data sharing.
SLM: Have you looked at the visualization tools available within CDD Vault at all?
RMK: I have. I’ve looked at it in a prior project. I feel my need for use of it comes and goes. I understand it’s a data analysis tool where you can graph two dimensionally the relationship between two assays or properties and it was available for me to use for a couple of months. I created a couple of graphs and it highlighted some outliers that allowed us to determine that the data had been entered incorrectly, as micromolar that should have been nanomolar. So these CDD Vault visualization tools helped identify those instances. However, in my current projects, I don’t have access to the CDD Vault visualization tools.
SLM: Well again, whereas you are in the early stages of generating data you might not be heavily into the data analysis yet but currently, is your team using any independent software for statistical analysis and visualization?
RKM: No, we’re not using anything right now.
SLM: I was wondering because, in my experience, with any external data analysis package there is a need to connect your data to the analysis somehow, by importing files, resolving formats, defining variables and data types, etc. Only then are you set to go. As a former user of several other data analysis platforms, there were always continuity issues. The larger platforms had many more bells and whistles, not that often used in practice, but that drove up the price such that smaller companies that we collaborated with could not afford them. So our analyses were different and comparisons were harder to make. As well, with each new version upgrade of several popular applications, my data was recognized differently and I spent a lot of time exporting and importing older version data files into the newer versions just to be able to continue my longer term analyses.
I believe that one major advantage of the visualization tools within CDD Vault is that it knows your data already. You don’t have to define anything or translate your data to the new application. Another great advantage of these visualization tools is the seamless connection between the analysis application and the CDD Vaults. This connection allows you to analyze any portion of your data, as well as the data from any collaborators’ CDD Vaults you have access to, as well as the public data sets available in CDD Vault. You can easily & simultaneously search across all these available data sets with very advanced search tools and then subset the data that is returned as its own data set for future reference or analysis.
RMK: Yes. I like that it was all just a click of a button to go from seeing the data to visualizing it. I recall that it wasn’t very expensive so I’ll have to make the case for adding the CDD Vault visualization tools to our package.
SLM: I think Charlie would be an excellent source to show you how much more you can do with the CDD Vault visualization tools besides X-Y plots. It intuitively allows you to explore multi-parameter data in X and Y using color and size as additional dimensions of the data to help spot trends, identify subsets of interest or mark outliers.
In my opinion, the real power of the visualization tools is not so much in the generation of 2D plots but that you also have (on the same screen) a panel histogram analysis of all the various properties of the data you are showing. So while you are showing the data in 2-4 dimensions in the plot, the panel on the right shows you all of the possible dimensions of the data that you could be exploring. Additionally, the plots and the histograms are dynamic and interactive. If you select an area of the plot, the properties of the subset are highlighted in the histograms, and vice versa. Alternatively, the histograms allow you to filter what you see in your plot to property ranges of your choosing. So you have the control to more richly visualize and select subsets of the data that provide interest or allow you to formulate and explore hypotheses. For instance, if you want to look at compounds in a certain pH range while plotting two other properties, you can color the subset uniquely on a 2D plot of all the compounds or you can filter out all compounds outside of the selected pH range. It’s that interactive play between your property panel and your plot that really brings the power to the analysis. That’s really the genius of CDD Vault’s visualization analysis tools, as I see it.
RMK: Yeah – I am just playing a lot more with CDD Vault (right now, actually) as I am generating a report for a new team of chemists. The new chemistry team wants to know what we’ve done so far and where they can jump in, so I’m just going to create and send them the report. Then, I’ll show them CDD Vault and have a good open discussion. I think this will work out well.
SLM: This is precisely the kind of situation where a support member from CDD could assist you with that meeting and very expeditiously yet comprehensively navigate and visualize the data based on the questions the medicinal chemists may have. CDD’s Kellan Gregory, who has been so central to the development of the CDD Vault’s visualization analysis tools, is a data wiz. I love to watch him during product demos at CDDs expo booth as people come up with all sorts of unique questions about the public data sets in CDD Vault. He can effortlessly pop a query result into a spreadsheet or create a plot which provides a really succinct way of answering their question. You are always free to ask for this kind of support from your CDD team.
SLM: In closing Rick, from your long history in the field, your positions in global pharma companies, from directing teams of medicinal chemists to much higher level strategic decision making, and then beyond in consulting for more discrete collaborations between academia and private institutions and for philanthropic organizations, can you comment on how you’ve seen the handling of data in the field evolve and witnessed the growth of data sharing as well as what you think is helping facilitate this movement towards a new model of science that can potentially accelerate the discovery of therapeutics?
RMK: I think that what has happened, overall, is that what is considered precompetitive is a moving bar. In the past, the IP concerns were such that people didn’t want to share anything and companies would patent just about everything. They patented gene sequences, which is no longer considered acceptable – nobody patents that anymore. They patented x-ray structures, which are not patented anymore. Now a lot of that stuff is shared. The Brookhaven database shares structures about a year after they are published, sometimes sooner. So now it’s felt that you can still compete in this business but that you compete somewhat further downstream and the early stuff is more shared.
The early efforts are very data intensive and you need to have vehicles that can handle these big data files that are not just text strings but are chemical structures, two-dimensional and three dimensional chemical structures, gene annotations, and you need to be able to search on structure as well as search on text strings. The computational requirements have just gone up and up. While more and more data is out there in the public domain (you have PubChem and ChemBL, for instance), these are all vehicles that are publicly available databases. Now the challenge on the scientists is how to wade through all that data. How to pull out the nuggets from all the other stuff of less value, that is not as useful. So having better computational tools is just imperative these days.
The other thing is, most of us scientists, we have some familiarity with computers, we use them all the time, we have smart phones, but we don’t want to know what’s going on behind the curtain computationally. We just want to use it. (laughter) So it has to be user-friendly and intuitive. That’s what made Apple so successful – they didn’t invent any of this stuff, they just made it user-friendly. This is where CDD Vault stood out when I was doing an evaluation – it is user-friendly, it makes intuitive sense, it is easier for me to use. So I think this is going to be key going forward in this industry.
Virtual companies have been around a bit now, five or ten years, and I think the business model is here to stay. The CRO industry has evolved – when I started 30 years ago there were few chemistry CROs and now most companies rely exclusively or almost predominantly on having molecules made outside their four walls. Keeping track of all these data is extremely important. More and more biology CROs are coming on board and the chemistry CROs are moving into full service CROs, where they do not just perform chemistry synthesis but they’ll do the in vitro assay testing and the DMPK studies. All this is just commonplace now. It’s changed the field considerably.
So I think we are circling back. I think that just being able to share data, being able to work across boundaries (whatever the company structures are), companies are evolving. You can have completely virtual to completely in-house discovery efforts but you are still going to have to be able to share data. Because in the end, there are more and more handoffs in the drug discovery process. Where it used to be from soup to nuts would all be done in one company, now often an academic group discovers a target and hands it off to a small biotech company that discovers a small molecule that binds to the target and carries it to the point that it hands it off to a larger company to do the clinical development of that small molecule. Each time, you have to share data and show what you’ve done. You’ve got to hand off the output of your work to a new team of scientists and being able to do that in an easy way is par for the course. That’s just how business is now done.
This blog is authored by members of the CDD Vault community. CDD Vault is a hosted drug discovery informatics platform that securely manages both private and external biological and chemical data. It provides core functionality including chemical registration, structure activity relationship, chemical inventory, and electronic lab notebook capabilities!
CDD Vault: Drug Discovery Informatics your whole project team will embrace!