3 Pitfalls Caused By Managing Your Drug Discovery Data In A Spreadsheet

Why Spreadsheets And Informatics Don’t Mix

“We had Excel sheets all over the place and data from different projects that were just separated in different folders.”
A scientist shared this with me recently. It’s not the first time that I heard these sentiments.
This scientist went on to say the following:
“After a while, it got to the point that it was just hard to manage.”
This is a common mistake that many scientists make – storing and managing their data in unsecure, difficult to find spreadsheets.
While this method might be okay for a lone scientist working in a vacuum, it is not a smart protocol for collaborative scientists doing deep work in drug discovery or in other chemical or biological fields that rely on storing, recalling, and sharing large amounts of data.
Sharing documents with your lab-mates and/or co-workers via Excel or Google Docs is convenient, but can be very insecure.
In science, especially in the drug discovery field, flawed data management can be catastrophic.
If you make one little typo when emailing the Excel file, or when sharing it with someone on Google Drive, you’re in trouble.
If you share data in a way that violates government regulations, fail to backup critical data, or make hazardous data entry mistakes, your career could be over.

3 Reasons Why Spreadsheets Fail Scientists

At best, scientists who rely on Excel files to manage scientific data and communicate results run the risk of operating inefficiently and wasting resources.
At worst, critical data becomes compromised, scientific innovation stalls, and identifying new development candidates suffer.
Other negative outcomes of using spreadsheets to store and manage your data include…

  • Restricted access to your data
  • Reduced control over your data & reduced security
  • Less productive collaborations & longer design cycles

The New York Times recently reported that sharing data on general cloud-based platforms like Google Drive is exceptionally risky.
Wired Magazine also confirmed that storing data on a secure local server is not always ideal because even the world’s most secure local servers can become unsecure without anyone knowing about it.
But if you can’t trust general cloud-based platforms or “secure” local servers to store your data, what can you trust?
Before answering this question, you need to understand just how dramatically using Excel or other spreadsheets limits you scientifically…

1. Restricted access to your data.

What is a spreadsheet and why do you or any scientist use it?
You might not have thought about it before, but spreadsheets are files that allow you to store and manage data. This sounds like a positive, right? What if you read it this way – spreadsheets are files that require you to store and manage data.
In other words, are spreadsheets requiring you to do more work than necessary?
For example, if you are using spreadsheets to store your data, then you must keep a current copy of your spreadsheet file with you at all times, and you must keep the file updated, and you must keep it stored in a place where not only you, but where your colleagues can quickly and easily access it.
What does this mean?
It means that spreadsheets are not easily accessible because YOU are the one that has to manually manage their accessibility.
Ask yourself: Are spreadsheets searchable?
No, not really. You cannot search for value ranges, chemical structures, or similarity, and You certainly cannot search for multiple complex criteria.
While spreadsheet files can hold simple tabular data for individual experiments, they cannot reveal relationships in  data that crosses multiple experiments, i.e. cross-reactivity in multiple assays, remaining batch inventory, duplicate compounds, etc.
Can your spreadsheet files provide dose response curves or Z statistics? The answer is most likely “no”.

2. Reduced control over your data & reduced security.

The more you can control your data, the more secure your data is.
When it comes to security, spreadsheets fail. This is because spreadsheet files can easily be forwarded to unauthorized people (intentionally or accidentally).
In addition, data updates to your spreadsheet file are not propagated to all your labmates, or all “users” of the data. And, as mentioned above, it is not always easy to manually keep track of which spreadsheet is your most current version.
You might not realize this, but passing data files back and forth over email is insecure, as is cloud file sharing. This is true even if your univeristy of institution is using a local server with only their standard safeguards. In fact, according to Computer World, this one group alone claimed to have hacked over 100 university servers, including Harvard, Stanford, and Penn.
Worst of all, spreadsheet files can be lost or accidentally deleted.
If you’re a scientist and this has happened to you in the past, then you know how devastating such a loss can be.

3. Less productive collaborations & longer design cycles.

Spreadsheets, even cloud-based ones, provide very little benefit when it comes to collaborating with other labs, especially those outside of your institution.
In science, time matters.
This is especially true when involved in a scientific collaboration. The problem is that if you’re a scientist who shared data with collaborators via spreadsheets, you will be required to constantly wait for collaborators to send updated data, and visa versa. All of this waiting will delay the progress of your collaboration.
There are a variety of problems that scientists face when it comes to using spreadsheets for collaborations. For example, collaborators can accidentally use outdated data, wasting resources on old hypotheses.
Most importantly, scientists cannot collaborate with a spreadsheet file in real time.
Even cloud-based spreadsheet (assuming that they were secure, which most are not) make the process of sharing in real time cumbersome at best.   A spreadsheet holds only the experimental data, so it doesn’t foster real-time collaboration on their analysis. It does not help scientists share and explore their conclusions in real-time.
Using spreadsheet files create communication bottlenecks that slow progress. This is because it’s nearly impossible to keep everyone in sync with the latest data when you’re sharing multiple spreadsheets with multiple scientists. Sharing spreadsheets over email or over a basic cloud-sharing platform is insecure, even if your univeristy of institution is using a local server with only their standard safeguards. Finally, spreadsheets are not searchable. You cannot search a finder window on your computer for value ranges, chemical structures, similarity, or other criteria. You certainly cannot search your computer for multiple complex criteria. As a result, smart scientists must think beyond the use of spreadsheets to secure their data, ensure it’s accessible, and to share it productively and safely.
Are you still using spreadsheets to manage your scientific data?
If so, you may be facing challenges similar to those outlined above.
This blog is authored by members of the CDD Vault community.
