CDD Blog

CDD Vault Update (November 2023)

Written by Admin | Nov 3, 2023 11:06:21 PM

Handling Non-Numeric Values in a Data Import File

Often, data files which are being imported into CDD Vault will contain certain text values in columns that are being mapped to numeric fields/readouts. Examples include “ND” for “Not Determined” or “N/A” for “Not Applicable”.

The CDD Vault Import Data Wizard will now report these rows as Suspicious Events in the QC Report. At which point, the user can choose 1 of 2 options:

  • ACCEPT - the text values being mapped to a numeric destination will be “blanked out” and the rest of the data on these rows will be successfully imported
  • REJECT - none of the data on these rows will be imported

As a quick example, importing this data file and mapping the Inhibition column to a Protocol numeric readout definition …

… will result in a Suspicious Event and any row containing textual data will be REJECTED by default.

The default REJECT selection matches the old behavior, and no data from the affected rows will be imported.

API Endpoint for Structure Images

The GET Molecules API call has a new /image parameter that will retrieve the image of the registered Molecule.

GET …vaults/<vault_id>/molecules/<molecule_id>/image

  • runs as an async API call
  • returns an Export ID

GET …vaults/<vault_id/exports/<export_id>

  • retrieves the image of the structure

New Parameter for GET Slurps API Endpoint

There is a new show_events parameter on the GET Slurps API endpoint that will show any row that generated a Suspicious Event or an Error. The details of the event are also included.

Once you've done the POST Slurps call, the next step is to use GET Slurps to check the status of the import. Including the new show_events parameter will add the details of any Suspicious Events and Errors to the JSON that is returned.

GET …vaults/<vault_id/slurps/<slurp_id>

With JSON like this:

{"show_events":true}

Now returns JSON that includes suspicious/ambiguous events and error, something like this:

{

   "id": 1736694,

   "class": "slurp",

   "created_at": "2023-10-31T19:24:07.000Z",

   "modified_at": "2023-10-31T19:24:09.000Z",

   "state": "rejected",

   "api_url": "...vaults/<vault_id>/slurps/<slurp_id>",

   "total_records": 1.0,

   "records_processed": 1.0,

   "records_committed": 0.0,

   "ambiguous_events_count": 0,

   "suspicious_events_count": 0,

   "import_errors": [

       {

           "class": "batch identifier not found",

           "message": "Record rejected because no batch with External Identifier 'DoesNotExist' exists in your database."

       }

   ],

   "import_errors_count": 1

}

Easily Register Ambiguous Structures

Use the new duplicate_resolution parameter to register ambiguous OR structures (structures drawn with the OR enhanced stereo label). This provides a way to register a new Molecule (versus a new Batch of an existing Molecule) via the API.

For a majority of CDD Vaults, which use the chemical registration system, use the duplicate_resolution with the POST Batch API call to register a new molecule.

By default (no parameter is used), a new Batch of the existing record is created. If more than one Molecule exists, and no parameter is used, an error is returned and no new Batch nor Molecule will be created.

Specify one of the following options when using this parameter:

  • first
    • "duplicate_resolution":"first"
    • results in a new Batch being registered for the first Molecule detected as a potential tautomer or duplicate
  • new
    • "duplicate_resolution":"new"
    • results in a new Molecule being registered
  • prompt
    • "duplicate_resolution":"prompt"
    • results in nothing being registered
    • matching molecule IDs are returned

Helpful hint:

  • For Vaults which do not utilize the chemical registration system, use the duplicate_resolution with the POST Molecule API call.
  • If this looks familiar, you are correct - these options were previously available for the tautomer_resolution parameter which is no longer needed since the new parameter handles all forms of duplicates: tautomers, ambiguous stereocenters, intentional duplicates.

This blog is authored by members of the CDD Vault community. CDD Vault is a hosted drug discovery informatics platform that securely manages both private and external biological and chemical data. It provides core functionality including chemical registrationdata visualization, inventory, and electronic lab notebook capabilities.