In anything involving clinical trial data, two acronyms come up often. The first is CDISC, the Clinical Data Interchange Standards Consortium, a Texas-based nonprofit whose mission is to ‘amplify data’s impact.’ In a line of work that involves making sense of complex data, CDISC endeavors to help researchers and experts along.
The second is one of the organization’s many contributions to said endeavor: the SDTM or Study Data Tabulation Model. This standard is required for submitting clinical trial data to regulatory bodies like the U.S. Food and Drugs Administration (FDA), including Japan’s Pharmaceuticals and Medical Devices Agency (PMDA).
Here’s an in-depth look into the nuts and bolts of SDTM.
Before SDTM, the scientific community lacked a standard for preparing data for submission. Researchers would spend an unhealthy amount of time labeling domains and datasets instead of reviewing the data. Such delays led to clinical trials dragging on, delaying the introduction of lifesaving innovations to the public.
In 2004, the CDISC Submission Data Standards (SDS) team, comprising experts from contract research organizations (CROs) and the pharmaceutical industry, rolled out SDS version 3.1, later known as STDM. It standardized labeling domains and their structures, making the once-chore of identifying data points much more manageable. And as of this writing, the current version is 2.0, released in November last year.
Domains And Datasets
While considered interchangeable in many instances, the terms ‘domain’ and ‘dataset’ have clear distinctions. As this CDISC resource puts it:
- A domain refers to observations with a common but specific topic collected in a typical clinical trial. Some examples include adverse events, lab test results, and medical history.
- A dataset is the structured data associated with the domains. From the examples above, laboratory test results can be further broken down based on size.
The domain and dataset’s close relationship blurs the fine line dividing them, leading researchers to refer one to the other and vice versa. Regulatory bodies are more concerned with the datasets, requiring a clear approach to identify them. That’s why they generally appear as abbreviations prescribed in the CDISC’s SDTM Domain Abbreviations code list. Below are several examples:
- Adverse events (AE)
- Subject visits (SV)
- Medical history (MH)
- Lab test results (LB)
- Vital signs (VS)
- Demographics (DM)
Every dataset has rows of observations and columns of variables or roles. The current SDTM version requires the use of five variables in datasets, namely:
- Identifier – contains details about the study, observable subjects, and sequence number
- Topic – specifies the nature or focus of the observation or study
- Timing – specifies the date, time, and duration of the observation
- Qualifier – adds further descriptions to the observation, which can be text or numeric
- Rule – indicates the executable (e.g., algorithm) to define trial design conditions
It isn’t unusual for some variables to appear more than once in a dataset. Nowhere is this more apparent than in Qualifier variables, primarily because of its five subclasses.
- Grouping – clusters observations within a specific domain
- Result – identifies the outcome relevant to the topic variable
- Synonym – specifies an alternate term for a specific variable
- Record – defines additional attributes
- Variable – cites necessary modifications to a qualifier variable
To put these variables into perspective, say you have the following dataset: ‘Subject 109 had a body temperature (TEMP) of 36.2oC on 02NOV2022.’ SDTM guidelines warrant splitting this dataset into variables. In this case:
- Identifier: Subject 109
- Synonym: Body temperature
- Topic: TEMP
- Result: 36.2
- Variable: oC
- Timing: 02NOV2022
Pros And Cons
As a standard, SDTM allows everyone in the healthcare sector to use the data, not just regulatory bodies. With future health crises uncertain, data sharing will be instrumental in understanding current and unknown diseases. It’ll be difficult, if not impossible, to do this if everyone has their own system of labeling datasets.
But if the number of bulleted points in this article is an indication, it’s that SDTM feels like more work for the average researcher. Aside from the types of domains and variables, there are also hundreds of abbreviations for individual variables. As a result, research bodies have to invest in updating their data management systems to the SDTM format.
To put it simply, SDTM is how health research has become more profound in the 21st century, from how pineapple can help manage asthma to how much a new drug impacts the environment. No finding remains constant, often encouraging further research to be more confident about the results. And with SDTM as the norm in clinical trial data submission, further research is possible.