newblog: ADaM

Before the SDTM standard was developed, the typical scenario for the creation of clinical trial datasets was to create an extract from the database management system (DBMS), such as OracleClinical or Others prepare this extract as the submission tabulation files and build analysis datasets from this extract.

Now that the SDTM standards are part of this development cycle, the salient question is what will become the new typical scenario? Assuming that bothtabulation datasets and analysis datasets and associated documentation will be submitted to a regulatory agency, there are at least four options for the development life cycle of these data, each with advantages and disadvantages. These are described below.

Parallel method
Retrospective method
Linear method
Hybrid method

Parallel method:

The advantages of this method are:

The development of the SDTM and ADaM datasets are independent and can be completed at any time without input from the other.
The creation of the SDTM can happen at the time of submission so there is no effort wasted if the clinical trial is either unsuccessful or not included in a submission.
The independence of the SDTM and ADaM datasets allows for parallel project teams to perform the extraction, transformation, and load (ETL) processes. This may be important for outsourced projects.
This method requires a minimum amount of re-engineering of existing processes within most pharmaceutical companies.

The disadvantages of this method are:

Documentation for each set of data share no similarity and parallel creation decreases efficiency.
Derivation of created variables in analysis datasets does not reference variables in the SDTM. This creates a significant disconnect between the two sets of submission data.
The regulatory agency does not have the original DBMS extract with which to verify or explore derivations performed in the analysis datasets. Similarly, they would not have the DBMS annotated CRF’s to understand the original source of the derivations of the analysis variables.
Any analysis programs submitted to the agency as analysis-level documentation has limited value since the source data is not available.
Validation is necessary to ensure that similar records or variables in both SDTM and ADaM datasets are identical, such as indication of which record is considered baseline.

Retrospective method:

The advantages to this method are:

The creation of the SDTM can happen at the time of submission so there is no effort wasted if the clinical trial is either unsuccessful or not included in a submission.
As enhancements to the SDTM standards are released, the analysis datasets are not affected and any enhancement can be represented during the creation of the SDTM.

The disadvantages of this method are:

The regulatory agency does not have the original DBMS extract with which to verify or explore derivations performed in the analysis datasets. Similarly, they would not have the DBMS annotated CRF’s to understand the original source of the derivations of the analysis variables.
Any analysis programs submitted to the agency as analysis-level documentation has limited value since the source data is not available.
Any date imputation or other types of hard coding performed during the creation of the analysis datasets would have to be undone since the SDTM represent the data as it was collected.
All CRF variables represented in the SDTM would need to be retained in the analysis datasets even if they are not used for analysis. This increases the complexity of documentation.
Validation is necessary to ensure that the SDTM adequately represent the original source data. This step could be potentially difficult and result in a loss of efficiency.

Linear method:

The advantages of this method are:

Analysis programs submitted to the agency as analysis-level documentation utilize the SDTM domains as input and are thus useable and informative to the reviewer.
Using SDTM domains as input to analysis datasets allows for the standardization of analysis dataset structures and programming methods to produce study report summaries.
Traceability

The disadvantages of this method are:

The development of the analysis datasets relies on the completion of the SDTM domains.
The SDTM domains are created for all clinical trials regardless of whether they will be part of a submission.
It is potentially more difficult to manage if the data management and/or the biostatistics is outsourced.

Hybrid method:

With this method, the differences between SDTM Draft domains and SDTM Final domains are envisioned to be small.

The SDTM Final domains contain the subset of variables or records that are optimally created during the analysis or at the final stage of submission preparation.

An example is the creation of USUBJID. This variable is required in the SDTM and provides a unique key identifier for a given subject. In some situations, however, the creation of USUBJID cannot be defined until all studies are complete since a given subject may participate in multiple trials.

Other examples include the creation of expected variable that is present in all Findings domains that indicate the data record considered to be the baseline value (e.g. ‘EGBLFL’). Since these indicator flags likely would be derived in the AD’s, creating the Final SDTM domains retrospectively from the AD’s prevents redundant derivation and eliminates the possibility of discord between SDTM domain and the analysis dataset. Finally, population indicator variables, such as those for intent-to-treat or per protocol status, can be optimally created in the AD and then placed in the supplemental qualifier domain.

The advantages to this method are:

With a few possible exceptions, analysis programs submitted to the agency as analysis-level documentation utilize the SDTM domains as input and are thus useable and informative to the reviewer.
Using SDTM domains as input to analysis datasets allows for standardization of analysis dataset structures and programming methods to produced study report summaries.
The variables or records in the SDTM that need bio statistical input, such as indication of baseline records or creation of population flags, is done in harmony with analysis datasets so there is no possibility of discrepancy.
If important, derived records can be added to the SDTM domains thus providing the reviewer with both CRF and analysis records.
Final completion of the SDTM domains can be done at the time of submission.

The disadvantages of this method are:

The development of the analysis datasets relies on the completion of the SDTM domains.
The SDTM domains are created for all clinical trials regardless of whether they will be part of a submission.
This may be potentially more difficult to manage if the data management and/or the Biostatistics is outsourced.

RECOMMENDATIONS:

Each organization will need to leverage the advantages and disadvantages of these methods when deciding an implementation plan. For submissions that are prepared within the near future, several of the above methods may need to be used in tandem to accommodate both legacy data and ongoing studies.

But as CDISC standards become adopted within an organization, one would expect that efficiencies will be gained if one method were used for all new studies going forward. Weighing the advantages and disadvantages of each method above, the linear or the hybrid method are the most parsimonious and are long-term solutions.

Clinical Data Interchange Standards Consortium (CDISC) defines and manages industry level data standards that are widely used during the analysis, reporting and submission of clinical data. For instance, the Study Data Tabulation Model (SDTM) is the submission data standard into which raw study data are mapped and collated. ADaM is a companion standard for use with analysis data and it is best practice to use SDTM data as the source for these datasets. Doing this allows for the easy documentation of any data processing with Define-XML, the CDISC standard for data definition files.

Being able to trace the flow from source values to derived ones is a clear intention of the ADaM standard and that applies to the structure of any datasets and the required linkage to machine-readable metadata. It also is crucial that data are made analysis-ready so that the production of tables, listings and figures needs minimal effort to achieve with currently available tools with little or no further data manipulation.

While SDTM domain classes are determined according to data type such as interventions, events or findings, their ADaM equivalents are classified by analysis approach. Of the main data structures, one is best suited to the needs of analysis of continuous data values while another supports categorical analyses. There also is a subject-level analysis dataset that needs to be created for every study where ADaM is used.

All ADaM datasets are named ADxxxx, where xxxx is sponsor-defined and often carries over the name of the source SDTM domain. For example, an ADaM domain called ADLB would use the LB SDTM domain as its data source. This one-to-one domain mapping is not mandatory though and the required number of ADaM domains depends on the needs of any study data analysis or data review. An ADaM domain may use more than one SDTM domain as its source and carry a unique name that reflects this.

For ADaM variables, the naming conventions should follow the standardized variable names defined in the ADaM Implementation Guide. Any variables copied directly from SDTM data into an ADaM domain shall be used unchanged, with no change made either to their attributes (name, label, type, length, etc.) or their contents. Sponsor-defined variable names can be given to any other analysis variable that is not defined within the ADaM or SDTM standards. Following these conventions will provide clarity for the reviewer.

The ADaM subject-level analysis dataset is called ADSL and contains a maximum of one record per subject that contains variables which contain key information for subject disposition, demographic, and baseline characteristics. Other variables within ADSL will contain planned or actual treatment group information as well as key dates and times of the subjects study participation on the study. Not all variables within ADSL may be used directly for analysis but could be used in conjunction with other datasets for display or grouping purposes or possibly included simply as variables of interest for review. Given that the intention of ADSL is to contain variables that describe subjects, analysis populations and treatment groups to which they belong or prognostic factors, subject level efficacy information should not be added here but should be placed in another domain. Variables from ADSL may be added to other ADaM domains where doing so aids output creation or data review.

Another main class of ADaM datasets is the Basic Data Structure (BDS) and this contains one or more records per subject, analysis parameter or analysis timepoint. It is possible to add derived analysis parameters if required for an analysis. An example would be where a derivation uses results from a number of different parameters or where a mean is calculated at subject level from all the values collected for a subject. Derived records also may be added to support Last Observation Carried Forward (LOCF) or Worst Observation Carried Forward (WOCF) analyses.

The BDS is especially useful for continuous value analyses such as presenting mean, median, standard deviation and so on. This may not be the only usage but for a domain to comply with the BDS standard, it at the very least must contain variables for study and subject identifiers, analysis parameter name and code as well as analysis values. If any of these are absent, then the dataset does not fit the BDS description.

A variant of the BDS is available for Time to Event (TTE) analyses that are commonly used in therapeutic areas like oncology. This additionally contains variables for the original date of risk used for the start times in any TTE analysis or censoring for subject where the events of interest are not observed.

In February 2016, CDISC published the Occurrence Data Structure (OccDS) for use in categorical analyses where summaries of frequencies and percentages of occurrence are planned. This is an extension of the previously published ADAE structure that contains extra variables for use with concomitant mediation or medical history data. Data from other SDTM domains in the event or intervention classes may be mapped into OccDS if it fulfils their analysis needs. Some, such as exposure data, may be mapped to either BDS or OccDS depending on the analysis and even may be split into two ADaM domains in study where both categorical and continuous analyses are required.

Currently, ADaM supports the majority of analysis needs for clinical data. It may not be as prescriptive as SDTM but if offers flexibility while at the same time ensuring that a set of analysis data standards can be set in place by a sponsor. ADaM datasets also can be submitted to a regulatory agency much like SDTM and has in-built traceability while also having compatibility with Define-XML, so that machine-readable data definitions can be supplied along with any detailed computational details.

source

newblog

Labels

Saturday, April 29, 2017

Strategies for Implementing SDTM and ADaM Standards

Monday, April 10, 2017

Exploring CDISC Analysis Data Model (ADaM)

Search

Followers