newblog: April 2017

Saturday, April 29, 2017

Strategies for Implementing SDTM and ADaM Standards

Before the SDTM standard was developed, the typical scenario for the creation of clinical trial datasets was to create an extract from the database management system (DBMS), such as OracleClinical or Others prepare this extract as the submission tabulation files and build analysis datasets from this extract.

Now that the SDTM standards are part of this development cycle, the salient question is what will become the new typical scenario? Assuming that bothtabulation datasets and analysis datasets and associated documentation will be submitted to a regulatory agency, there are at least four options for the development life cycle of these data, each with advantages and disadvantages. These are described below.

Parallel method
Retrospective method
Linear method
Hybrid method

Parallel method:

The advantages of this method are:

The development of the SDTM and ADaM datasets are independent and can be completed at any time without input from the other.
The creation of the SDTM can happen at the time of submission so there is no effort wasted if the clinical trial is either unsuccessful or not included in a submission.
The independence of the SDTM and ADaM datasets allows for parallel project teams to perform the extraction, transformation, and load (ETL) processes. This may be important for outsourced projects.
This method requires a minimum amount of re-engineering of existing processes within most pharmaceutical companies.

The disadvantages of this method are:

Documentation for each set of data share no similarity and parallel creation decreases efficiency.
Derivation of created variables in analysis datasets does not reference variables in the SDTM. This creates a significant disconnect between the two sets of submission data.
The regulatory agency does not have the original DBMS extract with which to verify or explore derivations performed in the analysis datasets. Similarly, they would not have the DBMS annotated CRF’s to understand the original source of the derivations of the analysis variables.
Any analysis programs submitted to the agency as analysis-level documentation has limited value since the source data is not available.
Validation is necessary to ensure that similar records or variables in both SDTM and ADaM datasets are identical, such as indication of which record is considered baseline.

Retrospective method:

The advantages to this method are:

The creation of the SDTM can happen at the time of submission so there is no effort wasted if the clinical trial is either unsuccessful or not included in a submission.
As enhancements to the SDTM standards are released, the analysis datasets are not affected and any enhancement can be represented during the creation of the SDTM.

The disadvantages of this method are:

The regulatory agency does not have the original DBMS extract with which to verify or explore derivations performed in the analysis datasets. Similarly, they would not have the DBMS annotated CRF’s to understand the original source of the derivations of the analysis variables.
Any analysis programs submitted to the agency as analysis-level documentation has limited value since the source data is not available.
Any date imputation or other types of hard coding performed during the creation of the analysis datasets would have to be undone since the SDTM represent the data as it was collected.
All CRF variables represented in the SDTM would need to be retained in the analysis datasets even if they are not used for analysis. This increases the complexity of documentation.
Validation is necessary to ensure that the SDTM adequately represent the original source data. This step could be potentially difficult and result in a loss of efficiency.

Linear method:

The advantages of this method are:

Analysis programs submitted to the agency as analysis-level documentation utilize the SDTM domains as input and are thus useable and informative to the reviewer.
Using SDTM domains as input to analysis datasets allows for the standardization of analysis dataset structures and programming methods to produce study report summaries.
Traceability

The disadvantages of this method are:

The development of the analysis datasets relies on the completion of the SDTM domains.
The SDTM domains are created for all clinical trials regardless of whether they will be part of a submission.
It is potentially more difficult to manage if the data management and/or the biostatistics is outsourced.

Hybrid method:

With this method, the differences between SDTM Draft domains and SDTM Final domains are envisioned to be small.

The SDTM Final domains contain the subset of variables or records that are optimally created during the analysis or at the final stage of submission preparation.

An example is the creation of USUBJID. This variable is required in the SDTM and provides a unique key identifier for a given subject. In some situations, however, the creation of USUBJID cannot be defined until all studies are complete since a given subject may participate in multiple trials.

Other examples include the creation of expected variable that is present in all Findings domains that indicate the data record considered to be the baseline value (e.g. ‘EGBLFL’). Since these indicator flags likely would be derived in the AD’s, creating the Final SDTM domains retrospectively from the AD’s prevents redundant derivation and eliminates the possibility of discord between SDTM domain and the analysis dataset. Finally, population indicator variables, such as those for intent-to-treat or per protocol status, can be optimally created in the AD and then placed in the supplemental qualifier domain.

The advantages to this method are:

With a few possible exceptions, analysis programs submitted to the agency as analysis-level documentation utilize the SDTM domains as input and are thus useable and informative to the reviewer.
Using SDTM domains as input to analysis datasets allows for standardization of analysis dataset structures and programming methods to produced study report summaries.
The variables or records in the SDTM that need bio statistical input, such as indication of baseline records or creation of population flags, is done in harmony with analysis datasets so there is no possibility of discrepancy.
If important, derived records can be added to the SDTM domains thus providing the reviewer with both CRF and analysis records.
Final completion of the SDTM domains can be done at the time of submission.

The disadvantages of this method are:

The development of the analysis datasets relies on the completion of the SDTM domains.
The SDTM domains are created for all clinical trials regardless of whether they will be part of a submission.
This may be potentially more difficult to manage if the data management and/or the Biostatistics is outsourced.

RECOMMENDATIONS:

Each organization will need to leverage the advantages and disadvantages of these methods when deciding an implementation plan. For submissions that are prepared within the near future, several of the above methods may need to be used in tandem to accommodate both legacy data and ongoing studies.

But as CDISC standards become adopted within an organization, one would expect that efficiencies will be gained if one method were used for all new studies going forward. Weighing the advantages and disadvantages of each method above, the linear or the hybrid method are the most parsimonious and are long-term solutions.

FDA Data Standards Catalog

FDA accepts electronic submissions that provide study data using the standards, formats, and terminologies described in the FDA Data Standards Catalog.

To view click the below link

https://www.fda.gov/downloads/ForIndustry/DataStandards/StudyDataStandards/UCM340684.xlsx

Define.xml

Define.xml (Case Report Tabulation Data Definition Specification) is a document that FDA required for drug submission. It describes the structure and contents of the data collected during the clinical trial process. Because Define.xml can increase the level of automation and improve the efficiency of the Regulatory Review process, FDA likes to have it with drug submission.

The define.xml standard is based on the CDISC Operational Data Model (ODM), which is available at http://www.cdisc.org/standards/index.html

To generate the code for Define.xml, there are three challenges [1] that average SAS programmers need to overcome:

1. Basic understanding of XML

2. Thorough understanding of the CDISC-specific XML structure of Define.xml

3. SAS expertise to generate the XML code The first two challenges are fundamental

There are no alternatives or shortcuts to them. However, there are alternatives to the third one. Instead of SAS or XML tools, SDTM specifications and Microsoft Excel can be used to program Define.xml in a practical and efficient way.

Exploring CDISC Analysis Data Model (ADaM)

Clinical Data Interchange Standards Consortium (CDISC) defines and manages industry level data standards that are widely used during the analysis, reporting and submission of clinical data. For instance, the Study Data Tabulation Model (SDTM) is the submission data standard into which raw study data are mapped and collated. ADaM is a companion standard for use with analysis data and it is best practice to use SDTM data as the source for these datasets. Doing this allows for the easy documentation of any data processing with Define-XML, the CDISC standard for data definition files.

Being able to trace the flow from source values to derived ones is a clear intention of the ADaM standard and that applies to the structure of any datasets and the required linkage to machine-readable metadata. It also is crucial that data are made analysis-ready so that the production of tables, listings and figures needs minimal effort to achieve with currently available tools with little or no further data manipulation.

While SDTM domain classes are determined according to data type such as interventions, events or findings, their ADaM equivalents are classified by analysis approach. Of the main data structures, one is best suited to the needs of analysis of continuous data values while another supports categorical analyses. There also is a subject-level analysis dataset that needs to be created for every study where ADaM is used.

All ADaM datasets are named ADxxxx, where xxxx is sponsor-defined and often carries over the name of the source SDTM domain. For example, an ADaM domain called ADLB would use the LB SDTM domain as its data source. This one-to-one domain mapping is not mandatory though and the required number of ADaM domains depends on the needs of any study data analysis or data review. An ADaM domain may use more than one SDTM domain as its source and carry a unique name that reflects this.

For ADaM variables, the naming conventions should follow the standardized variable names defined in the ADaM Implementation Guide. Any variables copied directly from SDTM data into an ADaM domain shall be used unchanged, with no change made either to their attributes (name, label, type, length, etc.) or their contents. Sponsor-defined variable names can be given to any other analysis variable that is not defined within the ADaM or SDTM standards. Following these conventions will provide clarity for the reviewer.

The ADaM subject-level analysis dataset is called ADSL and contains a maximum of one record per subject that contains variables which contain key information for subject disposition, demographic, and baseline characteristics. Other variables within ADSL will contain planned or actual treatment group information as well as key dates and times of the subjects study participation on the study. Not all variables within ADSL may be used directly for analysis but could be used in conjunction with other datasets for display or grouping purposes or possibly included simply as variables of interest for review. Given that the intention of ADSL is to contain variables that describe subjects, analysis populations and treatment groups to which they belong or prognostic factors, subject level efficacy information should not be added here but should be placed in another domain. Variables from ADSL may be added to other ADaM domains where doing so aids output creation or data review.

Another main class of ADaM datasets is the Basic Data Structure (BDS) and this contains one or more records per subject, analysis parameter or analysis timepoint. It is possible to add derived analysis parameters if required for an analysis. An example would be where a derivation uses results from a number of different parameters or where a mean is calculated at subject level from all the values collected for a subject. Derived records also may be added to support Last Observation Carried Forward (LOCF) or Worst Observation Carried Forward (WOCF) analyses.

The BDS is especially useful for continuous value analyses such as presenting mean, median, standard deviation and so on. This may not be the only usage but for a domain to comply with the BDS standard, it at the very least must contain variables for study and subject identifiers, analysis parameter name and code as well as analysis values. If any of these are absent, then the dataset does not fit the BDS description.

A variant of the BDS is available for Time to Event (TTE) analyses that are commonly used in therapeutic areas like oncology. This additionally contains variables for the original date of risk used for the start times in any TTE analysis or censoring for subject where the events of interest are not observed.

In February 2016, CDISC published the Occurrence Data Structure (OccDS) for use in categorical analyses where summaries of frequencies and percentages of occurrence are planned. This is an extension of the previously published ADAE structure that contains extra variables for use with concomitant mediation or medical history data. Data from other SDTM domains in the event or intervention classes may be mapped into OccDS if it fulfils their analysis needs. Some, such as exposure data, may be mapped to either BDS or OccDS depending on the analysis and even may be split into two ADaM domains in study where both categorical and continuous analyses are required.

Currently, ADaM supports the majority of analysis needs for clinical data. It may not be as prescriptive as SDTM but if offers flexibility while at the same time ensuring that a set of analysis data standards can be set in place by a sponsor. ADaM datasets also can be submitted to a regulatory agency much like SDTM and has in-built traceability while also having compatibility with Define-XML, so that machine-readable data definitions can be supplied along with any detailed computational details.

source

An introduction to SDTM

Study Data Tabulation Model (SDTM) is defined by the Clinical Data Interchange Standards Consortium (CDISC) as a standard structure for human clinical trial (study) data tabulations that are to be submitted to a regulatory authority such as the US Food and Drug Administration (FDA).

The SDTM data is the standard format recommended by the FDA. It has become a CDISC regulated content standard that describes how to organize subject information into variables and domains to be used as a standardized submission dataset format. The purpose of this model is to structure and format the tabulation data that are to be submitted to a regulatory authority. SDTM is based on the concept of observations (described by variables) made on the subjects who participate in a clinical study. The collected data is classified into a series of domains. The key idea of this model is that the domains are divided into Findings, Interventions, Events, and Special Purpose classes.

The SDTM standard has been endorsed by the FDA and embraced by the pharmaceutical industry. It has improved the FDA data submission and review process. Additionally, the Center for Drug Evaluation and Research (CDER) also encourages its use for ensuring efficient and quality reviews. The standard has improved data management,data integrity checking, data and cross-study analysis, as well as reporting. The widespread acceptance of SDTM will be beneficial for both the industry and the regulators in terms of efficient data conversion process and reduced related cost.

While SDTM provides a standard and ample flexibility, it can also become tedious and lengthy. For instance, conversion of clinical database may be difficult due to the large number of CDISC domains and variables. Similarly, there may be errors and delays in data conversion as many ETL (Extract, Transform, Load) programmers lack the CDISC domain expertise. The converted data would also result in multiple lines of code that is difficult to understand and re-use. In addition, every sponsor company implements SDTM with some variation, because the model is subject to interpretation and allows some flexibility. Therefore, additional documentation like the ‘Define’ (metadata definitions) document is required to support the data sets. Moreover, the SDTM standard is an evolving one and new guidance updates may affect submissions and involve restructuring of data. Consequently, the conversion to SDTM format requires extra effort, time, and cost.

SDTM is a standard that improves process efficiency and a model that provides flexibility. It has the following advantages:

Ø Provides a uniform standard for clinical trial data study to ease data exchange

Ø Facilitates communication between CRO’s, sponsors, and regulators

Ø Improves viewing and analysis by streamlining the flow of data in a clinical trial process and facilitating data interchange between partners and providers

Ø Facilitates data management by consolidating the data collected from multiple CRFs

Ø Improves the effectiveness of reviewers with less time to prepare, by providing standardized datasets and standard software tools

Ø Ensures a more comprehensive, timely and efficient FDA review process, by providing the reviewer with standard tools and checks

Ø Facilitates meta-analysis of safety across new drug entities from multiple companies by enabling the FDA to develop a repository of all submitted data and standard review tools and to access, manipulate and view the tabulations using standardized datasets

Ø Reduces the number of submission queries to pharmaceutical companies by leveraging the standard structure provided by SDTM

Ø Allows companies to add additional domains and variables outside of the CDISC controlled domains and variables.

Ø Increases programming consistency and study efficiency by providing the same datastructure to studies with different designs

Ø Assists in the creation of analysis data sets by developing macros

Ø Facilitates the development of commercial reporting and analysis tools

Ø Simplifies cross study analysis

DRUG DEVELOPMENT PROCESS

DRUG DEVELOPMENT PROCESS:

Drug development starts at R&D and ends at chemist. The process time are 12-15 years and cost around 10 thousand corers. At the starting of trial we start up with 30000 compounds and ends up with 10 compounds. On resultant compounds clinical trial begins.

Clinical trials:

PHASE-0 : The first stage of clinical trial is Preclinical trials or phase 0 studies or animal testing (1.5-2 yrs.)- To find out toxicity of compound. Single ascending dose and multiple ascending doses are given to animals to know toxicity. This data is submitted to FDA to obtain IND (Navigational New Drug).

After IND approval we are eligible test drug on human this process will conducted in 3 phases.

PHASE-1: We will find out pharmacokinetic properties around 20-80 healthy volunteers. These studies are done to know the safety of drug.

PHASE 2: These are exploratory studies done on around 100-300 subjects. We will know the efficacy reconsidering the safety. These is done on target study population.

PHASE 3: these are large clinical trials on populations number in 100 to thousands of patients .these are done to show that new drug is both safe and efficacious n target study population if these are successful we will get NDA (new drug application) approval. This means the drug is ready for marketing.

PHASE 4: these are post marketing trials conducted to monitor long term safety of new drug after drug is already in market.

CLINICAL DATA MANAGEMENT:

The sponsor and investigator and statistician will prepare the protocol or sponsor will handle the investigation to the CRO. Where CRO appoints the investigator and statistician .the investigator conduct the study as per protocol and collects the laboratory values of study subjects and the statistician designs the statistical plan as per protocol. The investigator will fill the CRF sends back filled CRF to data management team. Mean while the DM team will design the data base and data entry will done by 2 different people .2 people will enter the same information and third person will check the consistency of data the non CRF data like lab data will be loaded in to data base by data base loading or data loading. After data entered in to data base batch validation will be performed on data at regular intervals in order to find any miss matches or errors.

After data base is completely validated we lock and freeze the database so that we can prevent any further modification. After database is locked and freeze biostatic team will extract the database in to sas datasets.

BIOSTASTICS :

Biostastics comprises of 2 teams one is sas programmers other statistician . statisticians is responsible to analyze reports created by SAS programmer and draw conclusions based on the data. Sas programmer is responsible to create reports like TLF (Tables,Listings,Graphs/Figures) which are created from derived datasets.

newblog

Labels

Saturday, April 29, 2017

Strategies for Implementing SDTM and ADaM Standards

Friday, April 28, 2017

FDA Data Standards Catalog

Friday, April 21, 2017

Define.xml

Monday, April 10, 2017

Exploring CDISC Analysis Data Model (ADaM)

Sunday, April 9, 2017

An introduction to SDTM

DRUG DEVELOPMENT PROCESS

Search

Followers