University of North Carolina at Chapel Hill Issues Public Comment on FDA Notice
The comment was co-signed by
* * *
We appreciate the opportunity to provide feedback for the recently published FDA Draft Guidance for Industry entitled "Real-World Data: Assessing Electronic Health Records and Medical Claims Data to Support Regulatory Decision-Making for Drug and Biological Products." Faculty from the pharmacoepidemiology program at the
We commend the efforts of the Agency in providing extensive guidance in alignment with the RWE framework. The document offers important insight into elements to consider to determine fit-for-purpose questions regarding EHR and claims data. While not binding, we are hopeful the recommendations provided in this guidance can ensure proper application of research methods with the use of RWD.
While we think the guidance provides extensive recommendations that are relevant, we note there is some tension between what reads as ideal and aspirational and what is practical and achievable for researchers and sponsors. For example, we note that proper validation is important and necessary, yet the extent of recommendations in this regard might not be feasible due to data ownership and data use agreement issues. Sponsors and researchers are often limited to what the data vendors provide and the quality standards they implement. We note that while this guidance is focused on EHR and claims data explicitly, it does not little to highlight the differences between EHR and claims and we also encourage further guidance on the use of registries as sources of RWD.
The following provides additional commentary and suggestions in order of appearance in the draft document. We have provided some suggested references throughout, and we also recommend referencing already published guidance documents on RWE by the Agency and key stakeholders like professional organizations.
TITLE
To clarify the specific content discussed in the guidance, and in accordance with the intended purpose of this document described by the Agency, we suggest modifying the title as follows: "Real-World Data: Assessing the Reliability, Relevance, and Quality of Electronic Health Records and Medical Claims Data to Support Regulatory Decision-Making for Drug and Biological Products."
I. INTRODUCTION AND SCOPE
No comments for this section.
II. BACKGROUND
No comments for this section.
III. GENERAL CONSIDERATIONS
(Line 108) The document states that "the use of certain study design features or specific analyses to address misclassified or missing information" will be addressed in future guidance. Nevertheless, the current document has extensive recommendations and discussion that address validation, misclassification, missing data, and study design elements such as the definition of study periods (Section V(A)). We recommend that this document addresses these issues as they pertain to the intended function of the guidance in assessing the fit (i.e. relevance and reliability) of RWD sources for a given study.
IV. DATA SOURCES
Overall comments for Section IV: There are several instances where some information in the document applies exclusively to either EHR or claims data. We recommend clarifying when specific guidance is relevant to either data source type and point out when issues pertain to both. We point out some of these instances throughout our comments in this document.
(Line 136) It is important to note that the limitations inherent to electronic health data do not only stem from not being developed to support regulatory submissions to FDA. The limitations of RWD sources are often a result of not being designed for research purposes in general, but rather clinical care and administration.
(Bullet point 2 - Line 145) The data recorded in an EHR system can also vary significantly within the health care system of interest which can impact validity of studies. For example, coding practices can differ between clinical services, clinic locations, and providers of the same health system. There are cases where transitions from one EHR system to another or differences in adoption times between locations from the same system can introduce data quality issues./1
The current language seems to warn about differences between systems and does not highlight important considerations for users of EHR data within a single institution.
(Line 158) It is not clear what the guidance specifically suggests in terms of providing "the historical experience with and use of the selected data source." We recommend expanding on this recommendation to clarify the extent of background information being requested in addition to the extensive validation already outlined in section IV(B) and sections V(
A. Relevance of Data Source
(Line 165) We note again that it is important to highlight potential differences in the "practice of medicine" within the health system of interest as this issue can often affect data quality.
(Bullet point 2 - Line 180) The recommendation on providing background information about the health care system should apply exclusively to EHR data, including EHR data that are consolidated from across several healthcare systems. It is often unfeasible to provide background information on the many health systems that are included in claims data. Furthermore, some claims data sources do not allow users to identify specific plans from where information is collected. With respect to EHR, it is not clear what specifically should sponsors provide regarding the "method of diagnosis and preferred treatments for the disease of interest." The Agency should clarify how the requested background information will be used to determine whether the data source is fit-for-purpose.
(Bullet point 3 - Line 184) Similar to the concern we point out for bullet point 2, the recommendation to provide a "description of prescribing and use practices in the health care system" should be applied exclusively to EHR, including EHR data that are consolidated from across several healthcare systems. We note that even when limited to EHR data sources, "prescribing and use practices" is not specific enough to determine fit-for-purpose. The Agency should clarify the extent of information that is recommended.
B. Data Capture: General Discussion
(Line 196) Pertaining to claims data, the guidance should also point out that it is necessary to determine if the available dataset contains final adjudicated claims or unadjudicated processing claims - the latter can easily result in misclassification of exposure (e.g. medications filled at the pharmacy but never picked up by the patient).
1. Enrollment and Comprehensive Capture of Care
There are several issues with this section that we found confusing for the potential audience of the guidance. Overall, the recommendations of this section should specify when certain issues apply to either EHR, claims data, or both.
(Line 203) Coverage and enrollment are two distinct constructs that require their own consideration to determine relevance and reliability of RWD sources. Coverage and enrollment considerations present unique challenges for EHR and claims data, respectively, and the current text does not distinguish when certain issues pertain to one data source or the other. We suggest that the agency provides explicit guidance based on the data type. (e.g. enrollment and disenrollment in a particular health plan might not affect the presence of data for that individual in the health system EHR). We suggest steering away from discussing 'enrollment' or 'coverage' in the context of EHR data unless the recommendation pertains specifically to certain scenarios like a managed care environment where all patient care is controlled by the system. It would be a better approach to discuss enrollment in the context of claims data, observability in the context of both claims and EHR.
(Line 231) This section addresses specific concerns with nonprescription drugs and those not reimbursed under specific plans. We note that when discussing the availability of drug exposure from distinct EHR or claims data sources, it is important to comment on the formulary for the health system and health plan, respectively. The variation in formularies can lead to some patient populations being exposed to some drugs versus others which is a concern when studying drugs in the same class or with similar indications.
2. Data Linkage and Synthesis
(Line 269) When describing the appropriateness and feasibility of electronic health data linkages, we recommend that the document refers readers to published recommendations./2
The current language of this section provides important considerations but do not currently provide enough direction for the assessment of reliability and relevance of RWD. Similarly, we note that it is important to offer direction to readers of the guidance in regards to the assessment of the quality of linkages./3
3. Distributed Data Networks
(Line 325) This section provides useful guidance. Nevertheless, we note that the document should also mention specific concerns and the need to conduct detailed assessments of site-specific variation in data quality and completeness for sites participating in the network./4
4. Computable Phenotypes
(Line 363) The guidance document requests that the computable phenotype be available in "computer-processable format," a requirement that is not mentioned anywhere else in the recommendations for any other data type. We recommend clarification on what data elements the Agency is requesting to assess the reliability and relevance of RWD and whether this requirement extends to other algorithms used in the study. For example, the Agency should clarify whether this requirement applies to algorithms for the definition of continuous enrollment or to other algorithms to define comorbidities like history of cancer.
(Line 365) The guidance recommends that "clinical validation of the computable phenotype definition should be described," yet it is unclear to what extent is this information requested. We suggest that the Agency provides specific recommendations as they apply to computable phenotypes.
5. Unstructured Data
(Line 371) This section on unstructured data is not relevant to claims data sources which are often limited to discrete data fields for billing purposes. We recommend making this distinction explicit in the document. In addition, the use of unstructured data from EHR presents several methodological limitations and the current guidance should provide additional cautionary language. Unstructured data extraction algorithms can easily introduce misclassification/5 and there is need for additional information to ensure the transparency and reproducibility of research employing these data elements./6
(Line 379) The section on artificial Intelligence (AI) has the potential to motivate users of RWD to implement these methods for extraction of unstructured data. Given several concerns with the validity and transparency of these methodologies, we recommend that the guidance steers away from mentioning AI in the context of assessing relevance and reliability of RWD. We note that this topic is of increasing importance and we recommend that the Agency dedicates a guidance document in the future to address the issues with AI for RWE in more depth.
C. Information Content and Missing Data: General Considerations
(Line 420) In addition to the recommendation of addressing the implications of assumptions regarding missing data, we note that the guidance should also point out explicitly that these implications need to be considered and then addressed appropriately in the design and/or analysis stages.
D. Validation: General Considerations
Given that there are already extensive discussions on validation of exposure (Section V(C)), validation of outcomes (Section V(D)), and validation of covariates (Section V(E)), this subsection on "general considerations" is repetitive and could be removed. If this section is to remain in the document, we note that there are some general considerations that need to be added. Specifically, the guidance should be explicit on the need for an empirical assessment of the direction and magnitude of misclassification bias, which would then allow researchers to make a reasonable and informed argument on the impact of misclassification on the effect estimates. Furthermore, we are concerned on the distinction made between differential and non-differential misclassification as this statement has the potential to offer false sense of security. We note again that use of empirical assessment such as use of quantitative bias analyses should be recommended explicitly. While it is important to provide information on sensitivity, specificity, and predicted values, we caution on the potential oversimplification of bias assessment in this context. Likewise, the current recommendations assume dichotomization of the variable of interest and there is no mention of potential continuous measures.
V. STUDY DESIGN ELEMENTS
A. Definition of Time Periods
The discussion on definition of time periods seems to be a better fit for the future RWE guidance addressing specific issues with study design and analysis. We recommend that any guidance pertaining to time periods in the current document is centered on the availability of enough data for the study question of interest. Any further discussion (e.g. mentioning time-dependent covariates) should be addressed elsewhere.
B. Selection of Study Population
Similar to our comment for Section V(A), the discussion of the selection of a specific study population is outside the scope of the current guidance. For purposes of assessing reliability and relevance of RWD, this section should be limited to a brief discussion on the availability of data for the population of interest.
C. Exposure Ascertainment and Validation
1. Definition of Exposure
No comments for this section.
2. Ascertainment of Exposure: Data Source
(Line 608) In alignment with our recommendation of making an explicit distinction for when certain recommendations apply to EHR or claims data, we note that unstructured data can be found in the former but not the latter.
(Line 622) A major consideration that needs to be included in regards to data elements not captured in EHR is the fact that individuals might get services from health care providers outside of the health system that is being used. We suggest this issue is explicitly described in discussing data sources for the ascertainment of exposure in a given study.
3. Ascertainment of Exposure: Duration
The discussion on duration of exposure seems to be a better fit for the future RWE guidance addressing specific issues with study design and analysis. We recommend that any guidance pertaining to ascertainment of exposure in the current document is centered on the availability of data for the study question of interest.
(Line 645) If Section V(C)(3) is to remain in this guidance, we note that the statement dealing with prescriptions refills is only relevant to claims data.
4. Ascertainment of Exposure: Dose
If Section V(C)(4) is to remain in this guidance, we note that the discussion on dose needs additional consideration pertaining to EHR data. For instance, the presence of a record in inpatient medication orders does not indicate that the drug was administered per se. There are also significant issues with exposure ascertainment of infusions in the hospital (e.g. it is very hard to get an accurate idea of how much medication was provided if the "Stop" orders are not coded correctly).
5. Validation of Exposure
No comments for this section.
6. Dosing in Special Populations
No comments for this section.
7. Other Considerations
The discussion on selection of a comparator seems to be a better fit for the future RWE guidance addressing specific issues with study design and analysis. We recommend that any guidance pertaining to appropriate comparators in the current document is centered on the availability of data for the study question of interest.
D. Outcome Ascertainment and Validation
1. Definition of Outcomes of Interest
No comments for this section.
2. Ascertainment of Outcomes
(Line 793) While the document states that "FDA recommends considering the potential impact of outcome misclassification on study validity," there are no explicit recommendations on how to best approach this issue. The document has an extensive section on assessing sensitivity, specificity, and predictive values, but does not explicitly state what measures should be optimized and under which circumstances. Furthermore, any discussion should address what the effect measure of interest is for the given study (i.e. relative vs. absolute). We strongly recommend that the guidance recommends an empirical assessment of potential misclassification using quantitative bias analyses.
3. Validation of Outcomes
Sections V(D)(2) and this section V(D)(3) contain repeated information. We suggest condensing this content and address our previous suggestion regarding the need to recommend comprehensive bias analyses.
(Line 900) While it is true that "the impact on the measure of association (...) varies depending on whether the degree of misclassification differs between the exposure groups," this statement needs additional context. It is important to also consider potential differences by subgroups of interest (e.g. sex, age) which can create the appearance of differing treatment effects, or obscure real differences.
(Line 907) Stating that "non-differential misclassification tends to bias the association toward the null" is an oversimplification that should be avoided in this guidance. There is documentation on scenarios where the effects of non-differential misclassification are not predictable./7
The current text used in the guidance later concurs with our concern by stating in line 937 that "when evaluating the implication of potential misclassification on study inference, sponsors should avoid overreliance on non-differential misclassification biasing toward the null." To prevent confusion and misapplication of research methods, we strongly suggest removing statement in line 907.
(Line 954) This paragraph is the only instance in the guidance where quantitative bias analysis is listed. We recommend this recommendation is provided throughout this document in other sections where we have noted the need for an empirical assessment of bias.
4. Mortality as an Outcome
While mortality is an important outcome for many studies, it is unclear why this guidance dedicates a section to this particular outcome. We recommend that any guidance pertaining to ascertainment of outcomes in the current document is centered on the availability of data for the study question of interest.
E. Covariate Ascertainment and Validation
We note that there is a missing discussion on the data capture and availability of competing risks. If competing risks are important for a given study design chosen by researchers, the assessment of relevance and data availability in RWD sources should also mention whether there is information to capture these covariates.
1. Confounders
No comments for this section.
2. Effect Modifiers
We recommend changing the title of this section to "effect measure modifiers."
3. Validation of Confounders and Effect Modifiers
No comments for this section.
VI. DATA QUALITY DURING DATA ACCRUAL, CURATION, AND TRANSFORMATION INTO THE FINAL STUDY-SPECIFIC DATASET
No comments for this section.
A. Characterizing Data
We note that this section should include a recommendation to address the issues of data accrual, data curation, and data transformation by site.
Several bullet points apply to only EHR or claims data. We recommend that the guidance distinguishes what items apply to each data type.
B. Documentation of the QA/QC Plan
No comments for this section.
C. Documentation of Data Management Process
No comments for this section.
Sincerely,
* * *
Footnotes:
1/ Huang C, Koppel R, McGreevey JD 3rd, et al. Transitions from One Electronic Health Record to Another: Challenges, Pitfalls, and Recommendations. Appl Clin Inform. 2020 Oct;11(5):742-754.
2/ Rivera DR,
3/
4/ Kahn MG, Brown JS, Chun AT, et al. Transparent reporting of data quality in distributed data networks. EGEMS (Wash DC). 2015 Mar 23;3(1):1052.
5/ Young, J.C.,
6/ Wang SV, Patterson OV, Gagne JJ, et al. Transparent Reporting on Research Using Unstructured Electronic Health Record Data to Generate 'Real World' Evidence of Comparative Effectiveness and Safety. Drug Saf. 2019 Nov;42(11):1297-1309
7/
* * *
The notice can be viewed at: https://www.regulations.gov/document/FDA-2020-D-2307-0002
TARGETED NEWS SERVICE (founded 2004) features non-partisan 'edited journalism' news briefs and information for news organizations, public policy groups and individuals; as well as 'gathered' public policy information, including news releases, reports, speeches. For more information contact



AM Best Assigns Credit Ratings to Etiqa General Insurance Berhad
Save Our Soundside Communications Director Issues Public Comment on FEMA Notice
Advisor News
- Study finds more households move investable assets across firms
- Could workplace benefits help solve America’s long-term care gap?
- The best way to use a tax refund? Create a holistic plan
- CFP Board appoints K. Dane Snowden as CEO
- TIAA unveils ‘policy roadmap’ to boost retirement readiness
More Advisor NewsAnnuity News
- $80k surrender charge at stake as Navy vet, Ameritas do battle in court
- Sammons Institutional Group® Launches Summit LadderedSM
- Protective Expands Life & Annuity Distribution with Alfa Insurance
- Annuities: A key tool in battling inflation
- Pinnacle Financial Services Launches New Agent Website, Elevating the Digital Experience for Independent Agents Nationwide
More Annuity NewsHealth/Employee Benefits News
- Providers fear illness uptick
- JAN. 30, 2026: NATIONAL ADVOCACY UPDATE
- Advocates for elderly target utility, insurance costs
- National Health Insurance Service Ilsan Hospital Describes Findings in Gastric Cancer (Incidence and risk factors for symptomatic gallstone disease after gastrectomy for gastric cancer: a nationwide population-based study): Oncology – Gastric Cancer
- Reports from Stanford University School of Medicine Highlight Recent Findings in Mental Health Diseases and Conditions (PERSPECTIVE: Self-Funded Group Health Plans: A Public Mental Health Threat to Employees?): Mental Health Diseases and Conditions
More Health/Employee Benefits NewsLife Insurance News
- AM Best Affirms Credit Ratings of Etiqa General Insurance Berhad
- Life insurance application activity hits record growth in 2025, MIB reports
- AM Best Revises Outlooks to Positive for Well Link Life Insurance Company Limited
- Investors holding $130M in PHL benefits slam liquidation, seek to intervene
- Elevance making difficult decisions amid healthcare minefield
More Life Insurance News