The FDA's 2023 framework formalizing EHR data as a legitimate source for regulatory submissions was a genuine shift — not a press release. What it opened up in practice is still being negotiated. The hard questions aren't about whether RWE "counts": multiple approvals have already used it as primary or supportive evidence. The hard questions are methodological: how do you handle missing data in EHRs collected for billing rather than research? When does propensity score matching adequately substitute for randomization? And which outcomes can be reliably captured from claims data versus which require structured clinical data collection? These aren't academic debates — they determine whether hybrid trial designs actually work in regulatory submissions or just in academic papers.
This article is for informational purposes only and does not constitute medical advice. Clinical trial eligibility and availability vary. Always consult a qualified healthcare professional before making any medical decisions or considering participation in a clinical trial.
Summary
Real-world evidence — generated from electronic health records, insurance claims, disease registries, and wearable device data — has moved from supplementary context to a primary regulatory input. The 21st Century Cures Act mandated the FDA's RWE program, and multiple drug approvals have since used RWE as primary or supportive evidence. In 2026, hybrid trials that combine randomized treatment assignment with real-world data follow-up and external synthetic control arms are becoming a standard design option — particularly in rare diseases, post-market label extensions, and pediatric programs where placebo-controlled designs are ethically or practically impossible.
What RWD Sources Actually Offer — and Where Each Falls Short
The FDA's RWE framework, published under 21st Century Cures Act Section 3022, evaluates RWD on two dimensions: data relevance (does it capture the right outcomes for the research question?) and data reliability (is it complete, accurate, and consistently collected?). These sound straightforward. In practice, they create significant friction.
| RWD Source | Strengths | Limitations |
|---|---|---|
| EHR / EMR Data | Rich longitudinal data, lab values, diagnoses | Coding inconsistency, missing data, site variation |
| Insurance Claims | Large population, complete treatment history | Billing codes ≠ clinical diagnosis, no lab values |
| Disease Registries | Disease-specific, high relevance, curated | Enrollment bias, limited to enrolled patients |
| Wearable / Digital | Continuous, high-frequency, objective endpoints | Device variation, adherence gaps, regulatory validation needed |
EHR data is the most commonly used RWD source in regulatory submissions, but it was designed for clinical care and billing — not research. ICD codes are inconsistently applied across health systems. Lab values are missing when tests were ordered outside the capturing network. Socioeconomic and lifestyle confounders are systematically undercaptured. None of this is disqualifying, but each gap requires explicit documentation and analytical handling in a regulatory submission. The FDA's guidance requires sponsors to characterize missing data patterns and demonstrate through sensitivity analyses that the primary result is robust to the missingness assumptions made.
Hybrid Trials and Synthetic Control Arms: The Technical Reality
For rare diseases where randomizing patients to placebo is scientifically or ethically untenable, hybrid designs use external control data constructed from RWD. The methodological approaches differ in their assumptions and regulatory track record.
- Natural history studies as external controls: Prospectively collected data on untreated patients enrolled before the investigational drug became available. The FDA has accepted these in several rare pediatric disease approvals — spinal muscular atrophy's natural history registry data (PNCR, SMArt) was foundational to nusinersen's development. The key requirement is that the natural history cohort was enrolled before treatment access changed — historical controls from an era before any treatment existed are much cleaner than controls from a period when some patients might have received off-label therapy.
- Synthetic control arms (SCAs): Patient-level data from registries or EHRs are used to construct a matched control group using propensity score matching, inverse probability weighting, or Bayesian dynamic borrowing. The regulatory requirement is prospective agreement on the matching algorithm before data lock. Sponsors who retrospectively chose their matching variables after seeing the trial results will face significant pushback. The FDA's 2023 guidance on synthetic control arms specifies that the analysis plan must be submitted and agreed upon in a Type B meeting before the primary analysis.
- Single-arm trials with RWD comparators: Accepted most readily in oncology accelerated approvals where historical response rates in a specific refractory population are well-established. The investigational treatment must demonstrably exceed the expected historical benchmark — typically by a margin large enough that confounding alone cannot explain the difference. Post-market confirmatory randomized trials are typically required to convert accelerated to regular approval.
The Regulatory Threshold: What Actually Gets Through
The FDA has approved drugs using RWE as primary or supportive evidence in a pattern that reveals what regulators actually require — regardless of what guidance documents say in the abstract.
Three things appear consistently in accepted RWE submissions. First, a pre-specified statistical analysis plan registered before data lock — post-hoc RWE analyses are not accepted as primary evidence, full stop. Second, documented data provenance: sponsors must demonstrate where data came from, how it was extracted, what data cleaning steps were applied, and what proportion of expected data fields were complete. Third, and most practically important: a Type B pre-submission meeting with FDA before designing the study. Every investigator who has navigated an RWE submission successfully describes the pre-submission meeting as the critical step. Regulators can and do signal which specific confounders they're worried about and what sensitivity analyses they'll expect — information that would otherwise emerge only at the time of submission review, too late to address.
The EMA has parallel mechanisms through its PRIME pathway and Regulatory Science to Innovation (RSI) program. European RWE submissions face somewhat different standards: the EMA has historically been more conservative about synthetic control arms than the FDA, though recent guidance documents suggest convergence. Sponsors planning global submissions should seek alignment from both agencies before study initiation.
What This Means for Patients Participating in Hybrid Trials
If you're enrolled in a hybrid or decentralized trial that uses wearable monitoring and EHR data linkage, your standard-of-care medical records are likely contributing to the study's external control or safety database — sometimes with explicit consent, sometimes under a broad data use agreement that your institution signed. This is worth understanding before enrollment. The consent document should specify what data is being collected beyond the study protocol visits, how long it will be retained, and whether your identifiable data leaves your institution.
More practically: in EHR-linked trials, missing study visits are less catastrophic than in traditional protocols because the EHR captures ongoing clinical activity. But the data quality of what gets captured depends heavily on which health system you receive care in — a teaching hospital with structured clinical data entry will generate more complete and useful EHR data than a small community practice using legacy systems. This isn't a reason to avoid these trials, but it's context worth having.