The need for a data-quality framework is clear as real-world data (RWD) becomes increasingly complex and diverse. If we're talking about Electronic Health Record (EHR) data quality though, let’s take a moment to dive a bit deeper into RWD from the EHR.
Flatiron Health’s RWD is curated from the EHRs of a nationwide network of academic and community cancer clinics. The richest clinical data, like stages of diagnosis and clinical endpoints, exists in unstructured fields. It's challenging and complicated to pull that data, requiring both human interaction (including 2,000 human abstractors at Flatiron), machine learning, and natural language processing.
And it’s clear that quality matters. Recently, we’ve seen the growth of regulatory and policy guidance around its use by the FDA, EMA, NICE, Duke-Margolis Health Policy Center, and others. It's also clear that quality is not just a single concept. It has multiple dimensions, which fall into the categories of relevance and reliability.
Assessing RWD: Relevance.
Relevance of the source data has several subdimensions:
These are traditionally assessed in support of a specific research question or use case. But at Flatiron, we must think more broadly to ensure our multi-purpose datasets capture variables that address the most common and important use cases (e.g., natural history, treatment patterns, safety and efficacy).
We also consider relevance as we expand our network – relying not only on community clinics that use our EHR software, OncoEMR®, but also intentionally partnering with academic centers that use other software. This enables us to improve the number of patients represented and make sure we’re aligned to where cancer patients actually receive care.
Assessing RWD: Reliability.
Another dimension of quality is reliability, which has several critical sub-dimensions:
At Flatiron, we have developed important processes and infrastructure to ensure our data is reliable, with clear operational definitions. Our clinical and scientific experts help establish these processes, whether using an ML algorithm or guidance for human abstraction.
How does Flatiron ensure accuracy through validation?
We perform validation at multiple levels throughout the data lifecycle, e.g., at the field level at the time of data entry and at the cohort level. We use different quantitative and statistical approaches to validate the data at different levels – using a range of metrics depending upon the approach.
Figure 1:
Examples of validation approaches we use at Flatiron Health include:
Figure 2:
Accuracy: Verification Checks.
Using clinical knowledge, we also monitor data and address discrepancies and outliers over time through different types of verification checks:
An example is using clinical expertise to evaluate temporal plausibility of a patient's timeline of diagnosis, treatment sequences, and follow-up to assess whether data are logically believable.
Accuracy: Completeness.
Completeness is a critical complement to accuracy to assess reliability. It’s not enough for data to be accurate, it must be present first! We realize though that completeness in EHR based data is unlikely to be 100%.
To ensure completeness is meeting an acceptable level for quality, we place controls and processes in place across multiple levels. Data flows through many channels between the exam room and the final dataset – each step along the way is a point at which some elements may be lost, mislabeled, or inappropriately transformed. Thresholds are based on clinical expectations. In addition, integration of sources within or beyond the EHR can improve completeness.
In Summary…
Understanding the quality of RWD is critical to developing the right analytic approach. But quality is not measured by a single number, it requires multiple dimensions. At Flatiron, applying cross-disciplinary expertise across the data lifecycle and a commitment to data transparency ensure our data users are equipped with the knowledge they need to generate impactful RWE.