Blog post

Healthcare data challenges

Written by

Erin Lutenski

Published on

June 11, 2025

Whitepaper: Unlock the value of real-world healthcare data with confidential data clean rooms

As the amount of healthcare data from real-world settings grows, how can care providers and life sciences companies use this data to advance research and treatment while protecting sensitive patient information?

Download now

Key visual for unlocking real-world data with data clean rooms

Real-world data (RWD) is rapidly becoming a cornerstone of modern healthcare decision-making, enabling insights that support clinical trials, regulatory submissions, market access strategies, and population health management. Yet, realizing the full potential of real-world evidence (RWE) requires overcoming a set of persistent and emerging data challenges.

This article explores the most pressing healthcare data challenges — from methodological limitations and privacy risks to data standardization and underrepresentation — and outlines practical, forward-looking solutions. We also highlight how technologies like confidential computing are enabling privacy-safe collaboration between hospitals, life sciences organizations, and healthtech vendors.

Chainlink fence representing the challenges healthcare data brings — The barriers to data collaboration in the healthcare space are widespread — but not impossible to overcome

Key takeaways

Real-world data (RWD) and real-world evidence (RWE) are essential for supporting regulatory decisions, clinical research, and health technology assessments.
Key healthcare data challenges include bias, missing or low-quality data, privacy risks, regulatory ambiguity, and methodological limitations.
Operational hurdles often arise in multi-party collaborations due to inconsistent governance and infrastructure.
Ethical considerations go beyond compliance, requiring robust consent frameworks and transparency.
Linking different types of healthcare data (e.g. claims, genomic data, etc.) introduces interoperability and privacy issues.
Emerging tools like synthetic data and confidential computing are creating new possibilities for privacy-preserving analysis.
Decentriq enables secure, compliant collaboration by letting organizations extract insights from sensitive data without exposing it.

Definitions

Real-world data (RWD): Health-related data collected outside traditional clinical trials, including electronic health records (EHRs), claims data, patient registries, and wearable device data.
Real-world evidence (RWE): Clinical evidence derived from the analysis of RWD, used to evaluate the effectiveness, safety, and value of medical interventions.

Why is real-world data important in healthcare?

Despite the enormous potential of real-world data to improve clinical outcomes and support population health management, a staggering 97% of data generated by hospitals each year goes unused. This includes everything from clinical notes and EHR data to lab results and operational datasets — much of which could bridge the gap between research and real-world practice if unlocked safely and responsibly.

Real-world data is important in healthcare because it enables the generation of evidence (RWE) outside the constraints of controlled clinical trials. RWD supports:

Regulatory decision-making: Agencies like the Food and Drug Administration (FDA) and European Medicines Agency (EMA) are increasingly using RWE to support new indications and post-market surveillance.
Health technology assessments (HTAs): Payers rely on RWE to determine the real-world value of interventions.
Clinical research: RWD offers a broader, more diverse view of patient-reported outcomes across settings.
Population health management: Healthcare systems use RWD to track disease trends, optimize resource allocation, and personalize care.

Regulatory bodies are also embracing the value of real-world evidence. By the end of 2020, 90% of new FDA drug approval submissions included RWD, and the EMA has committed to integrating RWE in regulatory and post-marketing safety evaluations by 2025. This shift underscores the growing importance of RWD in accelerating clinical and economic review processes.

By enabling broader, faster, and often more cost-effective data analysis than randomized controlled trials alone, RWD fills key evidence gaps and supports improved patient outcomes. However, as we’ll outline in the following sections, putting it to use for real applications is not without its challenges.

Challenges of using real-world evidence in healthcare

While the potential of real-world evidence to enhance clinical decision-making, regulatory review, and population health is widely acknowledged, unlocking its value is far from simple. Healthcare organizations face a broad spectrum of real-world data challenges — from fragmented systems and inconsistent standards to privacy concerns, bias, and methodological pitfalls.

Overcoming real-world data challenges in healthcare requires not just technical innovation, but also policy alignment, ethical rigor, and scalable infrastructure. Below, we unpack the most pressing barriers to using real-world evidence effectively, and outline actionable strategies to address them.

1: Heterogeneous data formats

Big data has the potential to support unprecedented opportunities and use cases within healthcare. RWD partnerships are already helping organizations reduce the cost of care, increase access to treatments, and improve outcomes. Expanding these partnerships into large-scale, ongoing collaborations that create an ecosystem of connected data could transform healthcare.

Imagine combining information from Internet of Things (IoT) devices with medical records to understand how lifestyle factors impact risk and disease progression. Or pooling siloed data together to identify seemingly unrelated symptoms that could help diagnose rare diseases faster.

Data collaboration solutions accommodating heterogeneous data types and formats can lead to more alignment between drug development, patient need, and increased coverage from payors.

Beyond the fragmentation between systems, many datasets lack adherence to widely accepted healthcare data standards such as FHIR (Fast Healthcare Interoperability Resources) and OMOP (Observational Medical Outcomes Partnership). This incompatibility hinders seamless integration and secondary use of data.

Real-world example:
In the U.S., efforts like the All of Us Research Program attempt to unify disparate data sources from EHRs, biospecimens, and wearables. However, inconsistent adherence to standard terminologies like LOINC and SNOMED CT poses major hurdles for data harmonization and interoperability.

2: Varied rules and regulations

The governance of personally identifiable data varies from country to country. Even under the EU General Data Protection Regulation (GDPR), there are differences in implementation and interpretation across member states.

These differences lead to uncertainty and risks in navigating the legal landscape around collaborating on health data. This can deter manufacturers of digital health products and providers of digital health services from expanding into new markets and limit cross-border collaborations.

Opting for a solution with an end-to-end encrypted approach for data collaborations enables full guarantees for privacy and legal compliance.

3: Fear around how data and insights will be used

Beyond rules and regulations that protect patient privacy, organizations also need to consider how insights will be used. Medicare Advantage, a U.S.-based private health plan, is allegedly using artificial intelligence algorithms to reduce or deny coverage, according to a STAT investigation.

In one example, the insurer cut off coverage of hospital care when a woman exceeded the predicted recovery time, despite medical records showing she was unable to return home.

Even legally compliant projects can raise ethical concerns. Examples include:

Vague or poorly communicated patient consent processes
Use of patient data in ways that communities find exploitative or opaque

Maintaining public trust requires going beyond check-the-box compliance — toward transparency, community engagement, and clear value communication. To provide the necessary level of transparency and control, partners should choose collaboration methods where raw data is never shared with or accessible to external parties.

To participate in data collaboration, organizations need to understand how sensitive data will be used to avoid even the perception that sensitive information was leaked or misused. This is critical in protecting the organization’s reputation, as well as building and maintaining patient trust.

Data custodians should retain control of their data at all times and have transparency around access rights, planned analysis, approved outputs, and how those outputs will be used.

Once these challenges are overcome, the potential for increased RWD use is enormous: People are often willing to share anonymized health data — especially when the purpose is clear, the data remains protected, and the researcher is credible.

Surveys have shown 81% of people with chronic illnesses and 71% of the general population are open to sharing their anonymized health data for research purposes.

Factors that influence this willingness include:

The credibility of the researcher
Belief in the benefit of the research
Transparency around how data will be used
Strong assurances of privacy and security

4: Protecting intellectual property

Privacy concerns also apply to proprietary information that might be exposed during data collaborations or algorithm development. Healthcare organizations and life sciences companies often work with sensitive intellectual property (IP), including novel algorithms, model weights, proprietary feature engineering strategies, and trade-secret datasets.

In these contexts, privacy protection isn't just about regulatory compliance with HIPAA or GDPR — it's about safeguarding competitive advantage and innovation pipelines.

This is particularly critical in collaborative research settings, where multiple institutions may jointly analyze real-world data but have asymmetric stakes. For instance, a pharmaceutical company might train a machine learning model on a hospital's patient data while contributing its proprietary modeling techniques. Sharing too much may risk IP leakage or inadvertent reverse-engineering of algorithms by third parties.

Emerging privacy-preserving technologies such as secure multiparty computation (SMPC), federated learning, and trusted execution environments (TEEs) enable organizations to collaborate without exposing their proprietary data or models. These methods allow algorithm training across distributed datasets while keeping raw data — and model internals — inaccessible to other parties. For example:

Federated learning allows model training to occur locally on each dataset, sharing only updated model weights with a central server. This keeps both the data and IP confined to each institution.
Homomorphic encryption and SMPC enable joint computations on encrypted data, ensuring that no party can infer sensitive information about the other’s inputs.
Confidential computing environments ensure that data and model logic remain encrypted even during processing.

Here's how these three options compare:


Technique	Where data lives	What is shared	Privacy benefit	Use case example
Federated learning	Locally at each institution	Updated model parameters (aggregated data)	Individual patient data gets aggregated at source. Pre-aggregated data stays local.	Hospitals train an AI model on local EHRs.
Secure multiparty computation (SMPC)	Distributed, encrypted	Predefined analysis outputs	All parties compute together without ever seeing each other's inputs.	Pharma firms and hospitals calculate shared stats.
Confidential computing / Trusted execution environments (TEEs)	Inside a secure enclave	Predefined analysis outputs	Raw data never leaves the source, only encrypted data — inaccessible to everyone else.	Multiple stakeholders run joint analyses securely.

Made with HTML Tables

The ability to protect IP while engaging in RWD collaborations is becoming a core requirement for modern data partnerships. Organizations that fail to address this risk not only jeopardize their compliance posture but also their competitive edge in an increasingly data-driven industry.

5: Expensive, time-consuming collaborations don’t scale

Historically, establishing data collaborations has been hard. This involves finding partners who are able and willing to partner on their sensitive data, setting up complicated frameworks, and working through data protection requirements.

As a result, collaborations can take at least a year to establish, which is wildly at odds with the pace of technology and the demand for better treatments faster.

Collaboration agreements and technology solutions are so unique to the specific partners that they become single-use solutions. Every new partnership requires another round of time-consuming negotiations and costly setup. This discourages collaboration with more diverse partners that could provide broader data sets.

As brought up in challenge 2, this process can be shortened significantly by selecting a zero-trust solution. Using confidential computing and other advanced privacy technologies can provide hard proof that data remains confidential, facilitating compliance with data protection regulations. This eliminates the need for repeated compliance agreements and documentation.

Using solutions employing these measures can reduce the time needed to establish collaboration frameworks by 90%.

6: Rigid solutions impact data usability

The above challenges can result in very rigid collaboration methods and limited data sets. For example, some patient data might be excluded due to privacy concerns, or hospitals might opt out of the study if the barriers seem too high. Reducing the available data or severely limiting how it can be used restricts the accuracy and scope of the RWE that can be derived.

Collaborators need flexible solutions that allow participants to manage the tradeoff between privacy and utility based on the scope of the study and the comfort levels of the data custodians.

These solutions should be easy to use and able to guarantee that outputs won’t reveal private or confidential information, even if some participating organizations only contribute a few cases. Enabling organizations of all sizes to contribute results in more complete data sets.

7: Data quality and standardization

The integration of RWD from diverse sources — such as electronic medical records, wearable devices, and administrative databases — often leads to inconsistencies in data quality and a lack of standardization. Variations in data collection methods, coding systems, and terminologies can result in incomplete, inaccurate, or non-comparable datasets, hindering effective analysis and decision-making.

For example:

Inconsistent coding systems: Different healthcare institutions may use varying coding standards like ICD-10, SNOMED CT, or LOINC, leading to challenges in data harmonization.
Variable data entry practices: Manual data entry can introduce errors, and the absence of standardized protocols exacerbates discrepancies.
Lack of interoperability: Disparate EHR systems often lack seamless communication, making data exchange and integration difficult.

Real-World Example

The National COVID Cohort Collaborative (N3C) in the United States aggregated EHR data from multiple institutions to study COVID-19. However, differences in data standards and collection methods across these institutions led to significant variability in data quality, complicating large-scale analyses and highlighting the need for standardized data practices.

8: Representation, bias, and equity

RWD often fails to adequately represent diverse populations, leading to biases that can affect the generalizability of research findings and perpetuate health disparities. Underrepresentation of certain groups — such as women, racial and ethnic minorities, and rural populations — can result in inequitable healthcare outcomes. This can take the form of:

Sampling bias: Data collected may disproportionately represent certain demographics, skewing results.
Algorithmic bias: Predictive models trained on biased data can perpetuate existing disparities.
Lack of social determinants data: Failure to capture data on social determinants of health (e.g., socioeconomic status, education) limits the ability to address health inequities.

Real-world example:

Pulse oximeters have been found to provide less accurate readings for individuals with darker skin tones, leading to potential underdiagnosis of hypoxemia in these populations.
In the UK, the National Liver Offering Scheme (NLOS) algorithm used for organ transplant allocation has been criticized for disadvantaging younger patients, raising concerns about fairness and equity in automated healthcare decisions.

9: Methodological limitations

One of the most persistent challenges of using real-world evidence in healthcare is the inherent methodological complexity involved in working with observational data. Unlike randomized controlled trials, real-world data is not collected in a controlled environment, making it more prone to confounding variables, selection bias, and inconsistent data collection methods across multiple sources.

Key analytical issues include the lack of standardized endpoints, variability in data granularity, and missing or incomplete patient information. These limitations complicate the ability to generate reliable causal inferences, particularly when assessing comparative effectiveness or safety.

As a result, healthcare data users must apply robust statistical techniques, such as propensity score matching, instrumental variable analysis, or machine learning-based corrections, to extract meaningful insights.

However, the sophistication required often limits the scalability or reproducibility of RWE studies—especially when transparency in methods is lacking or datasets are inaccessible for peer review.

Real-world example

A study conducted at Bellvitge University Hospital in Spain assessed the effectiveness of a time-dependent treatment for hospitalized COVID-19 patients using observational data. The researchers identified several methodological biases that could distort treatment effect estimates, including:

Immortal-time bias: Patients who received the treatment had to survive until the treatment was administered, introducing a period during which they were "immortal" concerning the outcome, potentially leading to an overestimation of treatment effectiveness.
Confounding bias: Differences in baseline characteristics between treated and untreated patients could influence outcomes, making it challenging to attribute effects solely to the treatment.
Competing risks: The presence of other events (e.g., discharge, transfer) that preclude the occurrence of the primary outcome (e.g., in-hospital death) can bias results if not appropriately accounted for.

The study demonstrated that applying naïve analytical methods, such as the Kaplan-Meier estimator, without addressing these biases, led to overestimated treatment benefits. By employing more sophisticated techniques, like the weighted Aalen-Johansen estimator within an emulated trial framework, the researchers obtained more accurate estimates of treatment effects.

This example underscores the necessity of rigorous methodological approaches when analyzing RWD to inform clinical decision-making.

Case study

Challenge

A leading global pharmaceutical company sought to analyze cardiovascular disease data from more than one million patients across multiple European countries.

However, the sensitive nature of patient-level data and the fragmented landscape of data governance frameworks across jurisdictions made direct data sharing unfeasible. Privacy regulations, local compliance constraints, and the need to maintain data residency threatened to delay or derail the initiative entirely.

Solution

To overcome these data challenges in healthcare, the company partnered with Decentriq to enable a privacy-preserving analytics environment. Using Decentriq’s secure data clean rooms, healthcare providers in each country were able to contribute electronic health records (EHRs), administrative data, and other real-world data (RWD) without exposing any patient-level information.

The platform ensured end-to-end encryption, strict access controls, and compliance with local data protection standards — all without centralizing or moving the data.

Results

The collaboration successfully unlocked high-value insights from a combined dataset of over one million patient records — without any raw data ever leaving its original jurisdiction. Researchers were able to identify treatment patterns, comorbidity trends, and real-world clinical outcomes across diverse patient populations.

The project not only enabled scalable and compliant population health research, but also demonstrated a blueprint for multinational RWD collaboration in a highly regulated environment.

Read the full case study here: Decentriq facilitates analysis of data from over one million cardiovascular disease patients

The path forward for privacy-safe collaboration in healthcare

As the healthcare ecosystem becomes increasingly data-driven, privacy and collaboration are no longer trade-offs — they are interdependent. Secure access to high-value patient data across departments, networks, and partners is essential for generating actionable real-world evidence.

Decentriq enables this future through confidential computing, allowing multiple stakeholders to collaborate on sensitive data without exposing it. Our platform supports compliance with strict data protection laws while enabling high-impact analytics.

Explore Decentriq’s data clean rooms for healthcare.

By embracing innovative technologies and addressing the full spectrum of data challenges — from quality and representation to governance and methodology — the healthcare industry can unlock the true potential of real-world data to improve clinical and economic outcomes for all.