How to cultivate trust & connections in life sciences in a time of AI bias.
From drug discovery to supply chain optimization, artificial intelligence (AI) has made inroads into nearly every facet of the life science industry. AI brings many real-world benefits to the life sciences, including streamlined drug candidate selection, more efficient manufacturing, and advancing new therapeutic modalities, such as gene therapies.1
With increased access for basic researchers, widespread utilization of AI is in arms reach.2
However, there are concerns about the unfettered development and application of AI in the life sciences, particularly as these systems become more integrated into critical decision-making processes. Concerns about training data bias have gained attention as they have the potential to create AI systems that reinforce systemic, societal biases.
Bias is not a new thing in the life sciences. The reason clinical studies are “blinded” or “double-blinded” is to reduce the biases of clinical trial staff and volunteers involved in a study. These methods are incredibly effective in clinical testing yet are not widely used in pre-clinical testing or R&D laboratories.3
Several classic studies have confirmed that observer bias significantly affects the quality of the data collected.4 These and other biases have all played a role in science’s ongoing “reproducibility crisis:” The inability or difficulty reproducing many published scientific results.5
This current struggle provides a relevant backdrop for discussing the issues of AI bias.
AI bias is a regular pattern of divergence in the predictions of machine learning models, leading to precise projections for certain situations yet very inaccurate projections in other situations.
In data science, the aim is to construct models that demonstrate a random scatter of errors that are independent of one another.
AI bias problems can arise when training data doesn’t contain this scatter of errors. As a result, an algorithm’s prediction responds with an answer that aligns with the data they've been trained on. These prediction errors may not be scattered aimlessly but, rather, linked to certain data attributes.
As mentioned above, humans are biased across lots of different dimensions. There’s confirmation bias, hindsight bias, anchoring bias, and more. Our own biases and those of others can be difficult to root out, and the advantage of AI systems is that we can scrutinize them in detail, identify the roots of bias, and, when possible, make necessary corrections to eliminate them.
As a society, we need to acknowledge our own shortcomings and allow AI systems to steer us toward a path of more objective decision-making.
Biased data can significantly affect AI systems in multiple ways, perpetuating current social, cultural, and economic biases that exist in our world today. The impact could be far-reaching, leading to unfair decision-making and a reinforcement of systemic bias.
Here are a few ways AI is affected by bias.
Biased AI can produce inaccurate or misleading results. Consider an algorithm that is developed to help identify breast cancer.
Such a model would be trained on data from a population with and without breast cancer and contain features such as onset age, estrogen and progesterone receptors, BRCA1 and BRCA2 gene mutations, position and size of masses in the breast, and other factors that are known to be clinically-relevant.
If this model were used to predict the likelihood of breast cancer in males (less than 1% of all cases), the predictions would likely be highly inaccurate, as the predictive clinical factors are totally different for males. Breast cancer in males is less linked to hormonal levels; the BRAC1 and BRAC2 mutations are much less common in men and pose a lower risk; the onset age in men is much later; and how the tumors manifest themselves is different.
Bias in AI can skew results and lead to unfairness. There are a number of areas where this can be an issue:
Clinical trial representation: If AI algorithms used in cell and gene therapy research are trained on biased datasets that underrepresent certain populations, it can lead to disparities in the development and evaluation of therapies. This can result in a lack of diversity in clinical trials, limiting the generalizability and effectiveness of treatments for underrepresented groups.
Treatment recommendations: AI systems employed to aid in treatment recommendations for cell and gene therapy may be influenced by biases present in the training data. If the data primarily consists of patients from specific demographics, the recommendations generated by the AI may not adequately address the needs and variations of other populations. This can lead to unequal access to appropriate therapies and suboptimal outcomes for marginalized communities.
Healthcare resource allocation: Biased AI systems utilized in determining resource allocation and decision-making for therapeutics, like cell and gene therapy interventions, may perpetuate existing disparities. If these systems are trained on biased data or biased assumptions, they may inadvertently prioritize certain patient groups over others, exacerbating health inequalities and limiting access to potentially life-saving treatments.
Ethnic and genetic diversity: Cell and gene therapy approaches may differ in their efficacy and safety across diverse ethnicities and genetic backgrounds. If AI algorithms used to predict treatment outcomes or assess risk are not appropriately trained on diverse datasets, it can lead to disparities in therapeutic outcomes, as the algorithms may not account for the unique characteristics and responses of various populations.
When AI systems produce skewed results, it can result in users losing faith in these technologies.
A 2019 study published in Science revealed that a widely used healthcare risk prediction algorithm designed to predict which patients would benefit from extra medical care was significantly biased.6 This algorithm was used to predict health risks for millions of patients in the United States.
However, the problem was that the model's predictions were racially biased. It consistently underestimated the health needs of black patients, even when they were sicker than their white counterparts. This was because the model used healthcare spending as a proxy for health needs, and due to various systemic factors, less money is spent on healthcare for black patients at the same level of need.
The developers of the algorithm hadn't intended to include racial biases, but the bias emerged from the data they used to train their model. The discrepancy wasn't revealed until researchers from the University of California, Berkeley, scrutinized the model.
This example led to a significant erosion of trust in that specific AI model and caused many to question the use of similar models in healthcare.7 It demonstrates the importance of carefully evaluating both the data used to train these models and the proxies used for prediction to avoid unintentional yet harmful biases.
Leveraging biased AI models in cell and gene therapy or other therapeutics could lead to several regulatory and legal issues.
Violation of anti-discrimination laws: If an AI model trained on biased data results in different treatment options for different racial or ethnic groups, this could potentially violate anti-discrimination laws. This includes the U.S. Civil Rights Act, which prohibits discrimination on the basis of race, color, national origin, sex, and religion in various settings, including healthcare.
Un-informed consent: If patients are not made aware that an AI model may be less accurate for their demographic group due to biases in the training data, they could argue that they could not provide fully informed consent for their treatment.
HIPAA violations: If bias in AI models leads to inappropriate treatment recommendations, this could potentially lead to breaches of the Health Insurance Portability and Accountability Act (HIPAA) in the U.S., which requires that covered entities take reasonable steps to ensure the minimum necessary use of Protected Health Information.
FDA recalls: The U.S. Food and Drug Administration (FDA) regulates medical devices, which includes certain types of AI models used in healthcare. If an AI model is found to be biased and lead to harm, this could result in the model being recalled or its FDA approval being revoked.
Medical malpractice: If biased AI leads to incorrect treatment recommendations and a patient is harmed as a result, the healthcare providers who relied on the AI model could potentially be sued for medical malpractice.
Liability complications: There may also be liability issues associated with the developers of the biased AI model, especially if they were aware of the biases and did not take sufficient steps to correct them.
To avoid these regulatory and legal issues, it's crucial to ensure that AI models used in healthcare are trained on diverse and representative data and that potential biases are rigorously tested.
In the field of cell and gene therapy development, AI bias is a major concern. The effectiveness of AI models is inherently tied to the quality of the data they are trained on. The training data typically used is limited and predominantly comprises a small population of patients who are already undergoing treatment for the targeted rare diseases.
Therefore, insufficient training data can lead to bias and ultimately to flawed model assumptions, potentially resulting in serious complications among the treated population. Unrecognized systematic biases could impact the immunogenicity, efficacy, dosing, and even the patient selection or stratification for a clinical trial.
Identifying the presence of bias in data and model predictions often involves meticulous analysis. However, remedying this bias can prove to be an even more complex task.
Take, for example, a model designed to establish the most effective treatment protocols for prostate cancer. The dataset comprises historical data on men diagnosed with prostate cancer within the United States. Various factors, such as ethnicity, culture, and diet, lead to varying responses to treatment protocols amongst different groups.
A comparison of elderly men in southern Louisiana and Inuit males in northern Alaska might reveal significant disparities. Upon close scrutiny, one might find that Louisiana men have access to treatment protocols such as immunotherapy and cryosurgery, options that may not be readily available to their Alaskan counterparts. Alternatively, the effectiveness of procedures like prostatectomies might differ significantly between these two groups.
Without accounting for this type of bias in this training data, an AI model may be inaccurate when challenged with real-world data.
One strategy to alleviate this bias could involve developing distinct models for each identifiable group of people. However, this approach can be problematic as increasingly specific data segregation can enhance the model's relevance for a particular group but at the cost of reduced statistical power and higher error probability.
Integrating additional features into a generalized model to accommodate for treatment protocol, regional, and age factors can lend the model more adaptability, but it can also lead to overfitting and potentially misleading predictions.
The availability of more comprehensive data for a specific subset of people, categorized by a particular treatment protocol, geographic region, and age group, would enable us to construct a superior model. If we could collect enough data to adequately represent each possible group, we could alleviate the bias.
This raises the question of how much data is truly "adequate" to make an "accurate" prediction for a specific patient. Answers to such questions are rarely straightforward and continue to challenge AI systems developers. It's often best to accept that certain subpopulations will inevitably exhibit some degree of error in the model. Thus, while making decisions for these individuals, it's advisable to rely less on AI model predictions and more on the expert opinions of medical professionals in the field.
Currently, the chances of eliminating all bias seem slim.
Inevitably, a generalized AI model will present detectable biases in certain subpopulations when producing predictions. The emergence of these biases could be attributed to various reasons, often linked to unforeseen consequences arising from the historical data used.
However, identifying bias is an important first step in eliminating it. Progress continues to be made on this front. As AI scientists incorporate this knowledge into their methodologies for constructing, implementing, and continually testing models, we are likely to become more proficient in guaranteeing the fairness of and trustworthiness in AI applications.
Within the healthcare sphere, particularly in the development of cell and gene therapies, AI should be viewed as an instrument that enhances efficiency and precision rather than a substitute for the discernment of researchers, developers, manufacturers, and medical professionals.1 AI, with its capacity to process and evaluate myriad signals concurrently, can be an invaluable asset in the battle against rare diseases.
The intrinsic biases present in the data used to formulate and evaluate treatments will increasingly be understood as time progresses. Although I hold that bias can never be entirely eliminated, AI models can still be instrumental to humanity in several crucial ways, aiding in the advancement of treatments and the provision of care to patients.
Dr. Nipko has been a leading data scientist for more than 25 years, with a current focus on applying the latest AI models to address challenges in cell and gene therapy as Vice President of Artificial Intelligence for Form Bio.