Utilizing natural language processing technologies to improve search capabilities and advance the development of therapies.
Drug development for rare diseases presents a host of challenges, primarily because the number of affected patients is so limited for each rare disease. A rare disease is any disease, disorder, illness, or condition that affects fewer than 200,000 people in the U.S. Of the approximately 7,000 rare diseases that have been identified in the U.S., more than 90 percent do not have a treatment approved by the U.S. Food and Drug Administration. However, rare diseases have a huge impact on our healthcare systems, as around one in 17 people will be affected by a rare disease at some point in their lives.
Challenges come at all stages in drug development. For example, clinical researchers are often unable to conduct randomized trials for certain rare diseases because patient populations are too small. Even when there are enough patients for a randomized clinical trial, rare diseases may present the problem of disease heterogeneity; that is, affected patients exhibit a wide range of differences in symptoms, severity, and progression.
The scarcity of data presents additional challenges for drug discovery and development. To advance understanding of rare diseases, researchers at pharmaceutical companies have traditionally searched large volumes of published data to find nuggets of information that link rare diseases with specific genes and gene variants. This can be a tedious task when done manually. More recently, however, utilizing natural language processing (NLP) technologies, pharma companies have improved their search capabilities, advancing the development and delivery of life-enhancing drugs.
Following are three use cases that showcase how pharmaceutical companies are leveraging NLP to accelerate the development of drugs to treat rare diseases.
Biopharmaceutical company Takeda uses NLP tools to examine gene-disease associations to assess potential disease severity for patients with Hunter Syndrome, a rare genetic disorder caused by a missing or malfunctioning enzyme that can lead to permanent damage, affecting appearance, mental development, organ function and physical abilities. Takeda developed an enzyme replacement therapy drug that showed potential to help young patients with the severe forms of Hunter Syndrome (particularly impacting brain function), but the company needed to identify patients with the greatest potential to benefit from the invasive therapeutic delivery.
Takeda researchers developed a suite of NLP queries to search scientific literature (abstracts and full text papers) for any associations between the relevant gene (iduronate-2-sulfatase, IDS), mutations, and cognitive impairment. The queries identified and classified every published mention of patients with Hunter Syndrome or related symptoms, in addition to associated gene variants and mutations.
These queries yielded results that enabled researchers to identify the location of specific mutations within the IDS gene and associate specific gene mutations with specific phenotypes. Takeda’s use of NLP text mining produced excellent results that matched or exceeded results from structured genetic databases. These results enable clinicians to make data-driven decisions, by providing the understanding of which genetic variants in their pediatric patients would be likely to lead to severe cognitive impairment.
Other drug developers have used NLP to search significant volumes of published literature for small, valuable chunks of information on rare disease patients to uncover associations between different genes and gene variants. For example, Agios Pharmaceuticals developed a virtual portfolio for orphan diseases by using NLP to systematically map the space around inborn errors of metabolism and link diseases to targets. Agios’ rare disease drug discovery program was founded on an understanding of the space gleaned from NLP queries that the company used to identify candidate diseases and candidate target genes.
Sanofi wanted to improve its search capabilities for drugs that could be used for rare disease patients. For this indication expansion or drug repurposing work, Sanofi needed a rapid systematic method to discover target-indication links, both direct and indirect. The traditional approach to find new target-indication proposals is generally an ad hoc process that relies on expert knowledge and experiment observations, which are usually limited and time consuming.
With NLP text-mining, the Sanofi team built an internal Rare Genetic Diseases Knowledgebase (RGDKb). NLP queries were developed to capture and integrate all the available information around drug targets from scientific literature, including causal mutations associated with diseases, the underlying causal pathways, cell types and clinical phenotypes, and associations with known drugs. These data were integrated with information from rare and genetic disease databases (Orphanet and OMIM) into RGDKB, and visualized in a dashboard, RareView, that illustrates the relationships between the diseases, underlying causal genes, pathways, and drug compounds by network view.
The approach of combining text mining with network visualizations allowed Sanofi researchers to quickly identify new indication opportunities from target-disease pairs that have various commonalities based on broad scientific, medical and strategic values, and the approach has been applied in several disease areas.
With small patient populations and a shortage of data, pharmaceutical companies and researchers will continue to struggle developing drugs for rare diseases. NLP, however, represents a potential game-changer, relieving pharmaceutical researchers of the need to spend hours scouring sometimes-obscure medical literature for scarce jewels of data linking rare diseases with certain genes or gene variants. With NLP, researchers can significantly advance their search capabilities, speeding the drug development process and accelerating the market availability of life-enhancing therapies for patients with rare diseases.
Jane Z. Reed, Director, Life Sciences, Linguamatics, an IQVIA company
Key Findings of the NIAGARA and HIMALAYA Trials
November 8th 2024In this episode of the Pharmaceutical Executive podcast, Shubh Goel, head of immuno-oncology, gastrointestinal tumors, US oncology business unit, AstraZeneca, discusses the findings of the NIAGARA trial in bladder cancer and the significance of the five-year overall survival data from the HIMALAYA trial, particularly the long-term efficacy of the STRIDE regimen for unresectable liver cancer.
Artificial Intelligence Makes Possible a Multiomic Approach in Oncology Drug Discovery
January 7th 2025While challenges remain, AI is accelerating the process by enabling researchers to identify and design new drug candidates more quickly and efficiently with applications in target discovery, structure prediction, and drug optimization.
Cell and Gene Therapy Check-in 2024
January 18th 2024Fran Gregory, VP of Emerging Therapies, Cardinal Health discusses her career, how both CAR-T therapies and personalization have been gaining momentum and what kind of progress we expect to see from them, some of the biggest hurdles facing their section of the industry, the importance of patient advocacy and so much more.