AI in the R&D Value Chain

21 Oct 2019
AI in the R&D Value Chain

Artificial intelligence (AI) is transforming the entire life science industry from R&D to commercial business functions. According to the Artificial Intelligence in Life Sciences Market — Growth, Trends, and Forecast (2019-2024) report by Mordor Intelligence, the AI market in life sciences is valued at $902.1 million in 2019 and is expected to grow at a CAGR of over 21%[1] for the next 5 years. According to Gartner, various important elements of AI technologies are at their peak in the hype cycle.

In comparison with other industries, the use of AI is modest in life sciences. However, even within life sciences, adoption of AI in regulated environments, as in the R&D value chain, is further behind other areas of life sciences, for a number of reasons.  

AI in Clinical, Regulatory, and Safety

The scope of our discussions is limited to clinical, safety, and regulatory functions of pharma companies-which may span the R&D and CMO (Chief Medical Officer) organizations in a typical large pharma. We identify emerging use cases in these areas and the types of AI technologies that can impact them. In recent years, there has been a surge in data—such as real-world data (RWD) usage across the value chain, explosion of number of cases in safety, etc. This significant and substantial increase in data only enables the assessment of AI across these areas. 

However, AI and machine learning (ML) capabilities are largely misunderstood and, due to their position at the peak of the hype cycle, people can have very high expectations from AI/ML inevitably leading to disappointment. This weakens the trust in AI at the executive level.  

Use cases in R&D

Dark data

Pharma companies today analyze and look for inferences from data that they are mandated to collect and analyze. Even though there is rapid data explosion in R&D, companies have not been able to leverage all of this new data to its potential for effective decision making. The need of the hour is to take advantage of the opportunities to generate value by deriving insights from this "dark" data (e.g., data from RWE/RWD, secondary research data, data from patient interactions like patient services, data from public sources about regulatory submissions/pathways, etc.) According to International Data Corporation (IDC) report sponsored by Seagate Technology, healthcare data alone will experience a compound annual growth rate (CAGR) of 36% through 2025.[2] Thus, exploring such data and deriving insights from them is the first opportunity to leverage AI.  

Some examples are:

  • Shorten drug approval cycle time using HA query analytics: During NDA/MAA approval processes, there are often queries from health authorities leading to back-and-forth interaction with pharma companies. This constitutes a significant part of the drug approval life cycle in the regulatory function. Analyzing past queries and finding patterns to anticipate and preempt similar queries from arising in order to overall shorten the drug approval cycle time is a reasonable goal for pharma companies to achieve in the short-term using existing AI technologies.

  • Deriving insights from voice data about patient’s perception and usage of products (e.g., PV contact center/patient services): Companies have long been interacting with patients through their patient support programs and having PV/MI contact centers. However, Natural Language Processing (NLP)/Natural Language Generation (NLG) technologies are now ready to be used on voice-based data, convert them automatically to text, and mine them to derive different dimensions of analytics. These analytics can help in product launch, product positioning, label extension, and regulatory and marketing strategies, given the data is now much more comprehensive than just the third party analysis of perceptions of the product or social media mining.

  • Deriving actionable and insightful regulatory intelligence: For units working in regulatory affairs and HA query assessment, who are looking to accelerate time-to-submission with high accuracy and compliance levels, advance technologies can render a smart regulatory intelligence platform that hosts features like regulatory history monitoring for drugs, clinical trial analytics, HA query assessment, competitor drug profile and submission history. The platform would leverage nuanced AI/ML algorithms to access information from all relevant public and internal sources of information to mine insights from the data and auto-generate reports and visualizations. In processes that are primarily run on human effort today, such a solution would greatly reduce the time to collect and evaluate information for making strategic regulatory decisions by 99%, at ~100% quality and compliance levels, in a scalable manner.

  • Submission pathway optimization and prediction: The approval cycle of a product can be complex, e.g., there are three defined procedures in EU to get the drug approved viz the mutual recognition procedure (MRP), the decentralized (DCP), and the centralized procedure (CP). However, the complexity increases multifold in case of large pharma companies mainly because of the large volume of submissions, which increases further as they look to expand current indications or update safety issues in labels, etc. Second, with on-going changes there are additional criticalities like identifying the most optimal pathway for approval for each submission, what items to bundle with each submission, and so on, the answers to which vary from one regulatory authority to another. Hence, there is considerable planning involved to arrive at the most optimal submission plan to speed up the approvals. Pharma companies make qualitative judgments when it comes to determining these pathways even though they usually have plenty of historic data from past submissions. Using this data to train machines to optimize submission pathways, planning, and predicting submission timelines is another opportunity with pharma companies for high impact.

  • Explore disease characteristics/patient characteristics: Pharma companies have increasingly started using external data sources like real world data (RWD) in an attempt to understand disease characteristics, patient characteristics, etc. to ensure that their research and clinical trials are better served by these data. Identifying the right disease characteristics for better understanding of patient cohorts is fundamental and can aid a wide variety of use cases. One such use case is patient recruitment, which is perhaps the one of the biggest bottlenecks in clinical trials. For innovator drugs, one needs to understand the population and identify patients who are best suited for the drug in question. While RWD is central to solving these use cases, AI techniques using clustering can help shed more light on how a real-world cohort of patients looks, what comorbidities may be associated with this population, how diagnoses are being conducted for this population, etc. in the real world.


R&D, being a cost center, is under constant pressure to do more with less, making sensible automations attractive options. In combination with other technologies, AI can play a significant role in reducing the cost of operations. However, automation also yields other benefits beyond just cost savings. Reducing overall processing times is an obvious benefit. Automation not only saves cost but also improves compliance and facilitates scaling of operations. For example, in pharmacovigilance case processing, if a case arrives on day 13 when it is due on day 15, it becomes impossible to handle unless turnaround time can be dramatically improved. Scaling of operations is achieved via automation particularly in regulated areas where scaling requires enough lead time to identify qualified resources and training them. Automation inherently can scale up or down as appropriate.

Automation can be performed in various ways, we focus essentially on using AI for automation. AI is required for automation in areas where decision-making and subjectivity are involved. In the R&D world, there are several areas that are process oriented but have human decision-making embedded in them making automation impossible without AI.

Some examples are:

  • Clinical trial analytics: Data in clinical trials take a long time to get cleansed, processed, and compiled warranting a large workforce. This activity is a great candidate to automate. Right from creating and using global libraries (GLIB) for designing case record forms (CRFs) to generating data queries (DQ) on collected data, significant human effort is spent in the conduct phase of the trial. Likewise, post the last patient last visit (LPLV), it takes significant time for companies to transform the data into ADAM (Analysis Data Model) dataset design, analyze the data, identify outcomes and issues, and generate tables-listing-figures (TLFs). This process can be assisted by using machines to perform intelligent CRF design, raise intelligent DQs, and perform predictive analytics on the interim trial data gathered—offering opportunities to make effective portfolio decisions while at the same time automating manual operations.

  • Intelligent case processing in PV/complaint handling: Safety automation is arguably the most popular use case in R&D today and several companies have embarked on this journey or actively evaluating this area. Complaint handling is a similar use case although the variation of the complaint handling process makes it somewhat harder. Most of the budget in quality and PV is spent in these activities, making this area an ideal candidate for automation.

  • Auto-creation of eCTD documents and labels: Medical content forms the crux of several aspects of the pharma value chain. Typically, content is produced once and consumed repeatedly in the pharma value chain. For example, content from CSR is used across submission documents or a core company data sheet (CCDS) is updated resulting in multiple downstream label updates. A lot of medical writers’ time and efforts go into searching for the right documents or updates, reading and collating information, and assembling things in the right template. With the current advancements in NLP and NLG technologies, one can automatically create draft versions of these documents by letting machines search for the right content, track updates, read the content, and collate the relevant components to create the first draft of the intended document offering large scope for automation.

  • Literature search: This is a time- and effort-intensive activity, multiple attempts have been made to automate acquisition and monitoring of literature articles across medical affairs, regulatory affairs, safety, clinical research but the efforts have been fragmented and disparate. The entire domain of literature search is therefore a good candidate for intelligent automation that would create a one-stop end-to-end solution leveraging existing technologies by aggregating them in a sensible manner. The solution is powered by machine learning algorithms that implement word-vector matching, supervised/un-supervised learning based on trends and patterns in search strategies, and AI/NLP to support advance search through verbatim text or context matching for yielding the best results.

Classification by AI technologies

As we explore the use cases above, we try to categorize them into three broad areas of AI, namely NLP, AI-based classification algorithms, and unsupervised learning.

NLP technologies: Natural Language Processing is the technology to process free form text and provide structured information. The opposite process of taking structured data and generating human readable free form text is known as Natural Language Generation. At the core, modern NLP systems attempt to convert words in documents into numeric vector representations that are then computable. These vectors encompass the words and the context of the words. The systems are trained with large number of documents. Today, state-of-the-art in NLP is being pushed forward by algorithms that increasingly use sophisticated mechanisms of processing large document collections and creating numeric vectors. NLP technologies are being trained across multiple languages and domain-specific ontologies and taxonomies, providing powerful ways of deploying automation to enhance information search and retrieval in the document authoring processes. In the medical and regulated realm, NLP is able to contextualize clinical terms within the right context and do this with very high levels of accuracy.

Classification algorithms: Classification is the method used to take structured data and generate business domain appropriate metadata that describes the data. There are many algorithms available for classification; however, most models are created through supervised machine learning. In supervised machine learning, subject matter experts annotate historical datasets with descriptive metadata and this information is used to train the machine. The models are very good at identifying the patterns and creating a model to predict the metadata for new information parsed into the model. Unlike NLP, which has broad industry-wide applications, classification models tend to get closer to specific business problems. Thus, subject matter expertise both at business and technical levels is key.

Properly designed classification models are significant tools in analytics and insight generation, acceleration of document authoring through reuse, and information search and retrieval. Classification algorithms require high-quality data and expertise in annotation. Yet, if there is enough data and the data is clean, one can expect high levels of accuracy, often in the range of 80%-95%, depending on the use case.

Clustering algorithms: Clustering algorithms are good examples of unsupervised learning. They break up data into cohorts with similar characteristics by using techniques like minimizing distance between characteristics of patients within the cohort while maximizing the distance across cohorts.

The table below shows some of the top use cases classified by AI technique:

 Insights of Dark DataAutomation
NLPHA Query AnalyticsAuto-creation of eCTD Dossiers
PV Contact Center/Patient Services AnalyticsAuto-creation of Label
Classification/ PredictionRegulatory IntelligenceIntelligent PV Case Processing
Optimal Submission PlanningAutomated Complaint Handling
Clustering AlgorithmsUnderstanding of Disease and Patient CharacteristicsLiterature Search


We reiterate the message that AI has a wide range of application in the clinical, regulatory, and safety realm. However, many of the use cases described have not yet been mainstreamed.

For more details, please refer to: