#FutureReadyHealthcare

Who We Are

Careers

Reshaping pharmaceutical content atomization and tagging with generative AI: How we transformed NEXT Commercial Content Intelligence

30 Aug 2023

Elevating NEXT Commercial Content Intelligence

The sheer volume of customer-focused and healthcare professional (HCP)-focused content within life sciences organizations presents a significant challenge in effectively managing and utilizing this vast information. As life sciences organizations continuously generate an enormous amount of content, ranging from scientific research and clinical trial data to marketing materials and educational resources, it becomes increasingly difficult to streamline, categorize, and tailor content to meet the specific needs and preferences of diverse audiences.

Indegene’s NEXT Commercial Content Intelligence is a proprietary artificial intelligence (AI)-based solution for content atomization and tagging, specifically engineered for life sciences organizations. By utilizing robust artificial intelligence/machine learning (AI/ML) models, NEXT Commercial Content Intelligence (NCCI) deconstructs an array of assets, including iDetails, representative triggered emails (RTEs), banners, webpages, and videos, into atomized versions and then helps generate relevant tags.

Serving over 10 pharmaceutical majors in the past half a decade, NCCI has demonstrated capabilities in being an automated, consistent, and scalable solution, transforming multiple use cases, including content discoverability, analytics, personalization, and many more.

With generative AI, NEXT Commercial Content Intelligence has now further enhanced its capabilities. This blog post details how NCCI became a generative AI–powered tool, including the opportunities explored, challenges faced, and outcomes achieved.

Areas of impact with Generative AI

While NEXT Commercial Content Intelligence’s current supervised learning–based models were adequate, generative AI offered the opportunity to achieve fully automated atomization and tagging, making the entire journey more efficient and effective. The steps involved in this journey and the potential impact of generative AI are summarized in the table below:

Stage	Activities	Impact of generative AI	Impact level
Taxonomy Design	Use case comprehension, tag requirements, category/definition crafting, overlap analysis, manual tagging, and client revisions	Expedites taxonomy creation, seamlessly handles overlaps, and streamlines revisions	High
Model Development and Deployment	Dataset preparation, model creation, validation, and deployment	Bypasses conventional model generation steps by using intelligent prompts and optimizing model development	High
Machine Tagging	Content upload, running the model, and output generation	-	-
Subject matter expert (SME) Review	Asset audit, content comprehension, and tag validation	Generative AI enables content summarization, key area highlighting, and SME workload reduction	High
Output Generation and Transfer	Reports and API tag transfers	-	-
Model Retraining	Data selection, pipeline run, validation, and new version deployment	Generative AI–driven accuracy reduces retraining needs, allowing the effective use of intelligent prompts for efficient model enhancement	High

Explore how Indegene significantly boosted lead conversion rates by 30% for a major pharma company with AI-driven content personalization.

Getting started on the generative AI path

While powering NEXT Commercial Content Intelligence with generative AI is critical to staying future ready, the path to production involved several challenges. The following section attempts to explain the process of transitioning to generative AI for one core capability, that is, atom-level key message generation.

NEXT Commercial Content Intelligence’s core contained an atom-level key message generation model that was built through supervised learning, utilizing 250,000+ content pieces specific to the pharmaceutical domain. This model included 17 categories (efficacy, safety, brand information, study design, etc.) and was continuously retrained over the past 3 years to attain an accuracy of 83%. The aim was to now improve this model with generative AI.

Pilot experiment

The NEXT Commercial Content Intelligence team experimented using a limited sample size with GPT 3.5 Turbo to generate key messages for 100 atoms. Several prompting techniques were experimented with before narrowing down to the prompt with the highest prediction accuracy.

Trial number	Prompt parameters	Accuracy with GPT-3.5 Turbo
1	Category names and OCR text of the atom	33.2%
2	Category definition and OCR text of the atom	51.8%
3	Category definition, OCR text of the atom, and role definition	62.3%
4	Categories redefined (overlaps were identified and removed), OCR text of the atom, and role definition	89.8%

Since the pilot experiment now had an accuracy higher than the existing model, the team decided to productionize the feature with GPT-3.5 Turbo.

Scaled experimentation

The experiment was then extended to 1,000 atoms, and 3 runs were conducted. The following observations were noted:

Consistency between the runs was only 30%, that is, the same prompt given 3 times resulted in the same output only in 30% of the cases

Unintended responses were received in 20% of the cases. This included

Response outside the given/defined categories

Response with no categories

To improve the response and make the model more deterministic, hyperparameter tuning was used. After several batch tests, the temperature and top P values were optimized. The consistency improved to 87%.

To avoid unintended responses, a condition was set to verify the response before acceptance and a prompt was defined to resend and revise the output. This reduced the unintended responses to <2%. For the cases where the output was still unintended, the platform was modified to provide default values.

Final run and deployment

After hyperparameter tuning, a run for 1000 atoms was conducted and an accuracy of 84.8% was achieved. This model was cleared for deployment because the results were significantly higher than the earlier available model.

The deployment architecture was designed to integrate NEXT Commercial Content Intelligence’s content atomization workflow with the new GPT-3.5 Turbo–based atom-level key message identification. This deployment also took into consideration the data privacy and security concerns of the pharmaceutical majors, making the information exchange completely secure and protected.

Custom taxonomies and use cases

Every NEXT Commercial Content Intelligence customer requires additional custom categories in addition to the default set already present in the core model. Experiments were run to accommodate new categories and modify existing categories to ensure that similar accuracies could be achieved for custom categories with prompt tuning.

Benefits Achieved

Reduced time to deployment: A 70% decrease in time to deployment was achieved because the model could be tuned with just prompts and did not require custom dataset preparation and model customization.

Reduced total costs: The total cost of deployment and operations for 5000 pages of content is summarized in the graph below. While model training, deployment, and retraining costs were lower for GPT-3.5 Turbo, the operational charges were slightly higher. However, the total cost of tagging reduced by a staggering 76%.

Next steps and planned refinements

The NEXT Commercial Content Intelligence team noticed that GPT-3.5 Turbo produced lower accuracy in one key area, that is, key message identification for graphical components with unstructured optical character recognition (OCR) text such as logos, graphs, illustrations, diagrams, and so on. Experiments are now being conducted to improve results in this area.

Additional Generative AI–powered capabilities

With similar replacements to the core models, NEXT Commercial Content Intelligence is now powered by generative AI for most of its capabilities. A few of these key capabilities and their performance are summarized in the table below:

Stage	Activity	Performance^*
Taxonomy Design	Taxonomy category definition Taxonomy definition validation Taxonomy overlap analysis and redefinition	90% reduction in efforts required for a taxonomy design with generative AI support
Model Development and Deployment	Content atomization	New models (e.g., Segment Anything) were able to maintain the contours of graphics and identify atoms with a coverage of 93%, compared with the earlier model’s coverage of 78%
	Brand identification	The GPT-3.5 Turbo–based model improved accuracy from 83% to 93.33%
	Therapy area identification	The GPT-3.5 Turbo–based model improved accuracy from 79% to 83.33%
	Keyword identification –English	The GPT 3.5-Turbo–based model improved coverage from 76% to 93.33%
SME Review	Tag validation	The GPT 3.5-Turbo–based summary generation and attention optimization reduced SME review efforts by 54%

*Results are based on limited experimentation.

The future with generative AI

NEXT Commercial Content Intelligence was built to transform pharmaceutical content tagging and make it more automated, consistent, scalable, and affordable. Today, generative AI presents a transformative opportunity to further enhance these core goals.

However, the implementation of generative AI to deliver consistent and reliable output for business use cases comes with a set of factors that must be considered. These include:

Model reliability: The degree of control and understanding of model characteristics in a generalized large language model (LLM) model is much lower than a supervised learning model. This makes it necessary to regularly run scaled experiments to verify and sustain reliable output.

Version updates: LLM models in the market are continuously updated and upgraded, and as observed during our experimentation, the output from each of these versions is different. This requires continuous monitoring to ensure consistency and continuous improvement.

Consistency-Accuracy balance: When generative AI is used for classification, low randomness affects the accuracy of the prediction, while high randomness affects consistency. Therefore, a balance between accuracy and consistency should be maintained for reliable output.

With newer and more powerful generative AI releases upcoming, the NEXT Commercial Content Intelligence team will continue to identify and experiment with more transformative changes that deliver performance gains to life sciences businesses. Watch this space for more updates from us in the coming months.

Authors

Gokuul Veerasamy

Sivakumar Tumma