Who We Are
Investor Relations

Reshaping pharmaceutical content atomization and tagging with generative AI: How we transformed NEXT Commercial Content Intelligence

30 Aug 2023
Elevating NEXT Commercial Content Intelligence
The sheer volume of customer-focused and healthcare professional (HCP)-focused content within life sciences organizations presents a significant challenge in effectively managing and utilizing this vast information. As life sciences organizations continuously generate an enormous amount of content, ranging from scientific research and clinical trial data to marketing materials and educational resources, it becomes increasingly difficult to streamline, categorize, and tailor content to meet the specific needs and preferences of diverse audiences.
Indegene’s NEXT Commercial Content Intelligence is a proprietary artificial intelligence (AI)-based solution for content atomization and tagging, specifically engineered for life sciences organizations. By utilizing robust artificial intelligence/machine learning (AI/ML) models, NEXT Commercial Content Intelligence (NCCI) deconstructs an array of assets, including iDetails, representative triggered emails (RTEs), banners, webpages, and videos, into atomized versions and then helps generate relevant tags.
Serving over 10 pharmaceutical majors in the past half a decade, NCCI has demonstrated capabilities in being an automated, consistent, and scalable solution, transforming multiple use cases, including content discoverability, analytics, personalization, and many more.
With generative AI, NEXT Commercial Content Intelligence has now further enhanced its capabilities. This blog post details how NCCI became a generative AI–powered tool, including the opportunities explored, challenges faced, and outcomes achieved.
Areas of impact with Generative AI
While NEXT Commercial Content Intelligence’s current supervised learning–based models were adequate, generative AI offered the opportunity to achieve fully automated atomization and tagging, making the entire journey more efficient and effective. The steps involved in this journey and the potential impact of generative AI are summarized in the table below:
StageActivitiesImpact of generative AIImpact level
Taxonomy Design
Use case comprehension, tag requirements, category/definition crafting, overlap analysis, manual tagging, and client revisions
Expedites taxonomy creation, seamlessly handles overlaps, and streamlines revisions
Model Development and Deployment
Dataset preparation, model creation, validation, and deployment
Bypasses conventional model generation steps by using intelligent prompts and optimizing model development
Machine Tagging
Content upload, running the model, and output generation
Subject matter expert (SME) Review
Asset audit, content comprehension, and tag validation
Generative AI enables content summarization, key area highlighting, and SME workload reduction
Output Generation and Transfer
Reports and API tag transfers
Model Retraining
Data selection, pipeline run, validation, and new version deployment
Generative AI–driven accuracy reduces retraining needs, allowing the effective use of intelligent prompts for efficient model enhancement
Getting started on the generative AI path
While powering NEXT Commercial Content Intelligence with generative AI is critical to staying future ready, the path to production involved several challenges. The following section attempts to explain the process of transitioning to generative AI for one core capability, that is, atom-level key message generation.
NEXT Commercial Content Intelligence’s core contained an atom-level key message generation model that was built through supervised learning, utilizing 250,000+ content pieces specific to the pharmaceutical domain. This model included 17 categories (efficacy, safety, brand information, study design, etc.) and was continuously retrained over the past 3 years to attain an accuracy of 83%. The aim was to now improve this model with generative AI.
Pilot experiment
The NEXT Commercial Content Intelligence team experimented using a limited sample size with GPT 3.5 Turbo to generate key messages for 100 atoms. Several prompting techniques were experimented with before narrowing down to the prompt with the highest prediction accuracy.
Trial numberPrompt parametersAccuracy with GPT-3.5 Turbo
Category names and OCR text of the atom
Category definition and OCR text of the atom
Category definition, OCR text of the atom, and role definition
Categories redefined (overlaps were identified and removed), OCR text of the atom, and role definition
Since the pilot experiment now had an accuracy higher than the existing model, the team decided to productionize the feature with GPT-3.5 Turbo.
Scaled experimentation
The experiment was then extended to 1,000 atoms, and 3 runs were conducted. The following observations were noted:
Consistency between the runs was only 30%, that is, the same prompt given 3 times resulted in the same output only in 30% of the cases
Unintended responses were received in 20% of the cases. This included
Response outside the given/defined categories
Response with no categories
To improve the response and make the model more deterministic, hyperparameter tuning was used. After several batch tests, the temperature and top P values were optimized. The consistency improved to 87%.
To avoid unintended responses, a condition was set to verify the response before acceptance and a prompt was defined to resend and revise the output. This reduced the unintended responses to <2%. For the cases where the output was still unintended, the platform was modified to provide default values.
Final run and deployment
After hyperparameter tuning, a run for 1000 atoms was conducted and an accuracy of 84.8% was achieved. This model was cleared for deployment because the results were significantly higher than the earlier available model.
The deployment architecture was designed to integrate NEXT Commercial Content Intelligence’s content atomization workflow with the new GPT-3.5 Turbo–based atom-level key message identification. This deployment also took into consideration the data privacy and security concerns of the pharmaceutical majors, making the information exchange completely secure and protected.
Custom taxonomies and use cases
Every NEXT Commercial Content Intelligence customer requires additional custom categories in addition to the default set already present in the core model. Experiments were run to accommodate new categories and modify existing categories to ensure that similar accuracies could be achieved for custom categories with prompt tuning.
Benefits Achieved
Reduced time to deployment: A 70% decrease in time to deployment was achieved because the model could be tuned with just prompts and did not require custom dataset preparation and model customization.
Reduced total costs: The total cost of deployment and operations for 5000 pages of content is summarized in the graph below. While model training, deployment, and retraining costs were lower for GPT-3.5 Turbo, the operational charges were slightly higher. However, the total cost of tagging reduced by a staggering 76%.
Next steps and planned refinements
The NEXT Commercial Content Intelligence team noticed that GPT-3.5 Turbo produced lower accuracy in one key area, that is, key message identification for graphical components with unstructured optical character recognition (OCR) text such as logos, graphs, illustrations, diagrams, and so on. Experiments are now being conducted to improve results in this area.
Additional Generative AI–powered capabilities
With similar replacements to the core models, NEXT Commercial Content Intelligence is now powered by generative AI for most of its capabilities. A few of these key capabilities and their performance are summarized in the table below:
Taxonomy Design
Taxonomy category definition
Taxonomy definition validation
Taxonomy overlap analysis and redefinition
90% reduction in efforts required for a taxonomy design with generative AI support
Model Development and Deployment
Content atomization
New models (e.g., Segment Anything) were able to maintain the contours of graphics and identify atoms with a coverage of 93%, compared with the earlier model’s coverage of 78%
Brand identification
The GPT-3.5 Turbo–based model improved accuracy from 83% to 93.33%
Therapy area identification
The GPT-3.5 Turbo–based model improved accuracy from 79% to 83.33%
Keyword identification –English
The GPT 3.5-Turbo–based model improved coverage from 76% to 93.33%
SME Review
Tag validation
The GPT 3.5-Turbo–based summary generation and attention optimization reduced SME review efforts by 54%
*Results are based on limited experimentation.
The future with generative AI
NEXT Commercial Content Intelligence was built to transform pharmaceutical content tagging and make it more automated, consistent, scalable, and affordable. Today, generative AI presents a transformative opportunity to further enhance these core goals.
However, the implementation of generative AI to deliver consistent and reliable output for business use cases comes with a set of factors that must be considered. These include:
Model reliability: The degree of control and understanding of model characteristics in a generalized large language model (LLM) model is much lower than a supervised learning model. This makes it necessary to regularly run scaled experiments to verify and sustain reliable output.
Version updates: LLM models in the market are continuously updated and upgraded, and as observed during our experimentation, the output from each of these versions is different. This requires continuous monitoring to ensure consistency and continuous improvement.
Consistency-Accuracy balance: When generative AI is used for classification, low randomness affects the accuracy of the prediction, while high randomness affects consistency. Therefore, a balance between accuracy and consistency should be maintained for reliable output.
With newer and more powerful generative AI releases upcoming, the NEXT Commercial Content Intelligence team will continue to identify and experiment with more transformative changes that deliver performance gains to life sciences businesses. Watch this space for more updates from us in the coming months.


Gokuul Veerasamy
Gokuul Veerasamy
Sivakumar Tumma
Sivakumar Tumma

Insights to build #FutureReadyHealthcare