Using ML and AI to Infer HCP Specialties

8 min read

Using ML and AI to Infer HCP Specialties

Featured Image

By: John Seaner, CMO,

Specialty Physician Identification: Your Best Targets Could Be Hiding In Plain Sight and You May Be Missing Them

With real world data and modern analytical approaches utilizing machine learning and artificial intelligence, we can observe the degree to which any HCP behaves like a defined specialty. However, pharmaceutical sales operations teams often rely on outdated or inaccurate data when prioritizing HCP targets, limiting promotional effectiveness.

Healthcare at the Macro Level
The myriad of obstacles in achieving optimal patient outcomes in the current healthcare marketplace demands action. Total US expenditures in 2019 reached close to $4 trillion, yet the United States outpaces other nations in rates of death considered preventable by timely and effective care. Perhaps the trend that patients feel the most is the dire shortage in the number of healthcare professionals (HCPs) available to provide needed treatment. Throughout our own communities, COVID-19 exposed deep flaws in our system. As the demand for healthcare has increased, the quantity of physicians available to treat patients has decreased and health care professionals are being forced to practice both within and outside of their specialties. In just over a decade, our nation is projected to face a deficit of over 120,000 physicians. The confluence of an aging population, longer average life span and increased consumption of care have thrust HCPs into an impossible position: they’re being asked to keep up with runaway demand. This trend is especially pronounced in rural areas and historically underserved communities. Consider Mississippi, where the availability of HCPs has fallen to 31% below the national average. This deficit does not only exist among specialists, as primary care physicians who serve on the frontlines of patient care, are also in short supply.

Impact on the Life Sciences Industry
The tight supply of HCPs has predictably intensified competition for share of voice among pharmaceutical manufacturers. The number of marketed therapies per indication has risen steadily and, using oncology as a specialty example, competition is exceptionally fierce. Clinical development spend on cancer therapies has reached a staggering $91.1B – more than central nervous system, musculoskeletal and cardiovascular expenditure combined. With fewer than 15,000 medical oncologists in the US, manufacturers are having a particularly hard time differentiating their messages in this space.

Historically, the field sales representative has been the most effective promotional channel for educating HCPs. Today, the ability for field sales to even make it past the physician’s waiting room is anything but certain. In fact, according to AccessMonitorTM, a syndicated publication on physician access, fewer than half of HCPs are considered accessible. In highly competitive therapeutic areas such as rheumatology, cardiology and endocrinology, the number can be far lower. Large health systems and integrated delivery networks (IDNs) further restrict access through blanket “no see” policies. Considering the rapid acquisition of physician groups and system consolidation, access limitations are unlikely to be eased.

Potential Shortcomings

Traditionally, the life sciences industry has prioritized HCP targets by analyzing historical diagnosis or prescription data and deployed sales resources based on various ROI calculations. While ‘top-down’ HCP valuation approaches have held up well over time, cracks are beginning to emerge as the industry moves towards rare, niche and specialty markets. Current industry dogma teaches that reported specialty filtering must be implemented prior to even reviewing any data. As a result, highly valuable HCPs are disregarded and in-need patient populations are missed. In response, our team tested fundamental questions in this area by tapping into the power of longitudinal real-world data (de-identified patient claims) combined with leading edge machine learning and artificial intelligence capabilities. The team at tested three fundamental questions by tapping into the power of longitudinal real world data (de-identified patient claims) combined with cutting-edge machine learning techniques: 1) What is the archetype profile of each specialty?; 2) To what extent do HCPs “look like” their stated specialty or other specialties?; and 3) Can we infer with accuracy an HCP’s specialty, even in the cases of primary care or      ‘other’ specialists, based on actual behavioral observations rather than self-reported data?

Our findings uncovered latent insights that challenge both the traditional approach to HCP prioritization and call planning, as well as the conventional understanding of an HCP’s reported specialty.

Why Challenge the Conventional Approach?
We began by questioning the prevailing ‘truth’ in sales operations: pre-determined specialists will provide all relevant care for patients whose diseases fall into their specialty. Instead, we asked a simple question: what does the provider actually do? Could it be the case that specialty records are outdated or, worse, flat-out wrong? In some cases, the answer was yes.

The American Medical Association (AMA) has historically been the gold standard source for HCP specialty designation information. However, it may come as a surprise that an HCP’s specialty is self-reported, and that there is no requirement for the AMA to verify submissions with the American Board of Medical Specialties (ABMS). What’s more, HCPs are not required to update their specialty on an ongoing basis. Both in theory and in practice, an HCP’s specialty designation could be stated during a residency and never updated thereafter. As such, reported specialty data can become woefully inaccurate as time goes on.

If we acknowledge that self-reported specialties have a non-zero risk of error, the question then becomes the extent to which they deviate from reality. Our team leveraged machine learning to detect the level of homogeneity that existed in each specialty. Lookalike models are exceptionally adept at sifting through vast quantities of data, and identifying patterns within those data. In this case, our approach was clear – leverage machine learning to thoroughly analyze 20 distinct HCP specialties and develop a comprehensive profile of each one based on observed activities. With over 10 years of clinical history, the model constructed a robust understanding of each specialty type. Three types of attributes – known as features – drove each specialty’s profile:

  1. Patient Population – what are the diseases each specialty commonly treats, and in what volumes?
  2. Procedures – based on current procedural terminology (CPT) codes, what are the medical, surgical and diagnostic procedures performed by each specialty, and in what volumes?
  3. Prescriptions – which drugs, reimbursed through the pharmacy and medical benefits, were prescribed by each specialty, and in what volumes?

A clear, comprehensive understanding of the archetype specialist emerged across each of the 20 specialties in consideration, supported by real world data on millions of patient-HCP interactions. We tested the model by suppressing specialty information and used our algorithm to predict specialty instead, noting how often it was correct on the first, second or third attempt.

On average, we correctly classified an HCP’s specialty on the first try about 80% of the time. When we relaxed the threshold to accurately predicting specialty within the first three attempts, we reached 90% accuracy. This led us to define a metric we will call resemblance score: the degree to which an HCP resembles a given specialty. Upon further examination, there were clear, prominent features: nephrologists stood out based on treating a clearly identifiable patient population and performing dialysis treatment while OB/GYNs had an extremely high volume of unique procedures in female reproductive care.


However, not all specialties were so easily disentangled. Oncology subspecialties, understandably, demonstrated sufficient similarity to one another to noticeably decrease precise decision confidence. Hepatologists, who study the liver, gallbladder, and pancreas, were also particularly challenging to classify based on the heterogeneity of their day-to-day activities. Nonetheless, the exercise revealed some expected and some unexpected homogeneity and heterogeneity within and across specialties.

A common diagnostic tool in statistical classification is the confusion matrix which plots predicted versus actual outputs from a machine learning model. We employed this tool to uncover new insights. Rather than a perfunctory “Wrong!,” when the model suggested an inaccurate specialty, we used the data to explain the relationship between specialties.

Perhaps unsurprisingly, hepatologists act like their legacy super-specialty, gastroenterologists. Turning to cosmetic treatments, plastic surgeons shared more in common with dermatologists than their orthopedic surgery counterparts. We all understand this intuitively, but it’s worth restating: specialists’ behaviors are not mutually exclusive. A considerable amount of variation between HCPs in a given specialty can originate from a host of factors: patient population, practice location, continued education and more. Specialists will generally resemble each other more often than not, but we should increasingly consider that there will be others who behave similarly.

After analyzing similarities and differences across specialties and developing a robust understanding of our 20 specialist archetypes, we turned the machine learning algorithm loose on two cohorts of physicians who we had not previously profiled: PCPs13 and “Unspecified Specialists,” which we grouped together and named general medicine, or “GenMed.” The model took these imprecise, self-reported designations, analyzed empirical behavior, and classified individual GenMed HCPs into more precise specialty classifications.

Each of the original 20 specialty groups were scored within their own specialty cohort to assess the extent to which each specialist ‘looked like’ their stated specialty. This scoring then served as a yardstick for comparing single GenMed providers within a specialty group. For example, a GenMed HCP who scored higher in resemblance to the archetype endocrinologist than did the average stated endocrinologist would thus be labeled as a “likely endocrinologist.” At this score criterion (50th percentile or better), the machine identified over 100,000 GenMed HCPs as likely specialists. We then tightened the constraints: how many GenMed HCPs would resemble a specialist in the 75th percentile within their stated specialty and thus be “highly likely” specialists? Said another way, could a family medicine doctor look more ‘neuro-like’ than the top quartile of neuro-like neurologists? Again, the answer was a definitive “yes” to the tune of over 30,000 GenMed HCPs.


We wanted to understand why these GenMeds were scoring so highly for these very specific specialties, so we isolated the top scoring GenMeds for each specialty in order to take a closer look. Indeed, these GenMed HCPs conducted procedures that would ostensibly resemble a specialist. In the case of one hematology-oncology lookalike GenMed HCP, he collected biopsies, infused IV chemotherapies and tested blood protein levels – clearly reinforcing the findings. With little hesitation, we can infer his actual specialty to be HemOnc, despite the fact that he is reported to be a PCP.

Next we looked at GenMeds at the state level. We wanted to see if there was compelling evidence of likely (inferred) specialists in states with the greatest HCP shortages and, as it turns out, there is direct evidence that this correlation exists. In Nevada, for instance, we uncovered three times more physicians that were highly likely hepatologists than there were self-identified hepatologists. In oncology, we see an especially high volume of GenMed HCPs acting like surgical oncologists in Delaware, Idaho, Indiana, Kansas, Minnesota, New Hampshire, Oklahoma and South Dakota (Exhibit C). Across the nation, we see a 10% increase in the number of addressable specialists using our highly likely cut off (75th percentile), and a staggering 48% increase using our likely cut off (50th percentile).


At, it is not an uncommon occurrence to spotlight highly valuable HCPs that come from unconventional specialties. Our industry-leading patient prediction capability focuses on finding undiagnosed patients that are likely to meet a nuanced clinical definition – typically in rare disease and oncology – and the HCPs who are treating them. In a recent case, a manufacturer’s list of perceived highest value HCPs was limited to a few select specialties, but was able to identify that over 80% of patient volume was being lost as a result of these specialty filters. While sales reps in a competitive market battled it out for the same set of hematologists, many of the highest volume treaters weren’t even on their radar.

As manufacturers look to the future, new analytical models will force sales operations teams to reconcile the traditional approach to physician prioritization and the ultimate improvement of patient outcomes. New, integrated data models and a maturation of machine learning technology and artificial intelligence finally fulfills the analytical promise of precision medicine. However, applying blanket specialty filters (which can be outdated or inaccurate as illustrated above) will severely blunt the added value and patient benefit that would otherwise come from innovative therapeutics.

We do not presume that these shifts will be effortless, or even pain-free. As pharmaceutical manufacturers know, the change management associated with acquiring new data or implementing an analytical strategy can be the most painful part of the process. However, the story that the data tells cannot be ignored; there is too much to lose when important patients and HCPs continue to be unseen. Trust the data to expand your field of vision and you’ll be rewarded by breaking away from the crowded waiting room.

About is an Insights as a Service (IaaS) company that empowers the world’s leading life sciences brands to better understand and improve the lives of patients through the research, development and commercialization of new therapies and modalities of care for specialty and rare disease.’s system of insight streamlines patient discovery, treatment journey mapping, referral network intelligence, key opinion leader identification, market segmentation and adherence modeling by utilizing granular-level longitudinal analytics, artificial intelligence and machine learning in conjunction with a real world data, social determinants of health and outcomes research pool of over 300 million de-identified patients.

About the Author

John Seaner

Chief Marketing Officer

John leads a team charged with amplifying our brand, communicating the organization’s business value, and earning the loyalty of our clients. He has a 30 year track record of driving innovation, operational improvement, profitable revenue growth and sustainable competitive advantage for technology companies ranging from early-stage startups to global billion-dollar public organizations. John previously served as Chief Marketing Officer of 1010data, provider of analytical intelligence, consumer insights and data sharing solutions; Chief Marketing Officer of Signals Analytics, an early pioneer in the emerging Decision Science as a Service space; and Vice President of Global Growth Marketing at Medidata Solutions, the world’s leading clinical development lifecycle platform utilized by over 1,500 life sciences companies. Throughout his career his teams have earned notable awards and accolades, most recently for pioneering new approaches in account-based marketing, including the use of digital ethnography to uncover the customer decision journey, the utilization of data science to optimize demand activation and the application of signals intelligence to enhance customer empathy. 



Using ML/AI to Find the Right Patients at the Right Time: A Conversation with AVEO Oncology

By: John Seaner, CMO,

Ganesh Rajaratnam, Senior Director of Commercial Operations, Insights & Analytics at AVEO Oncology spoke with...

Read More