The Depressing Reality of Depression Screening
The Impact of Depression Screening Quality Measure on Patient & Doctors
In my previous article, “The Quality of Quality Measurement,” I wrote about the Cynefin framework1 of decision-making and how it applies to quality measures:
Quality metrics either assume that medical decisions are in the Cynefin framework's “clear” domain—or attempt to exclude medical nuances to recategorize them from the “complicated” and “complex” domains into the “clear” domain.
And how Charles Perrow’s theory of “Normal Accidents”2 applies to quality measures:
When we attempt to simplify complex medical decisions into clear, measurable targets, we inadvertently create more rigid processes while adding new layers of complexity
…
Rather than improving quality, this approach generated new risks, compromised clinical judgment, and harmed people.
The Depression Screening and Follow-Up Plan Quality Measure is a great example to demonstrate the two fundamental flaws in healthcare measurement when applying quality metrics:
Misclassification: Quality metrics incorrectly categorize complex medical decisions as "clear" under the Cynefin framework, ignoring the nuanced nature of mental health assessment.
Oversimplification: In attempting to make depression screening measurable, these metrics strip away critical clinical context that physicians need for proper diagnosis and treatment.
This oversimplification creates "Normal Accidents," where attempts to reduce complexity paradoxically increase system risks—in this case, patient harm and increased burden on primary care.
Before discussing the problems with this quality measure, let's understand the measure itself.
Anatomy of Screening for Depression and Follow-Up Plan Quality Measure
Quality Measure Description
Percentage of patients aged 12 years and older screened for depression on the date of the encounter or up to 14 days prior to the date of the encounter using an age-appropriate standardized depression screening tool AND, if positive, a follow-up plan is documented on the date of the qualifying encounter.
Denominator:
Age ≥ 12
Valid Patient Encounter based on CPT-I Codes within 12 months
(see PDF at the end of the article for a list of CPT codes)
Numerator:
Depression screening should be done on the day of the encounter or up to 14 days before the encounter
Use standardized screening tools for depression screening, e.g., PHQ-2/PHQ-93
If depression screening is positive, the doctor must document a “follow-up” plan. Examples of follow-plan include:
Referral to another provider, such as a psychiatrist or psychologist
Medications
Any other tools to manage depression
Denominator Exclusions:
Any patients with Bipolar disorder are not included in the denominator.
Denominator Exception
This removes people from the denominator if there is an appropriate justification by the physician:
The patient refused to participate in depression screening
Appropriate medical reason for not screening. E.g.,
Dementia
Urgent medical condition
To recap the difference between “exclusion” and exception from my previous article, “Solving Obesity with a Quality Metric:”
The difference between “exclusion and “exception” is that people in the “exclusion” category will never be in the denominator in the first place. People in the “exception” category will be counted in the denominator unless there is documentation on why they should be excluded.
Reporting Period
While the reporting period in the official specs is not defined, a valid patient encounter is generally between January 1 and December 31.
Data Sources
Billing codes
Medical chart abstraction
Now that we know the details of this quality measure, let’s analyze its rationale, and if it is achieving its intended outcome.
Rationale for Quality Measure
If I had to sum up the rationale for creating a quality measure for depression screening, it would be the following 2 points:
Depression is very common, and the diagnosis is often missed or delayed.
We need to ensure that doctors are screening and treating depression.
This quality measure was also developed to align with the USPSTF Guidelines for “Depression and Suicide Risk in Adults: Screening.”4
Problems with Quality Measure
Over-diagnosis and Over-treatment
The most common depression screening tool used in primary care is PHQ-2/PHQ-9. When a cutoff of PHQ-9 value ≥ 10 is used to diagnose major depression, PHQ-9 tool has a sensitivity and specificity of 88%.5 Assuming a prevalence rate of 10% in primary care, this leads to a 77% false positive rate.6 Furthermore, per the quality measure specifications, the cut-off used for depression screening to document a follow-up plan is 5. Therefore, the false positive rate in real life is probably far higher.
The requirement to treat any PHQ score ≥ 5 leads to over-diagnosis and over-treatment. This cut-off score affects people experiencing normal life events like family deaths, divorces, or job losses. Due to a lack of insurance coverage for mental health therapy or the availability of mental health therapists, most patients who screen positive for depression generally end up taking antidepressant medications—often for life.
Key Statistics on PHQ-9 Screening:
False Positive Rate: 77% in primary care settings
Sensitivity: 88% at PHQ-9 ≥10 cutoff
Clinical Impact: Over-diagnosis risk further increases with cutoff PHQ-9 ≥5
Therefore, the depression screening measure attempts to fit the nuanced medical decision of diagnosing “medical depression” from the “complex” and “complicated” domains of the Cynefin framework into the “clear” domain. In primary care, this results in the quality measure being overly sensitive, medicalizes normal variations in mood, and forces medical treatment upon people for everyday emotional fluctuations.
Cultural and Language Considerations
While PHQ-9 has been translated into multiple languages, these translations have not been validated. These “literal translations” ignore cultural nuances in how mental health is perceived and expressed.7
For example, the meaning of questions like “feeling down” or “little interest in doing things” can vary significantly across cultures. This cultural mismatch can lead to both false positives (over-diagnosing normal cultural expressions of distress) and false negatives (missing depression in patients who express it differently).
While PHQ-9 has been translated into multiple languages, these translations have not been validated.
And then, there is the problem of implementing workflows to capture depression screening in multiple languages. Most EHRs only offer English and maybe Spanish versions. This creates several workflow challenges:
Medical practices need to find or store a repository of accurately translated PHQ-9 that can printed in the office.
Medical Assistants must remember to download the correct language version before the patient arrives.
Medical assistants need to coordinate with interpreters for patients who have trouble completing depression screening forms.
These paper responses must be scanned or manually entered in the English-only EHR fields (assuming that all the questions are in the correct order).
Quick PCP Rant
The process of coordinating with interpreters is particularly frustrating. Here's a common scenario:
We schedule a telephone interpreter for the medical assistant to help the patient complete PHQ-9—but then what? Do we keep the interpreter call on hold for 10-15 minutes until I can see the patient (each minute on hold costs money)? Do we disconnect and call back, potentially waiting another 10-15 minutes for an interpreter to become available? Meanwhile, my schedule backs up, and other patients are left waiting.
End Rant
All these extra steps create time-dependent rigid processes that increase the complexity and cost of primary care offices. According to Charles Perrow's theory of Normal Accidents, this approach creates two problems: a checkbox mentality to quality measures and the potential for over-diagnosing and over-treating depression.
Potential for Harm
As mentioned above, given the lack of insurance coverage for mental health or difficulty in finding mental health professionals, most people diagnosed with depression will end up on antidepressants.
Antidepressants have several side effects, including impacts on the stomach, heart, and brain.8 As a PCP, I've observed that antidepressants can significantly improve a patient’s quality of life but also cause serious, harmful side effects.
Balancing benefits and risks is crucial in medicine. However, quality measures incentivize a “cookbook” approach, penalizing doctors for not adhering to standardized protocols even if it may be inappropriate or harmful to individual patients.
Insurance Coverage and Cost of Healthcare
According to the Affordable Care Act, preventive care services are provided at no cost to the patient. However, if a person screens positive for depression during an Annual Physical, the doctor needs to diagnose and treat depression to comply with the depression screening quality measure. This becomes a billable medical visit (in addition to the Annual Physical) and can result in substantial out-of-pocket costs for patients, particularly those with high-deductible health plans.
Primary Care Physicians (PCPs) face a dilemma with CPT code 96127, which is used for depression screening. When PCPs bill this code for patients with High-Deductible Health Plans, the cost is passed directly to the patients. This creates a catch-22: PCPs must use this code to receive credit for performing the screening, but doing so often leads to patient complaints about unexpected charges.
Initiating treatment for a false positive depression screening adds a burden to both the patient and the health system:
Increase patient costs for medications, mental health therapy, or other treatments.
This treatment cost is higher for people with high-deductible health plans or without coverage for mental health.
Increased healthcare costs to the system for unnecessary treatments and managing the harms from unnecessary treatment.
Resource allocation challenges due to the use of scarce mental health resources by people scoring false positives on depression screening.
Data Collection and Reporting Burden
Like the “BMI Screening and Follow-Up Plan” quality measure, the “follow-up” plan imposes significant challenges in collecting and reporting data. Since the “follow-up” plan is “free-text” documentation in the physician’s note, it has to be manually abstracted and submitted to the insurance company. If the data is not submitted, the PCP is blamed for not appropriately diagnosing and treating people for depression.
Now that we understand the problems associated with the Depression Screening Quality Measure, let's discuss some tools and techniques that PCPs and ACOs can use to improve performance.
Making Matters Bad to Worse
Not only are PCPs required to screen and treat false positive depression screening, but we are also subject to another quality measure: “Depression Remission at Twelve Months.”
This second quality metric measures how many people who scored positive on depression were adequately treated and that their PHQ-9 score returned to normal, i.e., PHQ-9 was less than 5 within 12 months of being diagnosed with depression.
So essentially, even if the PCP did start a person on treatment, but the patient had a bad day at work on a return visit with a PHQ-9 score of 6 (i.e., a false positive), the PCP failed to meet this second quality measure.
In one sense, PCPs are responsible for people’s daily mood fluctuations that may happen in their personal lives and are completely beyond their control.
How to Improve Performance on Depression Screening Quality Measure?
There are two components to compliance:
Perform depression screening using a standardized scale such as PHQ-2/PHQ-9
Document the plan if depression screening is abnormal
Performing PHQ-2/9 Depression Screening
A typical workflow would involve creating “encounter plans” for specific visit types, e.g., Annual Physicals, Medicare Annual Wellness Visits, and New Patient visits. The EHR (if capable) can send a link via email or text message to patients, who can then complete PHQ-2/9 as part of the check-in.
The “encounter plan” can also utilize “dotphrases” to automatically insert verbiage in the “Assessment & Plan” section documenting that PHQ-2/PHQ-9 Screening was performed.
CPT-I and ICD-10 Codes
There is no clear guidance on which CPT-1 and ICD-10 codes help doctors meet the depression screening quality measure. The official CMS specifications do not specify any codes, while HEDIS specifications are proprietary and must be purchased from NCQA.9 As far as I know, the following codes are the most relevant:
CPT-I Codes
96127: Brief behavioral or emotional assessment, with scoring and interpretation, per standardized instrument (e.g., PHQ-9, GAD-7), face-to-face; with medical decision-making
Used with commercial insurance companies
G0444: Annual depression screening, 15 minutes
Used with Medicare and Medicare Advantage insurance companies
ICD-10 Codes
Z13.31: Encounter for screening for depression
Z13.89: Encounter for screening for other specified disorders
Suggested Workflow:
EHR should send messages to patients before appointments (e.g., Annual Physical) so they can complete PHQ-9.
Configure EHR to automatically drop (or queue for provider/biller to review) appropriate CPT-1 and ICD-10 codes for the encounter in which PHQ-9 was completed.
This workflow informs the insurance company that a depression screening was done. Some commercial health plans will track only depression screening without requiring a follow-up plan. However, for Medicare and Medicare Advantage plans, doctors still need to report whether the screening was negative or positive, and if positive, a follow-up plan was documented.
HCPCS Codes
Similar to HCPCS Codes used in BMI Screening quality measure that I discussed in the last article, CMS defines several HCPCS codes for depression screening measure. These codes indicate to CMS if the patient should be excluded from the denominator and, if included if the doctor met the metric.
I have included the list of HCPCS codes with their description in the PDF at the end of the article.
Suggested workflow during an Annual Physical/Medicare Annual Wellness Visit:
The EHR system could be programmed to automatically drop the appropriate HCPCS codes with the ICD-10 code for depression screening. However, the challenge here is in capturing data on “follow-up” plan documentation. There are 2 ways to automate this:
Using text macros linked to CPT/HCPCS and ICD-10 codes.10
Use AI to analyze the doctor's office visit note after it is signed to queue the appropriate HCPCS code.
Supplemental Data Submission
Medical practices can submit PHQ-2/9 completion and scores via supplemental data to health plans using either EHR, or data warehouse reporting functionality. The downside of this approach is that there is often no way to capture the “follow-up plan.”
However, this is still useful because a follow-up plan is not required for normal PHQ-2/9, and the PCP office will get credit for performing depression screening.
For the follow-up plan part, doctors' notes containing the depression screening follow-up plan need to be manually abstracted and submitted to the insurance company.
Role of Artificial Intelligence
A potential workflow could be:
A list of depression screening “gaps in care” is loaded into the EHR or data warehouse.
AI finds all patients that have had PHQ-9 and, if positive, a follow-up plan documented in the medical record.
This data is “fed back” into the spreadsheet to submit to insurance companies.
Additionally, for proof, AI collates all the medical records containing the positive PHQ-9 and “follow-up” documentation in a format requested by the insurance company for submission.
Electronic Clinical Quality Measure (eCQM)
CMS has an eCQM version of the “Depression Screening and Follow-Up Plan” plan quality measure. Please see the CMS website for more details.
The general philosophy seems to be:
Diagnosis: e.g., depression NOS, Major depression
Order: e.g., medication, psychiatry or psychologist referral. However, if a person does not want any treatment, you must follow EHR-specific workflow to “check” that treatment was discussed.
Summary & Downloads
Depression screening does have a role in primary care, but only when it is used with the appropriate patient population. Making depression screening a quality measure and forcing it on all patients seen in primary care causes more harm than good—both for patients and PCPs.
I hope you found the suggested strategies helpful in designing workflows and/or products to help PCP practices meet this useless quality measure.
Up Next
In the next article, we will return to the fundamentals and examine how “disease” is defined, how that has changed in the last several decades, and how these changes impact healthcare costs.
Snowden, D. J., & Boone, M. E. (2007). A leader’s framework for decision making. A leader’s framework for decision making. Harvard Business Review, 85(11), 68–76, 149.
Perrow, C. (1999). Normal Accidents: Living with High Risk Technologies - Updated Edition (REV-Revised). Princeton University Press. https://doi.org/10.2307/j.ctt7srgf
The official quality measure specifications define the various depression screening tools that can be used.
I am not a journalist, but someone should investigate if the authors have received funding from pharmaceutical companies that make depression medications and have appropriately disclosed all conflicts of interest!
Patient Health Questionnaire (PHQ-9 & PHQ-2). (n.d.). Https://Www.Apa.Org. Retrieved December 10, 2024, from https://www.apa.org/pi/about/publications/caregivers/practice-settings/assessment/tools/patient-health
Thombs, B. D., Markham, S., Rice, D. B., & Ziegelstein, R. C. (2021). Does depression screening in primary care improve mental health outcomes? BMJ (Clinical Research Ed.), 374, n1661. https://doi.org/10.1136/bmj.n1661
Kleinman, A. (2004). Culture and Depression. New England Journal of Medicine, 351(10), 951–953. https://doi.org/10.1056/NEJMp048078
I reviewed the Hindi version of PHQ-9 from a website, www.phqscreeners.com. While it is a technically accurate translation, it uses phrases and words not used in the vernacular, limiting its application to the general population.
My wife (an APRN fluent in Chilean Spanish who lived in Chile for several years ) reviewed the Chilean Spanish version of PHQ-9 and thinks it is very good and culturally appropriate.
Side effects—Antidepressants. (2021, February 5). NHS. Uk. https://www.nhs.uk/mental-health/talking-therapies-medicine-treatments/medicines-and-psychiatry/antidepressants/side-effects/
Even if I purchased NCQA quality measures to write this article, I may violate their copyright. Doctors are kept in the dark about how to meet the metric!
I worked with Epic EHR several years ago to build this functionality for our ACO. We created a dot phrase called “.qmupdated,” which included several quality metrics. When the appropriate documentation was selected, Epic queued the associated CPT-II or HCPCS code for billing.