7 Metrics WHO Program Managers Use to Evaluate Mobile Health Screening
How WHO program managers evaluate mobile health screening programs using seven metrics from coverage and cost-effectiveness to data quality and clinical outcomes.

When a WHO country office decides whether to scale a mobile health screening program from pilot to national rollout, the decision comes down to numbers. Not the kind you find in a pitch deck — the kind that show up in monitoring reports six months into implementation, when the initial excitement has worn off and the program either works or it doesn't. WHO mobile health screening evaluation metrics have become increasingly standardized over the past few years, but knowing which ones actually drive funding and scale-up decisions is still something of an insider's game.
"Robust monitoring and evaluation plans were specifically identified as essential for the successful scaling of mHealth interventions." — WHO MAPS Toolkit: mHealth Assessment and Planning for Scale, developed by Jessica Rothstein and Tigest Tamrat with guidance from Garrett Mehl at WHO and Alain Labrique at Johns Hopkins University Global mHealth Initiative.
How WHO evaluates mobile health screening programs
The WHO published its MAPS Toolkit (mHealth Assessment and Planning for Scale) to give program managers a structured way to assess whether digital health interventions are ready for broader deployment. Separately, the WHO/ITU Global Digital Health Monitor tracks 23 standard indicators across domains including governance, workforce, standards, and infrastructure. The 2024 State of Digital Health Brief, published in early 2025, noted that these indicators would transition to the WHO Data Hub for global monitoring going forward.
But at the program level — where a country office is evaluating whether a specific mobile screening tool should get another year of funding — the metrics that matter are more granular. Based on WHO frameworks and published evaluations of mHealth screening programs across Sub-Saharan Africa, seven metrics consistently determine whether programs survive past their pilot phase.
The seven metrics
1. Population coverage rate
This is the most basic question: how many people did the program actually reach? Coverage rate measures the proportion of the target population screened against the total eligible population in the catchment area. A hypertension screening program targeting adults over 40 in a district of 200,000 needs to show what percentage it actually reached.
The numbers vary wildly. A 2024 study published in BMJ Global Health examining community health worker programs for non-communicable diseases in low- and middle-income countries found that CHW-delivered screening programs typically achieved coverage rates between 30% and 65% of target populations, depending on geography, staffing density, and community engagement. Programs that fell below 25% coverage rarely received continued funding regardless of their per-patient outcomes.
For mobile health screening specifically, the reach question has an interesting wrinkle. Smartphone-based tools can theoretically screen more people per day than traditional methods because each measurement takes less time and requires less equipment setup. Whether that theoretical advantage translates to actual coverage gains depends on factors like CHW workload, travel distances, and whether the screening workflow actually fits into existing visit patterns.
2. Cost per screening completed
Program managers divide total program costs — including devices, connectivity, training, supervision, and CHW time — by the number of completed screenings. This sounds straightforward, but the accounting gets complicated fast.
A cost analysis published in PLOS ONE by researchers studying mHealth-facilitated tuberculosis contact investigation in Kampala, Uganda (2014-2017) broke down the real costs of implementation. They found that mHealth-facilitated screening by community health workers cost roughly 40% less per contact investigated compared to facility-based approaches, but only after the initial investment in devices and training was amortized over the first 18 months.
| Cost component | Traditional screening (per person) | mHealth screening (per person) |
|---|---|---|
| Equipment (amortized) | $3.50–$8.00 | $0.40–$1.20 |
| CHW time | $1.80–$3.00 | $1.20–$2.50 |
| Consumables | $0.50–$2.00 | $0.00–$0.10 |
| Data entry and reporting | $0.80–$1.50 | $0.00–$0.20 (automated) |
| Supervision and quality assurance | $0.60–$1.20 | $0.80–$1.50 |
| Connectivity and data | N/A | $0.15–$0.40 |
| Total range | $7.20–$15.70 | $2.55–$5.90 |
The supervision line is worth noting. It often goes up with mHealth programs, not down, because program managers need to verify that digital tools are being used correctly. That cost increase gets offset by savings everywhere else, but it catches program planners off guard.
3. Referral completion rate
Screening only matters if the people identified with abnormal readings actually reach a health facility. Referral completion rate — the percentage of referred individuals who complete a facility visit within a defined timeframe — is where many mobile screening programs fall apart.
Research published by the University of York in 2024 modeled the impact of leveraging existing HIV primary health systems for hypertension screening in Africa. They found that community health worker screening could improve population-level hypertension control from a mean of 4% to 44%, but only if referral systems functioned. Without adequate referral linkage, screening essentially generated data without generating health outcomes.
WHO program evaluations typically set a minimum referral completion threshold of 60% for programs to be considered effective. Some programs in East Africa have achieved rates above 80% by using the same mobile platform to send SMS reminders and track referral status, turning the screening tool into a follow-up tool as well.
4. Data completeness and quality
A screening program that captures vital signs but loses 30% of records to sync failures, entry errors, or missing metadata is not going to scale. Data completeness measures the proportion of screening encounters that result in a full, usable data record submitted to the health information system.
The WHO's Monitoring and Evaluating Digital Health Interventions guide identifies data quality as one of the most common failure points in mHealth programs. Specifically, it looks at:
- Completeness: are all required fields populated?
- Timeliness: does data reach the central system within the expected window?
- Consistency: do values fall within plausible clinical ranges?
- Uniqueness: can individual patients be identified and tracked across encounters?
Programs using paper-based data collection typically achieve completeness rates around 60-75%. Digital tools generally improve this to 85-95%, primarily because mandatory fields and validation rules prevent common entry errors. But the gap between "data captured on device" and "data successfully synced to the central system" can be substantial in areas with poor connectivity. Programs operating in rural Sub-Saharan Africa often report sync failure rates of 10-20% when CHWs work in areas without reliable mobile network coverage.
5. Sensitivity and specificity of the screening tool
Does the tool correctly identify people who have a health condition (sensitivity) and correctly exclude those who don't (specificity)? For mobile health screening tools, this metric determines clinical credibility.
WHO evaluation frameworks don't require that mobile screening tools match clinical-grade devices. They require that the tools perform adequately for their intended use — which is typically triage and referral rather than diagnosis. The threshold varies by condition: a hypertension screening tool that misses 30% of cases (70% sensitivity) is problematic; a general wellness screening that occasionally reads high on a healthy person is less concerning because the consequence is just an extra clinic visit.
Published validation studies for smartphone-based vital signs measurement show a range of accuracy depending on the specific vital sign, the device, the population, and the measurement conditions. What WHO evaluators look at is whether the tool's performance characteristics are documented, whether they were assessed in the target population (not just a university lab), and whether the error rates are acceptable given the clinical pathway.
6. Health worker adoption and sustained use
A tool that CHWs stop using after three months has zero long-term value. Adoption metrics track both initial uptake (what percentage of trained CHWs actually use the tool) and sustained use (what percentage are still using it at 6 and 12 months).
The Australian Digital Health Agency published a comprehensive mHealth assessment framework in 2024 that identified usability as the single strongest predictor of sustained adoption. The framework draws on the Mobile App Rating Scale (MARS), developed by Stoyanov et al. and now widely referenced in mHealth evaluation literature, which scores apps across engagement, functionality, aesthetics, and information quality.
In practice, WHO program evaluators look at three things:
- Monthly active users among trained CHWs (target: above 80% at 6 months)
- Average screenings per CHW per week (compared to program targets)
- Drop-off curve — how quickly usage declines after initial training
Programs that maintain high adoption rates tend to share a few characteristics: the screening workflow takes under two minutes, results are immediately visible to the CHW (not just sent to a server), and the tool replaces something the CHW was already doing rather than adding a new task to their workday.
7. Integration with national health information systems
The final metric is structural rather than operational: can the screening data flow into the country's existing health information system? Most African nations use DHIS2 (District Health Information Software 2) as their primary health data platform. A mobile screening tool that generates its own reports but can't feed data into DHIS2 creates a parallel system that health ministries don't want to maintain.
The WHO Global Digital Health Monitor's 2024 assessment found that interoperability remains one of the weakest domains across low- and middle-income countries. Countries that scored highest on digital health maturity — including Rwanda, Kenya, and Tanzania — had invested in health data exchange standards that allowed mobile tools to submit data through standardized APIs.
For mobile health screening programs, integration typically means mapping screening data fields to the national indicator set, ensuring patient identifiers match between systems, and establishing automated data submission pipelines. Programs that can demonstrate DHIS2 integration from the pilot stage have a significant advantage in scale-up discussions.
How these metrics interact
No single metric determines program success. WHO evaluation frameworks weight them in combination, and the relative importance shifts depending on the program's stage. During pilot phase, sensitivity/specificity and adoption get the most scrutiny. During scale-up discussions, coverage and cost dominate. Once a program is running nationally, data quality and system integration become the deciding factors for continued investment.
| Program stage | Primary metrics | Secondary metrics |
|---|---|---|
| Pilot (0-12 months) | Sensitivity/specificity, CHW adoption | Cost per screening, data quality |
| Scale-up decision (12-24 months) | Coverage rate, cost per screening | Referral completion, system integration |
| National operation (24+ months) | Data quality, system integration | Coverage rate, sustained adoption |
A program can have excellent clinical accuracy but fail on adoption because CHWs find the tool cumbersome. Or it can achieve high coverage at low cost but lose credibility because the screening data doesn't sync reliably. The programs that get scaled are the ones that perform adequately across all seven dimensions rather than excelling at two or three while ignoring the rest.
Where mobile screening technology fits
The shift from equipment-heavy screening to smartphone-based approaches changes the math on several of these metrics simultaneously. Cost per screening drops because the hardware investment shrinks. Data quality potentially improves because capture is automated. Coverage can expand because CHWs don't need to carry bags of equipment.
But other metrics get harder. Sensitivity and specificity need to be validated for smartphone-based measurements — a process that takes time and published research. System integration requires software development work that traditional screening equipment doesn't demand. And adoption depends on whether the technology actually works reliably in field conditions, not just in controlled demonstrations.
Companies like Circadify are working on the field-condition problem specifically, building smartphone-based vital signs screening that operates offline and on low-end devices. The approach of capturing heart rate, respiratory rate, and other vital signs through the phone's camera eliminates the equipment supply chain entirely — which, based on these metrics, addresses the cost and coverage dimensions while putting pressure on the accuracy validation dimension.
For WHO program managers evaluating these newer approaches, the framework doesn't change. The seven metrics still apply. What changes is which columns of the cost table go to zero and which rows of the validation table need new data. The programs that will scale in the next few years are the ones that can fill in all seven boxes with credible numbers, regardless of what technology sits underneath.
Frequently asked questions
What is the WHO MAPS Toolkit?
The MAPS Toolkit (mHealth Assessment and Planning for Scale) is a WHO framework developed to help program managers assess whether mobile health interventions are ready for scale-up. It was created by Jessica Rothstein and Tigest Tamrat with guidance from Garrett Mehl at WHO and Alain Labrique at Johns Hopkins. The toolkit identifies monitoring and evaluation as one of the critical requirements for successful mHealth scaling and provides structured assessment criteria across multiple domains.
How does WHO measure mHealth program success differently from donors?
WHO evaluation frameworks tend to emphasize population-level impact and health system integration, while bilateral donors (USAID, DFID, GIZ) often focus more heavily on cost-effectiveness ratios and beneficiary reach numbers. In practice, the metrics overlap considerably, but the weighting differs. A program that demonstrates DHIS2 integration and government ownership will score well with WHO even if its cost per screening is slightly higher than alternatives that operate as standalone systems.
What coverage rate do mHealth screening programs need to achieve?
There's no universal threshold, but WHO evaluations consistently flag programs below 25% target population coverage as underperforming. Programs achieving 50% or higher coverage of their defined target population are generally considered successful. The specific target depends on the condition being screened, the population density, and the available CHW workforce. Urban programs typically achieve higher coverage rates than rural ones due to shorter travel distances between households.
Why is referral completion rate considered more important than screening volume?
Because screening without follow-up generates cost without generating health outcomes. A program that screens 100,000 people but only connects 15% of those with abnormal findings to care has limited health impact. WHO evaluations increasingly look at the full cascade from screening through referral to treatment initiation, recognizing that the screening step alone — however efficiently delivered — is only valuable if it leads to clinical action.
