Future of PFS: What happens if PFS evaluations don’t include counterfactuals?

November 2, 2017 - 12:34pm

How important is it to conduct an evaluation that includes a strong counterfactual? The rudimentary medical practice of bloodletting helps answer that question. 

Bloodletting was a technique commonly used by doctors prior to the 20th century to treat patients suffering from a variety of diseases and disorders. Leeches were applied to the patient to remove “impure" blood. Afterwards, any patient improvement was taken as proof that bloodletting was a best practice for healing whether used to treat heart disease or hysteria. If the patient got better, it was assumed that the leech caused the patient’s health to improve; if the patient’s condition worsened, it was assumed that the leech should have been applied earlier.

As you might have guessed, this informal pre-post test did not provide accurate information about bloodletting’s value. In fact, we now know definitively that the practice was harmful to patients, making already sick patients susceptible to infection or even cardiac arrest. That makes us wonder—what if doctors had used a strong counterfactual to assess the effectiveness of bloodletting centuries ago when the practice began? Would they have continued this practice?

This example, which J-PAL’s Quentin Palfrey shared at the National Symposium on the Future of Pay for Success, highlights the need for rigorous evaluations that determine the causal effects of interventions. Strong counterfactuals, which result from rigorous evaluation design, answer the question: Without these services, what would outcomes have looked like for the target population? A strong counterfactual allows PFS stakeholders to causally link participant outcomes to the intervention (or the lack thereof).

Bloodletting is just one example of the issues that can arise from a lack of consideration of causality in evaluation. More modern examples of the pitfalls of a lack of comparison group come from the criminal justice field in the 1980s and 1990s, as noted by panelist Akiva Liberman. For instance, the declining national crime rate in the 1990s made it seem as if most interventions were effective during that decade, but outside forces may have confounded the results of the intervention. As a result, determining the actual effectiveness of the interventions implemented was difficult.

Identifying positive impacts, particularly in pay for success (PFS), is often not enough to satisfy project stakeholders if they can't know what caused those impacts in the first place—or if those impacts would have happened regardless of the intervention. Using a randomized controlled trial (RCT) evaluation can boost stakeholders’ confidence that the intervention caused the change. Quasi-experimental designs such as propensity score matching and regression discontinuity also offer comparison groups that can help communities speak to the impacts of their intervention(s).

While many factors contribute to determining an evaluation design—including what the funder is trying to achieve with the project, contextual issues, the project timeline, the evaluation budget, and the political limelight—counterfactuals allow the effects of treatments to be definitively linked to outcomes, increasing the level of rigor and confidence in the results. 

