What will it take to get the value out of evaluation?  That question was in the thought bubble above my head for most of a two-day meeting last week organized by the Institute of Medicine and hosted by the Wellcome Trust in London.  The meeting, with the written-by-committee title of “Evaluation Methods for Large-Scale, Complex, Multi-National Global Health Initiatives,” had a pretty straightforward aim:  to look at several recent evaluations and figure out how to better measure the difference big donor-funded programs are making in people’s lives.

The evaluations we heard about covered programs that account for billions of dollars of donor spending, including the Global Fund to Fight AIDS, TB and Malaria, the President’s Emergency Plan for AIDS Relief, the President’s Malaria Initiative and the Affordable Medicine Facility for Malaria. These programs represent some of the largest and most ambitious global health initiatives; they also are among the most successful, inspirational and innovative efforts ever launched by international donors.  

Big-deal programs deserve big-deal evaluations, and these programs got them.  All the evaluations combined interview data from thousands of sources, and crunched endless amounts of monitoring and budget information.  They tried to filter out bias, while at the same time recognizing that those who know most about the performance of programs are often those doing the work (and those living off funding from donors).  The evaluators – some of the best in the business – tried heroically to distinguish health improvements that could be legitimately credited to the program from those that might have happened anyway.  For each of these evaluations, the political stakes were high and the methodological challenges enormous.

Assuming you don’t want to watch two days of webcast content (to be posted here), here are my quick and partial take-aways.  I hope others who were there use the comment feature to offer their own observations.  

  • These evaluation experiences provide hints about ways to create greater political space for serious evaluation; improve the relevance and technical quality of the evaluations; and intensify the use of evaluation findings.
  • Having the political space to be able to conduct a good evaluation of a high-stakes program requires a sort of “open to learning” stance on the part of program leadership and/or a mandate from on high.  It also requires that the evaluation is separated enough from the program to be genuinely unbiased, and welcomed by the advocacy community, even if the news is not always positive.
  • Better technical quality – and this will not be news – requires being able to articulate a theory of change and identify the fundamental assumptions that the evaluation should interrogate.  Given the zillions of possible “interesting” questions one might ask, it also requires take-no-prisoners priority-setting about what the most important evaluation questions are.  This priority-setting needs to take into consideration whether the questions they can be answered in a way that is sufficiently persuasive to change minds of those who are in a position to take decisions.
  • Better technical quality in these sorts of programs (and many others) also requires thinking about the evaluation from the outset – something not done in any of these initiatives, remarkably – and embedding impact evaluation during implementation. And let’s not forget the value of putting both the methods and the data itself out for public comment and reanalysis.  That helps keep everyone’s game up. 
  • Evaluation findings are used most effectively when there has been ongoing, meaningful engagement of both implementers and partner countries, and when those stakeholders have a commitment to learning and adaptation. Use of evaluation findings also benefits from a system for regular public follow-up of the recommendations.  Also important and often neglected:  adequate planning, skills and budget for fit-for-purpose dissemination.  Incredibly, in several of these evaluations the dissemination budget was – wait for it – $0.00.   
  • These experiences, informative as they are, don’t help prepare us fully to do things well in the future.  Most if not all of these evaluations were very much in the old-school “donor-recipient” model, which simply will not fly in the future (thank goodness).  As Ian Goldman, head of evaluation and research in the South African government and Board member of the International Initiative for Impact Evaluation, gently but firmly put it in his remarks (paraphrasing here), “You are operating under an outdated partnership model. You have to change.” These evaluations also did not find ways to take advantage of any non-traditional data sources, like on-the-spot client feedback.  Surely this is part of the emerging set of opportunities for evaluators. 

This meeting alone did not change the world, or ensure that we will get more value out of future evaluations.  But it sure made me think about what we at the Hewlett Foundation, long-standing supporters of some of the most pioneering evaluation work in international development, can do to advance the state of this imperfect art.