From impact evaluation to decision-analysis: assessing the extent and quality of evidence on ‘value for money’ in health impact evaluations in low- and middle-income countries [version 1; peer review: 1 approved, 1 approved with reservations]

Background: Health impact evaluations (HIEs) are currently the main way of assessing policy changes in low-and middle-income countries (LMICs). However, evidence on effectiveness alone cannot reliably inform decisions over the allocation of limited resources. Health economic evaluation provides a suitable framework for ‘value for money’ assessments. Methods: In this article we explore to what extent economic evaluations have been conducted alongside published health impact evaluations, then we assess the quality of these, using criteria from an economic evaluation reference case developed for use in LMICs. Results: Among the 2419 HIEs stored in the International Initiative for Impact Evaluations (3ie) database, and among the 8155 studies identified by the Ovid Medline database search, only 70 studies included an economic evaluation. When measured against the quality assessment criteria, study quality showed great variation. Many studies did not fulfil the basic requirements for economic evaluation, such as stating the perspective of the budget holder, using generic health measures that can be compared across diseases, or suitably reflecting uncertainty. Conclusions: Greater effort should be directed towards bringing the fields of impact evaluation and economic evaluation together to better inform resource allocation decisions in global health. Open Peer Review


Introduction
The demand for better evidence on "what works" in health policy in low-and middle-income countries (LMICs) has soared in recent years (Shi et al., 2015). The growing demand has at least partly been met by a considerable growth in impact evaluations (Cameron et al., 2016;Gertler et al., 2016), studies that seek to causally attribute changes in an outcome to a given intervention, using either an experimental or quasi-experimental design (3ie Impact Evaluation Glossary, 2012). Not only has the quantity of such studies increased, but quality has also improved overall, as impact evaluations have adopted increasingly rigorous evaluation methodologies (McEwan, 2012), providing policy makers with relevant, credible evidence on intervention effectiveness.
However, effectiveness evidence alone is not enough to determine whether to invest in a programme, in the (ubiquitous) situation of limited resources. Decision-makers also need to assess the costs and opportunity costs associated with different policy choices. Although there is widespread agreement that health economic evaluation methods can suitably inform resource allocation decisions, there is evidence that these tools are under-utilised (Evans, 2016) and often imperfectly applied (McEwan, 2012) in the impact evaluation field. This is in contrast to the increasingly routine use of economic evaluation, often applying novel and sophisticated methods, to inform decisions on the adoption and reimbursement of drugs, devices and other health care interventions, as e.g. in health technology assessment (HTA) (Drummond et al., 2015).
In this paper we assess the extent to which 'value for money' has been considered in published health-related impact evaluations in LMICs. To our knowledge, this is the first study to assess this aspect in a comprehensive and rigorous manner, focusing on health-related interventions. For those studies with some kind of economic evaluation, we examine the quality of the economic assessment. We use quality criteria derived from the International Decision Support Initiative (iDSI) Reference Case for Economic Evaluation. This provides a set of principles and flexible methodological specifications intended to support the planning, conduct, and reporting of economic evaluations of health interventions in LMICs. (iDSI, 2016;Wilkinson et al., 2016). On the basis of these quality assessments, we provide suggestions for future research and highlight how economic evaluation can be better conducted alongside impact evaluation.

Literature review
Data sources, search and screening criteria. We conducted a purposive literature review using two strategies to identify relevant published impact evaluation and accompanying economic evaluation studies.
Firstly, we used the repository of published impact evaluations from the International Initiative for Impact Evaluations (3ie). The 3ie database aims to stock all published 1 impact evaluations of development interventions. 3ie first conducted a systematic literature search in 2014, covering more than 35 databases of published studies, including journal articles, book chapters, reports, or working papers. This database was last fully updated in 2016 2 . 3ie's inclusion criteria require that a study has been conducted in a LMIC country (classified by World Bank criteria in the year of publication); examines the effectiveness of a specific development intervention; and uses an approved experimental (i.e. randomised controlled trials (RCTs), cluster RCTs), or quasiexperimental econometric strategy (i.e. differences-in-differences (DiD), propensity score matching (PSM), instrumental variables (IV), regression discontinuity design (RDD), or other methods including multivariate matching and regression approaches). It covers interventions in many sectors (including agriculture and rural development, economic policy, energy, disaster management and others).
For health, we searched the "health, nutrition and population" segment of the repository, between the years 2000 and 2016. This includes RCTs which go beyond laboratory trials and examine interventions in real world settings, but it excludes those trials which only address the biomedical efficacy of a drug or treatment. To identify studies that have conducted any economic evaluation, we searched for articles with the keywords: "economic evaluation" OR "cost-effectiveness" OR "cost-benefit" OR "cost-utility" OR "cost".
Secondly, we conducted a literature search replicating the search strategy used in the 3ie repository and using the same terms to identify economic evaluations. This was in case the 3ie search, which focused on identifying impact evaluations, missed any accompanying economic evaluations studies. We searched the Ovid MEDLINE database, which has an implicit focus on the health field, extending the search period until 2017 (see Extended data (Kreif, 2020) for full details). The inclusion/exclusion criteria are summarised in Table 1. We only included studies that used health outcome measures. Health outcomes were interpreted broadly, including health benefits such as height gained, weight loss, life years gained, and intermediate outcomes such as the accuracy of a malaria testing kit, patient attendance rate, etc. This led to the exclusion of, for instance, the study by Alatas et al. (2012) which compared different mechanisms for the targeting of poor households for social welfare programs, with outcome measures including per capita consumption of beneficiaries, but excluding health outcomes.
The resulting articles were screened by reviewing abstracts and full texts. Screening was conducted by one researcher (SK) for the 3ie search and was split between three researchers (NK, AM, JLK) for the Ovid Medline search. We retained only those studies that conducted: cost-benefit analysis (CBAs), cost-effectiveness analysis (CEAs) (including cost-utility analysis), cost-consequences analysis (CCA), or cost-minimisation analysis (CMAs); following the categories of economic evaluations outlined in Drummond et al. (2015). Descriptions of CEAs and CBAs can be fluid. For the purposes of this review we classified any papers that use measures of health as their primary outcomes as CEAs and those that value outcomes monetarily, usually based on some notion of the 'value' of varied health effects, as CBAs. CMAs estimate outcomes in various ways but their central findings and recommendations are based only on minimizing costs. We included any studies where incremental costs and effects were combined either in ratios, net benefits or were presented side-by-side in a cost-consequences style 3 . We excluded studies that only presented immediate direct intervention costs (e.g. unit price of a diagnostic test kit) without assessing full delivery costs (e.g. human resource costs) and/or future related costs (e.g. of treatment).

Quality assessment
We assessed the quality of economic evaluation studies using criteria adjusted from 11 principles of the iDSI Reference Case (Wilkinson et al., 2016). We used the 11 Reference Case principles to construct a set of criteria for the quality assessment of the retrieved studies. We excluded one principle on 'evidence' -that "an economic evaluation should consider all available evidence relevant to the decision problem" -as this was not deemed applicable for impact evaluations that generally rely on a single study. Other iDSI principles were merged to create criteria to better fit impact evaluations. The final set of criteria is shown in Table 2, including (1) the transparent statement of the decision problem, and the inclusion of appropriate comparators; (2) details of the analysis, including the measure of health outcomes, time horizon and discounting; (3) the perspective of the study; (4) costs; (5) heterogeneity; (6) uncertainty; (7) constraints; and (8) equity.
We recorded a range of descriptive information from each study: year and journal of publication, country location of the intervention, type of health intervention, impact evaluation method, the type of economic evaluation and the health outcomes measured. For each study, at least two reviewers conducted the assessment and any discrepancies were resolved through discussion with the wider team.

Literature search results
The 3ie impact evaluation repository contained 2,419 impact evaluations in the Health Nutrition and Population sector, published between 2000 and 2016. Of these, more than half employed an experimental design (n=1,313), while the studies with non-experimental design applied various econometric methods, including DiD (n=166), PSM (n=135), IV (n=71), RDD (n=20) and other approaches 4 (n=715). Of the entire set of Health, Nutrition and Population studies, we found only 117 mentioned costs and, following screening of abstracts and full  An economic evaluation should be communicated clearly and transparently to enable the decision-maker(s) to interpret the methods and results.
The decision problem should be fully and accurately described.
The comparator(s) against which costs and effects are measured should accurately reflect the decision problem.
The intervention(s) that are currently offered to the population should be the base case comparator.

Measure of health outcome
The measure of health outcome should be appropriate to the decision problem, should capture positive and negative effects on length of life and quality of life, and should be generalizable across disease states.
Disability-Adjusted Life Years (DALYs) averted or other generic measures that capture length and quality of life should be used (e.g. the QALY).

Time horizon
The time horizon used in an economic evaluation should be of sufficient length to capture all costs and effects relevant to the decision problem Lifetime time horizon should be used. Shorter time horizons can be used where it is shown that all costs and effects that are relevant to the decision problem have been captured.

Discounting
Appropriate discount rate should be used to discount costs and effects to present value.
An annual discount rate for costs and effects should be used. When time horizon is greater than 30 years, the impact of lower discount rates should be explored.
(3) Perspective Non-health effects and costs associated with the health interventions that don't accrue to the health budget should be identified where relevant to the decision problem. All costs and effects should be disaggregated, either by sector of the economy or to whom they accrue.
1. The perspective of the study should be described.
2. Analysis should reflect direct health costs and health outcomes.
3. Additional analysis should adopt a disaggregated societal perspective, to include nonhealth effects and costs that fall outside the health budget.

(4) Costs
All differences between the intervention and the comparator in expected resource use and costs of delivery to the target population(s) should be incorporated into the evaluation.
1. Costs of all resources relevant to the decision problem should be considered.
2. Cost implications of a rollout of the program to the population should be considered.
3. Out of pocket payments are considered.

(5) Uncertainty
The uncertainty associated with an economic evaluation should be appropriately characterised.
The economic evaluation should explore uncertainty: 1. In the structure of the analysis in the economic evaluation 2. Due to source of parameters, and/or precision in the estimation of parameters of the economic evaluation (e.g. one sensitivity analysis, probabilistic sensitivity analysis).

(6) Heterogeneity
The cost and effects of the intervention on subpopulations within the decision problem should be explored and the implications appropriately characterised.
Heterogeneity in cost-effectiveness should be explored in population subgroups, where subgroup formation is justified by the evidence base regarding differences in relative costs and effects, and the influence on absolute effects.

(7) Constraints
The impact of implementing the intervention on the health budget and on other constraints should be identified clearly and separately.
Budget impact analysis should be performed that provides an estimate of the implications of implementing the intervention on various budgets, or an empirical CEA threshold.

(8) Equity
An economic evaluation should explore the equity implications of implementing the intervention.
Equity implications should be considered at all stages of the economic evaluation, including design, analysis and reporting.
texts, only 42 of these conducted an economic evaluation. In our Ovid MEDLINE search, 380 studies were identified. Studies were excluded at the full text review stage typically due to failing to qualify as an impact evaluation, either because it did not estimate the effectiveness of the intervention (31 excluded studies) or did not employ an experimental or quasi-experimental method (18 excluded studies). See Figure 1 for a flow chart of the search and screening process.
The remaining core body of 70 studies (see Table 4a and Table 4b) were published in journals from a variety of fields, including medical and public health, health policy, and economics journals and in impact evaluation reports by 3ie journals.
Publications covered many geographical areas; with a majority focused on Sub-Saharan Africa and several on Central and South America and East Asia. The interventions ranged from health interventions, including disease prevention and treatment (e.g. intermittent preventive treatment [IPT] for malaria or a school-based HIV education programme), health services (e.g. health facility improvement), health promotion (e.g. commitment devices for smoking cessation), to non-health interventions with potential health-improving impacts (e.g. a national social fund for development).
The majority (n=56) of studies employed randomised designs and, of the non-experimental studies, four conducted a PSM analysis as a primary statistical approach, four used DiD designs, one performed IV estimation, one employed an interrupted time series design, while four further studies used other controlled observation designs (e.g. covariate matching). Economic evaluations were mostly CEAs (n=54), with a minority of CBAs (n=11), CMAs (n=3) and CCAs (n=2).
Results of the quality assessment Table 3 summarises the results of the quality assessment.
Here we discuss how the published studies fared against each principle and highlight some overarching patterns.
Decision problem. Transparent description of the decision problem requires describing the need for a policy decision, accurate description of the comparators and target population and stating the evaluation perspective. We found 50 studies (71%) clearly described the decision problem, as e.g. Alfonso et al.
(2015) who framed their decision problem as the comparison of a voucher scheme combined with obstetrical quality improvements to the status quo. The nature of the review, where we conditioned on first conducting an impact evaluation, meant all included studies considered a comparator; most often (n=55) this was 'do nothing' or the status quo.
The nature of the review, where we conditioned on first conducting an impact evaluation, meant that all included studies had considered a comparator, and most often (n=55) this comparator was the 'do nothing' or status quo. An example of a study that did not consider the status quo was Duflo et al. (2007). While in the impact evaluation they compared five different types of education interventions including the current national programme (current practice) for HIV/AIDS prevention, in the economic evaluation they compared the incremental cost per teen pregnancy averted only among the comparator interventions (not considering the current practice as a comparator).

Analysis
Measure of health outcome. The assessment criteria highlighted the importance of using generic health measures such as DALYs or QALYs, incorporating both morbidity and mortality consequences of an intervention and to facilitate cross-program comparisons. However, only 20 studies used a generic health measure; 18 using DALYs and two using QALYs. The remaining studies used a wide range of outcome measures, including narrower health indicators (e.g. life years saved, depression free days, malaria detected), and behavioural change (quitting smoking, number of new adopters attributed to a campaign).
The studies that conducted CBA measured health effects and then converted them to monetised benefits (e.g. wage benefit due to effective treatment, value of statistical life saved).

Time horizon.
The assessment criteria considered the lifetime horizon as a gold standard and required a clear justification for a shorter time horizon. The majority of studies (n=47) did not follow this requirement and used either a shorter time horizon without any justification or failed to explicitly state the time horizon at all. Examples include studies that maintained the time horizon of an RCT and stuck to measured, intermediate outcomes, for example accuracy of a diagnosis, as reported by Bualombai et al. (2003). An example of a study that considered a longer time horizon is the evaluation of a pre-school intervention (Behrman et al., 2004), where the short term impact of the intervention, a gain in height, was translated into wages gained later in life via a cost-benefit analysis.
Discounting. Less than half of the studies (n=29) applied a discount rate in their economic evaluation (or justified the use of zero discount rate), and typically applied the same rate to discount costs and effects. Many studies use discount rates that come from recommended guidelines for economic evaluation, usually at the level of 3% or 5%. . Another 26 studies went further, however, and adopted a wider societal perspective, either by considering costs beyond the health budget, or some non-health benefits of the intervention (e.g. opportunity cost of waiting time, travel fees, wage loss, out of pocket payments, and education benefits); although these choices were rarely justified. CBA that valued life years saved using the value of statistical life (VSL), or lifetime earnings, also implicitly considered benefits beyond the health care sector. An example that incorporates some of these approaches is the study by Abou-Ali et al. (2010), evaluating the Egyptian Social Fund for Development, a complex nationwide inter-sectoral policy initiative including interventions in education, health, sanitation and microcredit.
Costs. The assessment criteria required that all resource implications relevant to the decision problem were counted and costed, and the implications of a potential rollout or scale-up of a program had been considered. Overall, 49 studies were deemed to have incorporated all relevant costs, although we had to rely on what was reported, which was challenging and may thus likely be an overestimate. A minority of the studies (n=17) considered a potential scale up of a programme, as e.g. Jan et al. (2010): a microfinance intervention to address the partner violence problem in South Africa is assessed based initially on the implementation costs of the programme, with exploration of   likely programme scale up costs. By indicating the decrease in the per capita cost due to economies of scale, the study concluded that the cost per DALY averted would be lower when scaled up beyond the pilot programme.

Heterogeneity.
Only six studies considered heterogeneity in the cost-effectiveness estimates. One of these, Subramanian et al. (2009), implemented a CEA of visual screening for oral cancer detection and showed that cost-effectiveness was better for high risk individuals rather than the overall population of interest, due to higher per-case health gains, even though the cost-per-case was also higher. Typically, evaluations tended to explore heterogeneity in the impact evaluation estimates, for example by conducting subgroup analysis, or estimating other forms of treatment effect heterogeneity, but did not take this forward to the economic evaluation component of the study. For example, Barham (2011) estimates heterogeneous impacts of the PROGRESA conditional cash transfer program on infant mortality, by pre-intervention municipality characteristics including access to piped water or proportion of illiteracy. However, when conducting their CBA the authors only employed the overall impact estimate.
Uncertainty. The iDSI Reference Case recommends that economic evaluations systematically explore all sources of uncertainty, including choices in the structure of an analysis and precision of estimated parameters. A relevant source of structural uncertainty in impact evaluations is the choice and specification of the econometric method used to adjust for confounding. We found that while authors often carefully explored the implications of these methodological choices on their impact estimates, this approach was rarely (n=6) taken forward to the economic evaluation stage. For example, Barham (2011) reported impact estimates from a wide range of model specifications, including different definitions of the intervention variable, and different sets of control variables. They found qualitatively similar results across specifications and chose to use the lowest of the estimates in the cost-benefit analysis. Abou-Ali et al. (2010) took a somewhat different approach: they use three different statistical approaches to obtain impact estimates: regression, nearest neighbour matching and kernel matching; and they reported separate cost-benefit estimates based on each of the impact estimation method. However, for the economic evaluation they only made use of those estimates that were found statistically significant. This latter choice omits uncertainty attributable to the degree of precision in the estimation of the impact parameter -a pattern that we find in many of the included studies: less than half of the studies (n=30) took into account any kind of parameter uncertainty, with only few studies reporting probabilistic sensitivity analysis. Characterising uncertainty due to assumptions on the parameters was more common, as in Alfonso et al. (2015) and Michaels-lgbokwe et al. (2016), who presented tornado diagrams to investigate the sensitivity of the cost-effectiveness parameter by varying assumptions on relevant parameters.

Constraints.
Our criteria related to constraints focused primarily on budget impact due to its importance for health financing, although this is not the only constraint that could be considered.
Only three studies performed an explicit budget impact analysis, among which Barasa et al. (2012), in their evaluation of a hospital improvement intervention in Kenya, estimate the budget attributable to the scale up of the intervention, and compare this to the annual health budget of the country. Several papers attempted to give an estimate of the budget impact by providing the cost per a given administrative unit. For example, an analysis by Simwaka et al. (2009) provided the cost per student in a school-based malaria programme.
Opportunity costs that fall on health budgets are sometimes reflected in economic analysis through the cost-effectiveness threshold (Drummond et al., 2015), although the use of an explicit cost-effectiveness threshold such as the UK one is rare in LMICs, due in part to the lack of empirical evidence on their opportunity costs. Several of those studies that estimated cost-effectiveness in terms of DALYs compared the resulting ICERs to thresholds derived from the GDP per capita of a given country (following a traditional and by now largely disowned recommendation by the World Health Organization; Bertram et al., 2016) rather than a threshold reflecting, at least in part, an appropriate measure of opportunity cost.

Equity.
While several impact evaluation studies have implicitly touched upon the equity principle, by either evaluating a programme specifically designed for a deprived population (e.g. Behrman et al., 2004) or by conducting subgroup analysis by levels of deprivation (e.g. Barham, 2011), only one study formally incorporated equity in the impact evaluation. Abou-Ali et al. (2010) examined the distribution of resources of the Egyptian Social Fund by sector, using a Lorenz Curve analysis. They found that more funds for education and wastewater programmes were allocated disproportionally in favour of the relatively wealthy income group. In contrast, other programmes (e.g. portable water, health and micro credit) allocated relatively more funds for the poor income group. However, this analysis this did not extend to assessing the distributional impacts also in relation to cost-effectiveness.

Summary
The main finding from our study is that too few published impact evaluations include a full economic evaluation, and those that do, have economic evaluations of variable quality. When searching within the 3ie database, we found that among the 2,419 published impact evaluations in the "Health, Nutrition and Population" category, only a small fraction, i.e. n=42 (2%), had made an attempt to conduct an economic evaluation. Based on the complementary Ovid MEDLINE search, a further 28 studies passed the criteria of having both an impact evaluation and an economic evaluation component, resulting in a total of 70 studies. The quality of economic evaluations, when assessed against a set of criteria derived from the iDSI reference case, was found to vary greatly).

Explanation of our findings
This literature review shows two distinct types of impact evaluations with economic evaluation components that have major differences in their design, and which may explain some of our findings. First, there are those typically published in economics journals -most of which were identified though the 3ie database -which positioned the economic evaluation as a relatively small part of the overall work. In these studies, typically using non-experimental designs (e.g. Barham, 2011;Behrman et al., 2004;Cohen & Dupas, 2010;Giné et al., 2010;Miguel & Kremer, 2004;Nizalova & Vyshnya), the impact evaluation was generally conducted using highly sophisticated econometric methods, addressing the heterogeneity of programme impacts and the sensitivity of the results to choices of the econometric specification. However, the same level of sophistication was not normally applied to the accompanying economic evaluation, which often relied on just one point-estimate and did not systematically consider uncertainty or the impact of different methodological choices on the estimates. Analyses of costs were typically 'back-of-the-envelope' and were not sufficient to provide decision makers with a reliable picture of program cost-effectiveness. The effects of different assumptions regarding the input parameters and the resulting cost-effectiveness estimates were likewise not assessed. While the lack of availability of detailed cost information and long-term outcomes may in part explain the choices above, even with such constraints, it should be possible to provide a decision maker with a fuller picture on the expected cost-effectiveness of such programs.
A helpful example is a study by Cohen & Dupas (2010), published in the Quarterly Journal of Economics, evaluating different levels of cost-sharing when distributing insecticide treated bed-nets for malaria prevention. The authors showed their results in a cross tabulation where their key assumptions were varied. Often, these studies defined their research question as testing an economic or behavioural hypothesis (e.g. How do people respond to incentives? Does early education work?), and not as a decision problem for resource allocation. Hence, the assessment criteria requiring a definition of the decision problem as well as the comparators often fell short in these studies.
The second type of studies can be best described as "within-trial" economic evaluations (Sculpher et al., 2006). In these, the impact evaluations and economic evaluations were typically conducted as part of an evaluation of an RCT, often with detailed information on costs. While these studies were often precise in defining the decision problem, the comparators, and even the perspective of the study, they fell short with respect to other important assessment criteria. Notably, the studies rarely looked beyond the trial: short time horizons were used, the outcomes of interest were typically intermediate outcomes measured in the trial instead of generic health measures (e.g. DALYs), and sensitivity analysis was rarely comprehensive.

Future research
One objective of this work was to generate a discussion of methodological gaps that could guide future research on how to better combine economic evaluations and impact evaluations. The need and value for the fields coming together has also been noted elsewhere. McEwan (2012), for example, outlines the main requirements for CEA and CBA in the health and education sectors; Dhaliwal et al. (2013), propose a framework to synthesize published impact estimates to compare the cost-effectiveness of a range of interventions in the educations sector; and Evans & Popova (2014) assess how uncertainty can be reflected and suggest the use of probabilistic ensitivity analysis in impact evaluation. In the UK, researchers have also noted the emphasis on effectiveness studies for evaluating pay-for-performance and have called for the accompanying assessment of cost-effectiveness (Meacock et al., 2014;Meacock, 2019).
There have been methods advancements in both the fields of impact evaluation and economic evaluation that could lead to cross-fertilization and merging of approaches. An important concern is quantifying generalisability (or 'external validity'), in which analyses undertaken for one setting can inform assessments in another (Dhaliwal et al., 2013;Drummond et al., 2005;McEwan, 2012;Sculpher et al., 2004;Vivalt, 2016 Our review has focused on single studies for which causal estimates have been obtained, but the way impact and economic evaluations are combined to meet the needs of policy-making is likely to involve the fuller use of decision-modelling (Briggs et al., 2006). This can enable the synthesis of multiple forms of evidence, including effect estimates from multiple studies (Welton et al., 2012); extrapolation of treatment effects and costs beyond a study's follow-up period, as well as from intermediate effects to outcomes; and exploration of the consequences of heterogeneity, uncertainty, as well as changes in key parameters such as prices or effect sizes.
Decision-analytic modelling is now a much more common vehicle for economic evaluation than single-study cost-effectiveness analysis, but is still rarely used as an extension to impact evaluation studies. One reason could be that the interventions assessed in impact evaluations often have the characteristics of what have been described as "complex" interventions, consisting of one or multiple activities producing multiple outcomes (Masset et al., 2018). Moreover, when evaluated at the level of jurisdictions, effects for many of these interventions may be dynamic (with externalities). The development of methods to address these issues, and how they can be modelled at the level of whole systems, is likely to be a research priority in the coming years. Future research could also better reflect the distributional aspects of policies on different socioeconomic groups (Cookson et al., 2017;Welch et al., 2017) and inform cost-effectiveness assessments costs and benefits fall across multiple sectors (Claxton et al., 2010;Remme et al., 2017).

Limitations
This review has been a first step in examining more closely the linkage (or lack thereof) between the literatures of impact evaluation and economic evaluation. There are several limitations: First, as the literature search was for single studies, it inherently excluded studies that synthesised evidence from many sources (e.g. meta-analyses, systematic reviews). Second, while the inclusion criteria of the 3ie database and of our OVID MEDLINE search provided some quality assessment for the econometric methodology applied, an in-depth assessment of the applied econometrics methods was beyond the scope of this study.

Conclusion
The fields of impact evaluation and economic evaluation have largely developed separately; each to a high level of methodological sophistication. More efforts should now be directed towards bringing the two fields together, with a view to better informing resource allocation decisions in global health. Research funders, as well as national and international policy institutions, can play important roles in supporting the generation of new methods research to achieve this aim.

Data availability
Underlying data All data underlying the results are available as part of the article and no additional source data are required.
Extended data York Research Database: Data appendices to the paper "From impact evaluation to decision-analysis: assessing the extent and quality of evidence on 'value for money' in health impact evaluations in low-and middle-income countries". https://doi.org/ 10.15124/cd49d13f-0553-46e7-ab13-c726fc5c97e5 (Kreif, 2020).
Datadepo.zip contains the following extended data: • 3ie_QA.csv. (Quality assessment of articles identified in 3ie.) • Ovid_allscreen.csv. (All articles identified in the initial screen of Ovid.)

Open Peer Review
Could studies that are included in other segments of the database evaluate interventions which do not primarily affect health but have caused health-related externalities?
Second, the study excluded "trials which only address the biomedical efficacy of a drug or treatment." It is later stated, "Health outcomes were interpreted broadly, including health benefits such as height gained, weight loss, …." A definition and/or example of biomedical efficacy would be informative. It is not immediately clear why some clinical outcomes (e.g. weight loss) are evaluated but others are omitted when both may be intermediaries of disease-related outcomes and health-related quality of life.

Results:
The results section is well-written and contains useful, relevant information. It would provide useful clarification to state that Duflo et al. used average cost-effectiveness ratios in their study of HIV/AIDS prevention.
too cumbersome, to have also the number of studies for the type of interventions, the measure of health used and the type of data used. I believe the authors could draw from here also important implications for further research, perhaps emphasizing (but I may be having the wrong expectation here) the lack of direct health measures and the underutilisation of administrative data routinely collected. Perhaps some of the recommendations could then point to the need of improving routine data to facilitate these evaluations.
Does "medical sector perspective" refer to health care sector perspective more broadly? 7.
It is unclear if the authors argue for the need of more studies evaluating value for money, or for the need of all studies evaluating health impact to also include an economic evaluation.

8.
It is stated only later in the paper that one objective was to generate a discussion of the methodological gaps that could guide future research on how to better combine economic evaluations and impact evaluations. It would be worth highlighting this at the beginning.

9.
There is a mention of the need for increased generalisability across settings. It is not clear what settings mean here, whether across countries, or care settings or perhaps type of intervention. I also wonder if some of the suggestions made point at understanding generalisability by understanding the mechanisms to impact.

10.
A final point is about methodological improvements in further research. Is there a need for validations of health measures in specific settings? 11.