Comparing the cost-per-QALYs gained and cost-per-DALYs averted literatures

Background: We examined the similarities and differences between studies using two common metrics used in cost-effectiveness analyses (CEAs): cost per quality-adjusted life year (QALY) gained and cost per disability-adjusted life year (DALY) averted. Methods: We used the Tufts Medical Center CEA Registry, which contains English-language cost-per-QALY gained studies, and the Global Cost-Effectiveness Analysis (GHCEA) Registry, which contains cost-per-DALY averted studies. We examined study characteristics, including intervention type, sponsor, country, and primary disease, and also compared the number of published CEAs to disease burden for major diseases and conditions across geographic regions. Results: We identified 6,438 cost-per-QALY and 543 cost-per-DALY studies published through 2016 and observed rapid growth for both literatures. Cost-per-QALY studies most often examined pharmaceuticals and interventions in high-income countries. Cost-per-DALY studies predominantly focused on infectious disease interventions and interventions in low and lower-middle income countries. We found that while diseases imposing a larger burden tend to receive more attention in the cost-effectiveness analysis literature, the number of publications for some diseases and conditions deviates from this pattern, suggesting “under-studied” conditions (e.g., neonatal disorders) and “over-studied” conditions (e.g., HIV and TB). Conclusions: The CEA literature has grown rapidly, with applications to diverse interventions and diseases. The publication of fewer studies than expected for some diseases given their imposed burden suggests funding opportunities for future cost-effectiveness research.


Introduction
Researchers conducting cost-effectiveness analyses (CEAs) commonly use quality-adjusted life years (QALYs) or disabilityadjusted life years (DALYs) as health outcome measures to account for both longevity and quality of life (or life with disability) 1 . These broadly applicable metrics facilitate the comparison of interventions across conditions and diseases.
Analysts have used these measures in different contexts and settings [2][3][4][5][6] . CEAs using the cost-per-QALY metric, which first appeared in the late 1970s, have typically focused on interventions in higher income settings 7,8 . In the 1990s, the World Bank and the World Health Organization (WHO) developed the DALY to quantify disease burden (reflecting both years of life lost (YLL) and years of life with disability (YLD)) 9,10 . CEAs using DALYs have tended to focus on lower-and middle-income countries 11 . QALYs and DALYs, which both quantify health related quality of life by assigning a value ranging from zero to one to each year of life, have somewhat different methodological underpinnings 12 . QALY preference weights range from 0 (corresponding to "dead") to 1 (corresponding to a hypothetical state of "perfect health") and reflect a set of health state "attributes," "dimensions," or "domains" -e.g., discomfort, mobility, depression, etc. -associated with an individual's health condition. DALY weights have a similar intuitive interpretation, although for DALYs, 1 corresponds to "dead" and 0 corresponds to "perfect health." For DALYs, moreover, each weight corresponds not to a set of health state attributes but to a specific disease 13 . DALY values have in the past depended on the age of the affected populations. "Age-weighting" reflected the idea that an additional life year accrued during childhood or old age has less value than a year accrued during young and middle adulthood, when productivity contributions to societal well-being are typically greatest 14,15 . Because the unequal treatment of different age groups raised substantial ethical concerns, however, the most recent DALY calculation methods omit age-weighting 16 .
We analyzed the cost-per-QALY gained and cost-per-DALY averted literatures to examine their growth and regional variation, and to investigate the extent to which the focus of each literature corresponds to those diseases and conditions imposing the largest burden on the population.

Data
The cost-effectiveness analysis literature. We analyzed two databases maintained by the Center for the Evaluation of Value and Risk in Health at Tufts Medical Center in Boston, Massachusetts: the Cost-Effectiveness Analysis (CEA) Registry (www.cearegistry.org), which contains information on costper-QALY studies, and the Global Health CEA Registry (www. ghcearegistry.org), which houses information on cost-per-DALY studies. Both registries contain information on PubMedindexed, English-language CEAs published through 2016. Previous publications further detail the search strategies, data collection processes, and review methods, which are similar for the two registries 5,6 . We received an ethics exemption for this study because it did not involve human subjects. Data from these registries used in this analysis appear in Dataset 1 and Dataset 2; Supplemental File 1 and Supplemental File 2 document the variables in these datasets.

Disease burden.
Dataset 3 contains population disease burden estimates (total DALYs incurred), as reported by the Institute for Health Metrics and Evaluation (IHME), and stratified by Global Burden of Disease (GBD) Super Region 17 . Within each Super Region, we sub-stratified population burden by GBD level two disease category. Dataset 3 also lists the number of articles from the cost-per-QALY literature and from the cost-per-DALY literature for each of these strata and substrata. We counted articles in more than one of the Table 2 strata if, for example, they focused on two countries belonging to two distinct GBD Super Regions.

Analysis
Study characteristics. Using data from Dataset 1 and Dataset 2, and definitions from the World Bank and the GBD initiative, we stratified studies by: GBD Super Region, World Bank income level, intervention type, study funding source category, prevention stage, and GBD category. As detailed in Table 1, some of these categories are mutually exclusive, while others are not. We computed the proportion of studies in each stratum using total article counts for the cost-per-QALY and cost-per-DALY literature from Dataset 1 and Dataset 2, respectively.

Amendments from Version 1
Our revised version 2 addresses a number of important comments raised by reviewers.
First, we provide a more complete set of potential explanations for why QALY-based CEAs are more prevalent in high income countries than in low income and lower-middle income countries.
Second, we have revised the figures characterizing the relationship between disease burden and number of CEAs, putting number of studies on the vertical axis. This switch makes it easier to understand the figures because "number of studies" (vertical axis) is best understood as a "response" to diseases burden (horizontal axis). This rearrangement makes it clear that data points above the regression line represent diseases and conditions that are relatively over-studied.
We have also added a Table 2, which compares actual studies conducted to predicted number of studies for each of the seven GBD regions.
Finally, we have made generation of the data used in this paper more easily reproducible by eliminating all manual steps and replacing those steps with computer code that we are making publicly available. Dataset 1 has been replaced with the following file: Dataset 1. Cleaned QALY Database. Dataset 2 has been replaced with the following file: Dataset 2. Cleaned DALY Database. Dataset 3 has been replaced with the following file: Dataset 3. Regional and Disease Level Stratified Dataset. These datasets are available in our "Version 2" folder at OSF.
We likewise report all code used for analysis.
Tertiary prevention (treatment) dominates the cost-per-QALY registry (62%), whereas the cost-per-DALY registry focuses far more on primary prevention (59%). Conditions most frequently addressed by studies in the cost-per-QALY literature include noncommunicable diseases, such as cancer (18%) and cardiovascular diseases (17%), whereas most cost-per-DALY registry studies target infectious diseases.
Foundations are the single largest source of non-governmental support for cost-per-DALY studies (27%), while pharmaceutical and device companies are the single largest source of non-governmental support for cost-per-QALY studies (28%).
In Figure 3A, we excluded one study classified as "international." We excluded 145 studies because the country of study was unclear.
In Figure 3B, we excluded 13 studies classified as "international." We excluded 17 studies because the country of study was unclear.
Literature coverage vs. disease burden Neoplasms were the most studied diseases in Southeast Asia, East Asia, and Oceania ( Figure 4A), while mental and behavioral disorders were less studied relative to their burden. Highincome countries had relatively few studies addressing mental and behavioral disorders, and injuries ( Figure 4B). Relative to burden, HIV/AIDS and tuberculosis were the most studied diseases in Sub-Saharan Africa, while this region reported fewer studies on nutritional deficiencies ( Figure 4C). Table 2 reports Studentized residuals from the ordinary least square regression for each region, along with the average and median of these residuals for each disease, across all seven GBD regions.
Those results suggest that a number of conditions are uniformly "under-studied" because the residuals are negative in all seven regions (e.g., unintentional injuries, transport injuries, liver cirrhosis). Positive residuals across most regions indicate other conditions generally receive more attention than appears warranted by their burden (HIV and TB, neoplasms).
Each Figure 4 panel displays results for the top 10 diseases and includes a diagonal line that represents average studies published as a function of disease burden for each Super Region. The location of a plotted point to the "northwest" of this line indicates a disease that is relatively "over-studied" within that region, because the number of published studies exceeds, on average, the number published studies for other diseases imposing the same burden on the population. The location of a plotted point to the "southeast" indicates a disease that is relatively under-studied.    Figure 3A) and cost-per-DALY studies ( Figure 3B) for each country. Gray indicates countries with no associated studies. If a study reported a cost-effectiveness estimate for two or more countries, we counted a CEA for each country (e.g. if a study reported an intervention's costeffectiveness ratio for both Canada and the United States, we incremented the study count in both countries). If a study reported a "global" cost-effectiveness ratio, we excluded it from all country counts. We also excluded from these counts studies that did not clearly specify an applicable country or region.

Discussion
Our review reveals a notable increase in the publication of cost-per-QALY and cost-per-DALY studies since 2000, thus making ever more cost-effectiveness information available to aid decision makers in their efforts to prioritize resources. The literature spans a wide range of interventions, diseases, and geographic regions.
The data demonstrate key differences between the cost-per-QALY and cost-per-DALY literatures (Table 1). For example, the costper-QALY literature tends to focus on high-income countries, while cost-per-DALY studies focus more on lower-and middleincome income nations. Differences extend to the types of interventions and diseases represented: cost-per-QALY studies tend to address diseases prevalent in wealthier countries (e.g., cardiovascular disease and cancer), while cost-per-DALY studies address diseases more prevalent in low-income countries (e.g., infectious diseases, such as tuberculosis and HIV). The two literatures also differ in terms of the interventions on which they focus. More cost-per-QALY studies evaluate pharmaceuticals, while cost-per-DALY studies focus more often on immunizations.
Several factors may explain why cost-per-QALY studies predominate in high-income countries, while cost-per-DALY studies are more popular in lower and middle-income countries. The differences could, for example, reflect the availability of health utility weights, needed to estimate QALYs, in high-income countries and the lack of such information in lower-income settings. Researchers conducting CEAs in countries with limited data capacity may find it easier and less expensive to use the cost-per-DALY metric.
The differences could also reflect the preferences and traditions of organizations that fund CEA studies. Foundations funding global health research may prefer the DALY metric, given the historic use of DALYs to measure global disease burden. In contrast, health authorities in high-income countries (e.g., the National Institute for Health and Care Excellence (NICE) in the United Kingdom) have tended to recommend the use of QALYs in CEAs. The geographic differences between the cost-per-QALY and cost-per-DALY literature deserve further investigation, as our effort did not gather information on why authors used these measures.
Our data also indicate inconsistencies between literature coverage and disease burden. Some diseases and conditions (e.g., cardiovascular disease and mental health in Southeast Asia, South Asia and Oceania) are relatively "under-studied," while other diseases and conditions (e.g., HIV and TB in all regions) are relatively "over-studied".
There is no clear explanation for these inconsistencies. As we have noted elsewhere, decisions to fund or conduct economic evaluations reflect not just the disease burden imposed by the targeted condition, but also the number of promising interventions or programs 19,20 . Because specialty drugs for diseases such as cancer represent important new interventions in high-income countries, and because pharmaceutical companies have the resources and incentive to characterize value for those interventions, much of the cost-per-QALY literature has recently focused on specialty drug therapies. These financial incentives are less pronounced in the lower-and middle-income countries that are much more the focus of the cost-per-DALY literature. In addition to disease burden, priorities in the cost-per-DALY literature may reflect the visibility and emotional salience of diseases, the influence of advocacy groups, the vagaries of reimbursement decisions 19 , and institutional priorities of the organizations sponsoring the research.
In any case, the incongruities we observed between literature coverage and disease burden raise important questions about opportunities for the re-direction of future CEA research funding so that resources for such research can generate the highest return on investment.
Our work has the following limitations. First, the databases we used are restricted to English-language articles indexed in PubMed. This restriction may have depressed the number of cost-per-DALY studies we identified to a greater extent proportionally than it may have depressed the number of cost-per-QALY studies we identified because a smaller proportion of the cost-per-DALY literature focuses on English-speaking countries. Second, categorizing studies (e.g., whether an intervention targets primary or secondary prevention) depends on judgment, and other researchers may have classified articles differently.
In the future it will be important to further explore trends in the CEA literature in terms of diseases and geographic regions covered, funding patterns among donor organizations, the country of origin or study authors, the prevalence and patterns of CEAs published in languages other than English, the variation in methods used in analyses, and whether published studies address society's most pressing needs 21 . It will also be useful to continue to investigate the methodological underpinnings of QALYs and DALYs and how much the choice of metric influences CEA results and the decisions based on them 22,23 .

Data availability
We have made the data used in this analysis available through the Open Science Foundation (OSF): http://doi.org/10.17605/OSF. IO/3BEK5 24 .

Dataset 1. Cleaned QALY Database.
Includes the cost-per-QALY data used in this paper.

Dataset 2. Cleaned DALY Database.
Includes the cost-per-QALY data used in this paper.
Dataset 3. Regional and disease level stratification dataset.
Includes disease burden and literature coverage data used in this paper.

Competing interests
No competing interests were disclosed.

Grant information Bill and Melinda Gates Foundation [OPP1171680].
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Supplementary material
Supplementary File 1. Cost-per-QALY manual. Documents the variables collected in the cost-per-QALY database. Click here to access the data.

Supplementary File 2. Cost-per-DALY manual.
Documents the variables collected in the cost-per-DALY database. Click here to access the data. 1.

2.
3. The authors have provided a useful summary of the up-to-date contents of the Tufts Medical Center CEA Registry and Global Health CEA Registry, which they manage. In particular, they contrast the contents of the two databases in regard to number of studies, geographic, disease burden, and disease-specific content. This provides a useful -if somewhat simplistic -overview of the availability and contents of current CEA studies. A few comments regarding the results as presented are provided below, along with a few suggestions about additional ways to interrogate the databases.

Open Peer Review
I am interested to see the time series of cost per QALY and cost per DALY studies presented in Figure 1. There is nothing especially surprising here for anyone working in the field, and it is heartening to see the steady increase in economic evaluations for health. I would appreciate the authors highlighting a few aspects of the databases that help one interpret the data. The methods clearly state that the databases draw from English-language articles indexed in PubMed, but it would be worth underscoring that selection creates a downward bias on the true number of cost per DALY studies. It would be very interesting to know if any literature has assessed the change over time of economic evaluations in local-language journals, which would provide an additional signal of the state of economic capacity in LMIC regions.
The authors have a rich longitudinal database that could be further analyzed to assess such questions as how changes in the funding sources or disease patterns over time affect the number of cost per QALY or cost per DALY studies. Looking specifically at the cost per DALY numbers over time, how does one understand the growth in study numbers? Is it strongly correlated with growth in global health funding? (I would guess so), and are numbers of disease-specific studies correlated with change in disease burden? (I would guess not). The authors could show rates of growth year by year which would make comparisons across years and types of studies easier.
The metric of "under-" and "over-" studied as determined by the DALY burden is also interesting and mostly unsurprising. Perhaps more could be said about the countries and sub-regions that show up green on both maps -such as North Africa, Middle East, and parts of Latin America. Those are the regions truly deficient in economic evaluations. Another point about the literature coverage relative to disease burden is to consider the demographics of the respective Super Regions. Since sub-Saharan Africa and South Asia have younger populations, they also merit more analysis of childhood conditions. If an age-specific disease burden measure were used as the scalar, would the conclusions about "over-" and "under-" studied be the same? the scalar, would the conclusions about "over-" and "under-" studied be the same?
Like other reviewers, I find some issue with the statement about "historic proclivities" driving the choice between cost per QALY and cost per DALY, but for a different reason. The methodological underpinnings of the two measures require different types of data, some of which is culturally or contextually determined. Measuring disease prevalence is more straightforward -albeit not simple -than measuring attributes and states of health, and therefore more readily available in countries with limited data capacity; thus creating the means to produce more cost per DALY studies.

Is the work clearly and accurately presented and does it cite the current literature? Yes
Is the study design appropriate and is the work technically sound? Yes

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Yes
No competing interests were disclosed.

Comment #2. The authors have a rich longitudinal database that could be further analyzed to assess such questions as how changes in the funding sources or disease patterns over time affect the number of cost per QALY or cost per DALY studies. Looking specifically at the cost per DALY numbers over time, how does one understand the growth in study numbers? Is it strongly correlated with growth in global health funding? (I would guess so), and are numbers of disease-specific studies correlated with change in disease burden? (I would guess not). The authors could show rates of growth year by year which would make comparisons across years and types of studies easier.
Response: These are interesting questions, although we believe they go beyond the scope of what we set out to address. We have added text to the Discussion section of the paper to note areas for future research, including trends in the CEA literature in terms of diseases and geographic regions covered, funding patterns among donor organizations, and whether published studies correspond to society's most pressing needs.

Comment #3. The metric of "under-" and "over-" studied as determined by the DALY burden is also interesting and mostly unsurprising. Perhaps more could be said about the countries and sub-regions that show up green on both maps -such as North Africa, Middle East, and parts of Latin America. Those are the regions truly deficient in economic evaluations. Another point about the literature coverage relative to disease burden is to consider the demographics of the respective Super Regions. Since sub-Saharan Africa and South Asia have younger populations, they also merit more analysis of childhood conditions. If an age-specific disease burden measure were used as the scalar, would the conclusions about "over-" and "under-" studied be the same?
Response: These points are likewise interesting. We defer to future researchers to organize the data as needed and conduct these analyses.

Response:
We have added text offering further explanation for the discrepancy between use of QALYs and DALYs by country wealth level (see response to Comment #1 from Michael Drummond).
No competing interests were disclosed. This is a useful and timely study presenting informative analyses of the cost-per-QALY gained and cost-per-DALY averted literatures, relating them to geographical regions and to diseases and conditions prominent in these regions, as well as to wealth levels, types of intervention assessed and funding source. The authors draw conclusions as to the respective literatures' evolution over the years and to things such as how well these link to disease burdens in their respective geographies.
We would challenge the classification of pharmaceuticals as "tertiary prevention/treatment". According to WHO, pharmaceuticals make up the bulk of OOP spending in most LICs (~77% based on the 2011 World Medicines Situation) and given fees, access and availability of facilities, self-medication is a major component of healthcare systems in LICs and LMICs.
We were surprised almost half of the DALY studies have received government funding. Is it possible to tell whether this is national governments of LICs and LMICs or donor governments? Given DALYs are mostly used in LICS and LMICs, are the governments of these countries commissioning this work? According to another study which we believe is worth citing , BMGF seems to be the single most commonly cited funder of DALY studies in LMICs. Our analysis as part of this paper (unpublished data) found that LIC government funded studies in malaria, TB and HIV studies (using DALYs (mostly) as an outcome measure), made up only 13%, 5% and 7% of the total in each disease area, respectively. Perhaps a more nuanced (eg broken down by decision maker global and domestic) analysis of funding source may reveal important messages assuming the data are available? Such a study would supplement nicely the PLoS paper cited earlier.
Though probably not for this paper, perhaps a discussion as to why the discrepancy between QALYs and DALYs by wealth level and what the message may be for transitioning countries, is warranted. So the database could be expanded perhaps in the future to include data on the country of origin of authors, which would in turn allow capturing a likely (but unproven) trend from poorer countries where publications come from mostly western authors funded by foreign donor  Figure 1 is useful, but it would be helpful to show the time trends not just in the counts of the papers but in the composition of papers by GBD category and super-region, as well as intervention. A more developed Figure 1 would set up nicely a discussion of what the future might hold and which we touch on earlier in our discussion regarding transition.
One of us (AM) has downloaded the cost/QALY data set to take a look and found that for 5895 out of 6438 records the field for publication year is blank. This info is needed to generate Figure 1 and so it should be there. It considerably lowers confidence in the integrity of the analysis when one discovers these things within a few seconds of downloading the database. Figure 4 is also very useful but why not show similar analysis for the other GBD regions? Why so few regions? In an increasingly multipolar world it seems highly appropriate to conclude that research priorities in each region should be different. If one could generate graphs for all the GBD regions that would strengthen that case and give a sense of the extent of global diversity.
The authors write: "The contrast seems to reflect the historical proclivities rather than any inherent advantages for one metric's use for a particular category of countries" -what might such "inherent advantages" be or indeed the historical proclivities (might the funding source have a role to play given the preference by BMGF a major funder of this work, for DALYshttps://beta.nice.org.uk/Media/Default/About/what-we-do/NICE-International/projects/MEEP-report.pdf )? Why not standardise on the QALY as both the methodological (see Airoldi and Morton ) and empirical foundations of the method are more well-established and there is evidence that when domestic payers make investment decisions in HICs and UMICs, QALY is their preferred outcome?
The findings relating to under-and over-studied conditions seem to us to be very interesting and relevant (perhaps more so than the QALY/DALY debate). Could the paper be retitled and/or the abstract rewritten to give these findings more prominence?

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Partly
No competing interests were disclosed.

Competing Interests:
We have read this submission. We believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Comment #1. This is a useful and timely study presenting informative analyses of the cost-per-QALY gained and cost-per-DALY averted literatures, relating them to geographical regions and to diseases and conditions prominent in these regions, as well as to wealth levels, types of intervention assessed and funding source. The authors draw conclusions as to the respective literatures' evolution over the years and to things such . as how well these link to disease burdens in their respective geographies
Response: No response needed.

Comment #2. We would challenge the classification of pharmaceuticals as "tertiary prevention/treatment". According to WHO, pharmaceuticals make up the bulk of OOP spending in most LICs (~77% based on the 2011 World Medicines Situation) and given fees, access and availability of facilities, self-medication is a major component of . healthcare systems in LICs and LMICs
Response: Note that we make no assumptions about an intervention's prevention stage based on its type. For example, we do not assume that pharmaceuticals are tertiary treatments. Instead, we assign the prevention level based on how the article describes the disease and the treatment.
While we had intended the original text to provide examples of typical primary and tertiary While we had intended the original text to provide examples of typical primary and tertiary treatments, we see that the presentation of the results may have been confusing. We have therefore eliminated those examples and just report the overall proportion of articles in two categories. The text now reads: Tertiary prevention (treatment) dominates the cost-per-QALY registry (62%), whereas the cost-per-DALY registry focuses far more on primary prevention (59%).

Response:
We do not have the information needed to assess whether the governments of these countries are commissioning the work. The final paragraph of the Discussion now cites both papers identified by the reviewer and notes the need for further research on this and on other issues.

Response:
We have added text offering further explanation for the discrepancy between use of QALYs and DALYs by country wealth level (see response to Comment #1 from Michael Drummond) and on the need for further research in this area. Figure 1 would set up nicely a  discussion of what the future might hold and which we touch on earlier in our discussion regarding transition. regarding transition.

Response:
We appreciate that providing time trends for other study characteristics, including GBD and super-region would be useful and could provide insight regarding the direction of the literature. In the revised paper, we have noted that as an area for future research and believe that as the cost-per-DALY literature in particular increases in size, the inferences that can be drawn will increase.

Comment #5. One of us (AM) has downloaded the cost/QALY data set to take a look and found that for 5895 out of 6438 records the field for publication year is blank. This info is needed to generate Figure 1 and so it should be there. It considerably lowers confidence in the integrity of the analysis when one discovers these things within a few seconds of
. downloading the database

Response:
We very much appreciate the reviewers pointing out errors in our data extract. In response, we have regenerated the data extract, this time doing so by implementing all steps in a computer program to reduce the risk of introducing errors through manual manipulation of the original data. We are posting the computer program (written in STATA) and the extracted data. We have checked the distributions of the extracted data to make sure they appear to be reasonable.
Note that because we used the original dataset for our statistical analysis in verstion #1 of this paper, the errors in the extract did not affect the results.

Response:
We have chosen to include figures for only the three regions with the largest number of studies. But to address the reviewer's comment, we have alsoadded a table that reports the standardized residual for each disease in each super region, relative to the regression line. We also report the mean and median residual for each disease (across all seven super regions) to characterize which diseases tend to be over-and under-studied in general.

Comment #7. The authors write: "The contrast seems to reflect the historical proclivities rather than any inherent advantages for one metric's use for a particular category of countries" -what might such "inherent advantages" be or indeed the historical proclivities (might the funding source have a role to play given the preference by BMGF a major funder of this work, for DALYshttps://beta.nice.org.uk/Media/Default/About/what-we-do/NICE-International/projects/MEEP-report.pdf)? Why not standardise on the QALY as both the methodological (see Airoldi and Morton2) and empirical foundations of the method are more well-established and there is evidence that when domestic payers make investment decisions in HICs and UMICs, QALY is their preferred outcome?
Response: Response: Making recommendations as to what measure the field should use is beyond the scope of this paper. We do, however, provide expanded text in an effort to explain why these measures are each used, and why the QALY measure is used more in high-income countries, and the DALY measure more in lower-and middle-income countries. See response to Comment #1 from Michael Drummond.

Comment #8. The findings relating to under-and over-studied conditions seem to us to be very interesting and relevant (perhaps more so than the QALY/DALY debate). Could the paper be retitled and/or the abstract rewritten to give these findings more prominence?
Response: As this paper is the first comprehensive effort to describe the cost-per-DALY literature and compare it to the cost-per-QALY literature, we prefer to stick with emphasizing this aspect of the work in the title. have tended to focus on low and lower-middle income countries. This is likely to reflect the greater availability of preference values for health states in higher income countries and the preference of international donors, such as WHO and the World Bank, for studies estimating DALYs in lower income countries. Secondly, while the literature in both cost per QALY and cost per DALY studies is growing over time, there are more than 10 times the number of studies using QALYs than those using DALYs. This is likely to reflect the higher number of economist researchers and greater availability of funding for studies in high-income countries.
However, another finding of the research is not so easily explained. While the focus on topics for research, tertiary prevention (treatment) for studies using QALYs and primary prevention for studies using DALYs, it is surprising that the literature coverage is not closely aligned to disease burden in either high income or low income countries. Neumann et al. suggest that 'the most commonly studied diseases, regions and interventions may reflect the financial interests of the CEA funders'. One can see why this might be the case in higher income countries, where many studies are funded by pharmaceutical countries, but it's not clear why international donors might be favouring some diseases over others in lower income countries.
The analysis by Neumann et al. cannot directly answer that question, but one important factor driving economic evaluation in all countries is the number of promising interventions or programmes to evaluate.
In this sense, the literature on economic evaluation mostly follows the priorities for research of technology In this sense, the literature on economic evaluation mostly follows the priorities for research of technology manufacturers or public health specialists. For example, in recent years the research priorities of pharmaceutical companies in higher income countries have focused on specialty drugs for diseases such as cancer. This could be driven by discoveries in basic research or the pursuit of profits, or both. However, in all countries one might expect priorities for research to be driven not by the absolute level of disease burden, but the potential for that burden through the development and implementation modifying of health care treatments and programmes.
One final issue touched on in the paper by Neumann et al. concerns the analytic choice between QALYs and DALYs in conducting economic evaluations. In commenting on the contrast in approach between higher and lower income countries, the authors state that 'this contrast seems to reflect the historic proclivities of health economist researchers, rather than any inherent advantages for one metric's use for a particular category of countries'. In my view this issue deserves much deeper investigation.
In many lower income countries, health economist researchers may not have a realistic choice of approach, as QALYs may not exist for the country concerned. But which approach should the analyst use in a country for which both QALYs and DALYs are available? Comparisons between QALYs and DALYs and the implications for health policy decisions have been discussed in the papers by Airoldi and Morton (2009) and Robberstad (2005) , with the conclusion that different decisions might be reached.
Although there are some minor differences in the theorectical constructs of QALYs and DALYs, two practical issues may be critical to the choice of approach. On the one hand QALYs are likely to be more 'bespoke' to the country where the study is being conducted and are more likely to reflect the health state preferences in the country concerned. However, on the other hand there is considerable variability in the methods used to elicit the preferences for health states in QALYs, which may threaten any standardized approach to decision-making. This issue has been recognized by the National Institute for Health and Care Excellence (NICE) in the United Kingdom, which, while recommending the use of QALYs, specifies the characteristics of the instrument that should be used to estimate them ( ). By an extension NICE, 2013 of the same argument, an international donor requiring some standardization of approach to evaluation across several countries is likely to recommend the use of DALYs.

Response:
We agree with the reviewer comments and have revised the Discussion to incorporate these points. We have added the following text to the Discussion: Several factors may explain why cost-per-QALY studies predominate in high-income countries, while cost-per-DALY studies are more popular in lower and middle-income countries. The differences could, for example, reflect the availability of health utility weights in high-income countries and the lack of such information in lower-income settings. Researchers conducting CEAs in countries with limited data capacity may find it easier and less expensive to use the cost-per-DALY metric.
The differences could also reflect the preferences and traditions of organizations that fund CEA studies. Foundations funding global health research may prefer the DALY metric, given the historic use of DALYs to measure global disease burden. In contrast, health authorities in high-income countries (e.g., the National Institute for Health and Care Excellence (NICE) in the United Kingdom) have tended to recommend the use of QALYs in CEAs. The geographic differences countries (e.g., the National Institute for Health and Care Excellence (NICE) in the United Kingdom) have tended to recommend the use of QALYs in CEAs. The geographic differences between the cost-per-QALY and cost-per-DALY literature deserve further investigation, as our effort did not gather information on why authors used these measures.

Comment #2.However, another finding of the research is not so easily explained. While the focus on topics for research, tertiary prevention (treatment) for studies using QALYs and primary prevention for studies using DALYs, it is surprising that the literature coverage is not closely aligned to disease burden in either high income or low income countries. Neumann et al. suggest that 'the most commonly studied diseases, regions and interventions may reflect the financial interests of the CEA funders'. One can see why this might be the case in higher income countries, where many studies are funded by pharmaceutical countries, but it's not clear why international donors might be favouring . some diseases over others in lower income countries
Response: We agree with the reviewer and have added the following paragraph to the Discussion: There is no clear explanation for these inconsistencies. As we have noted elsewhere, decisions to fund or conduct economic evaluations reflect not just the disease burden imposed by the targeted condition, but also the number of promising interventions or programs 19, 20. Because specialty drugs for diseases such as cancer represent important new interventions in high-income countries, and because pharmaceutical companies have the resources and incentive to characterize value for those interventions, much of the cost-per-QALY literature has recently focused on specialty drug therapies. These financial incentives are less pronounced in the lower-and middle-income countries that are much more the focus of the cost-per-DALY literature. In addition to disease burden, priorities in the cost-per-DALY literature may reflect the visibility and emotional salience of diseases, the influence of advocacy groups, the vagaries of reimbursement decisions , and institutional priorities of the organizations sponsoring the research.

Comment #3. The analysis by Neumann et al. cannot directly answer that question, but one important factor driving economic evaluation in all countries is the number of promising interventions or programmes to evaluate. In this sense, the literature on economic evaluation mostly follows the priorities for research of technology manufacturers or public health specialists. For example, in recent years the research priorities of pharmaceutical companies in higher income countries have focused on specialty drugs for diseases such as cancer. This could be driven by discoveries in basic research or the pursuit of profits, or both. However, in all countries one might expect priorities for research to be driven not by the absolute level of disease burden, but the potential for modifying that burden through the development and implementation of . health care treatments and programmes
Response: See response to Comment #3.

Comment #4. One final issue touched on in the paper by Neumann et al. concerns the analytic choice between QALYs and DALYs in conducting economic evaluations. In commenting on the contrast in approach between higher and lower income countries, the authors state that 'this contrast seems to reflect the historic proclivities of health economist researchers, rather than any inherent advantages for one metric's use for a particular category of countries'. In my view this issue deserves much deeper
investigation. 19