<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="methods-article" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">Gates Open Res</journal-id>
            <journal-title-group>
                <journal-title>Gates Open Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2572-4754</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/gatesopenres.13107.1</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Method Article</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>A grid-based sample design framework for household surveys</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 1; peer review: 2 approved, 1 approved with reservations]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Boo</surname>
                        <given-names>Gianluca</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Visualization</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-4078-8221</uri>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Darin</surname>
                        <given-names>Edith</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Thomson</surname>
                        <given-names>Dana R.</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a2">2</xref>
                    <xref ref-type="aff" rid="a3">3</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Tatem</surname>
                        <given-names>Andrew J.</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Funding Acquisition</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a3">3</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>WorldPop, School of Geography and Environmental Science, University of Southampton, Southampton, SO17 1BJ, UK</aff>
                <aff id="a2">
                    <label>2</label>Department of Social Statistics and Demography, University of Southampton, Southampton, SO17 1BJ, UK</aff>
                <aff id="a3">
                    <label>3</label>Flowminder Foundation, Stockholm, 11355, Sweden</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:gianluca.boo@soton.ac.uk">gianluca.boo@soton.ac.uk</email>
                </corresp>
                <fn fn-type="conflict">
                    <p>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>27</day>
                <month>1</month>
                <year>2020</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2020</year>
            </pub-date>
            <volume>4</volume>
            <elocation-id>13</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>20</day>
                    <month>1</month>
                    <year>2020</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2020 Boo G et al.</copyright-statement>
                <copyright-year>2020</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://gatesopenresearch.org/articles/4-13/pdf"/>
            <abstract>
                <p>Traditional sample designs for household surveys are contingent upon the availability of a representative primary sampling frame. This is defined using enumeration units and population counts retrieved from decennial national censuses that can become rapidly inaccurate in highly dynamic demographic settings. To tackle the need for representative sampling frames, we propose an original grid-based sample design framework introducing essential concepts of spatial sampling in household surveys. In this framework, the sampling frame is defined based on gridded population estimates and formalized as a bi-dimensional random field, characterized by spatial trends, spatial autocorrelation, and stratification. The sampling design reflects the characteristics of the random field by combining contextual stratification and proportional to population size sampling. A nonparametric estimator is applied to evaluate the sampling design and inform sample size estimation. We demonstrate an application of the proposed framework through a case study developed in two provinces located in the western part of the Democratic Republic of the Congo. We define a sampling frame consisting of settled cells with associated population estimates. We then perform a contextual stratification by applying a principal component analysis (PCA) and 
                    <italic toggle="yes">k</italic>-means clustering to a set of gridded geospatial covariates, and sample settled cells proportionally to population size. Lastly, we evaluate the sampling design by contrasting the empirical cumulative distribution function for the entire population of interest and its weighted counterpart across different sample sizes and identify an adequate sample size using the Kolmogorov-Smirnov distance between the two functions. The results of the case study underscore the strengths and limitations of the proposed grid-based sample design framework and foster further research into the application of spatial sampling concepts in household surveys.</p>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>Demography</kwd>
                <kwd>Household Surveys</kwd>
                <kwd>Sample Design</kwd>
                <kwd>Spatial Sampling</kwd>
                <kwd>Gridded Population</kwd>
                <kwd>Democratic Republic of the Congo</kwd>
            </kwd-group>
            <funding-group>
                <award-group id="fund-1" xlink:href="http://dx.doi.org/10.13039/501100002992">
                    <funding-source>Department for International Development, UK Government</funding-source>
                    <award-id>OPP1182408</award-id>
                </award-group>
                <award-group id="fund-2" xlink:href="http://dx.doi.org/10.13039/100000865">
                    <funding-source>Gates Foundation</funding-source>
                    <award-id>OPP1182408</award-id>
                </award-group>
                <funding-statement>This work was supported by the Gates Foundation and the United Kingdom Department of International Development (DFID) [OPP1182408].</funding-statement>
                <funding-statement>
                    <italic>The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</italic>
                </funding-statement>
            </funding-group>
        </article-meta>
    </front>
    <body>
        <sec sec-type="intro">
            <title>Introduction</title>
            <p>Research and policymaking often require demographic data, such as population enumerations and age and sex structures. While these data have been historically derived from national censuses
                <sup>
                    <xref ref-type="bibr" rid="ref-1">1</xref>
                </sup>, the past 40 years have witnessed an increasing interest in the use of household surveys for demographic estimations
                <sup>
                    <xref ref-type="bibr" rid="ref-2">2</xref>
                </sup>. Starting from 2000, for instance, the US Census adopted the dual system estimation that complements the national census with a richer set of demographic and socio-economic characteristics captured using household surveys
                <sup>
                    <xref ref-type="bibr" rid="ref-3">3</xref>
                </sup>. This kind of survey provides a cost-effective way to access an extensive range of attributes that can be ultimately generalized to a larger population of interest
                <sup>
                    <xref ref-type="bibr" rid="ref-4">4</xref>
                </sup>. Generalization is especially valuable in low- and middle-income countries with outdated, inaccurate or incomplete censuses, where a sample of representative households can be used to estimate demographic data
                <sup>
                    <xref ref-type="bibr" rid="ref-5">5</xref>
                </sup>.</p>
            <p>Traditional sample designs for household surveys build on three pillars &#x2014; the sampling frame, sampling design, and estimator
                <sup>
                    <xref ref-type="bibr" rid="ref-6">6</xref>
                </sup>. The sampling frame consists of a list of all potential sampling units
                <sup>
                    <xref ref-type="bibr" rid="ref-7">7</xref>
                </sup>, the sample design defines the probability of any given unit to be sampled
                <sup>
                    <xref ref-type="bibr" rid="ref-8">8</xref>
                </sup>, and the estimator determines the rule to generalize the estimate (for example, recovering the mean characteristics of the population of interest using the mean characteristics of the sampled households)
                <sup>
                    <xref ref-type="bibr" rid="ref-6">6</xref>
                </sup>. In low- and middle-income countries, these sample designs are generally set up in two stages because of logistical and financial considerations
                <sup>
                    <xref ref-type="bibr" rid="ref-9">9</xref>
                </sup>. This form of multistage design involves the initial sampling from the primary frame, which is composed of non-overlapping enumeration units. Following the definition of a secondary frame resulting from the enumeration of all households in the sampled enumeration units, households are finally sampled
                <sup>
                    <xref ref-type="bibr" rid="ref-9">9</xref>
                </sup>.</p>
            <p>The primary frame is an essential aspect of two-stage sampling designs because it is meant to provide an accurate, complete, and up-to-date representation of the distribution of the population of interest
                <sup>
                    <xref ref-type="bibr" rid="ref-7">7</xref>
                </sup>. This is defined using enumeration units and population counts retrieved from the most recent national census, an exercise that, in the best-case scenario, is carried out on a decadal basis
                <sup>
                    <xref ref-type="bibr" rid="ref-10">10</xref>
                </sup>. Census data become rapidly outdated because a maximum time-span of two years should typically occur between the definition of the sampling frame and the sample design implementation
                <sup>
                    <xref ref-type="bibr" rid="ref-7">7</xref>
                </sup>. As a consequence, sample designs for household surveys are increasingly relying on alternative sampling frames, typically derived from gridded population estimates
                <sup>
                    <xref ref-type="bibr" rid="ref-10">10</xref>
                </sup>. These estimates are produced through top-down spatial disaggregation of national census data
                <sup>
                    <xref ref-type="bibr" rid="ref-11">11</xref>
                </sup> or bottom-up spatial interpolation based on household survey data collected within small geographic areas
                <sup>
                    <xref ref-type="bibr" rid="ref-12">12</xref>
                </sup>.</p>
            <p>Adopting a gridded sampling frame requires adjusting the three pillars of household sample design conceived for one-dimensional listings to a bi-dimensional geographic space
                <sup>
                    <xref ref-type="bibr" rid="ref-4">4</xref>
                </sup>. This adjustment can be achieved by considering the three core concepts of spatial sampling &#x2014; the random field, the design, and the estimator
                <sup>
                    <xref ref-type="bibr" rid="ref-13">13</xref>
                </sup>. The notion of random field formalizes the population of interest through a bi-dimensional random process characterized by errors, trends, autocorrelation, and stratification
                <sup>
                    <xref ref-type="bibr" rid="ref-14">14</xref>
                </sup>; the design reflects the specificities of the random field in the selection of sampling units; and the estimator defines the generalization of the estimate retrieved from the sampling units to the entire sampling frame
                <sup>
                    <xref ref-type="bibr" rid="ref-15">15</xref>
                </sup>. Despite the need for bridging sample designs for household surveys and spatial sampling, explicit joint methodological frameworks are currently still rare
                <sup>
                    <xref ref-type="bibr" rid="ref-10">10</xref>
                </sup>.</p>
            <p>To fill this knowledge gap, we propose a grid-based sample design for household surveys that embeds the three core concepts of spatial sampling
                <sup>
                    <xref ref-type="bibr" rid="ref-13">13</xref>
                </sup>. In doing so, the gridded sampling frame is formalized as a bi-dimensional random field
                <sup>
                    <xref ref-type="bibr" rid="ref-13">13</xref>
                </sup>; the design considers spatial trends, spatial autocorrelation, and stratification through a contextually stratified
                <sup>
                    <xref ref-type="bibr" rid="ref-16">16</xref>
                </sup> proportional to population size sampling
                <sup>
                    <xref ref-type="bibr" rid="ref-5">5</xref>
                </sup>; a nonparametric estimator is used to assess the sampling design and inform sample size estimation
                <sup>
                    <xref ref-type="bibr" rid="ref-17">17</xref>
                </sup>. We demonstrate the application of this sample design framework with a case study developed in two provinces located in the western part of the Democratic Republic of Congo. This country had its last census over 30 years ago, and sampling frames for household surveys are still based on these extremely outdated population figures
                <sup>
                    <xref ref-type="bibr" rid="ref-18">18</xref>
                </sup>. The results of the case study provide valuable insights into the implementation of the proposed framework and foster further research into grid-based sample designs.</p>
        </sec>
        <sec sec-type="methods">
            <title>Methods</title>
            <sec>
                <title>The grid-based sample design framework</title>
                <p>
                    <xref ref-type="fig" rid="f1">Figure 1</xref> shows the proposed grid-based sample design framework, which embeds the core concepts of spatial design into the three pillars of household sample design. First, the sampling frame (
                    <xref ref-type="fig" rid="f1">Figure 1A</xref>) is formalized as a bi-dimensional random field, defined by superimposing a square grid to the study area, where the presence of settled area defines the sampling cells. The sampling design (
                    <xref ref-type="fig" rid="f1">Figure 1B</xref>) reflects the characteristics of the random field, namely, spatial autocorrelation and spatial heterogeneity, by combining contextual stratification and proportional to population size sampling techniques. Lastly, an estimator (
                    <xref ref-type="fig" rid="f1">Figure 1C</xref>) of nonparametric nature, namely the cumulative distribution function (CDF), is used to evaluate the sampling design and guide sample size estimation in a simulation study. The three elements of the proposed grid-based sample design framework are presented in detail in the next sections. The proposed grid-based sample design framework can be implemented using the R statistical language
                    <sup>
                        <xref ref-type="bibr" rid="ref-19">19</xref>
                    </sup> in 
                    <ext-link ext-link-type="uri" xlink:href="https://rstudio.com/products/rstudio/">RStudio</ext-link> 3.5.2
                    <sup>
                        <xref ref-type="bibr" rid="ref-20">20</xref>
                    </sup>, using the following packages &#x2014; 
                    <ext-link ext-link-type="uri" xlink:href="https://cran.r-project.org/web/packages/gridsample/index.html">gridsample</ext-link> 0.2.1
                    <sup>
                        <xref ref-type="bibr" rid="ref-21">21</xref>
                    </sup>, 
                    <ext-link ext-link-type="uri" xlink:href="https://cran.r-project.org/web/packages/raster/index.html">raster</ext-link> 3.0-7
                    <sup>
                        <xref ref-type="bibr" rid="ref-22">22</xref>
                    </sup>, 
                    <ext-link ext-link-type="uri" xlink:href="https://cran.r-project.org/web/packages/sf/index.html">sf</ext-link> 0.8-0
                    <sup>
                        <xref ref-type="bibr" rid="ref-23">23</xref>
                    </sup>, and 
                    <ext-link ext-link-type="uri" xlink:href="https://cran.r-project.org/web/packages/spatstat/index.html">spatstat</ext-link> 1.61-0
                    <sup>
                        <xref ref-type="bibr" rid="ref-24">24</xref>
                    </sup>.</p>
                <fig fig-type="figure" id="f1" orientation="portrait" position="float">
                    <label>Figure 1. </label>
                    <caption>
                        <title>The grid-based sample design framework.</title>
                        <p>The key elements of this framework are the sampling frame (
                            <bold>A</bold>) defined by deriving from the study area (
                            <bold>A1</bold>) the gridded sampling frame (
                            <bold>A2</bold>); the sampling design (
                            <bold>B</bold>) consisting of contextual stratification (
                            <bold>B1</bold>) and sampling proportional to population size (
                            <bold>B2</bold>); and the estimator (
                            <bold>C</bold>) where the empirical cumulative distribution function and the weighted empirical cumulative distribution function are used to evaluate the design (
                            <bold>C1</bold>) and estimate sample size (
                            <bold>C2</bold>).</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://gatesopenresearch-files.f1000.com/manuscripts/14272/7878f20e-8765-4c7e-b614-7bcc9d727ab0_figure1.gif"/>
                </fig>
            </sec>
            <sec>
                <title>Sampling frame</title>
                <p>The notion of sampling frame is at the core of household sample design because it ensures that every household has a known probability of being surveyed
                    <sup>
                        <xref ref-type="bibr" rid="ref-7">7</xref>
                    </sup>. This concept, however, is not frequently adopted in other disciplines, such as environmental sciences, because full listings are considered impractical or even impossible
                    <sup>
                        <xref ref-type="bibr" rid="ref-13">13</xref>
                    </sup>. To overcome this issue, in the domain of geostatistics, the complete listing of the population of interest is replaced by the listing of the geographical location where it can be found
                    <sup>
                        <xref ref-type="bibr" rid="ref-16">16</xref>
                    </sup>. For this purpose, a regular geometric grid with square or hexagonal patterns is overlaid on the study area to enable equal sampling probability
                    <sup>
                        <xref ref-type="bibr" rid="ref-25">25</xref>
                    </sup>. Given the heterogeneous geographic distribution of the human population, in the past, the use of gridded sampling frames has been discouraged for household surveys
                    <sup>
                        <xref ref-type="bibr" rid="ref-16">16</xref>
                    </sup>. However, other spatially explicit sampling frames, for instance, based on parcel boundaries
                    <sup>
                        <xref ref-type="bibr" rid="ref-26">26</xref>
                    </sup> or air pollution levels
                    <sup>
                        <xref ref-type="bibr" rid="ref-16">16</xref>
                    </sup>, have already been adopted in the past for household sampling.</p>
                <p>Gridded population sampling frames are being increasingly adopted in household sampling carried out in low- and middle-income countries with outdated census frames
                    <sup>
                        <xref ref-type="bibr" rid="ref-10">10</xref>
                    </sup>. This is because, in some instances, traditional sampling frames lack complete geographic coverage, well-defined geographic boundaries and up-to-date population data
                    <sup>
                        <xref ref-type="bibr" rid="ref-9">9</xref>
                    </sup>. Conversely, a gridded sampling frame provides comprehensive coverage of well-defined regular sampling units &#x2014; the grid cell
                    <sup>
                        <xref ref-type="bibr" rid="ref-5">5</xref>
                    </sup>. The increasing availability of high-resolution gridded population estimates, with cells measuring between 30
                    <sup>
                        <xref ref-type="bibr" rid="ref-27">27</xref>
                    </sup> and 250 meters
                    <sup>
                        <xref ref-type="bibr" rid="ref-28">28</xref>
                    </sup>, also enables deriving sampling frames of relatively fine spatial resolution. Whether gridded population estimates have known inaccuracies connected with the quality of the input datasets
                    <sup>
                        <xref ref-type="bibr" rid="ref-5">5</xref>
                    </sup> and selected spatial disaggregation techniques
                    <sup>
                        <xref ref-type="bibr" rid="ref-11">11</xref>
                    </sup>, they are generally considered to provide a more accurate approximation of the geographical distribution of population counts than outdated census enumerations
                    <sup>
                        <xref ref-type="bibr" rid="ref-5">5</xref>,
                        <xref ref-type="bibr" rid="ref-11">11</xref>
                    </sup>.</p>
                <p>While most gridded population estimates are constrained to settled areas
                    <sup>
                        <xref ref-type="bibr" rid="ref-11">11</xref>
                    </sup>, WorldPop top-down estimates provide a continuous population-count value across all land masses to ensure that sparsely-populated areas are not omitted
                    <sup>
                        <xref ref-type="bibr" rid="ref-29">29</xref>
                    </sup>. This dataset also offers the advantage of a systematic global coverage and an accuracy assessment
                    <sup>
                        <xref ref-type="bibr" rid="ref-29">29</xref>
                    </sup>. Furthermore, a gridded sampling frame derived from WorldPop top-down estimates can be refined using global settlement datasets such as the Global Urban Footprint (GUF)
                    <sup>
                        <xref ref-type="bibr" rid="ref-30">30</xref>
                    </sup> and the Global Human Settlement Layer (GHSL)
                    <sup>
                        <xref ref-type="bibr" rid="ref-28">28</xref>
                    </sup> using the settled area as a limiting ancillary variable
                    <sup>
                        <xref ref-type="bibr" rid="ref-31">31</xref>
                    </sup>. The sampling frame, defined based on the population counts within settled cells, can be formalized as a random field (&#x211c;), where the population count in a settled cell (
                    <italic toggle="yes">X</italic>) is distributed across a bi-dimensional parameter space (&#x211d;
                    <sup>2</sup>) as a function of its geographic coordinates (
                    <italic toggle="yes">l</italic>) (
                    <xref ref-type="other" rid="M1">Equation 1</xref>).</p>
                <p>Equation 1
                    <inline-formula>
                        <mml:math display="inline" id="M1">
                            <mml:mspace width="20em"/>
                            <mml:mrow>
                                <mml:mi>&#x211c;</mml:mi>
                                <mml:mo>=</mml:mo>
                                <mml:mrow>
                                    <mml:mo>{</mml:mo>
                                    <mml:mrow>
                                        <mml:mi>X</mml:mi>
                                        <mml:mo stretchy="false">(</mml:mo>
                                        <mml:mi>l</mml:mi>
                                        <mml:mo stretchy="false">)</mml:mo>
                                        <mml:mo>,</mml:mo>
                                        <mml:mi>l</mml:mi>
                                        <mml:mo>&#x2208;</mml:mo>
                                        <mml:msup>
                                            <mml:mi>&#x211d;</mml:mi>
                                            <mml:mn>2</mml:mn>
                                        </mml:msup>
                                    </mml:mrow>
                                    <mml:mo>}</mml:mo>
                                </mml:mrow>
                            </mml:mrow>
                        </mml:math>
                    </inline-formula>
                </p>
                <p>The population count within a settled cell (
                    <italic toggle="yes">X</italic>(
                    <italic toggle="yes">l</italic>)) is influenced by the following features. First, spatial autocorrelation, or first-order non-stationarity, since 
                    <italic toggle="yes">X</italic>(
                    <italic toggle="yes">l</italic>) is expected to be similar when the settled cells are close to one another
                    <sup>
                        <xref ref-type="bibr" rid="ref-32">32</xref>
                    </sup>. This condition violates the underlying assumption of an independently distributed population governing probabilistic sampling and involves a loss of sampling efficiency
                    <sup>
                        <xref ref-type="bibr" rid="ref-33">33</xref>
                    </sup>. Second, spatial heterogeneity, or second-order non-stationarity, as 
                    <italic toggle="yes">X</italic>(
                    <italic toggle="yes">l</italic>) is likely to differ across 
                    <italic toggle="yes">l</italic> in different geographic contexts, such as urban/rural or mountainous/flat areas
                    <sup>
                        <xref ref-type="bibr" rid="ref-34">34</xref>
                    </sup>. This situation also contravenes a crucial assumption of probabilistic sampling, namely, the presence of an identically distributed population
                    <sup>
                        <xref ref-type="bibr" rid="ref-35">35</xref>
                    </sup>. The third variable is discreteness, as 
                    <italic toggle="yes">X</italic>(
                    <italic toggle="yes">l</italic>) is not continuous across all potential 
                    <italic toggle="yes">l</italic> but limited to settled areas only
                    <sup>
                        <xref ref-type="bibr" rid="ref-31">31</xref>
                    </sup>. This last characteristic implies that traditional spatial sampling techniques are not directly applicable because the sampling frame is not a continuous surface but constrained to settled cells only
                    <sup>
                        <xref ref-type="bibr" rid="ref-13">13</xref>
                    </sup>.</p>
            </sec>
            <sec>
                <title>Sampling design</title>
                <p>Opposite to geostatistics, household surveys adopt design-based sampling strategies because the population of interest is considered unknown but fixed and entirely measurable
                    <sup>
                        <xref ref-type="bibr" rid="ref-4">4</xref>
                    </sup>. Within different design strategies, household surveys in low- and medium-income countries are often based on two-stage sampling designs
                    <sup>
                        <xref ref-type="bibr" rid="ref-9">9</xref>
                    </sup>. This design involves drawing enumeration units from a primary sampling frame with probability proportional to population size, in which a number of households are subsequently randomly surveyed
                    <sup>
                        <xref ref-type="bibr" rid="ref-7">7</xref>
                    </sup>. First-stage sampling is crucial to improve sampling efficiency because it can incorporate characteristics of the random field
                    <sup>
                        <xref ref-type="bibr" rid="ref-6">6</xref>
                    </sup>. For example, enumeration areas may be selected with probabilities proportional to their population sizes to better account for spatial heterogeneity and to include densely populated areas that would likely be excluded from a random sample. However, the scarce accuracy of the population enumerations retrieved from the last census and the definition of coarse strata can limit the efficiency of proportional to population size sampling for household surveys
                    <sup>
                        <xref ref-type="bibr" rid="ref-36">36</xref>
                    </sup>.</p>
                <p>Stratified sampling assumes that the population of interest can be partitioned into more homogeneous subpopulations, or strata
                    <sup>
                        <xref ref-type="bibr" rid="ref-13">13</xref>
                    </sup>. This is to capture the spatial heterogeneity in the population of interest globally, and, consequently, to reduce the in-sample spatial autocorrelation
                    <sup>
                        <xref ref-type="bibr" rid="ref-6">6</xref>
                    </sup>. Stratification can be based on prior knowledge, pre-sampling, or proxy variables
                    <sup>
                        <xref ref-type="bibr" rid="ref-37">37</xref>
                    </sup>. In household sampling, strata often consist of a proxy reflecting the urban/rural divide
                    <sup>
                        <xref ref-type="bibr" rid="ref-8">8</xref>
                    </sup>, a strategy that is reproduced in existing grid-based sampling designs to provide independent estimates for planning and decision-making
                    <sup>
                        <xref ref-type="bibr" rid="ref-5">5</xref>
                    </sup>. The use of bi-dimensional gridded sampling frames enables finer contextual stratification by incorporating information on geographic phenomena influencing the distribution of the population of interest
                    <sup>
                        <xref ref-type="bibr" rid="ref-16">16</xref>
                    </sup>. This can be achieved by accessing ancillary gridded datasets related to socio-economic (e.g., distance to major roads and urban centres) or physical characteristics (e.g., terrain and climate) that are embedded in top-down population models
                    <sup>
                        <xref ref-type="bibr" rid="ref-38">38</xref>
                    </sup>.</p>
                <p>For each ancillary dataset, the cell values intersecting the settled cells define a high-dimensional space describing geographical context. This approach enables to define contextual strata by combining two popular methods for dimensionality reduction
                    <sup>
                        <xref ref-type="bibr" rid="ref-39">39</xref>
                    </sup> &#x2014; principal component analysis (PCA)
                    <sup>
                        <xref ref-type="bibr" rid="ref-40">40</xref>
                    </sup> and 
                    <italic toggle="yes">k</italic>-means classification
                    <sup>
                        <xref ref-type="bibr" rid="ref-41">41</xref>
                    </sup>. PCA is meant to reduce the number of correlated random variables into a set of linearly uncorrelated principal components
                    <sup>
                        <xref ref-type="bibr" rid="ref-42">42</xref>
                    </sup>. The number of principal components can be selected by assessing the proportion of the total variance explained, which should generally be above 80&#x2013;90%
                    <sup>
                        <xref ref-type="bibr" rid="ref-43">43</xref>
                    </sup>. The principal components of the high-dimensional contextual space can be further reduced using a 
                    <italic toggle="yes">k</italic>-means classification
                    <sup>
                        <xref ref-type="bibr" rid="ref-39">39</xref>
                    </sup>. This method enables to capture intrinsic structures by minimizing heterogeneity within clusters and maximizing the heterogeneity across clusters based on the mean of the principal components. The number of clusters can be assessed using the &#x201c;elbow&#x201d; method applied to the variance explained (i.e., the within-cluster sum of squares)
                    <sup>
                        <xref ref-type="bibr" rid="ref-44">44</xref>
                    </sup>, but also by inspecting whether the spatial distribution of the resulting clusters produces meaningful contextual strata.</p>
                <p>Within each stratum, proportional to population size sampling has a straight-forward implementation in gridded sampling designs, through dedicated software packages
                    <sup>
                        <xref ref-type="bibr" rid="ref-5">5</xref>
                    </sup> and web platforms (e.g. 
                    <ext-link ext-link-type="uri" xlink:href="https://gridsample.org">https://gridsample.org</ext-link>). The crucial feature of proportional to population size sampling is the use of gridded population datasets. For this purpose, several top-down gridded population datasets are currently available globally (e.g., GHS-POP
                    <sup>
                        <xref ref-type="bibr" rid="ref-28">28</xref>
                    </sup>, GPWv4
                    <sup>
                        <xref ref-type="bibr" rid="ref-45">45</xref>
                    </sup>, LandScan
                    <sup>
                        <xref ref-type="bibr" rid="ref-27">27</xref>
                    </sup>, and WorldPop
                    <sup>
                        <xref ref-type="bibr" rid="ref-29">29</xref>,
                        <xref ref-type="bibr" rid="ref-46">46</xref>
                    </sup>, while bottom-up datasets are only being produced in a limited number of countries
                    <sup>
                        <xref ref-type="bibr" rid="ref-12">12</xref>
                    </sup>. These datasets have different characteristics and fitness for use that should be carefully considered in the sampling design implementation
                    <sup>
                        <xref ref-type="bibr" rid="ref-11">11</xref>
                    </sup>. </p>
                <p>The probability scheme resulting from stratified proportional to population size sampling 
                    <inline-formula>
                        <mml:math display="inline" id="M2">
                            <mml:mrow>
                                <mml:mo stretchy="false">(</mml:mo>
                                <mml:msubsup>
                                    <mml:mi>&#x03c0;</mml:mi>
                                    <mml:mi>i</mml:mi>
                                    <mml:mrow>
                                        <mml:mo stretchy="false">(</mml:mo>
                                        <mml:mi>S</mml:mi>
                                        <mml:mi>P</mml:mi>
                                        <mml:mi>P</mml:mi>
                                        <mml:mi>S</mml:mi>
                                        <mml:mo stretchy="false">)</mml:mo>
                                    </mml:mrow>
                                </mml:msubsup>
                                <mml:mo stretchy="false">)</mml:mo>
                            </mml:mrow>
                        </mml:math>
                    </inline-formula> can be summarized as the joint probability of stratified sampling 
                    <inline-formula>
                        <mml:math display="inline" id="M3">
                            <mml:mrow>
                                <mml:mo stretchy="false">(</mml:mo>
                                <mml:msubsup>
                                    <mml:mi>&#x03c0;</mml:mi>
                                    <mml:mi>i</mml:mi>
                                    <mml:mrow>
                                        <mml:mo stretchy="false">(</mml:mo>
                                        <mml:mi>S</mml:mi>
                                        <mml:mo stretchy="false">)</mml:mo>
                                    </mml:mrow>
                                </mml:msubsup>
                                <mml:mo stretchy="false">)</mml:mo>
                            </mml:mrow>
                        </mml:math>
                    </inline-formula> and sampling proportional to population size 
                    <inline-formula>
                        <mml:math display="inline" id="M4">
                            <mml:mrow>
                                <mml:mo stretchy="false">(</mml:mo>
                                <mml:msubsup>
                                    <mml:mi>&#x03c0;</mml:mi>
                                    <mml:mi>i</mml:mi>
                                    <mml:mrow>
                                        <mml:mo stretchy="false">(</mml:mo>
                                        <mml:mi>P</mml:mi>
                                        <mml:mi>P</mml:mi>
                                        <mml:mi>S</mml:mi>
                                        <mml:mo stretchy="false">)</mml:mo>
                                    </mml:mrow>
                                </mml:msubsup>
                                <mml:mo stretchy="false">)</mml:mo>
                            </mml:mrow>
                        </mml:math>
                    </inline-formula> (
                    <xref ref-type="other" rid="M5">Equation 2</xref>).</p>
                <p>Equation 2
                    <inline-formula>
                        <mml:math display="inline" id="M5">
                            <mml:mspace width="20em"/>
                            <mml:mrow>
                                <mml:msubsup>
                                    <mml:mi>&#x03c0;</mml:mi>
                                    <mml:mi>i</mml:mi>
                                    <mml:mrow>
                                        <mml:mo stretchy="false">(</mml:mo>
                                        <mml:mi>S</mml:mi>
                                        <mml:mi>P</mml:mi>
                                        <mml:mi>P</mml:mi>
                                        <mml:mi>S</mml:mi>
                                        <mml:mo stretchy="false">)</mml:mo>
                                    </mml:mrow>
                                </mml:msubsup>
                                <mml:mo>=</mml:mo>
                                <mml:msubsup>
                                    <mml:mi>&#x03c0;</mml:mi>
                                    <mml:mi>i</mml:mi>
                                    <mml:mrow>
                                        <mml:mo stretchy="false">(</mml:mo>
                                        <mml:mi>S</mml:mi>
                                        <mml:mo stretchy="false">)</mml:mo>
                                    </mml:mrow>
                                </mml:msubsup>
                                <mml:mo>&#x00d7;</mml:mo>
                                <mml:msubsup>
                                    <mml:mi>&#x03c0;</mml:mi>
                                    <mml:mi>i</mml:mi>
                                    <mml:mrow>
                                        <mml:mo stretchy="false">(</mml:mo>
                                        <mml:mi>P</mml:mi>
                                        <mml:mi>P</mml:mi>
                                        <mml:mi>S</mml:mi>
                                        <mml:mo stretchy="false">)</mml:mo>
                                    </mml:mrow>
                                </mml:msubsup>
                            </mml:mrow>
                        </mml:math>
                    </inline-formula>
                </p>
                <p>The probability of selecting a specific cell 
                    <italic toggle="yes">X
                        <sub>i</sub>
                    </italic> in the design 
                    <inline-formula>
                        <mml:math display="inline" id="M6">
                            <mml:mrow>
                                <mml:msubsup>
                                    <mml:mi>&#x03c0;</mml:mi>
                                    <mml:mi>i</mml:mi>
                                    <mml:mrow>
                                        <mml:mo stretchy="false">(</mml:mo>
                                        <mml:mi>S</mml:mi>
                                        <mml:mo stretchy="false">)</mml:mo>
                                    </mml:mrow>
                                </mml:msubsup>
                            </mml:mrow>
                        </mml:math>
                    </inline-formula> is contingent on the size of the stratum it belongs to (
                    <italic toggle="yes">S
                        <sub>i</sub>
                    </italic>), where 
                    <italic toggle="yes">n
                        <sub>S</sub>
                    </italic> is the number of sampled settled cells in the stratum 
                    <italic toggle="yes">S
                        <sub>i</sub>
                    </italic> and 
                    <italic toggle="yes">m
                        <sub>S</sub>
                    </italic> the total number of settled cells in the stratum 
                    <italic toggle="yes">S
                        <sub>i</sub>
                    </italic> (
                    <xref ref-type="other" rid="M7">Equation 3</xref>).</p>
                <p>Equation 3
                    <inline-formula>
                        <mml:math display="inline" id="M7">
                            <mml:mspace width="23em"/>
                            <mml:mrow>
                                <mml:msubsup>
                                    <mml:mi>&#x03c0;</mml:mi>
                                    <mml:mi>i</mml:mi>
                                    <mml:mrow>
                                        <mml:mo stretchy="false">(</mml:mo>
                                        <mml:mi>S</mml:mi>
                                        <mml:mo stretchy="false">)</mml:mo>
                                    </mml:mrow>
                                </mml:msubsup>
                                <mml:mo>=</mml:mo>
                                <mml:mfrac>
                                    <mml:mrow>
                                        <mml:msub>
                                            <mml:mi>n</mml:mi>
                                            <mml:mi>S</mml:mi>
                                        </mml:msub>
                                    </mml:mrow>
                                    <mml:mrow>
                                        <mml:msub>
                                            <mml:mi>m</mml:mi>
                                            <mml:mi>S</mml:mi>
                                        </mml:msub>
                                    </mml:mrow>
                                </mml:mfrac>
                            </mml:mrow>
                        </mml:math>
                    </inline-formula>
                </p>
                <p>The probability of selecting a specific cell 
                    <italic toggle="yes">X
                        <sub>i</sub>
                    </italic> in the design 
                    <inline-formula>
                        <mml:math display="inline" id="M8">
                            <mml:mrow>
                                <mml:msubsup>
                                    <mml:mi>&#x03c0;</mml:mi>
                                    <mml:mi>i</mml:mi>
                                    <mml:mrow>
                                        <mml:mo stretchy="false">(</mml:mo>
                                        <mml:mi>P</mml:mi>
                                        <mml:mi>P</mml:mi>
                                        <mml:mi>S</mml:mi>
                                        <mml:mo stretchy="false">)</mml:mo>
                                    </mml:mrow>
                                </mml:msubsup>
                            </mml:mrow>
                        </mml:math>
                    </inline-formula> is relative to its population size and the total size of the population, in other words, the sum of the population counts for each cell 
                    <inline-formula>
                        <mml:math display="inline" id="M9">
                            <mml:mrow>
                                <mml:mstyle displaystyle="false">
                                    <mml:msubsup>
                                        <mml:mo>&#x2211;</mml:mo>
                                        <mml:mrow>
                                            <mml:mi>l</mml:mi>
                                            <mml:mo>=</mml:mo>
                                            <mml:mn>1</mml:mn>
                                        </mml:mrow>
                                        <mml:mrow>
                                            <mml:msub>
                                                <mml:mi>n</mml:mi>
                                                <mml:mi>S</mml:mi>
                                            </mml:msub>
                                        </mml:mrow>
                                    </mml:msubsup>
                                    <mml:mrow>
                                        <mml:msub>
                                            <mml:mi>X</mml:mi>
                                            <mml:mi>l</mml:mi>
                                        </mml:msub>
                                    </mml:mrow>
                                </mml:mstyle>
                            </mml:mrow>
                        </mml:math>
                    </inline-formula> (
                    <xref ref-type="other" rid="M10">Equation 4</xref>).</p>
                <p>Equation 4
                    <inline-formula>
                        <mml:math display="inline" id="M10">
                            <mml:mspace width="20em"/>
                            <mml:mrow>
                                <mml:msubsup>
                                    <mml:mi>&#x03c0;</mml:mi>
                                    <mml:mi>i</mml:mi>
                                    <mml:mrow>
                                        <mml:mo stretchy="false">(</mml:mo>
                                        <mml:mi>P</mml:mi>
                                        <mml:mi>P</mml:mi>
                                        <mml:mi>S</mml:mi>
                                        <mml:mo stretchy="false">)</mml:mo>
                                    </mml:mrow>
                                </mml:msubsup>
                                <mml:mo>=</mml:mo>
                                <mml:mfrac>
                                    <mml:mrow>
                                        <mml:msub>
                                            <mml:mi>X</mml:mi>
                                            <mml:mi>i</mml:mi>
                                        </mml:msub>
                                    </mml:mrow>
                                    <mml:mrow>
                                        <mml:mstyle displaystyle="false">
                                            <mml:msub>
                                                <mml:mo>&#x2211;</mml:mo>
                                                <mml:mrow>
                                                    <mml:mi>l</mml:mi>
                                                    <mml:mo>&#x2208;</mml:mo>
                                                    <mml:msup>
                                                        <mml:mi>&#x211d;</mml:mi>
                                                        <mml:mn>2</mml:mn>
                                                    </mml:msup>
                                                </mml:mrow>
                                            </mml:msub>
                                            <mml:mrow>
                                                <mml:msub>
                                                    <mml:mi>X</mml:mi>
                                                    <mml:mi>l</mml:mi>
                                                </mml:msub>
                                            </mml:mrow>
                                        </mml:mstyle>
                                    </mml:mrow>
                                </mml:mfrac>
                            </mml:mrow>
                        </mml:math>
                    </inline-formula>
                </p>
                <p>Based on the probability scheme specified above, it is possible to produce an unbiased estimator that can be used to evaluate the sampling design and inform sample size estimation.</p>
            </sec>
            <sec>
                <title>Estimator</title>
                <p>In household sampling design, the estimand is a parameter summarizing the random variable of interest, such as the mean, variance, or total
                    <sup>
                        <xref ref-type="bibr" rid="ref-8">8</xref>
                    </sup>. Typical examples are the mean proportion of children under five years old or the number of women of child-bearing age. In this setting, the estimator is built using a parametric attribute of the random variable of interest
                    <sup>
                        <xref ref-type="bibr" rid="ref-47">47</xref>
                    </sup>. However, the use of nonparametric estimators enables to retrieve the characteristics of the entire random variable
                    <sup>
                        <xref ref-type="bibr" rid="ref-48">48</xref>,
                        <xref ref-type="bibr" rid="ref-49">49</xref>
                    </sup>. In the case of sample design for household surveys, the random variable consists of the population count across settled cells, where a large number of cells have medium-to-low population counts and only a few have high population counts. To capture the characteristics of the entire population of interest, the estimand becomes the full probability distribution of the random variable through its CDF
                    <sup>
                        <xref ref-type="bibr" rid="ref-50">50</xref>
                    </sup>. The CDF (
                    <italic toggle="yes">F
                        <sub>X</sub>
                    </italic>(
                    <italic toggle="yes">x</italic>)) summarizes the probability for the population count within a settled cell (
                    <italic toggle="yes">X
                        <sub>i</sub>
                    </italic>) of being lower or equal to 
                    <italic toggle="yes">x</italic>. Given the law of large numbers, the CDF can be approximated using the empirical CDF (ECDF) 
                    <inline-formula>
                        <mml:math display="inline" id="M11">
                            <mml:mrow>
                                <mml:mo stretchy="false">(</mml:mo>
                                <mml:msub>
                                    <mml:mover accent="true">
                                        <mml:mi>F</mml:mi>
                                        <mml:mi>^</mml:mi>
                                    </mml:mover>
                                    <mml:mi>m</mml:mi>
                                </mml:msub>
                                <mml:mo stretchy="false">(</mml:mo>
                                <mml:mi>x</mml:mi>
                                <mml:mo stretchy="false">)</mml:mo>
                                <mml:mo stretchy="false">)</mml:mo>
                            </mml:mrow>
                        </mml:math>
                    </inline-formula> for a number 
                    <italic toggle="yes">m</italic> of sampling frame cells (
                    <xref ref-type="other" rid="M12">Equation 5</xref>).</p>
                <p>Equation 5
                    <inline-formula>
                        <mml:math display="inline" id="M12">
                            <mml:mspace width="10em"/>
                            <mml:mrow>
                                <mml:msub>
                                    <mml:mover accent="true">
                                        <mml:mi>F</mml:mi>
                                        <mml:mo>^</mml:mo>
                                    </mml:mover>
                                    <mml:mi>m</mml:mi>
                                </mml:msub>
                                <mml:mo stretchy="false">(</mml:mo>
                                <mml:mi>x</mml:mi>
                                <mml:mo stretchy="false">)</mml:mo>
                                <mml:mo>=</mml:mo>
                                <mml:mfrac>
                                    <mml:mn>1</mml:mn>
                                    <mml:mi>m</mml:mi>
                                </mml:mfrac>
                                <mml:mstyle displaystyle="false">
                                    <mml:msubsup>
                                        <mml:mo>&#x2211;</mml:mo>
                                        <mml:mrow>
                                            <mml:mi>i</mml:mi>
                                            <mml:mo>=</mml:mo>
                                            <mml:mn>1</mml:mn>
                                        </mml:mrow>
                                        <mml:mi>m</mml:mi>
                                    </mml:msubsup>
                                    <mml:mrow>
                                        <mml:mi>I</mml:mi>
                                        <mml:mrow>
                                            <mml:mo>{</mml:mo>
                                            <mml:mrow>
                                                <mml:msub>
                                                    <mml:mi>X</mml:mi>
                                                    <mml:mi>i</mml:mi>
                                                </mml:msub>
                                                <mml:mo>&#x2264;</mml:mo>
                                                <mml:mi>x</mml:mi>
                                            </mml:mrow>
                                            <mml:mo>}</mml:mo>
                                        </mml:mrow>
                                    </mml:mrow>
                                </mml:mstyle>
                                <mml:mo>,</mml:mo>
                                <mml:mtext>&#x2009;</mml:mtext>
                                <mml:mspace width="0.1em"/>
                                <mml:mtext>where</mml:mtext>
                                <mml:mspace width="0.5em"/>
                                <mml:mi>I</mml:mi>
                                <mml:mrow>
                                    <mml:mo>{</mml:mo>
                                    <mml:mrow>
                                        <mml:msub>
                                            <mml:mi>X</mml:mi>
                                            <mml:mi>i</mml:mi>
                                        </mml:msub>
                                        <mml:mo>&#x2264;</mml:mo>
                                        <mml:mi>x</mml:mi>
                                    </mml:mrow>
                                    <mml:mo>}</mml:mo>
                                </mml:mrow>
                                <mml:mo>=</mml:mo>
                                <mml:mrow>
                                    <mml:mo>{</mml:mo>
                                    <mml:mtable columnalign="left">
                                        <mml:mtr>
                                            <mml:mtd>
                                                <mml:mn>1</mml:mn>
                                                <mml:mspace width="0.5em"/>
                                                <mml:mi>i</mml:mi>
                                                <mml:mi>f</mml:mi>
                                                <mml:mspace width="0.5em"/>
                                                <mml:msub>
                                                    <mml:mi>X</mml:mi>
                                                    <mml:mi>i</mml:mi>
                                                </mml:msub>
                                                <mml:mo>&#x2264;</mml:mo>
                                                <mml:mi>x</mml:mi>
                                            </mml:mtd>
                                        </mml:mtr>
                                        <mml:mtr>
                                            <mml:mtd>
                                                <mml:mn>0</mml:mn>
                                                <mml:mspace width="0.5em"/>
                                                <mml:mi>o</mml:mi>
                                                <mml:mi>t</mml:mi>
                                                <mml:mi>h</mml:mi>
                                                <mml:mi>e</mml:mi>
                                                <mml:mi>r</mml:mi>
                                                <mml:mi>w</mml:mi>
                                                <mml:mi>i</mml:mi>
                                                <mml:mi>s</mml:mi>
                                                <mml:mi>e</mml:mi>
                                            </mml:mtd>
                                        </mml:mtr>
                                    </mml:mtable>
                                </mml:mrow>
                            </mml:mrow>
                        </mml:math>
                    </inline-formula>
                </p>
                <p>Given that the proposed sample design is not random but probabilistic, the estimator needs to be weighted for the respective probability scheme
                    <sup>
                        <xref ref-type="bibr" rid="ref-51">51</xref>
                    </sup>. Typical parametric estimators, such as the mean or total, can be weighted using the Horvitz-Thompson estimator, by implementing the inverse of the probability scheme
                    <sup>
                        <xref ref-type="bibr" rid="ref-47">47</xref>
                    </sup>. This concept can be extended to nonparametric estimators, by weighting the ECDF using the inverse of the probability scheme, and producing a weighted empirical cumulative distribution function (WECDF) 
                    <inline-formula>
                        <mml:math display="inline" id="M13">
                            <mml:mrow>
                                <mml:mo stretchy="false">(</mml:mo>
                                <mml:msub>
                                    <mml:mover accent="true">
                                        <mml:mi>G</mml:mi>
                                        <mml:mo>^</mml:mo>
                                    </mml:mover>
                                    <mml:mi>n</mml:mi>
                                </mml:msub>
                                <mml:mo stretchy="false">(</mml:mo>
                                <mml:mi>x</mml:mi>
                                <mml:mo stretchy="false">)</mml:mo>
                                <mml:mo stretchy="false">)</mml:mo>
                            </mml:mrow>
                        </mml:math>
                    </inline-formula> for the number of sampled cells (
                    <italic toggle="yes">n</italic>) (
                    <xref ref-type="other" rid="M14">Equation 6</xref>).</p>
                <p>Equation 6
                    <inline-formula>
                        <mml:math display="inline" id="M14">
                            <mml:mspace width="10em"/>
                            <mml:mrow>
                                <mml:msub>
                                    <mml:mover accent="true">
                                        <mml:mi>G</mml:mi>
                                        <mml:mo>^</mml:mo>
                                    </mml:mover>
                                    <mml:mi>n</mml:mi>
                                </mml:msub>
                                <mml:mo stretchy="false">(</mml:mo>
                                <mml:mi>x</mml:mi>
                                <mml:mo stretchy="false">)</mml:mo>
                                <mml:mo>=</mml:mo>
                                <mml:mfrac>
                                    <mml:mn>1</mml:mn>
                                    <mml:mi>n</mml:mi>
                                </mml:mfrac>
                                <mml:mstyle displaystyle="false">
                                    <mml:msubsup>
                                        <mml:mo>&#x2211;</mml:mo>
                                        <mml:mrow>
                                            <mml:mi>i</mml:mi>
                                            <mml:mo>=</mml:mo>
                                            <mml:mn>1</mml:mn>
                                        </mml:mrow>
                                        <mml:mi>n</mml:mi>
                                    </mml:msubsup>
                                    <mml:mrow>
                                        <mml:msub>
                                            <mml:mi>W</mml:mi>
                                            <mml:mi>i</mml:mi>
                                        </mml:msub>
                                        <mml:mspace width="0.1em"/>
                                        <mml:mi>I</mml:mi>
                                        <mml:mrow>
                                            <mml:mo>{</mml:mo>
                                            <mml:mrow>
                                                <mml:msub>
                                                    <mml:mi>X</mml:mi>
                                                    <mml:mi>i</mml:mi>
                                                </mml:msub>
                                                <mml:mo>&#x2264;</mml:mo>
                                                <mml:mi>x</mml:mi>
                                            </mml:mrow>
                                            <mml:mo>}</mml:mo>
                                        </mml:mrow>
                                        <mml:mo>,</mml:mo>
                                        <mml:mspace width="0.5em"/>
                                        <mml:mtext>where</mml:mtext>
                                        <mml:mspace width="0.5em"/>
                                        <mml:msub>
                                            <mml:mi>W</mml:mi>
                                            <mml:mi>i</mml:mi>
                                        </mml:msub>
                                        <mml:mo>=</mml:mo>
                                        <mml:mn>1</mml:mn>
                                        <mml:mo>/</mml:mo>
                                        <mml:msubsup>
                                            <mml:mi>&#x03c0;</mml:mi>
                                            <mml:mi>i</mml:mi>
                                            <mml:mrow>
                                                <mml:mo stretchy="false">(</mml:mo>
                                                <mml:mi>S</mml:mi>
                                                <mml:mi>P</mml:mi>
                                                <mml:mi>P</mml:mi>
                                                <mml:mi>S</mml:mi>
                                                <mml:mo stretchy="false">)</mml:mo>
                                            </mml:mrow>
                                        </mml:msubsup>
                                    </mml:mrow>
                                </mml:mstyle>
                            </mml:mrow>
                        </mml:math>
                    </inline-formula>
                </p>
                <p>In household surveys, the sample size is typically determined using a power analysis applied to the parametric estimator, which is assumed to be normally distributed for large sample sizes
                    <sup>
                        <xref ref-type="bibr" rid="ref-8">8</xref>
                    </sup>. For nonparametric estimators, such as the WECDF, a simulation study can enable evaluation of the sample size required to provide an accurate representation of the population of interest across the different strata
                    <sup>
                        <xref ref-type="bibr" rid="ref-17">17</xref>
                    </sup>. For this purpose, the same gridded population data used in proportional to population size sampling can serve as a proxy for the entire population of interest. The population counts across sampling frame cells are used to derive the ECDF for the entire population of interest and the WECDF for different sample sizes, and compare the two distributions using a nonparametric statistic &#x2014; the Kolmogorov-Smirnov distance (
                    <italic toggle="yes">D
                        <sub>m,n</sub>
                    </italic>)
                    <sup>
                        <xref ref-type="bibr" rid="ref-52">52</xref>
                    </sup> (
                    <xref ref-type="other" rid="M15">Equation 7</xref>).</p>
                <p>Equation 7
                    <inline-formula>
                        <mml:math display="inline" id="M15">
                            <mml:mspace width="15em"/>
                            <mml:mrow>
                                <mml:msub>
                                    <mml:mi>D</mml:mi>
                                    <mml:mrow>
                                        <mml:mi>m</mml:mi>
                                        <mml:mo>,</mml:mo>
                                        <mml:mi>n</mml:mi>
                                    </mml:mrow>
                                </mml:msub>
                                <mml:mo>=</mml:mo>
                                <mml:msub>
                                    <mml:mrow>
                                        <mml:mi>s</mml:mi>
                                        <mml:mi>u</mml:mi>
                                        <mml:mi>p</mml:mi>
                                        <mml:mo>&#x2061;</mml:mo>
                                    </mml:mrow>
                                    <mml:mi>x</mml:mi>
                                </mml:msub>
                                <mml:mo>|</mml:mo>
                                <mml:msub>
                                    <mml:mover accent="true">
                                        <mml:mi>F</mml:mi>
                                        <mml:mo>^</mml:mo>
                                    </mml:mover>
                                    <mml:mi>m</mml:mi>
                                </mml:msub>
                                <mml:mo stretchy="false">(</mml:mo>
                                <mml:mi>x</mml:mi>
                                <mml:mo stretchy="false">)</mml:mo>
                                <mml:mo>&#x2212;</mml:mo>
                                <mml:msub>
                                    <mml:mover accent="true">
                                        <mml:mi>G</mml:mi>
                                        <mml:mo>^</mml:mo>
                                    </mml:mover>
                                    <mml:mi>n</mml:mi>
                                </mml:msub>
                                <mml:mo stretchy="false">(</mml:mo>
                                <mml:mi>x</mml:mi>
                                <mml:mo stretchy="false">)</mml:mo>
                                <mml:mo>|</mml:mo>
                            </mml:mrow>
                        </mml:math>
                    </inline-formula>
                </p>
                <p>
                    <italic toggle="yes">D
                        <sub>m,n</sub>
                    </italic> is based on the maximum distance between 
                    <inline-formula>
                        <mml:math display="inline" id="M16">
                            <mml:mrow>
                                <mml:msub>
                                    <mml:mover accent="true">
                                        <mml:mi>G</mml:mi>
                                        <mml:mo>^</mml:mo>
                                    </mml:mover>
                                    <mml:mi>m</mml:mi>
                                </mml:msub>
                                <mml:mo stretchy="false">(</mml:mo>
                                <mml:mi>x</mml:mi>
                                <mml:mo stretchy="false">)</mml:mo>
                            </mml:mrow>
                        </mml:math>
                    </inline-formula> for the entire population of interest across 
                    <italic toggle="yes">m</italic> settled cells, and 
                    <inline-formula>
                        <mml:math display="inline" id="M17">
                            <mml:mrow>
                                <mml:mo stretchy="false">(</mml:mo>
                                <mml:msub>
                                    <mml:mover accent="true">
                                        <mml:mi>F</mml:mi>
                                        <mml:mi>^</mml:mi>
                                    </mml:mover>
                                    <mml:mi>n</mml:mi>
                                </mml:msub>
                                <mml:mo stretchy="false">(</mml:mo>
                                <mml:mi>x</mml:mi>
                                <mml:mo stretchy="false">)</mml:mo>
                                <mml:mo stretchy="false">)</mml:mo>
                            </mml:mrow>
                        </mml:math>
                    </inline-formula> for the population within a varying number of sampled cells 
                    <italic toggle="yes">n</italic>. While 
                    <italic toggle="yes">n</italic> increases iteratively, it is possible to assess the associated changes in 
                    <italic toggle="yes">D
                        <sub>m,n</sub>
                    </italic>. However, given that 
                    <italic toggle="yes">D
                        <sub>m,n</sub>
                    </italic> is extremely sensitive to the shape of the two distributions, the process of sampling 
                    <italic toggle="yes">n</italic> settled cells should be replicated and averaged to provide a robust assessment of 
                    <italic toggle="yes">D
                        <sub>m,n</sub>
                    </italic>. The use of nonparametric estimators (i.e., the ECDF and the WECDF) and statistic (i.e., the Kolmogorov-Smirnov distance) typically requires large sample sizes to capture the entire range and variability of population counts within settled cells. This process can be optimized by estimating sample size for each stratum independently
                    <sup>
                        <xref ref-type="bibr" rid="ref-13">13</xref>
                    </sup>.</p>
            </sec>
        </sec>
        <sec>
            <title>Case study</title>
            <p>We demonstrate an application of the proposed grid-based sample design framework in two provinces in the western part of the DRC. This country is the second-largest by area and the fourth-most-populous in Africa. However, official population figures are currently lacking because the last census was carried out over thirty years ago, in 1984. Attempts to produce demographic data are routinely being carried out using population estimates and projections (e.g., 
                <ext-link ext-link-type="uri" xlink:href="https://population.un.org/wpp">https://population.un.org/wpp</ext-link>), as well as national surveys
                <sup>
                    <xref ref-type="bibr" rid="ref-18">18</xref>
                </sup>. Six national surveys have been carried out in the DRC since 2004 &#x2014; two 
                <ext-link ext-link-type="uri" xlink:href="https://dhsprogram.com/data/available-datasets.cfm">Demographic and Health Surveys</ext-link> (DHS) in 2013&#x2013;2014 and 2017&#x2013;2018, a 
                <ext-link ext-link-type="uri" xlink:href="https://www.unicef.org/drcongo/en/reports/multiple-indicator-cluster-survey-2010">Multiple Indicator Cluster Survey (MICS) from UNICEF in 2010</ext-link>, two 
                <ext-link ext-link-type="uri" xlink:href="http://ghdx.healthdata.org/organizations/national-institute-statistics-congo-dr">Enqu&#x00ea;te 1-2-3 Surveys from the Congolese National Statistics Office</ext-link> in 2005 and 2012, and a 
                <ext-link ext-link-type="uri" xlink:href="https://www.wfp.org/publications/democratic-republic-congo-comprehensive-food-security-vulnerability-analysis-january-2014">Comprehensive Food Security and Vulnerability Analysis (CFSVA) from the World Food Programme</ext-link> in 2011&#x2013;2012. These surveys have been developed using outdated sampling frames based on the census data of 1984, which has been shown to introduce uncertainty in both the collected survey data and the derived demographic information
                <sup>
                    <xref ref-type="bibr" rid="ref-18">18</xref>
                </sup>.</p>
            <sec>
                <title>Study area</title>
                <p>The study area covers the Kongo-Central and Kinshasa provinces, in the Democratic Republic of the Congo. Together, these provinces constitute the most dynamic socio-economic region of the country. In this region, approximately 80% of the population lives in urban areas &#x2014; in the capital city of Kinshasa, the cities of Boma and Matadi, and a number of smaller cities and towns
                    <sup>
                        <xref ref-type="bibr" rid="ref-53">53</xref>
                    </sup>. 
                    <xref ref-type="fig" rid="f2">Figure 2</xref> shows that urban areas develop from South-West to North-East, from the harbour town of Moanda, across the Congo river basin, to the vast agglomeration of the capital city Kinshasa. The remaining of the study area lies on a sparsely-populated plateau, where smaller towns (e.g., Kinganga and Mbankana) act as sub-regional centres for the surrounding villages and hamlets. In this sector, the vegetation is denser than in the Congo river basin, as the rain forest is prominent at the North-West and the savannah at the South-East. These particular urbanization patterns, and the consequent geographic distribution of population, are connected with the diverse socio-economic, infrastructural, environmental, physical, and climatic characteristics of the study area
                    <sup>
                        <xref ref-type="bibr" rid="ref-53">53</xref>
                    </sup>.</p>
                <fig fig-type="figure" id="f2" orientation="portrait" position="float">
                    <label>Figure 2. </label>
                    <caption>
                        <title>The study area comprising the Kongo-Central and Kinshasa provinces.</title>
                        <p>Cities and towns develop mostly across the Congo river basin, while smaller towns can be found in the sparsely-populated plateau at the North-West and South-East of the study area. At elevated locations, the vegetation is prominent with the rain forest at the North-West and the savannah at the South-East.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://gatesopenresearch-files.f1000.com/manuscripts/14272/7878f20e-8765-4c7e-b614-7bcc9d727ab0_figure2.gif"/>
                </fig>
            </sec>
            <sec>
                <title>Gridded sampling frame</title>
                <p>We accessed a settlement layer produced by the Oak Ridge National Laboratory using feature extraction from high-resolution imagery for population modelling work undertaken in the Kinshasa and Kongo-Central provinces. The settlement layer consists of settlement polygons of approximately 7 meters resolution that were subsequently subset to the official province boundaries provided by the Central Bureau of Census (BCR) of the Democratic Republic of the Congo. Comprehensive metadata are provided in 
                    <xref ref-type="table" rid="T1">Table 1</xref>. The polygons were rasterized based on a reference grid with a resolution of 3 arc-seconds, approximately 90 meters. The presence of at least one settlement polygon designated a settled cell &#x2014; a gridded sampling unit. 
                    <xref ref-type="fig" rid="f3">Figure 3</xref> shows the gridded sampling frame, which comprises 211,831 settled cells. A large number of settled cells can be observed in the cities of Kinshasa, Boma, and Matadi, while more scattered settlement patterns can be observed in the rest of the study area. In more urbanized areas, such as in the city of Boma (
                    <xref ref-type="fig" rid="f3">Figure 3A</xref>), the settled cells tend to match the extent of the settlement layer. Conversely, in suburban areas (
                    <xref ref-type="fig" rid="f3">Figure 3C</xref>), towns (
                    <xref ref-type="fig" rid="f3">Figure 3D</xref>), and rural areas (
                    <xref ref-type="fig" rid="f3">Figure 3B</xref>) the gaps between the settlement layer and the settled cells become larger because the built-up area is more scattered.</p>
                <table-wrap id="T1" orientation="portrait" position="anchor">
                    <label>Table 1. </label>
                    <caption>
                        <title>Metadata for the datasets used in the case study.</title>
                        <p>The column &#x201c;Type&#x201d; indicates the characteristics addressed. The column &#x201c;Format&#x201d; describes the type of input data. The column &#x201c;Type&#x201d; defines the type of variable. The column &#x201c;Source&#x201d; reports the links to the datasets used in the case study.</p>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1">Type</th>
                                <th align="left" colspan="1" rowspan="1">Name</th>
                                <th align="left" colspan="1" rowspan="1">Provider</th>
                                <th align="left" colspan="1" rowspan="1">Year</th>
                                <th align="left" colspan="1" rowspan="1">Format</th>
                                <th align="left" colspan="1" rowspan="1">Variable</th>
                                <th align="left" colspan="1" rowspan="1">Source</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">SE</td>
                                <td align="left" colspan="1" rowspan="1">Distance to
                                    <break/>conflict points</td>
                                <td align="left" colspan="1" rowspan="1">Armed Conflict
                                    <break/>Location and
                                    <break/>Event Data
                                    <break/>(ACLED) Project</td>
                                <td align="left" colspan="1" rowspan="1">2016</td>
                                <td align="left" colspan="1" rowspan="1">VECT</td>
                                <td align="left" colspan="1" rowspan="1">CONT</td>
                                <td align="left" colspan="1" rowspan="1">
                                    <ext-link ext-link-type="uri" xlink:href="https://www.acleddata.com/data/">https://www.acleddata.com/data/</ext-link>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">SE</td>
                                <td align="left" colspan="1" rowspan="1">Travel distance
                                    <break/>to cities</td>
                                <td align="left" colspan="1" rowspan="1">Malaria Atlas
                                    <break/>Project (MAP)</td>
                                <td align="left" colspan="1" rowspan="1">2015</td>
                                <td align="left" colspan="1" rowspan="1">RAST</td>
                                <td align="left" colspan="1" rowspan="1">CONT</td>
                                <td align="left" colspan="1" rowspan="1">
                                    <ext-link ext-link-type="uri" xlink:href="http://doi.org/10.1038/nature25181">http://doi.org/10.1038/nature25181</ext-link>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">INF</td>
                                <td align="left" colspan="1" rowspan="1">Distance to
                                    <break/>major roads</td>
                                <td align="left" colspan="1" rowspan="1">OSM/WorldPop</td>
                                <td align="left" colspan="1" rowspan="1">2016</td>
                                <td align="left" colspan="1" rowspan="1">RAST</td>
                                <td align="left" colspan="1" rowspan="1">CONT</td>
                                <td align="left" colspan="1" rowspan="1">
                                    <ext-link ext-link-type="uri" xlink:href="https://www.worldpop.org/doi/10.5258/SOTON/WP00644">https://www.worldpop.org/doi/10.5258/SOTON/WP00644</ext-link>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">INF</td>
                                <td align="left" colspan="1" rowspan="1">Light intensity at
                                    <break/>night</td>
                                <td align="left" colspan="1" rowspan="1">VIIRS/WorldPop</td>
                                <td align="left" colspan="1" rowspan="1">2016</td>
                                <td align="left" colspan="1" rowspan="1">RAST</td>
                                <td align="left" colspan="1" rowspan="1">CONT</td>
                                <td align="left" colspan="1" rowspan="1">
                                    <ext-link ext-link-type="uri" xlink:href="https://www.worldpop.org/doi/10.5258/SOTON/WP00644">https://www.worldpop.org/doi/10.5258/SOTON/WP00644</ext-link>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">ENV</td>
                                <td align="left" colspan="1" rowspan="1">Degree of
                                    <break/>urbanization</td>
                                <td align="left" colspan="1" rowspan="1">GHS-SMOD</td>
                                <td align="left" colspan="1" rowspan="1">2015</td>
                                <td align="left" colspan="1" rowspan="1">RAST</td>
                                <td align="left" colspan="1" rowspan="1">CAT</td>
                                <td align="left" colspan="1" rowspan="1">
                                    <ext-link ext-link-type="uri" xlink:href="https://ghsl.jrc.ec.europa.eu/ucdb2018visual.php">https://ghsl.jrc.ec.europa.eu/ucdb2018visual.php</ext-link>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">ENV</td>
                                <td align="left" colspan="1" rowspan="1">Land cover</td>
                                <td align="left" colspan="1" rowspan="1">ESA-CCI</td>
                                <td align="left" colspan="1" rowspan="1">2015</td>
                                <td align="left" colspan="1" rowspan="1">RAST</td>
                                <td align="left" colspan="1" rowspan="1">CAT</td>
                                <td align="left" colspan="1" rowspan="1">
                                    <ext-link ext-link-type="uri" xlink:href="https://www.esa-landcover-cci.org">https://www.esa-landcover-cci.org</ext-link>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">PHY</td>
                                <td align="left" colspan="1" rowspan="1">Elevation</td>
                                <td align="left" colspan="1" rowspan="1">SRTM/WorldPop</td>
                                <td align="left" colspan="1" rowspan="1">2000</td>
                                <td align="left" colspan="1" rowspan="1">RAST</td>
                                <td align="left" colspan="1" rowspan="1">CONT</td>
                                <td align="left" colspan="1" rowspan="1">
                                    <ext-link ext-link-type="uri" xlink:href="https://www.worldpop.org/doi/10.5258/SOTON/WP00644">https://www.worldpop.org/doi/10.5258/SOTON/WP00644</ext-link>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">PHY</td>
                                <td align="left" colspan="1" rowspan="1">Slope</td>
                                <td align="left" colspan="1" rowspan="1">SRTM/WorldPop</td>
                                <td align="left" colspan="1" rowspan="1">2000</td>
                                <td align="left" colspan="1" rowspan="1">RAST</td>
                                <td align="left" colspan="1" rowspan="1">CONT</td>
                                <td align="left" colspan="1" rowspan="1">
                                    <ext-link ext-link-type="uri" xlink:href="https://www.worldpop.org/doi/10.5258/SOTON/WP00644">https://www.worldpop.org/doi/10.5258/SOTON/WP00644</ext-link>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">CLIM</td>
                                <td align="left" colspan="1" rowspan="1">Rainfall</td>
                                <td align="left" colspan="1" rowspan="1">WorldClim</td>
                                <td align="left" colspan="1" rowspan="1">1960&#x2013;2000</td>
                                <td align="left" colspan="1" rowspan="1">RAST</td>
                                <td align="left" colspan="1" rowspan="1">CONT</td>
                                <td align="left" colspan="1" rowspan="1">
                                    <ext-link ext-link-type="uri" xlink:href="http://worldclim.org/version2">http://worldclim.org/version2</ext-link>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">CLIM</td>
                                <td align="left" colspan="1" rowspan="1">Temperature</td>
                                <td align="left" colspan="1" rowspan="1">WorldClim</td>
                                <td align="left" colspan="1" rowspan="1">1960&#x2013;2000</td>
                                <td align="left" colspan="1" rowspan="1">RAST</td>
                                <td align="left" colspan="1" rowspan="1">CONT</td>
                                <td align="left" colspan="1" rowspan="1">
                                    <ext-link ext-link-type="uri" xlink:href="http://worldclim.org/version2">http://worldclim.org/version2</ext-link>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">&#x2014;</td>
                                <td align="left" colspan="1" rowspan="1">Population
                                    <break/>counts</td>
                                <td align="left" colspan="1" rowspan="1">WorldPop</td>
                                <td align="left" colspan="1" rowspan="1">2016</td>
                                <td align="left" colspan="1" rowspan="1">RAST</td>
                                <td align="left" colspan="1" rowspan="1">CONT</td>
                                <td align="left" colspan="1" rowspan="1">
                                    <ext-link ext-link-type="uri" xlink:href="https://www.worldpop.org/doi/10.5258/SOTON/WP00645">https://www.worldpop.org/doi/10.5258/SOTON/WP00645</ext-link>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">&#x2014;</td>
                                <td align="left" colspan="1" rowspan="1">Settlement layer</td>
                                <td align="left" colspan="1" rowspan="1">ORNL/WorldPop</td>
                                <td align="left" colspan="1" rowspan="1">2016</td>
                                <td align="left" colspan="1" rowspan="1">VECT</td>
                                <td align="left" colspan="1" rowspan="1">CAT</td>
                                <td align="left" colspan="1" rowspan="1">
                                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5281/zenodo.3562191">https://doi.org/10.5281/zenodo.3562191</ext-link>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">&#x2014;</td>
                                <td align="left" colspan="1" rowspan="1">Administrative
                                    <break/>boundaries</td>
                                <td align="left" colspan="1" rowspan="1">Central Bureau
                                    <break/>of the Census
                                    <break/>(BCR)</td>
                                <td align="left" colspan="1" rowspan="1">2018</td>
                                <td align="left" colspan="1" rowspan="1">VECT</td>
                                <td align="left" colspan="1" rowspan="1">CAT</td>
                                <td align="left" colspan="1" rowspan="1">
                                    <xref ref-type="other" rid="TFN1">*</xref>
                                </td>
                            </tr>
                        </tbody>
                    </table>
                    <table-wrap-foot>
                        <fn id="TFN1">
                            <p>*Datasets not publicly available.</p>
                        </fn>
                        <fn id="FN2">
                            <p>SE, socio-economic; INF, infrastructural; ENV, environmental; PHY, physical; CLIM, climatic; VECT, vector; RAST, raster; CONT, continuous; CAT, categorical.</p>
                        </fn>
                    </table-wrap-foot>
                </table-wrap>
                <fig fig-type="figure" id="f3" orientation="portrait" position="float">
                    <label>Figure 3. </label>
                    <caption>
                        <title>The settled cells constituting the gridded sampling frame.</title>
                        <p>The gaps between settlement layer and the settled cells tend to vary considerably across the urban area of Boma (A), the suburban areas at the outskirts of Kinshasa (C), the town of Mbankana (D), and the rural area north of the town of Kimpese (B).</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://gatesopenresearch-files.f1000.com/manuscripts/14272/7878f20e-8765-4c7e-b614-7bcc9d727ab0_figure3.gif"/>
                </fig>
            </sec>
            <sec>
                <title>Contextual stratification</title>
                <p>We retrieved ten gridded datasets describing the socio-economic (i.e., distance to conflict points and light intensity at night), infrastructure (i.e., distance to major roads and travel distance to cities), environmental (i.e., land cover and degree of urbanization), physical (i.e., elevation and slope), and climatic (i.e., temperature and rainfall) characteristics of the study area. These datasets have been selected because they represent key geospatial covariates in top-down population models developed by WorldPop
                    <sup>
                        <xref ref-type="bibr" rid="ref-38">38</xref>
                    </sup>. Comprehensive metadata are provided in 
                    <xref ref-type="table" rid="T1">Table 1</xref>. Gridded dataset attributes were extracted for the cells intersecting the settled cells, and categorical variables were &#x201c;dummified&#x201d;. A PCA was performed on the resulting 16 gridded data attributes and produced nine principal components that, together, explain 91.36% of the original variance. The nine principal components were then fed into a 
                    <italic toggle="yes">k</italic>-means clustering algorithm. 
                    <xref ref-type="fig" rid="f4">Figure 4</xref> shows the within-cluster sum of squares reduction for clusters spanning between one and ten. The &#x201c;elbow&#x201d; method suggests that three, five and eight clusters, with respectively 60.30%, 46.15% and 35.48% of the principal components&#x2019; variance explained, provide the best scenarios for capturing the variance in the principal components.</p>
                <fig fig-type="figure" id="f4" orientation="portrait" position="float">
                    <label>Figure 4. </label>
                    <caption>
                        <title>Within-cluster sum of squares reduction for 
                            <italic toggle="yes">k</italic>-means clusters spanning between one and ten.</title>
                        <p>Three, five, and eight clusters are the best scenarios, according to the &#x201c;elbow&#x201d; method, for capturing the variance in the nine principal components derived from the gridded data attributes.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://gatesopenresearch-files.f1000.com/manuscripts/14272/7878f20e-8765-4c7e-b614-7bcc9d727ab0_figure4.gif"/>
                </fig>
                <p>
                    <xref ref-type="fig" rid="f5">Figure 5</xref> contrasts the spatial distribution of three, five and eight clusters across the urban area (
                    <xref ref-type="fig" rid="f5">Figure 5A</xref>), suburban area (
                    <xref ref-type="fig" rid="f5">Figure 5C</xref>), town (
                    <xref ref-type="fig" rid="f5">Figure 5D</xref>), and rural area (
                    <xref ref-type="fig" rid="f5">Figure 5B</xref>) presented in 
                    <xref ref-type="fig" rid="f3">Figure 3</xref>. The legends show the ratio of settled cells allocated to the different clusters. Overall, the three scenarios produce comparable results, with a clear distinction between urban and suburban areas versus towns and rural areas. However, within urban and suburban areas, five and eight clusters seem to produce less realistic geographic patterns, with improbably sharp cluster boundaries (
                    <xref ref-type="fig" rid="f5">Figure 5A5</xref>) and prominent &#x201c;salt and pepper&#x201d; effects (
                    <xref ref-type="fig" rid="f5">Figure 5C8</xref>). Some of these patterns persist across the three scenarios, for instance, the sharp cluster boundaries occurring in the suburban area (
                    <xref ref-type="fig" rid="f5">Figure 5C</xref>) and town (
                    <xref ref-type="fig" rid="f5">Figure 5D</xref>). Within the three scenarios, the three-cluster scenario appears to produce the most realistic contextual strata. These contextual strata appear to reflect high (in red), medium (in blue), and low (in green) urban status.</p>
                <fig fig-type="figure" id="f5" orientation="portrait" position="float">
                    <label>Figure 5. </label>
                    <caption>
                        <title>The spatial distribution of three, five and eight clusters for selected locations.</title>
                        <p>The legends show the ratio of settled cells allocated to the different clusters. Overall, the spatial patterns resulting from the three scenarios produce comparable outputs, with a clear distinction between the urban (Boma &#x2014; A) and suburban (outskirts of Kinshasa &#x2014; C) areas versus the town (Mbankana &#x2014; D) and rural area (North of Kimpese &#x2014; B).</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://gatesopenresearch-files.f1000.com/manuscripts/14272/7878f20e-8765-4c7e-b614-7bcc9d727ab0_figure5.gif"/>
                </fig>
            </sec>
            <sec>
                <title>Probability proportional to population size</title>
                <p>We accessed high-resolution gridded population estimates for 2016 from WorldPop and allocated population figures to the corresponding settled cells. Comprehensive metadata are provided in 
                    <xref ref-type="table" rid="T1">Table 1</xref>. 
                    <xref ref-type="fig" rid="f6">Figure 6</xref> shows the distribution of the population counts per settled cell across the contextual strata derived from the three clusters scenario. Contextual strata labelled as high, medium, and low urban status include 26.91%, 40.14%, and 32.95% of the settled cells, respectively. Overall, the distribution of population counts per settled cell varies considerably across the three contextual strata, and this is consistent with the allocated labels of high, medium, and low urban status. The stratum characterized by high urban status has the highest median population count per cell of 55.58 and the largest outliers, with a maximum of 1109.41. Conversely, the stratum characterized by low urban status shows a very low median population count per cell of 0.15, with a maximum value of 13.97. The stratum with medium urban status also has a low median population count per cell of 1.39, but outliers are relatively important, with a maximum value of 146.74.</p>
                <fig fig-type="figure" id="f6" orientation="portrait" position="float">
                    <label>Figure 6. </label>
                    <caption>
                        <title>Distribution of population counts per sampling-frame cell across the contextual strata defined based on the three clusters scenario.</title>
                        <p>The large horizontal black lines show the median, the boxes the interquartile range, the whiskers the minimum and maximum, and the dots the outliers.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://gatesopenresearch-files.f1000.com/manuscripts/14272/7878f20e-8765-4c7e-b614-7bcc9d727ab0_figure6.gif"/>
                </fig>
            </sec>
            <sec>
                <title>Sampling evaluation</title>
                <p>We sampled settled cells from each contextual stratum proportionally to the respective population counts. 
                    <xref ref-type="fig" rid="f7">Figure 7</xref> contrasts the ECDF (black lines) to the WECDF (coloured lines). For each stratum, the ECDF lines depict the cumulative distribution of the population counts across all the settled cells, while the WECDF lines show the cumulative distributions of the population counts for a number of sampled grid cells spanning between 1 and 1000. Overall, the WECDF lines become less dispersed towards higher values and are mostly above the ECDF lines. Conversely, the WECDF lines tend to be more scattered for low-to-medium values and are mostly located below the ECDF lines. These results reflect the oversampling of settled cells with the highest population counts resulting from the proportional to population size sampling strategy. This expected pattern is predominant in the stratum characterized by high urban status, while it appears to be negligible in the strata with medium and low urban status.</p>
                <fig fig-type="figure" id="f7" orientation="portrait" position="float">
                    <label>Figure 7. </label>
                    <caption>
                        <title>Empirical cumulative distribution function (ECDF) and weighted ECDF (WECDF).</title>
                        <p>The ECDFs are depicted as black lines and the ECDFs as coloured lines. Sample sizes for the ECDFs span between 1 and 1000. The settled cells are selected using proportional to population size sampling for each contextual stratum (high, medium, and low urban status), independently.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://gatesopenresearch-files.f1000.com/manuscripts/14272/7878f20e-8765-4c7e-b614-7bcc9d727ab0_figure7.gif"/>
                </fig>
            </sec>
            <sec>
                <title>Sample size estimation</title>
                <p>We computed the Kolmogorov-Smirnov distance between the baseline ECDF and the WECDF for sample sizes spanning between 1 and 1000 across the different strata. We replicated this procedure 1000 times for the different sample sizes and averaged the distance metrics to provide a robust assessment of the distance between the two functions. 
                    <xref ref-type="fig" rid="f8">Figure 8</xref> shows the mean Kolmogorov-Smirnov distance for sample sizes spanning between 1 and 1000 across the different contextual strata. Overall, average distances show similar patterns across different strata. Low average distances can be observed for extremely low sample sizes that then spike before gradually decreasing as a function of sample size. This suggests that after discarding very low sample sizes &#x2014; poorly recovering the reference population &#x2014; and very high sample sizes &#x2014; providing negligible improvements &#x2014; it is difficult to estimate ideal sample sizes. However, 
                    <xref ref-type="fig" rid="f8">Figure 8</xref> suggests that a sample size threshold can be defined based on sensible distance values (e.g., between 0.10 and 0.20), and sample size can be allocated across strata to provide similar sampling performances. 
                    <xref ref-type="fig" rid="f8">Figure 8</xref> shows that, in order to achieve a sampling performance of 0.15, 139 settled cells should be sampled from the stratum with high urban status, 171 from the stratum with medium urban status and 83 the stratum with low urban status &#x2014; 0.25%, 0.20%, and 0.12% of the respective settled cells.</p>
                <fig fig-type="figure" id="f8" orientation="portrait" position="float">
                    <label>Figure 8. </label>
                    <caption>
                        <title>Average Kolmogorov-Smirnov distance for each contextual stratum.</title>
                        <p>For sample sizes spanning between 1 and 1000, 1000 repetitions have been carried out and then averaged to produce a more robust assessment. The box highlights sample sizes resulting in reasonable distance metrics. The circles show the sample sizes resulting in a distance of 0.15.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://gatesopenresearch-files.f1000.com/manuscripts/14272/7878f20e-8765-4c7e-b614-7bcc9d727ab0_figure8.gif"/>
                </fig>
            </sec>
            <sec>
                <title>Sampled locations</title>
                <p>To obtain similar sampling performances, we sampled 139, 171 and 83 settled cells from the strata with high, medium, and low urban status, respectively, proportionally to population size. 
                    <xref ref-type="fig" rid="f9">Figure 9</xref> shows the sampled locations across the three strata and the sampling weights to be embedded in the estimator. The highest weights can be observed for the stratum of medium urban status, mostly across sparsely populated areas. Higher weights are also present in the stratum with high urban status, especially at the outskirts of Kinshasa. In this sector, the urban transition results in substantially lower population counts per settled cell, compared with the settled cells within the same stratum. The lowest weights can be observed across the strata with low urban status because its total population is by far the lowest.</p>
                <fig fig-type="figure" id="f9" orientation="portrait" position="float">
                    <label>Figure 9. </label>
                    <caption>
                        <title>Sampled settled cells across the different contextual strata.</title>
                        <p>The resulting sampling weights vary considerably across strata. Higher weights can be observed in areas of lower population counts per settled cell within the medium urban status stratum, while lower weights can be found in the sparsely populated low urban status stratum.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://gatesopenresearch-files.f1000.com/manuscripts/14272/7878f20e-8765-4c7e-b614-7bcc9d727ab0_figure9.gif"/>
                </fig>
            </sec>
        </sec>
        <sec sec-type="discussion | conclusions">
            <title>Discussion and conclusions</title>
            <sec>
                <title>Limits of traditional sample designs</title>
                <p>In low- and middle-income countries, sample designs for household surveys are traditionally set up in two stages for logistical and financial considerations
                    <sup>
                        <xref ref-type="bibr" rid="ref-9">9</xref>
                    </sup>. This form of multistage sampling involves an initial sampling from the primary frame, which consists of non-overlapping enumeration units defined proportionally to population size
                    <sup>
                        <xref ref-type="bibr" rid="ref-7">7</xref>
                    </sup>. These enumeration units are typically derived from the last national census, which is usually carried out on a decadal basis
                    <sup>
                        <xref ref-type="bibr" rid="ref-54">54</xref>
                    </sup>. In reality, the time-spans between censuses can be even larger as, according to the United Nations&#x2019; Department of Economic and Social Affairs, 
                    <ext-link ext-link-type="uri" xlink:href="https://unstats.un.org/unsd/demographic-social/census/censusdates">23 countries had the last census over ten years ago</ext-link>. Even when collected regularly, census data become rapidly outdated because a maximum time-span of two years should typically occur between the definition of the sampling frame and the household survey sampling and implementation
                    <sup>
                        <xref ref-type="bibr" rid="ref-7">7</xref>
                    </sup>. For this reason, traditional sample designs for household surveys are to be considered representative only at sporadic frequencies and for relatively short periods.</p>
                <p>The uncertainty associated with non-representative sampling frames propagates through the sampling design to the estimator
                    <sup>
                        <xref ref-type="bibr" rid="ref-8">8</xref>
                    </sup>. As a consequence, the resulting household surveys can limit the accuracy of the derived demographic data
                    <sup>
                        <xref ref-type="bibr" rid="ref-18">18</xref>
                    </sup>. To tackle this issue, research in the domain of household sample design recently started to focus on the use of gridded population data to produce actionable sampling frames
                    <sup>
                        <xref ref-type="bibr" rid="ref-10">10</xref>
                    </sup>. Given the geographically explicit nature of gridded sampling frames, sample designs for household surveys can arguably benefit from spatial sampling techniques traditionally applied in natural sciences
                    <sup>
                        <xref ref-type="bibr" rid="ref-13">13</xref>
                    </sup>. To date, only a limited number of sample designs for household surveys have explicitly considered concepts of spatial sampling through the concepts of random field, sampling design and estimator. Two such studies reflect the characteristics of the random field in sample design using parcel boundaries
                    <sup>
                        <xref ref-type="bibr" rid="ref-26">26</xref>
                    </sup> and air pollution levels
                    <sup>
                        <xref ref-type="bibr" rid="ref-16">16</xref>
                    </sup>. However, none of these studies explicitly considered the geographic distribution of the reference population in their sample design.</p>
            </sec>
            <sec>
                <title>Adopting gridded sampling frames</title>
                <p>To tackle the limits of traditional sample designs, we proposed an innovative grid-based sample design framework for household surveys. This framework is centred around the concept of gridded sampling frame, a concept that is traditionally being adopted in natural sciences
                    <sup>
                        <xref ref-type="bibr" rid="ref-8">8</xref>
                    </sup> and, more recently, in sampling for household surveys
                    <sup>
                        <xref ref-type="bibr" rid="ref-10">10</xref>
                    </sup>. The use of geographically explicit sampling units enabled us to revise the three pillars of traditional sample design &#x2014; sampling frame, sampling design, and estimator &#x2014; through the elements of the core components of spatial sampling
                    <sup>
                        <xref ref-type="bibr" rid="ref-13">13</xref>
                    </sup>. A key element of the proposed framework is formalizing the population distribution as a random field, and tackle spatial trends, spatial autocorrelation, and stratification of the reference population. These considerations are embedded in the sampling design, where contextual stratification
                    <sup>
                        <xref ref-type="bibr" rid="ref-8">8</xref>
                    </sup> and population-weighted sampling
                    <sup>
                        <xref ref-type="bibr" rid="ref-36">36</xref>
                    </sup> are used jointly to improve sampling efficiency. Both the sampling design and the sample size are assessed based on a nonparametric estimator to assess generalization to the entire reference population
                    <sup>
                        <xref ref-type="bibr" rid="ref-48">48</xref>,
                        <xref ref-type="bibr" rid="ref-49">49</xref>
                    </sup>.</p>
                <p>We demonstrated an application of our proposed sample design framework with a case study developed in two provinces in the western part of DRC. In this country, existing sampling frames are typically developed based on outdated census figures dating from 1984. As a result, much demographic information produced through the six national surveys carried out since 2004 is highly uncertain
                    <sup>
                        <xref ref-type="bibr" rid="ref-18">18</xref>
                    </sup>. We built a gridded sampling frame for the study area consisting of settled cells of approximatively 90 meters spatial resolution. We then defined the two essential elements of our sampling design, namely the contextual strata based on a combination of PCA and 
                    <italic toggle="yes">k</italic>-means algorithm and the probability proportional to population size per settled cell retrieved from recent gridded population estimates. While the estimates are arguably uncertain because based on projections from the last national census, their geographic distribution is a reasonable approximation of the geographic distribution of population across the study area
                    <sup>
                        <xref ref-type="bibr" rid="ref-5">5</xref>,
                        <xref ref-type="bibr" rid="ref-11">11</xref>
                    </sup>. We assessed the sampling design by contrasting the ECDF for the population to the WECDFs for different sample sizes across the contextual strata. We also examined how sample size impacts the recovering the characteristics of the entire reference population across the different contextual strata. Lastly, we document and describe the geographic distribution of the sampled cells and the relative sampling weights to be embedded in the estimator.</p>
            </sec>
            <sec>
                <title>Challenges and next steps</title>
                <p>The case study underscores some challenges of the proposed grid-based sample design. First, the spatial accuracy of a gridded sampling frame is contingent upon the quality of the input settlement layer. The case study showed that the settlement layer enables to detect settlement patterns at high spatial resolution across urban and rural locations. The use of settlement data of lower spatial resolutions would reduce the accuracy of the sampling frame, especially in regions where the built-up area is more scattered. Second, the dimensionality reduction techniques employed to define contextual strata suffer inherent limitations in detecting complex dimensionality structures. Alternative unsupervised classification methods should be tested
                    <sup>
                        <xref ref-type="bibr" rid="ref-55">55</xref>
                    </sup>. The sampling design can also be affected by the quality of the gridded population data used to define the probability scheme. Even if these gridded data are argued to be more accurate than the related administrative counts
                    <sup>
                        <xref ref-type="bibr" rid="ref-21">21</xref>
                    </sup>, their fitness for use is contingent upon a number of criteria listed elsewhere
                    <sup>
                        <xref ref-type="bibr" rid="ref-11">11</xref>
                    </sup>. The use of a nonparametric estimator to assess sampling efficiency also demonstrated systematic oversampling of settled cells with higher population counts when sampling proportional to population size. This involves that larger sample sizes are required within heterogeneous strata.</p>
                <p>The proposed grid-sampling design inspired the selection of household survey locations in the Kongo-Central and Kinshasa provinces in 2018 as part of the 
                    <ext-link ext-link-type="uri" xlink:href="https://www.grid3.org/">Geo-Referenced Infrastructure and Demographic Data for Development</ext-link> (GRID3) project. In this project, household survey data collected across small and well-defined geographic areas were used as input data for bottom-up population models to predict basic demographic characteristics across the study area. The survey work conducted as part of this project enabled us to identify critical next steps in the household survey implementation. First, carrying out household surveys within grid cells can be challenging if clear guidelines are not defined in the survey protocol. This includes, for instance, defining the buildings belonging to a cell using the location of their entrance door. The survey work also highlighted other challenges in the implementation of the proposed grid-based sample design related to the difficulty of detecting square grid boundaries in complex settings, as they do not reflect identifiable physical boundaries on the ground (e.g., roads and water bodies). In addition, surveying individual grid cells can be poorly resource-effective in sparsely populated areas. For this reason, a minimum population-count threshold could be enforced by aggregating neighbouring grid cells prior to the sampling design
                    <sup>
                        <xref ref-type="bibr" rid="ref-10">10</xref>
                    </sup>. This feature has been recently suggested by an automatic enumeration units delineation tool
                    <sup>
                        <xref ref-type="bibr" rid="ref-56">56</xref>
                    </sup> and implemented in the latest update of the online version of GridSample, available at 
                    <ext-link ext-link-type="uri" xlink:href="https://gridsample.org/">https://gridsample.org/</ext-link>.</p>
            </sec>
        </sec>
        <sec>
            <title>Data availability</title>
            <sec>
                <title>Source data</title>
                <p>Most of the data used in our case study are freely available and can be accessed following the references presented in 
                    <xref ref-type="table" rid="T1">Table 1</xref>. The official administrative boundaries for the Kongo-Central and Kinshasa provinces are owned by the Central Bureau of the Census (BCR) of the Democratic Republic of the Congo and can be accessed upon reasonable request made to 
                    <email xlink:href="mailto:bcrinfo@ins-rdc.org">bcrinfo@ins-rdc.org</email>. Further information on the data created by the BCR is available on 
                    <ext-link ext-link-type="uri" xlink:href="http://ins-rdc.org">http://ins-rdc.org</ext-link>.</p>
            </sec>
        </sec>
    </body>
    <back>
        <ack>
            <title>Acknowledgements</title>
            <p>This work is part of the GRID3 project (Geo-Referenced Infrastructure and Demographic Data for Development) funded by the Bill and Melinda Gates Foundation and the United Kingdom Department of International Development (DFID) [OPP1182408]. The project is a collaboration between WorldPop at the University of Southampton, the Flowminder Foundation, the United Nations Population Fund (UNFPA), and the Center for International Earth Science Information Network (CIESIN) within the Earth Institute at Columbia University. We thank the UCLA-DRC Health Research and Training Program, the Kinshasa School of Public Health (KSPH), and the DRC Bureau Central du Recensement (BCR) for coordinating and conducting the micro-census survey in Kongo-Central and Kinshasa provinces, for which this sample design framework was developed. We also acknowledge the help of Douglas R. Leasure, Maksym Bondarenko, Warren C. Jochem, and Heather R. Chamberlain at WorldPop, and Eric M. Weber at Oak Ridge National Laboratory.</p>
        </ack>
        <ref-list>
            <ref id="ref-1">
                <label>1</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Robey</surname>
                            <given-names>B</given-names>
                        </name>
</person-group>:
                    <article-title>Two hundred years and counting: the 1990 census.</article-title>
                    <source>

                        <italic toggle="yes">Popul Bull.</italic>
</source>
                    <year>1989</year>;<volume>44</volume>(<issue>1</issue>):<fpage>3</fpage>&#x2013;<lpage>43</lpage>.
                    <pub-id pub-id-type="pmid">12282080</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-2">
                <label>2</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Corsi</surname>
                            <given-names>DJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Neuman</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Finlay</surname>
                            <given-names>JE</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Demographic and health surveys: a profile.</article-title>
                    <source>

                        <italic toggle="yes">Int J Epidemiol.</italic>
</source>
                    <year>2012</year>;<volume>41</volume>(<issue>6</issue>):<fpage>1602</fpage>&#x2013;<lpage>1613</lpage>.
                    <pub-id pub-id-type="pmid">23148108</pub-id>
                    <pub-id pub-id-type="doi">10.1093/ije/dys184</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-3">
                <label>3</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wright</surname>
                            <given-names>T</given-names>
                        </name>
</person-group>:
                    <article-title>Sampling and Census 2000: The Concepts.</article-title>
                    <source>

                        <italic toggle="yes">Am Sci.</italic>
</source>
                    <year>1998</year>;<volume>86</volume>(<issue>3</issue>):<fpage>245</fpage>.
                    <ext-link ext-link-type="uri" xlink:href="https://www.jstor.org/stable/27857024?seq=1">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-4">
                <label>4</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Delmelle</surname>
                            <given-names>EM</given-names>
                        </name>
</person-group>:
                    <article-title>Spatial Sampling.</article-title>In
                    <italic toggle="yes">Handbook of Regional Science</italic>; Fischer, M.M., Nijkamp, P., Eds.; Springer Berlin Heidelberg: Berlin, Heidelberg,<year>2014</year>;<fpage>1385</fpage>&#x2013;<lpage>1399</lpage>.
                    <pub-id pub-id-type="doi">10.1007/978-3-642-23430-9_73</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-5">
                <label>5</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Thomson</surname>
                            <given-names>DR</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Stevens</surname>
                            <given-names>FR</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ruktanonchai</surname>
                            <given-names>NW</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>
                        <italic toggle="yes">GridSample</italic>: an R package to generate household survey primary sampling units (PSUs) from gridded population data.</article-title>
                    <source>

                        <italic toggle="yes">Int J Health Geogr.</italic>
</source>
                    <year>2017</year>;<volume>16</volume>(<issue>1</issue>):<fpage>25</fpage>.
                    <pub-id pub-id-type="pmid">28724433</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s12942-017-0098-4</pub-id>
                    <pub-id pub-id-type="pmcid">5518145</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-6">
                <label>6</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Cochran</surname>
                            <given-names>WG</given-names>
                        </name>
</person-group>:
                    <article-title>Sampling techniques.</article-title>Wiley series in probability and mathematical statistics; 3d ed.; Wiley: New York,<year>1977</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://archive.org/details/Cochran1977SamplingTechniques_201703/mode/2up">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-7">
                <label>7</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Turner</surname>
                            <given-names>AG</given-names>
                        </name>
</person-group>:
                    <article-title>Sampling frames and master samples.</article-title>In
                    <italic toggle="yes">Designing Household Survey Samples: Practical Guidelines</italic>; UN: New York,<year>2008</year>;<fpage>75</fpage>&#x2013;<lpage>97</lpage>.
                    <ext-link ext-link-type="uri" xlink:href="https://books.google.co.in/books/about/Designing_Household_Survey_Samples.html?id=VDqJW6O5JYYC&amp;pg=PA75">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-8">
                <label>8</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Turner</surname>
                            <given-names>AG</given-names>
                        </name>
</person-group>:
                    <article-title>Sampling strategies.</article-title>In
                    <italic toggle="yes">Designing Household Survey Samples: Practical Guidelines</italic>; UN: New York,<year>2003</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://mdgs.un.org/unsd/demographic/meetings/egm/Sampling_1203/docs/no_2.pdf">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-9">
                <label>9</label>
                <mixed-citation publication-type="bppk">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Yansaneh</surname>
                            <given-names>IS</given-names>
                        </name>
</person-group>:
                    <article-title>Overview of sample design issues for household surveys in developing and transition countries.</article-title>UN Department of Economic and Social Affairs, Statistics Division. Household sample surveys in developing and transition countries.; UN.; New York,<year>2005</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://unstats.un.org/unsd/hhsurveys/pdf/chapter_2.pdf">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-10">
                <label>10</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Thomson</surname>
                            <given-names>DR</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Rhoda</surname>
                            <given-names>DA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Tatem</surname>
                            <given-names>AJ</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Gridded Population Survey Sampling: A Review of the Field and Strategic Research Agenda.</article-title>
                    <source>

                        <italic toggle="yes">Preprints.</italic>
</source>
                    <year>2019</year>; 2019110072.
                    <pub-id pub-id-type="doi">10.20944/preprints201911.0072.v1</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-11">
                <label>11</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Leyk</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gaughan</surname>
                            <given-names>AE</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Adamo</surname>
                            <given-names>SB</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The spatial allocation of population: a review of large-scale gridded population data products and their fitness for use.</article-title>
                    <source>

                        <italic toggle="yes">Earth Syst Sci Data.</italic>
</source>
                    <year>2019</year>;<volume>11</volume>(<issue>3</issue>):<fpage>1385</fpage>&#x2013;<lpage>1409</lpage>.
                    <pub-id pub-id-type="doi">10.5194/essd-11-1385-2019</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-12">
                <label>12</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Weber</surname>
                            <given-names>EM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Seaman</surname>
                            <given-names>VY</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Stewart</surname>
                            <given-names>RN</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Census-independent population mapping in northern Nigeria.</article-title>
                    <source>

                        <italic toggle="yes">Remote Sens Environ.</italic>
</source>
                    <year>2018</year>;<volume>204</volume>:<fpage>786</fpage>&#x2013;<lpage>798</lpage>.
                    <pub-id pub-id-type="pmid">29302127</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.rse.2017.09.024</pub-id>
                    <pub-id pub-id-type="pmcid">5738969</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-13">
                <label>13</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wang</surname>
                            <given-names>JF</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Stein</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gao</surname>
                            <given-names>BB</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A review of spatial sampling.</article-title>
                    <source>

                        <italic toggle="yes">Spat Stat.</italic>
</source>
                    <year>2012</year>;<volume>2</volume>(<issue>1</issue>):<fpage>1</fpage>&#x2013;<lpage>14</lpage>.
                    <pub-id pub-id-type="doi">10.1016/j.spasta.2012.08.001</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-14">
                <label>14</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wang</surname>
                            <given-names>JF</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zhang</surname>
                            <given-names>TL</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Fu</surname>
                            <given-names>BJ</given-names>
                        </name>
</person-group>:
                    <article-title>A measure of spatial stratified heterogeneity.</article-title>
                    <source>

                        <italic toggle="yes">Ecol Indic.</italic>
</source>
                    <year>2016</year>;<volume>67</volume>:<fpage>250</fpage>&#x2013;<lpage>256</lpage>.
                    <pub-id pub-id-type="doi">10.1016/j.ecolind.2016.02.052</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-15">
                <label>15</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wang</surname>
                            <given-names>JF</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Haining</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Cao</surname>
                            <given-names>Z</given-names>
                        </name>
</person-group>:
                    <article-title>Sample surveying to estimate the mean of a heterogeneous surface: reducing the error variance through zoning.</article-title>
                    <source>

                        <italic toggle="yes">Int J Geogr Inf Sci.</italic>
</source>
                    <year>2010</year>;<volume>24</volume>(<issue>4</issue>):<fpage>523</fpage>&#x2013;<lpage>543</lpage>.
                    <pub-id pub-id-type="doi">10.1080/13658810902873512</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-16">
                <label>16</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Kumar</surname>
                            <given-names>N</given-names>
                        </name>
</person-group>:
                    <article-title>Spatial Sampling Design for a Demographic and Health Survey.</article-title>
                    <source>

                        <italic toggle="yes">Popul Res Policy Rev.</italic>
</source>
                    <year>2007</year>;<volume>26</volume>(<issue>5&#x2013;6</issue>):<fpage>581</fpage>&#x2013;<lpage>599</lpage>.
                    <pub-id pub-id-type="doi">10.1007/s11113-007-9044-7</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-17">
                <label>17</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Massey</surname>
                            <given-names>FJ</given-names>
                        </name>
</person-group>:
                    <article-title>The Kolmogorov-Smirnov Test for Goodness of Fit.</article-title>
                    <source>

                        <italic toggle="yes">J Am Stat Assoc.</italic>
</source>
                    <year>1951</year>;<volume>46</volume>(<issue>253</issue>):<fpage>68</fpage>&#x2013;<lpage>78</lpage>.
                    <pub-id pub-id-type="doi">10.1080/01621459.1951.10500769</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-18">
                <label>18</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Marivoet</surname>
                            <given-names>W</given-names>
                        </name>

                        <name name-style="western">
                            <surname>De Herdt</surname>
                            <given-names>T</given-names>
                        </name>
</person-group>:
                    <article-title>Tracing Down Real Socio-Economic Trends From Household Data With Erratic Sampling Frames: The Case of the Democratic Republic of the Congo.</article-title>
                    <source>

                        <italic toggle="yes">J Asian Afr Stud.</italic>
</source>
                    <year>2018</year>;<volume>53</volume>(<issue>4</issue>):<fpage>532</fpage>&#x2013;<lpage>552</lpage>.
                    <pub-id pub-id-type="doi">10.1177/0021909617698842</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-19">
                <label>19</label>
                <mixed-citation publication-type="book">
                    <collab>R Core Team</collab>:
                    <article-title>R: A Language and Environment for Statistical Computing.</article-title>R Foundation for Statistical Computing: Vienna, Austria,<year>2019</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://www.r-project.org/index.html">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-20">
                <label>20</label>
                <mixed-citation publication-type="book">
                    <collab>RStudio Team</collab>:
                    <article-title>RStudio: Integrated Development Environment for R.</article-title>RStudio, Inc.: Boston, MA, US,<year>2019</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://www.kdnuggets.com/2011/03/rstudio-ide-for-r.html">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-21">
                <label>21</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Thomson</surname>
                            <given-names>DR</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Stevens</surname>
                            <given-names>FR</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Castro</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>gridsample: Tools for Grid-Based Survey Sampling Design</article-title>.<year>2018</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://rdrr.io/cran/gridsample/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-22">
                <label>22</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hijmans</surname>
                            <given-names>RJ</given-names>
                        </name>
</person-group>:
                    <article-title>raster: Geographic Data Analysis and Modeling</article-title>.<year>2019</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://rdrr.io/cran/raster/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-23">
                <label>23</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Pebesma</surname>
                            <given-names>E</given-names>
                        </name>
</person-group>:
                    <article-title>Simple features for R: Standardized support for spatial vector data.</article-title>
                    <source>

                        <italic toggle="yes">R J.</italic>
</source>
                    <year>2018</year>;<volume>10</volume>(<issue>1</issue>):<fpage>439</fpage>&#x2013;<lpage>446</lpage>.
                    <ext-link ext-link-type="uri" xlink:href="https://journal.r-project.org/archive/2018/RJ-2018-009/RJ-2018-009.pdf">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-24">
                <label>24</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Baddeley</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <article-title>Spatial Point Process Modelling and Its Applications.</article-title>Universitat Jaume I,<year>2004</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://books.google.co.in/books?id=o5W9Odp0AXwC&amp;printsec=frontcover&amp;source=gbs_ge_summary_r&amp;cad=0#v=onepage&amp;q&amp;f=false">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-25">
                <label>25</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Matheron</surname>
                            <given-names>G</given-names>
                        </name>
</person-group>:
                    <article-title>Principles of geostatistics.</article-title>
                    <source>

                        <italic toggle="yes">Econ Geol.</italic>
</source>
                    <year>1963</year>;<volume>58</volume>(<issue>8</issue>):<fpage>1246</fpage>&#x2013;<lpage>1266</lpage>.
                    <pub-id pub-id-type="doi">10.2113/gsecongeo.58.8.1246</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-26">
                <label>26</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Lee</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Moudon</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Courbois</surname>
                            <given-names>JY</given-names>
                        </name>
</person-group>:
                    <article-title>Built environment and behavior: spatial sampling using parcel data.</article-title>
                    <source>

                        <italic toggle="yes">Ann Epidemiol.</italic>
</source>
                    <year>2006</year>;<volume>16</volume>(<issue>5</issue>):<fpage>387</fpage>&#x2013;<lpage>394</lpage>.
                    <pub-id pub-id-type="pmid">16005246</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.annepidem.2005.03.003</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-27">
                <label>27</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Rose</surname>
                            <given-names>AN</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Bright</surname>
                            <given-names>E</given-names>
                        </name>
</person-group>:
                    <article-title>The LandScan Global Population Distribution Project: Current State of the Art and Prospective Innovation.</article-title>
                    <source>

                        <italic toggle="yes">PAAA Proc.</italic>
</source>
                    <year>2014</year>;<fpage>21</fpage>.
                    <ext-link ext-link-type="uri" xlink:href="https://pdfs.semanticscholar.org/dbec/08b982769c197b8b891390e55e055581c5db.pdf">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-28">
                <label>28</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Freire</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Pesaresi</surname>
                            <given-names>M</given-names>
                        </name>
</person-group>:
                    <article-title>GHS population grid, derived from GPW4, multitemporal (1975, 1990, 2000, 2015).</article-title>Documentation for the GHS Population Grid (GHS-POP); European Commission, Joint Research Centre (JRC): Ispria, Italy,<year>2017</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://data.europa.eu/euodp/en/data/dataset/jrc-ghsl-ghs_pop_gpw4_globe_r2015a">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-29">
                <label>29</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Stevens</surname>
                            <given-names>FR</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gaughan</surname>
                            <given-names>AE</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Linard</surname>
                            <given-names>C</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Disaggregating census data for population mapping using random forests with remotely-sensed and ancillary data.</article-title>
                    <source>

                        <italic toggle="yes">PLoS One.</italic>
</source>
                    <year>2015</year>;<volume>10</volume>(<issue>2</issue>):<fpage>e0107042</fpage>.
                    <pub-id pub-id-type="pmid">25689585</pub-id>
                    <pub-id pub-id-type="doi">10.1371/journal.pone.0107042</pub-id>
                    <pub-id pub-id-type="pmcid">4331277</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-30">
                <label>30</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Esch</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Heldens</surname>
                            <given-names>W</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hirner</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Breaking new ground in mapping human settlements from space-The Global Urban Footprint.</article-title>
                    <source>

                        <italic toggle="yes">ISPRS J Photogramm Remote Sens.</italic>
</source>
                    <year>2017</year>;<volume>134</volume>:<fpage>30</fpage>&#x2013;<lpage>42</lpage>.
                    <pub-id pub-id-type="doi">10.1016/j.isprsjprs.2017.10.012</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-31">
                <label>31</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Eicher</surname>
                            <given-names>CL</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Brewer</surname>
                            <given-names>CA</given-names>
                        </name>
</person-group>:
                    <article-title>Dasymetric Mapping and Areal Interpolation: Implementation and Evaluation.</article-title>
                    <source>

                        <italic toggle="yes">Cartogr Geogr Inf Sci.</italic>
</source>
                    <year>2001</year>;<volume>28</volume>(<issue>2</issue>):<fpage>125</fpage>&#x2013;<lpage>138</lpage>.
                    <pub-id pub-id-type="doi">10.1559/152304001782173727</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-32">
                <label>32</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Tobler</surname>
                            <given-names>WR</given-names>
                        </name>
</person-group>:
                    <article-title>A Computer Movie Simulating Urban Growth in the Detroit Region.</article-title>
                    <source>

                        <italic toggle="yes">Econ Geogr.</italic>
</source>
                    <year>1970</year>;<volume>46</volume>:<fpage>234</fpage>&#x2013;<lpage>240</lpage>.
                    <pub-id pub-id-type="doi">10.2307/143141</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-33">
                <label>33</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Griffith</surname>
                            <given-names>DA</given-names>
                        </name>
</person-group>:
                    <article-title>Effective Geographic Sample Size in the Presence of Spatial Autocorrelation.</article-title>
                    <source>

                        <italic toggle="yes">Ann Assoc Am Geogr.</italic>
</source>
                    <year>2005</year>;<volume>95</volume>(<issue>4</issue>):<fpage>740</fpage>&#x2013;<lpage>760</lpage>.
                    <pub-id pub-id-type="doi">10.1111/j.1467-8306.2005.00484.x</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-34">
                <label>34</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Brunsdon</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Fotheringham</surname>
                            <given-names>AS</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Charlton</surname>
                            <given-names>ME</given-names>
                        </name>
</person-group>:
                    <article-title>Geographically Weighted Regression: A Method for Exploring Spatial Nonstationarity.</article-title>
                    <source>

                        <italic toggle="yes">Geogr Anal.</italic>
</source>
                    <year>1996</year>;<volume>28</volume>(<issue>4</issue>):<fpage>281</fpage>&#x2013;<lpage>298</lpage>.
                    <pub-id pub-id-type="doi">10.1111/j.1538-4632.1996.tb00936.x</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-35">
                <label>35</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Delmelle</surname>
                            <given-names>EM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Goovaerts</surname>
                            <given-names>P</given-names>
                        </name>
</person-group>:
                    <article-title>Second-Phase Sampling Designs for Non-Stationary Spatial Variables.</article-title>
                    <source>

                        <italic toggle="yes">Geoderma.</italic>
</source>
                    <year>2009</year>;<volume>153</volume>(<issue>1&#x2013;2</issue>):<fpage>205</fpage>&#x2013;<lpage>216</lpage>.
                    <pub-id pub-id-type="pmid">20625537</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.geoderma.2009.08.007</pub-id>
                    <pub-id pub-id-type="pmcid">2901132</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-36">
                <label>36</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Elsey</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Poudel</surname>
                            <given-names>AN</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ensor</surname>
                            <given-names>T</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Improving household surveys and use of data to address health inequities in three Asian cities: protocol for the Surveys for Urban Equity (SUE) mixed methods and feasibility study.</article-title>
                    <source>

                        <italic toggle="yes">BMJ Open.</italic>
</source>
                    <year>2018</year>;<volume>8</volume>(<issue>11</issue>):<fpage>e024182</fpage>.
                    <pub-id pub-id-type="pmid">30478123</pub-id>
                    <pub-id pub-id-type="doi">10.1136/bmjopen-2018-024182</pub-id>
                    <pub-id pub-id-type="pmcid">6254496</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-37">
                <label>37</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Rodeghiero</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Cescatti</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <article-title>Spatial variability and optimal sampling strategy of soil respiration.</article-title>
                    <source>

                        <italic toggle="yes">For Ecol Manag.</italic>
</source>
                    <year>2008</year>;<volume>255</volume>(<issue>1</issue>):<fpage>106</fpage>&#x2013;<lpage>112</lpage>.
                    <pub-id pub-id-type="doi">10.1016/j.foreco.2007.08.025</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-38">
                <label>38</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Lloyd</surname>
                            <given-names>CT</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Sorichetta</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Tatem</surname>
                            <given-names>AJ</given-names>
                        </name>
</person-group>:
                    <article-title>High resolution global gridded data for use in population studies.</article-title>
                    <source>

                        <italic toggle="yes">Sci Data.</italic>
</source>
                    <year>2017</year>;<volume>4</volume>:<fpage>170001</fpage>.
                    <pub-id pub-id-type="pmid">28140386</pub-id>
                    <pub-id pub-id-type="doi">10.1038/sdata.2017.1</pub-id>
                    <pub-id pub-id-type="pmcid">5283062</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-39">
                <label>39</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ding</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>He</surname>
                            <given-names>X</given-names>
                        </name>
</person-group>:
                    <article-title>
                        <italic toggle="yes">K</italic>-means clustering via principal component analysis. </article-title>In
                    <source>
                        <italic toggle="yes">Proceedings of the Proceedings of the twenty-first international conference on Machine learning</italic>.</source>ACM.<year>2004</year>;<fpage>29</fpage>.
                    <pub-id pub-id-type="doi">10.1145/1015330.1015408</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-40">
                <label>40</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Pearson</surname>
                            <given-names>K</given-names>
                        </name>
</person-group>:
                    <article-title>LIII. 
                        <italic toggle="yes">On lines and planes of closest fit to systems of points in space</italic>.</article-title>
                    <source>

                        <italic toggle="yes">Lond Edinb Dublin Philos Mag J Sci.</italic>
</source>
                    <year>1901</year>;<volume>2</volume>(<issue>11</issue>):<fpage>559</fpage>&#x2013;<lpage>572</lpage>.
                    <pub-id pub-id-type="doi">10.1080/14786440109462720</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-41">
                <label>41</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Tou</surname>
                            <given-names>JT</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gonzalez</surname>
                            <given-names>RC</given-names>
                        </name>
</person-group>:
                    <article-title>Pattern Recognition Principles. </article-title>Addison-Wesley.; Reading, MA.<year>1977</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://books.google.co.in/books/about/Pattern_Recognition_Principles.html?id=Bb9QAAAAYAAJ&amp;source=kp_book_description&amp;redir_esc=y">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-42">
                <label>42</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Dem&#x0161;ar</surname>
                            <given-names>U</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Harris</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Brunsdon</surname>
                            <given-names>C</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Principal Component Analysis on Spatial Data: An Overview.</article-title>
                    <source>

                        <italic toggle="yes">Ann Assoc Am Geogr.</italic>
</source>
                    <year>2013</year>;<volume>103</volume>(<issue>1</issue>):<fpage>106</fpage>&#x2013;<lpage>128</lpage>.
                    <pub-id pub-id-type="doi">10.1080/00045608.2012.689236</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-43">
                <label>43</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Jolliffe</surname>
                            <given-names>IT</given-names>
                        </name>
</person-group>:
                    <article-title>Choosing a Subset of Principal Components or Variables.</article-title>In
                    <italic toggle="yes">Principal Component Analysis</italic>; Springer Series in Statistics; Springer: New York, NY.<year>2002</year>;<fpage>111</fpage>&#x2013;<lpage>149</lpage>.
                    <pub-id pub-id-type="doi">10.1007/0-387-22440-8_6</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-44">
                <label>44</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hartigan</surname>
                            <given-names>JA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wong</surname>
                            <given-names>MA</given-names>
                        </name>
</person-group>:
                    <article-title>Algorithm AS 136: A K-Means Clustering Algorithm.</article-title>
                    <source>

                        <italic toggle="yes">Appl Stat.</italic>
</source>
                    <year>1979</year>;<volume>28</volume>(<issue>1</issue>):<fpage>100</fpage>&#x2013;<lpage>108</lpage>.
                    <pub-id pub-id-type="doi">10.2307/2346830</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-45">
                <label>45</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Balk</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Yetman</surname>
                            <given-names>G</given-names>
                        </name>
</person-group>:
                    <article-title>The global distribution of population: evaluating the gains in resolution refinement.</article-title>
                    <source>

                        <italic toggle="yes">N Y Cent Int Earth Sci Inf Netw CIESIN Columbia Univ.</italic>
</source>
                    <year>2004</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://sedac.ciesin.columbia.edu/downloads/docs/gpw-v3/gpw3_documentation_final.pdf">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-46">
                <label>46</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Tatem</surname>
                            <given-names>AJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Noor</surname>
                            <given-names>AM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>von Hagen</surname>
                            <given-names>C</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>High resolution population maps for low income nations: combining land cover and census in East Africa.</article-title>
                    <source>

                        <italic toggle="yes">PLoS One.</italic>
</source>
                    <year>2007</year>;<volume>2</volume>(<issue>12</issue>):<fpage>e1298</fpage>.
                    <pub-id pub-id-type="pmid">18074022</pub-id>
                    <pub-id pub-id-type="doi">10.1371/journal.pone.0001298</pub-id>
                    <pub-id pub-id-type="pmcid">2110897</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-47">
                <label>47</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Horvitz</surname>
                            <given-names>DG</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Thompson</surname>
                            <given-names>DJ</given-names>
                        </name>
</person-group>:
                    <article-title>A generalization of sampling without replacement from a finite universe.</article-title>
                    <source>

                        <italic toggle="yes">J Am Stat Assoc.</italic>
</source>
                    <year>1952</year>;<volume>47</volume>(<issue>260</issue>):<fpage>663</fpage>&#x2013;<lpage>685</lpage>.
                    <pub-id pub-id-type="doi">10.2307/2280784</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-48">
                <label>48</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Chao</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <article-title>Nonparametric Estimation of the Number of Classes in a Population.</article-title>
                    <source>

                        <italic toggle="yes">Scand J Stat.</italic>
</source>
                    <year>1984</year>;<volume>11</volume>(<issue>4</issue>):<fpage>265</fpage>&#x2013;<lpage>270</lpage>.
                    <ext-link ext-link-type="uri" xlink:href="https://www.jstor.org/stable/4615964?seq=1">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-49">
                <label>49</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Bollinger</surname>
                            <given-names>CR</given-names>
                        </name>
</person-group>:
                    <article-title>Measurement Error in the Current Population Survey: A Nonparametric Look.</article-title>
                    <source>

                        <italic toggle="yes">J Labor Econ.</italic>
</source>
                    <year>1998</year>;<volume>16</volume>(<issue>3</issue>):<fpage>576</fpage>&#x2013;<lpage>594</lpage>.
                    <pub-id pub-id-type="doi">10.1086/209899</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-50">
                <label>50</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Monti</surname>
                            <given-names>KL</given-names>
                        </name>
</person-group>:
                    <article-title>Folded Empirical Distribution Function Curves-Mountain Plots.</article-title>
                    <source>

                        <italic toggle="yes">Am Stat.</italic>
</source>
                    <year>1995</year>;<volume>49</volume>(<issue>4</issue>):<fpage>342</fpage>&#x2013;<lpage>345</lpage>.
                    <pub-id pub-id-type="doi">10.2307/2684570</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-51">
                <label>51</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Heeringa</surname>
                            <given-names>SG</given-names>
                        </name>

                        <name name-style="western">
                            <surname>West</surname>
                            <given-names>BT</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Berglund</surname>
                            <given-names>PA</given-names>
                        </name>
</person-group>:
                    <article-title>Applied survey data analysis. </article-title>CRC Press.; Boca Raton.<year>2017</year>.
                    <pub-id pub-id-type="doi">10.1201/9781315153278</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-52">
                <label>52</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Smirnov</surname>
                            <given-names>N</given-names>
                        </name>
</person-group>:
                    <article-title>Table for estimating the goodness of fit of empirical distributions.</article-title>
                    <source>

                        <italic toggle="yes">Ann Math Stat.</italic>
</source>
                    <year>1948</year>;<volume>19</volume>(<issue>2</issue>):<fpage>279</fpage>&#x2013;<lpage>281</lpage>.
                    <pub-id pub-id-type="doi">10.1214/aoms/1177730256</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-53">
                <label>53</label>
                <mixed-citation publication-type="journal">
                    <collab>The World Bank Group</collab>:
                    <article-title> Democratic Republic of Congo Urbanization Review &#x2014; Productive and Inclusive Cities for an Emerging Democratic Republic of Congo. </article-title>Directions in Development; Washington DC US.<year>2017</year>;<fpage>89</fpage>.
                    <pub-id pub-id-type="doi">10.1596/978-1-4648-1203-3</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-54">
                <label>54</label>
                <mixed-citation publication-type="journal">
                    <collab>United Nations</collab>:
                    <article-title> Principles and recommendations for population and housing censuses. </article-title>Department of Economic and Social Affairs, Statistics Division; UN: New York.<year>2008</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://www.un-ilibrary.org/population-and-demography/principles-and-recommendations-for-population-and-housing-censuses_be1ae14b-en">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-55">
                <label>55</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Duda</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Canty</surname>
                            <given-names>M</given-names>
                        </name>
</person-group>:
                    <article-title>Unsupervised classification of satellite imagery: Choosing a good algorithm.</article-title>
                    <source>

                        <italic toggle="yes">Int J Remote Sens.</italic>
</source>
                    <year>2002</year>;<volume>23</volume>(<issue>11</issue>):<fpage>2193</fpage>&#x2013;<lpage>2212</lpage>.
                    <pub-id pub-id-type="doi">10.1080/01431160110078467</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-56">
                <label>56</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Qader</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lefebvre</surname>
                            <given-names>V</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ninneman</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A Novel Approach to the Automatic Designation of Predefined Census Enumeration Areas and Population Sampling Frames: A Case Study in Somalia. </article-title>Policy Research Working Papers; The World Bank.<year>2019</year>.
                    <pub-id pub-id-type="doi">10.1596/1813-9450-8972</pub-id>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report28546">
        <front-stub>
            <article-id pub-id-type="doi">10.21956/gatesopenres.14272.r28546</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>De Herdt</surname>
                        <given-names>Tom</given-names>
                    </name>
                    <xref ref-type="aff" rid="r28546a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-5288-7446</uri>
                </contrib>
                <aff id="r28546a1">
                    <label>1</label>Institute of Development Policy (IOB), University of Antwerp, Antwerp, Belgium</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>12</day>
                <month>3</month>
                <year>2020</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2020 De Herdt T</copyright-statement>
                <copyright-year>2020</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport28546" related-article-type="peer-reviewed-article" xlink:href="10.12688/gatesopenres.13107.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The paper presents an innovative way to provide for an alternative to "bottom-up" sampling for household surveys, by proposing a two-stage "top-down" sampling technique that makes use of all the available geo-datasets -and that duly corrects as much as possible for possible errors.</p>
            <p> I find the exercise generally very convincing and also welcome, especially in cases like the DRC where bottom-up data are virtually absent.&#x00a0;</p>
            <p> </p>
            <p> I also find the paper particularly well developed, it also very clearly indicates the original data sources, allowing -and almost inviting- readers to engage in further inquiry or replication.&#x00a0;</p>
            <p> The only element I found lacking perhaps is a performance test of the new method compared to the sampling used in one or more existing surveys: would the "newly sampled"&#x00a0;results, in the end, significantly differ from the results derived from the usual method? Such an exercise might give a good indication of the value/usefulness of this new sampling method.</p>
            <p>Is the rationale for developing the new method (or application) clearly explained?</p>
            <p>Yes</p>
            <p>Is the description of the method technically sound?</p>
            <p>Yes</p>
            <p>Are the conclusions about the method and its performance adequately supported by the findings presented in the article?</p>
            <p>Yes</p>
            <p>If any results are presented, are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Partly</p>
            <p>Are sufficient details provided to allow replication of the method development and its use by others?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>economics, experience in analysing household surveys, particularly in the DRC.</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report28545">
        <front-stub>
            <article-id pub-id-type="doi">10.21956/gatesopenres.14272.r28545</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Goovaerts</surname>
                        <given-names>Pierre</given-names>
                    </name>
                    <xref ref-type="aff" rid="r28545a1">1</xref>
                    <role>Referee</role>
                </contrib>
                <aff id="r28545a1">
                    <label>1</label>BioMedware, Inc., Ann Arbor, MI, USA</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>11</day>
                <month>3</month>
                <year>2020</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2020 Goovaerts P</copyright-statement>
                <copyright-year>2020</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport28545" related-article-type="peer-reviewed-article" xlink:href="10.12688/gatesopenres.13107.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>This well-illustrated paper is proposing a grid-based sample design framework where contextual&#x00a0;stratification&#x00a0;and&#x00a0;proportional&#x00a0;to&#x00a0;population&#x00a0;size&#x00a0;sampling are combined to achieve representative sampling for household surveys. This framework is targeted to low- and middle-income countries and is illustrated with case study developed in two provinces located in the western part of the Democratic Republic of Congo.</p>
            <p> I only have a few suggestions to improve the paper: 
                <list list-type="order">
                    <list-item>
                        <p>The spatial nature of the data could be incorporated into the classification algorithm using any type of spatially-constrained clustering; either by incorporating a measure of geographical proximity directly into the computation of the dissimilarity matrix (e.g., Oliver and Webster, 1989
                            <sup>
                                <xref ref-type="bibr" rid="rep-ref-28545-1">1</xref>
                            </sup>) or the application of contiguity-constrained hierarchical agglomerative clustering approach (e.g., Recchia, 2010
                            <sup>
                                <xref ref-type="bibr" rid="rep-ref-28545-2">2</xref>
                            </sup>). This should reduce the salt-and-pepper effect observed by the authors.</p>
                    </list-item>
                    <list-item>
                        <p>It might be worth exploring the imposition of a minimum separation distance between sampling units in order to ensure a spatially representative sample while satisfying the other constraints (contextual stratification, proportional&#x00a0;to&#x00a0;population&#x00a0;size&#x00a0;sampling).</p>
                    </list-item>
                    <list-item>
                        <p>The caption of Fig. 7 should be modified as follows: &#x201c;the 
                            <underline>W</underline>ECDF as coloured lines&#x201d;.</p>
                    </list-item>
                </list>
            </p>
            <p>Is the rationale for developing the new method (or application) clearly explained?</p>
            <p>Yes</p>
            <p>Is the description of the method technically sound?</p>
            <p>Yes</p>
            <p>Are the conclusions about the method and its performance adequately supported by the findings presented in the article?</p>
            <p>Yes</p>
            <p>If any results are presented, are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Are sufficient details provided to allow replication of the method development and its use by others?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>geostatistics</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
        <back>
            <ref-list>
                <title>References</title>
                <ref id="rep-ref-28545-1">
                    <label>1</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>A geostatistical basis for spatial weighting in multivariate classification</article-title>.
                        <source>
                            <italic>Mathematical Geology</italic>
                        </source>.<year>1989</year>;<volume>21</volume>(<issue>1</issue>) :
                        <elocation-id>10.1007/BF00897238</elocation-id>
                        <fpage>15</fpage>-<lpage>35</lpage>
                        <pub-id pub-id-type="doi">10.1007/BF00897238</pub-id>
                    </mixed-citation>
                </ref>
                <ref id="rep-ref-28545-2">
                    <label>2</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>Contiguity-Constrained Hierarchical Agglomerative Clustering UsingSAS</article-title>.
                        <source>
                            <italic>Journal of Statistical Software</italic>
                        </source>.<year>2010</year>;<volume>33</volume>(<issue>Code Snippet 2</issue>) :
                        <elocation-id>10.18637/jss.v033.c02</elocation-id>
                        <pub-id pub-id-type="doi">10.18637/jss.v033.c02</pub-id>
                    </mixed-citation>
                </ref>
            </ref-list>
        </back>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report28510">
        <front-stub>
            <article-id pub-id-type="doi">10.21956/gatesopenres.14272.r28510</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Stein</surname>
                        <given-names>Alfred</given-names>
                    </name>
                    <xref ref-type="aff" rid="r28510a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-9456-1233</uri>
                </contrib>
                <aff id="r28510a1">
                    <label>1</label>Faculty of Geo-information Science and Earth Observation (ITC), University of Twente, Enschede, The Netherlands</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>4</day>
                <month>2</month>
                <year>2020</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2020 Stein A</copyright-statement>
                <copyright-year>2020</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport28510" related-article-type="peer-reviewed-article" xlink:href="10.12688/gatesopenres.13107.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>A review of &#x2018;A grid-based sample design framework for household surveys&#x2019;, by Gianluca Boo et al.</p>
            <p> The paper describes the set up and implementation of a household survey carried out in the western Kongo. It is a study of clear interest and relevance, although relatively simple in its different aspects. In fact, the introduction is promising much more than what is delivered in the paper. For instance, the role of geostatistics (hence of spatial dependencies) disappears shortly after equation 1. But what comes out of it in the end, i.e. the implementation, can certainly serve as a &#x2018;framework&#x2019;. Also the case study has its merits, and in particular figure 7 is convincing. The following changes should be made to make the manuscript acceptable for indexing: 
                <list list-type="bullet">
                    <list-item>
                        <p>Adjust the introduction such that it becomes more realistic and in line with the framework as presented.</p>
                    </list-item>
                    <list-item>
                        <p>The terms &#x2018;frame&#x2019; and &#x2018;framework&#x2019; need a definition.</p>
                    </list-item>
                    <list-item>
                        <p>Figure 8, at the left side, has a strange red line, increasing from about 0.18 until 0.35. This artifact of the software should be removed.</p>
                    </list-item>
                    <list-item>
                        <p>In the discussion section there is a mentioning of representative and non-representative samples. This should be further considered, as so far the sampling is done mainly in a design -based frame. There is literature, notably by Brus et al. that integrate design-based sampling with model-based sampling. I would appreciate it if the authors could add a paragraph on this frame in the discussion section.</p>
                    </list-item>
                    <list-item>
                        <p>Also: much is 
                            <italic>not</italic> considered in this paper, like costs, cost-effectiveness, a justification for the choice of the KS-distance, the role of PCA, and (as often happens in developing countries) extending the sampling to more than one variable. This puts other constraints on the framework. The authors should concentrate on these aspects as well.</p>
                    </list-item>
                </list>
            </p>
            <p>Is the rationale for developing the new method (or application) clearly explained?</p>
            <p>Partly</p>
            <p>Is the description of the method technically sound?</p>
            <p>Yes</p>
            <p>Are the conclusions about the method and its performance adequately supported by the findings presented in the article?</p>
            <p>Yes</p>
            <p>If any results are presented, are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Are sufficient details provided to allow replication of the method development and its use by others?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>Spatial statistics, spatial sampling</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
    </sub-article>
</article>
