<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">Gates Open Res</journal-id>
            <journal-title-group>
                <journal-title>Gates Open Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2572-4754</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/gatesopenres.12999.1</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Research Article</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>"What is the best method of family planning for me?": a text mining analysis of messages between users and agents of a digital health service in Kenya</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 1; peer review: 2 approved]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Green</surname>
                        <given-names>Eric P</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Funding Acquisition</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Validation</role>
                    <role content-type="http://credit.niso.org/">Visualization</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Whitcomb</surname>
                        <given-names>Alexandra</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Kahumbura</surname>
                        <given-names>Cynthia</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a3">3</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Rosen</surname>
                        <given-names>Joseph G</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-4991-4033</uri>
                    <xref ref-type="aff" rid="a4">4</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Goyal</surname>
                        <given-names>Siddhartha</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Funding Acquisition</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Achieng</surname>
                        <given-names>Daphine</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a3">3</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Bellows</surname>
                        <given-names>Ben</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Funding Acquisition</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-9205-6623</uri>
                    <xref ref-type="aff" rid="a2">2</xref>
                    <xref ref-type="aff" rid="a5">5</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>Duke Global Health Institute, Duke University, Durham, NC, 27708, USA</aff>
                <aff id="a2">
                    <label>2</label>Nivi, 40 Tall Pine Drive, Sudbury, MA, 01776, USA</aff>
                <aff id="a3">
                    <label>3</label>AskNivi Limited, Windsor House, University Way, Nairobi, PO Box 34430-00100, Kenya</aff>
                <aff id="a4">
                    <label>4</label>Population Council, Plot #3670, Mwaleshi Rd, Lusaka, 10101, Zambia</aff>
                <aff id="a5">
                    <label>5</label>Population Council, 4301 Connecticut Ave NW # 280, Washington, DC, 20008, USA</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:eric.green@duke.edu">eric.green@duke.edu</email>
                </corresp>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>The Nivi affiliated authors all have a financial interest in the company Nivi that created the software that generated the data for this analysis.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>29</day>
                <month>5</month>
                <year>2019</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2019</year>
            </pub-date>
            <volume>3</volume>
            <elocation-id>1475</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>21</day>
                    <month>5</month>
                    <year>2019</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2019 Green EP et al.</copyright-statement>
                <copyright-year>2019</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://gatesopenresearch.org/articles/3-1475/pdf"/>
            <abstract>
                <p>
                    <bold>Background</bold>: Text message-based interventions have been shown to have consistently positive effects on health improvement and behavior change. Some studies suggest that personalization, tailoring, and interactivity can increase efficacy. With the rise in artificial intelligence and its incorporation into interventions, there is an opportunity to rethink how these characteristics are designed for greater effect. A key step in this process is to better understand how users engage with interventions. In this paper, we apply a text mining approach to characterize the ways that Kenyan men and women communicated with the first iterations of 
                    <italic toggle="yes">askNivi</italic>, a free sexual and reproductive health information service. </p>
                <p>
                    <bold>Methods</bold>: We tokenized and processed more than 179,000 anonymized messages that users exchanged with live agents, enabling us to count word frequency overall, by sex, and by age/sex cohorts. We also conducted two manual coding exercises: (1) We manually classified the intent of 3,834 user messages in a training dataset; and (2) We manually coded all conversations between a random subset of 100 users who engaged in extended chats. </p>
                <p>
                    <bold>Results</bold>: Between September 2017 and January 2019, 28,021 users (mean age 22.5 years, 63% female) sent 87,180 messages to 
                    <italic toggle="yes">askNivi,</italic> and 18 agents sent 92,429 replies. Users wrote most often about family planning methods, contraception, side effects, pregnancy, menstruation, and sex, but we observed different patterns by sex and age. User intents largely reflected the marketing focus on reproductive health, but other topics emerged. Most users sought factual information, but requests for advice and symptom reports were common. </p>
                <p>
                    <bold>Conclusions</bold>: Young people in Kenya have a great desire for accurate and reliable information on health and wellbeing, which is easy to access and trustworthy. Text mining is one way to better understand how users engage with interventions like 
                    <italic toggle="yes">askNivi</italic> and maximize what artificial intelligence has to offer.</p>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>digital health</kwd>
                <kwd>reproductive health</kwd>
                <kwd>sms</kwd>
                <kwd>text mining</kwd>
                <kwd>kenya</kwd>
            </kwd-group>
            <funding-group>
                <award-group id="fund-1">
                    <funding-source>Merck for Mothers</funding-source>
                </award-group>
                <award-group id="fund-2" xlink:href="http://dx.doi.org/10.13039/100000865">
                    <funding-source>Gates Foundation</funding-source>
                    <award-id>OPP1181398</award-id>
                </award-group>
                <funding-statement>This work was supported by the Gates Foundation [OPP1181398]. This work was supported by Merck for Mothers.</funding-statement>
                <funding-statement>
                    <italic>The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</italic>
                </funding-statement>
            </funding-group>
        </article-meta>
    </front>
    <body>
        <sec sec-type="intro">
            <title>Introduction</title>
            <p>An active area of research and practice falling under the umbrella of digital health is the development and evaluation of mobile text message-based interventions for health improvement and behavior change. Health topics and behaviors targeted in these interventions have included disease management, medication adherence, smoking cessation, weight loss, sexual health, contraception, among many others (
                <xref ref-type="bibr" rid="ref-8">Hall 
                    <italic toggle="yes">et al</italic>., 2015</xref>). Until recently, the most common channel of communication in these interventions has been short message service, better known as SMS or text messaging, a feature available on all mobile phones, which lets users read and compose alphanumeric messages of up to 160 characters. Since at least 2015, however, mobile phone messaging applications such as WhatsApp and Facebook Messenger have eclipsed SMS in terms of daily message volume (
                <xref ref-type="bibr" rid="ref-17">Perez, 2016</xref>; 
                <xref ref-type="bibr" rid="ref-24">Sparkes, 2015</xref>).</p>
            <p>Research findings pointing to the efficacy of text message-based interventions have been summarized in more than a dozen systematic reviews and meta-analyses, as well as several systematic reviews of reviews (
                <xref ref-type="bibr" rid="ref-8">Hall 
                    <italic toggle="yes">et al</italic>., 2015</xref>; 
                <xref ref-type="bibr" rid="ref-10">Househ, 2016</xref>). Meta-analyses have consistently reported average standardized effect sizes of 0.24 (0.16&#x2013;0.32; 
                <xref ref-type="bibr" rid="ref-1">Armanasco 
                    <italic toggle="yes">et al</italic>., 2017</xref>), 0.29 (0.22&#x2013;0.36; 
                <xref ref-type="bibr" rid="ref-16">Orr &amp; King, 2015</xref>), and 0.33 (0.24&#x2013;0.39; 
                <xref ref-type="bibr" rid="ref-9">Head 
                    <italic toggle="yes">et al</italic>., 2013</xref>) across prevention and health promotion interventions targeting diverse health topics. Given the low unit cost of communication via text message, these modest effect sizes suggest that text message-based interventions have the potential to be highly cost-effective.</p>
            <p>While the evidence is mixed (
                <xref ref-type="bibr" rid="ref-1">Armanasco 
                    <italic toggle="yes">et al</italic>., 2017</xref>), several reviews report that intervention personalization (e.g., including a user&#x2019;s name), tailoring (e.g., outbound messages are determined by a user&#x2019;s previous responses), and interactivity (e.g., two-way vs one-way messaging) may boost the efficacy of these interventions (
                <xref ref-type="bibr" rid="ref-8">Hall 
                    <italic toggle="yes">et al</italic>., 2015</xref>). It is important to consider, however, that these findings represent an initial signal from 
                <italic toggle="yes">first generation</italic> text message-based interventions. A 
                <italic toggle="yes">second generation</italic> of interventions is emerging with the rapid integration of artificial intelligence (AI) into health applications, which could enable greater personalization, smarter tailoring, and more engaging interactivity (
                <xref ref-type="bibr" rid="ref-20">Shaban-Nejad 
                    <italic toggle="yes">et al</italic>., 2018</xref>; 
                <xref ref-type="bibr" rid="ref-25">USAID, 2019</xref>; 
                <xref ref-type="bibr" rid="ref-26">Wahl 
                    <italic toggle="yes">et al</italic>., 2018</xref>).</p>
            <p>A well-known challenge in creating AI-informed applications for health is the need for large amounts of training data. A related challenge is that the training data must be relevant to the context of the intended end users. For instance, systems trained on data from US-based patients may not generalize to the needs of users living in low- and middle-income countries (
                <xref ref-type="bibr" rid="ref-25">USAID, 2019</xref>). To create this new generation of applications that will benefit diverse populations, we should apply design thinking principles (
                <xref ref-type="bibr" rid="ref-4">Brown, 2008</xref>) and seek to understand how users engage with our interventions.</p>
            <p>One approach for exploring and understanding user engagement is text mining. Text mining is the practice of using automated tools to examine large amounts of free-form text, summarize the contents, uncover interesting patterns, and generate new insights from the data. The most active areas of research in text mining for health have been the analysis of electronic medical records (
                <xref ref-type="bibr" rid="ref-11">Koleck 
                    <italic toggle="yes">et al</italic>., 2019</xref>; 
                <xref ref-type="bibr" rid="ref-12">Kreimeyer 
                    <italic toggle="yes">et al</italic>., 2017</xref>) and social media messages (
                <xref ref-type="bibr" rid="ref-22">Sinnenberg 
                    <italic toggle="yes">et al</italic>., 2017</xref>). There have been few published analyses of two-way communication transcripts. 
                <xref ref-type="bibr" rid="ref-27">Ye 
                    <italic toggle="yes">et al</italic>. (2010)</xref> published a systematic review of studies, which examined e-mail communication between patients and providers. 
                <xref ref-type="bibr" rid="ref-3">Blanc 
                    <italic toggle="yes">et al</italic>. (2016)</xref> conducted a content analysis of messages that users in Nigeria sent to a sexual and reproductive health question and answer service called 
                <italic toggle="yes">MyQuestion</italic>. Despite a large and growing body of evidence around the efficacy of text message-based interventions, there has been scant attention to the study of user engagement through text mining.</p>
            <p>In this paper, we conduct a text mining analysis of the inbound and outbound messages to 
                <italic toggle="yes">askNivi</italic>, a free sexual and reproductive health information service currently operating in Kenya and India (
                <xref ref-type="bibr" rid="ref-13">Nivi, 2018</xref>). Users can send free-form messages to 
                <italic toggle="yes">askNivi</italic> via SMS or Facebook Messenger and interact with automated conversation modules or live customer success agents. The aim of 
                <italic toggle="yes">askNivi</italic> is to provide health information, referrals to health products and services, and encouragement to take action that will promote health and well-being. The objective of this analysis is to characterize the ways that Kenyan men and women communicated with the first iterations of 
                <italic toggle="yes">askNivi</italic> about their health inquiries to inform future content development, tailoring, and automation.</p>
        </sec>
        <sec sec-type="methods">
            <title>Methods</title>
            <p>The data for this secondary analysis comes from a query of the Kenya 
                <italic toggle="yes">askNivi</italic> database for all valid inbound and outbound SMS messages handled by customer success agents between September 2017 and January 2019. The query resulted in 179,609 total messages (87,180 inbound and 92,429 outbound). The data were anonymized prior to analysis.</p>
            <sec>
                <title>Language detection</title>
                <p>We conducted all data processing and analysis in R version 3.5 (
                    <xref ref-type="bibr" rid="ref-18">R Core Team, 2018</xref>). To detect the language of each message, we used the 
                    <monospace>cld2</monospace> package (v1.2; 
                    <xref ref-type="bibr" rid="ref-14">Ooms, 2018a</xref>) to access Google&#x2019;s Compact Language Detector 2 (
                    <xref ref-type="bibr" rid="ref-23">Sites, 2013</xref>), a na&#x00ef;ve Bayes classifier that probabilistically detects 83 languages, including English and Swahili. The classifier detected that users sent 47% of messages in English, 25% in Swahili, and failed to detect either language in 28% of incoming messages. When the language could not be automatically detected for a message, we set the missing language label to the dominant language detected in the user&#x2019;s inbound messages during the same calendar week. For instance, if a person sent 10 messages in one week (e.g., six English, two Swahili, and two with no language detected), we set the two missing language labels to English.</p>
            </sec>
            <sec>
                <title>Text mining</title>
                <p>We used the 
                    <monospace>tidytext</monospace> package (v0.2.0; 
                    <xref ref-type="bibr" rid="ref-21">Silge &amp; Robinson, 2016</xref>) to analyze word frequency and relationships in all inbound messages. The first step was to tokenize each message into its component words, strip all punctuation, and convert the tokens to lowercase. We filtered out 1149 English stop words (e.g., "a", "the", "and") from three lexicons compiled in the 
                    <monospace>tidytext</monospace> package and 74 Swahili stop words from a multi-language collection of stop words (
                    <xref ref-type="bibr" rid="ref-5">Diaz, 2017</xref>). We added custom stop words to these lists for both languages based on our initial review of the tokens. See 
                    <italic toggle="yes">Extended data</italic> for a full list (
                    <xref ref-type="bibr" rid="ref-7">Green, 2019</xref>).</p>
                <p>Prior to counting the frequency of individual word tokens, we also tokenized each message by consecutive words to identify the most common bigrams. This allowed us to tally key terms as pairs rather than as individual words. For instance, when the word &#x201c;family&#x201d; immediately preceded the word &#x201c;planning&#x201d;, we tallied &#x201c;family&#x201d; as part of the bigram &#x201c;family planning&#x201d;. However, if &#x201c;family&#x201d; occurred on its own, e.g., &#x201c;I do not want to start a family&#x201d;, then we tallied family as an individual term.</p>
                <p>We used the 
                    <monospace>hunspell</monospace> package (v3.0; 
                    <xref ref-type="bibr" rid="ref-15">Ooms, 2018b</xref>) to detect possible misspellings and suggest corrections but ultimately decided to only accept suggestions for English words that appeared fewer than four times in the corpus, given our concerns about reliability. We did not accept any Swahili spelling suggestions.</p>
                <p>We used the 
                    <monospace>textstem</monospace> package (v0.1.4; 
                    <xref ref-type="bibr" rid="ref-19">Rinker, 2018</xref>) to conduct lemmatization on the English words and identify the base form of each word - its lemma. We opted to conduct lemmatization over stemming to avoid the creation of non-word stems. Following this process, we ran an initial word frequency count and conducted a manual review to identify synonyms that could be combined into one label (e.g., &#x201c;period&#x201d; and &#x201c;menses&#x201d; combined into &#x201c;period&#x201d;; see 
                    <italic toggle="yes">Extended data</italic> for the full list; 
                    <xref ref-type="bibr" rid="ref-7">Green, 2019</xref>). After combining like terms, we conducted another frequency analysis using the 
                    <monospace>tidytext</monospace> package to get the final count of each word or key bigram in the corpus of messages.</p>
            </sec>
            <sec>
                <title>Intent analysis</title>
                <p>When a user sends a question or statement to 
                    <italic toggle="yes">askNivi</italic>, a natural language processing algorithm trained on past submissions classifies the user&#x2019;s intent. For instance, a user might send a message that reads, &#x201c;I want to find a good method of family planning&#x201d;. The intent behind this question - what the user wants to know or do - is &#x201c;find a method of contraception&#x201d;. To develop the training set to build a predictive model for automated intent classification, we built a simple web application that enabled our agents to read each question and manually label the intent (
                    <xref ref-type="bibr" rid="ref-6">Green, 2018</xref>). Each question was presented for classification until two different agents agreed on the best intent label. Through this process, we manually classified 3,303 English and Swahili language messages in the training dataset. In this paper, we present a descriptive summary of user intent.</p>
            </sec>
            <sec>
                <title>Conversation analysis</title>
                <p>To analyze the structure of extended exchanges between users and agents, we selected a random sample of 50 men and 50 women who sent at least seven English messages to 
                    <italic toggle="yes">askNivi</italic> during one calendar week. A member of the team read the collection of 2,590 messages and qualitatively coded the number of distinct conversations each user had with agents, the topics discussed in each conversation, and the message-level components of each exchange (e.g., questions, responses, greetings).</p>
            </sec>
            <sec>
                <title>Preparation of data for sharing</title>
                <p>Anonymized message meta-data with calendar week time stamps is archived along with the tokenized word frequencies (
                    <xref ref-type="bibr" rid="ref-7">Green, 2019</xref>). To prevent accidental sharing of private information due to possible imperfect anonymization, we omitted terms that appear fewer than three times in the corpus.</p>
            </sec>
            <sec>
                <title>Ethical statement</title>
                <p>Our study protocol was screened by the Duke University Institutional Review Board and determined to be exempt from further review.</p>
            </sec>
        </sec>
        <sec sec-type="results">
            <title>Results</title>
            <sec>
                <title>Descriptive summary of 
                    <italic toggle="yes">askNivi</italic> usage</title>
                <p>
                    <bold>
                        <italic toggle="yes">Users.</italic>
                    </bold> Between week 38 of 2017 and week 5 of 2019, 28,021 users sent 87,180 messages to 
                    <italic toggle="yes">askNivi,</italic> and 18 agents sent 92,429 replies (
                    <xref ref-type="bibr" rid="ref-7">Green, 2019</xref>). Questions that required input from a medical advisor were handled by nurses at a local maternity hospital. Nurse replies accounted for 1.4% of these outbound messages. 
                    <xref ref-type="table" rid="T1">Table 1</xref> displays user characteristics. Nearly two-thirds of users were female, and roughly half indicated they preferred to receive messages in English. The average user was 22.5 years old (SD=6.4).</p>
                <table-wrap id="T1" orientation="portrait" position="anchor">
                    <label>Table 1. </label>
                    <caption>
                        <title>Characteristics of users.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">Variable</th>
                                <th align="right" colspan="1" rowspan="1" valign="top">Values</th>
                                <th align="right" colspan="1" rowspan="1" valign="top">Missing</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Users, N</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">28,021</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">0</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Female, %</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">63.2</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">12,528</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Prefers English, %</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">46.7</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">303</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Mean Age (SD)</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">22.5 (6.4)</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">13,801</td>
                            </tr>
                        </tbody>
                    </table>
                    <table-wrap-foot>
                        <fn>
                            <p>

                                <italic toggle="yes">Note.</italic> User sex was not routinely captured during the entire period covered by the dataset.</p>
                        </fn>
                        <fn>
                            <p>SD, standard deviation</p>
                        </fn>
                    </table-wrap-foot>
                </table-wrap>
                <p>
                    <bold>
                        <italic toggle="yes">Message-level language.</italic>
                    </bold> Of the 87,180 inbound messages received by 
                    <italic toggle="yes">askNivi</italic>, 63% were written in English, 36% in Swahili, and 2% messages could not be labeled after imputing the dominant language. Nearly one out of five users (18%) sent at least one message in a language that did not match their stated language preference, and 14% of all inbound messages were discordant in terms of a user&#x2019;s stated language preference and the detected language.</p>
                <p>
                    <bold>
                        <italic toggle="yes">Patterns of engagement.</italic>
                    </bold> The median user sent 2.0 messages during the period covered by the dataset (M=3.1, SD=4.3). Inbound message volume fluctuated over time. 
                    <xref ref-type="fig" rid="f1">Figure 1</xref> shows a spike in total inbound message volume during the middle of 2018, followed by a drop-off that corresponded to an intentional pause in marketing of the service. 
                    <italic toggle="yes">askNivi</italic> ended the study period with roughly 2000 inbound messages per week.</p>
                <fig fig-type="figure" id="f1" orientation="portrait" position="float">
                    <label>Figure 1. </label>
                    <caption>
                        <title>Volume of inbound messages per week.</title>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://gatesopenresearch-files.f1000.com/manuscripts/14107/9c3dc769-df57-47ce-81c9-4c16ff262820_figure1.gif"/>
                </fig>
                <p>
                    <xref ref-type="fig" rid="f2">Figure 2</xref> shows three main patterns in communication: 40% of users sent one message and received one reply (orange), 9% sent multiple messages and received one reply (purple), and 46% sent and received multiple messages (green). This pattern was broadly similar for women and men, but men were 1.7 times as likely as women to send only one message to 
                    <italic toggle="yes">askNivi</italic>.</p>
                <fig fig-type="figure" id="f2" orientation="portrait" position="float">
                    <label>Figure 2. </label>
                    <caption>
                        <title>Alluvial diagram of inbound and outbound messages.</title>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://gatesopenresearch-files.f1000.com/manuscripts/14107/9c3dc769-df57-47ce-81c9-4c16ff262820_figure2.gif"/>
                </fig>
            </sec>
            <sec>
                <title>Text analysis</title>
                <p>
                    <xref ref-type="fig" rid="f3">Figure 3</xref> shows the top 25 most frequently occurring pairs of adjacent words, or bigrams (A), and single words (B) in English language messages sent to 
                    <italic toggle="yes">askNivi</italic>. Users wrote most often about family planning methods, contraception, side effects, pregnancy, menstruation, and sex.</p>
                <fig fig-type="figure" id="f3" orientation="portrait" position="float">
                    <label>Figure 3. </label>
                    <caption>
                        <p>Most frequently occurring bigrams (
                            <bold>A</bold>) and words (
                            <bold>B</bold>) in incoming messages (English). ksh, Kenyan shillings; HIV, human immunodeficiency virus; AIDS, acquired immunodeficiency syndrome.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://gatesopenresearch-files.f1000.com/manuscripts/14107/9c3dc769-df57-47ce-81c9-4c16ff262820_figure3.gif"/>
                </fig>
                <p>As 
                    <xref ref-type="fig" rid="f4">Figure 4</xref> shows, the most frequently used terms are broadly similar between English and Swahili messages. Eight of the top 10 English terms fell within the Swahili top 15 ranking.</p>
                <fig fig-type="figure" id="f4" orientation="portrait" position="float">
                    <label>Figure 4. </label>
                    <caption>
                        <title>Rank order of the most common words used in incoming messages by language.</title>
                        <p>This is a forced-rank chart because two words could have the same frequency of usage, but no ties are awarded.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://gatesopenresearch-files.f1000.com/manuscripts/14107/9c3dc769-df57-47ce-81c9-4c16ff262820_figure4.gif"/>
                </fig>
                <p>It is informative to examine differences in word frequency by sex and age. 
                    <xref ref-type="fig" rid="f5">Figure 5</xref> presents age disaggregated data for men, and 
                    <xref ref-type="fig" rid="f6">Figure 6</xref> presents age disaggregated data for women. An easy way to read these plots is to start at the top left, which represents the most frequent word used by adolescents.</p>
                <fig fig-type="figure" id="f5" orientation="portrait" position="float">
                    <label>Figure 5. </label>
                    <caption>
                        <title>Rank order of the most common words used in incoming messages (English) by age category, men.</title>
                        <p>This is a forced-rank chart because two words could have the same frequency of usage, but no ties are awarded.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://gatesopenresearch-files.f1000.com/manuscripts/14107/9c3dc769-df57-47ce-81c9-4c16ff262820_figure5.gif"/>
                </fig>
                <fig fig-type="figure" id="f6" orientation="portrait" position="float">
                    <label>Figure 6. </label>
                    <caption>
                        <title>Rank order of the most common words used in incoming messages (English) by age category, women.</title>
                        <p>This is a forced-rank chart because two words could have the same frequency of usage, but no ties are awarded.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://gatesopenresearch-files.f1000.com/manuscripts/14107/9c3dc769-df57-47ce-81c9-4c16ff262820_figure6.gif"/>
                </fig>
                <p>In the case of 
                    <xref ref-type="fig" rid="f5">Figure 5</xref>, the most frequent English word used by adolescent boys was sex. Follow this line across to see that it remains the most frequent word used by men of all age categories. The same is not true for the word pregnant. &#x2018;Pregnant&#x2019; was the third most common word used by adolescent boys (as in preventing pregnancy), but it fell to eighth among men in their 20&#x2019;s and mid-30&#x2019;s, and all the way to a rank of 241 among older men. The chart can also be read from right to left starting with the most frequent terms used by older men and tracing back through younger age groups. Doing so reveals that cancer and prostate were frequent topics for older men, but less so among younger men.</p>
                <p>
                    <xref ref-type="fig" rid="f6">Figure 6</xref> presents the same ranking of English word frequency for female age groups. The terms &#x2018;period&#x2019; and &#x2018;pregnant&#x2019; were common topics across all female age categories. However, young women were much more likely to write about how to identify their safe and unsafe days compared to older women. Conversely, older women chatted more frequently about family planning, conception, and cancers of the breast and cervix. Compared to men, women chatted less about sex overall, and the frequency declined with age.</p>
                <p>
                    <xref ref-type="fig" rid="f7">Figure 7</xref> visualizes common English bigrams as a network graph. The points (nodes) represent words, the lines (edges) represent the most frequent connections between words, and the arrows indicate the temporal ordering of the words. For instance, several words (&#x2018;unprotected&#x2019;, &#x2018;safe&#x2019;, &#x2018;oral&#x2019;, and &#x2018;play&#x2019;), point to the word &#x2018;sex&#x2019;, reflecting the different ways that users chatted about sex. The word &#x2018;prevent&#x2019; bridges two topic clusters that people want to avoid: pregnancy and HIV. For instance, users asked, &#x201c;How can i prevent unwanted pregnancies?&#x201d; and &#x201c;Is the use of condoms during sex completely prevent any HIV infection??&#x201d;.</p>
                <fig fig-type="figure" id="f7" orientation="portrait" position="float">
                    <label>Figure 7. </label>
                    <caption>
                        <title>Network graph of most frequent English bigrams.</title>
                        <p>The points (nodes) represent words, the lines (edges) represent the most frequent connections between words, and the arrows indicate the temporal ordering of the words. KSH, Kenyan Shillings; p2, Postinor-2; HIV, human immunodeficiency virus; AIDS, acquired immunodeficiency syndrome; FP, family planning</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://gatesopenresearch-files.f1000.com/manuscripts/14107/9c3dc769-df57-47ce-81c9-4c16ff262820_figure7.gif"/>
                </fig>
                <p>In addition to exploring the relationships between pairs of adjacent words, we also examine how English words co-occur in conversations, regardless of their position in individual messages. 
                    <xref ref-type="fig" rid="f8">Figure 8</xref> shows the words that are most associated with several key terms such as 'contraception' and 'period'. For instance, when asking about contraception and method options, users often want to know about side effects and to learn which methods are effective. Conversations about periods often involve descriptive words like 'irregular', 'normal', and 'pain', and include questions about the possibility of pregnancy with missed periods. For instance, users asked questions like, &#x201c;Why are my periods irregular?&#x201d; and &#x201c;What are some ways that can reduce abdominal pains during menstrual periods?&#x201d;.</p>
                <fig fig-type="figure" id="f8" orientation="portrait" position="float">
                    <label>Figure 8. </label>
                    <caption>
                        <title>Top pairwise correlations (phi) between English words.</title>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://gatesopenresearch-files.f1000.com/manuscripts/14107/9c3dc769-df57-47ce-81c9-4c16ff262820_figure8.gif"/>
                </fig>
            </sec>
            <sec>
                <title>Intent analysis</title>
                <p>In order to build a training dataset that would enable automated classification of user intent, we manually classified a subset of 3,834 initial utterances from users (English and Swahili). 
                    <xref ref-type="fig" rid="f9">Figure 9</xref> displays the distribution of intents and indicates whether the intent was part of an 
                    <italic toggle="yes">askNivi</italic> marketing campaign. This figure shows that the most frequent intents were related to contraception, a major focus of 
                    <italic toggle="yes">askNivi</italic> marketing. It also shows, however, that users asked questions on topics that 
                    <italic toggle="yes">askNivi</italic> was not marketing at the time, including sexually transmitted infections, symptoms, sexual health, and relationships.</p>
                <fig fig-type="figure" id="f9" orientation="portrait" position="float">
                    <label>Figure 9. </label>
                    <caption>
                        <title>Distribution of classified intents by marketing history.</title>
                        <p>STI, sexually transmitted infection; UTI, urinary tract infection.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://gatesopenresearch-files.f1000.com/manuscripts/14107/9c3dc769-df57-47ce-81c9-4c16ff262820_figure9.gif"/>
                </fig>
                <p>
                    <italic toggle="yes">askNivi</italic> was initially marketed to adolescent girls and young women, but a high demand among men led to an expansion of the intended market audience. 
                    <xref ref-type="fig" rid="f10">Figure 10</xref> shows the distribution of user intent by sex. Compared to women, men were more interested in questions related to sexual health, relationships, and sexually transmitted infections. However, overall, contraception was still the dominant theme among men and women.</p>
                <fig fig-type="figure" id="f10" orientation="portrait" position="float">
                    <label>Figure 10. </label>
                    <caption>
                        <title>Distribution of classified intents by gender.</title>
                        <p>STI, sexually transmitted infection; UTI, urinary tract infection.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://gatesopenresearch-files.f1000.com/manuscripts/14107/9c3dc769-df57-47ce-81c9-4c16ff262820_figure10.gif"/>
                </fig>
            </sec>
            <sec>
                <title>Conversation analysis</title>
                <p>We manually coded 2,590 English language messages exchanged between 
                    <italic toggle="yes">askNivi</italic> agents and 100 users (50 men, 50 women), selected at random from the pool of high-engagement users. The average age of this subset of users was 21.4 years (SD=3.7). 
                    <xref ref-type="table" rid="T2">Table 2</xref> summarizes key characteristics of these conversations.</p>
                <table-wrap id="T2" orientation="portrait" position="anchor">
                    <label>Table 2. </label>
                    <caption>
                        <title>Characteristics of conversations.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">Group</th>
                                <th align="center" colspan="1" rowspan="1" valign="top">
                                    <italic toggle="yes">N</italic>
                                </th>
                                <th align="center" colspan="1" rowspan="1" valign="top">Messages</th>
                                <th align="center" colspan="1" rowspan="1" valign="top">Conversations</th>
                                <th align="center" colspan="1" rowspan="1" valign="top">Convos/person
                                    <break/>Mean (SD)</th>
                                <th align="center" colspan="1" rowspan="1" valign="top">Messages/convo
                                    <break/>Mean (SD)</th>
                                <th align="center" colspan="1" rowspan="1" valign="top">Topics/convo
                                    <break/>Mean (SD)</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">All</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">100</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">2590</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">207</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">2.1 (1.8)</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">12.5 (8.9)</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">2.6 (1.8)</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Men</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">50</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">1276</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">101</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">2.0 (1.5)</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">12.6 (8.8)</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">2.8 (1.9)</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Men, 15&#x2013;19</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">10</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">271</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">18</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">1.8 (1.6)</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">15.1 (9.6)</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">3.1 (2.1)</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Men, 20&#x2013;24</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">29</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">699</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">57</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">2.0 (1.5)</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">12.3 (9.0)</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">2.6 (1.9)</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Men, 25&#x2013;35</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">8</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">259</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">22</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">2.8 (1.7)</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">11.8 (8.3)</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">3.1 (2.1)</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Women</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">50</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">1314</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">106</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">2.1 (2.0)</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">12.4 (9.1)</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">2.3 (1.5)</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Women, 15&#x2013;19</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">21</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">579</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">51</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">2.4 (2.7)</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">11.4 (8.2)</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">2.3 (1.5)</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Women, 20&#x2013;24</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">19</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">506</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">41</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">2.2 (1.4)</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">12.3 (9.0)</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">2.4 (1.8)</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Women, 25&#x2013;35</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">7</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">181</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">10</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">1.4 (0.8)</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">18.1 (13.5)</td>
                                <td align="right" colspan="1" rowspan="1" valign="top">2.6 (1.0)</td>
                            </tr>
                        </tbody>
                    </table>
                    <table-wrap-foot>
                        <fn>
                            <p>

                                <italic toggle="yes">Note</italic>. Sex x Age 
                                <italic toggle="yes">N</italic>'s do not sum to 50 because of missing age values.</p>
                        </fn>
                        <fn>
                            <p>Convos/person, conversations per person; messages/convo, messages per conversation; topics/convo, topics per conversation; SD, standard deviation</p>
                        </fn>
                    </table-wrap-foot>
                </table-wrap>
                <p>We determined that these 100 users engaged in a total of 207 distinct conversations, for an average of 2.1 conversations per person (SD=2.1, median=1.0). On average, conversations consisted of 12.5 messages (SD=8.9): 6.6 messages sent by users and 5.9 replies sent by agents. 72% of user messages came in the form of questions or requests. We classified these questions as shown in 
                    <xref ref-type="table" rid="T3">Table 3</xref>. Most user questions sought factual information, such as requests to define or explain concepts and questions about causes (60.1%). Among the 516 requests for information, 4.7% asked about common myths (e.g., "Is it true that if one have a kiss with someone positive, the are high chances of being affected?&#x201d;).</p>
                <table-wrap id="T3" orientation="portrait" position="anchor">
                    <label>Table 3. </label>
                    <caption>
                        <title>Distribution of user questions/requests.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">Category</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Example</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Percent</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">1. Requests for factual information about causes</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Can someone using an implant conceive immediately?</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">48.9</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">2. Requests for factual information about the
                                    <break/>meaning of concepts/terms </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">What is family planning?</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">11.2</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">3. Requests for advice</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">What are the best contraceptive options for avoiding
                                    <break/>pregnancy?</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">26.0</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">4. Questions about access to services and
                                    <break/>products</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Which facility should I visit please?</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1.5</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">5. Reporting symptoms, requesting diagnosis</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">I gave birth 1year ago from then I have never seen my
                                    <break/>periods am I safe or I have a problem?</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">10.6</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">6. Other</td>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">1.9</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
                <p>Over half (128) of the 207 conversations involved multiple topics. In the average conversation, users and agents discussed 2.6 topics (
                    <italic toggle="yes">SD</italic>=1.8). As shown in 
                    <xref ref-type="fig" rid="f11">Figure 11</xref>, the topics that appeared most frequently in multiple-topic conversations were contraception, fertility, sexually transmitted infection (STI), relationships, and sex pains. For instance, multiple-topics conversations about contraception were most frequently paired with discussions of unsafe days, menstruation, and emergency contraception.</p>
                <fig fig-type="figure" id="f11" orientation="portrait" position="float">
                    <label>Figure 11. </label>
                    <caption>
                        <title>Distribution of topics that appeared most frequently in multiple-topic conversations, top 25 topics shown.</title>
                        <p>STI, sexually transmitted infection; HIV, human immunodeficiency virus.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://gatesopenresearch-files.f1000.com/manuscripts/14107/9c3dc769-df57-47ce-81c9-4c16ff262820_figure11.gif"/>
                </fig>
            </sec>
        </sec>
        <sec sec-type="discussion">
            <title>Discussion</title>
            <p>This paper presents the results of a text mining analysis of 179,609 SMS messages, which users exchanged with customer success agents of 
                <italic toggle="yes">askNivi</italic>, a free digital health service operating in Kenya and India. First and foremost, our descriptive findings from several initial iterations of the Kenya 
                <italic toggle="yes">askNivi</italic> service are useful internally as we seek to improve and expand the product. In particular, content analysis is informing the development of automated conversations and the creation of an ontology that links user intents and topics to a cascade of conversations, thereby allowing us to further tailor the content of 
                <italic toggle="yes">askNivi</italic> and make automated, personalized recommendations.</p>
            <p>Externally, this paper makes two contributions. First, the methodology presented here and documented in the data and code repository (
                <xref ref-type="bibr" rid="ref-7">Green, 2019</xref>) offers researchers and practitioners a reproducible example of how to apply text mining techniques to text message-based interventions. We have not encountered any similar examples in the evaluation literature that is largely focused on issues of feasibility, acceptability, and efficacy. Certainly, the techniques presented here could be useful for all three of these aims. Second, the analysis adds to our understanding of how our Kenyan users, in particular adolescents and young adults (mean age of 22.5 years), converse on private networks about topics related to sexual and reproductive health.</p>
            <p>We can compare our results directly to a comprehensive qualitative analysis of text messages sent to a question and answer service in Nigeria called 
                <italic toggle="yes">MyQuestion</italic> (
                <xref ref-type="bibr" rid="ref-3">Blanc 
                    <italic toggle="yes">et al</italic>., 2016</xref>). The most common topics on both platforms followed the dominant marketing focus. 
                <xref ref-type="bibr" rid="ref-3">Blanc 
                    <italic toggle="yes">et al</italic>. (2016)</xref> reported that 
                <italic toggle="yes">MyQuestion</italic> began as a platform for HIV/AIDS information; 
                <italic toggle="yes">askNivi</italic> was originally developed to help women learn about and access family planning. Both services also observed a substantial volume of messages about topics not marketed, such as health symptoms and diseases like cancer.</p>
            <p>The majority of messages sent to both services took the form of questions seeking factual information about the meaning of health concepts and causes of health conditions. In the case of 
                <italic toggle="yes">askNivi</italic>, many users wanted to talk about different aspects of contraception, from how to find the best method to effectiveness and side effects. We found that nearly all messages not requesting information could be grouped into Blanc 
                <italic toggle="yes">et al.</italic>&#x2019;s classification scheme of requests for advice, questions about access to services and products, and reports of symptoms/requests for a diagnosis.</p>
            <p>This paper adds to the 
                <italic toggle="yes">MyQuestion</italic> analysis by 
                <xref ref-type="bibr" rid="ref-3">Blanc 
                    <italic toggle="yes">et al</italic>. (2016)</xref> because the anonymized 
                <italic toggle="yes">askNivi</italic> messages were linked to data on users&#x2019; sex and age, allowing us to disaggregate the results. Doing so revealed interesting differences across both demographic dimensions. For instance, young people chatted more about how to avoid pregnancy and practice safe sex, whereas older users asked more about symptoms and health issues like cancer. Men more often wanted information and advice on sexual health and relationships, while women more commonly sought contraception recommendations. These insights can be used to create targeted marketing campaigns and to develop content that will increase user engagement.</p>
            <p>Although we demonstrate that users who communicated in Swahili chatted about very similar topics compared to English users, a limitation of our paper is that we did not fully replicate each analysis for the Swahili corpus of messages. This is because we found that existing open source text mining tools for Swahili are less developed compared to the English-language tools. Another limitation of this study is that we may not have a representative sample of Kenyan users overall or by demographic group. While the service was free to use, mobile phone access is nearly universal in Kenya, and marketing efforts included online digital marketing and offline community mobilization, 
                <italic toggle="yes">askNivi</italic> users are likely to be more educated on average compared to the Kenyan population. Furthermore, nearly two-thirds of users were female, following the initial marketing of the service. The men who took the initiative to contact 
                <italic toggle="yes">askNivi</italic> might be different from the general population of potential male users in unmeasured ways. Despite these limitations, the similarities observed with the 
                <italic toggle="yes">MyQuestion</italic> analysis in Nigeria is modest evidence for generalizability in an Anglophone African context.</p>
        </sec>
        <sec sec-type="conclusions">
            <title>Conclusions</title>
            <p>The early 
                <italic toggle="yes">askNivi</italic> experience demonstrates that young people in Kenya have a great need for accurate and reliable information on health and wellbeing that is easy to access and trustworthy, replicating what has been observed in other contexts like Nigeria (
                <xref ref-type="bibr" rid="ref-3">Blanc 
                    <italic toggle="yes">et al</italic>., 2016</xref>). As services like 
                <italic toggle="yes">askNivi</italic> and 
                <italic toggle="yes">MyQuestion</italic> increase in popularity, users go beyond the marketed offering to reveal unmet needs for information, recommendations, and referrals. Text mining is a relatively simple approach for exploring these trends and probing how user needs and interests differ across groups like age cohorts and sex. As artificial intelligence is increasingly incorporated into text message-based interventions like 
                <italic toggle="yes">askNivi</italic>, the opportunities for intervention personalization, tailoring, and interaction will grow. Text mining is one way to better understand how users engage with these interventions and maximize what artificial intelligence has to offer.</p>
        </sec>
        <sec>
            <title>Data availability</title>
            <sec>
                <title>Underlying data</title>
                <p>Anonymized message meta-data with calendar week time stamps is archived along with the tokenized word frequencies. Raw message content is not available due to privacy concerns and limitations on user information that cannot be shared publicly. However, the data repository includes (1) a tutorial with a sample of anonymized raw data to demonstrate how to conduct the same analysis on new data; and (2) anonymized message meta-data and tokenized word frequencies to reproduce the analysis, figures, and tables presented in this manuscript.</p>
                <p>Zenodo: ericpgreen/asknivi-text-mining-2019: zenodo. 
                    <ext-link ext-link-type="uri" xlink:href="https://dx.doi.org/10.5281/zenodo.2653865">https://doi.org/10.5281/zenodo.2653865</ext-link> (
                    <xref ref-type="bibr" rid="ref-7">Green, 2019</xref>)</p>
                <p>This project contains the following underlying data within the &#x2018;ericpgreen-asknivi-text-mining-2019-v1-2\input&#x2019; folder:</p>
                <list list-type="bullet">
                    <list-item>
                        <label>-</label>
                        <p>convos_coded.csv (conversation data)</p>
                    </list-item>
                    <list-item>
                        <label>-</label>
                        <p>df_tok_en_wordUser.csv (words by user)</p>
                    </list-item>
                    <list-item>
                        <label>-</label>
                        <p>intents.csv (intent classifications)</p>
                    </list-item>
                    <list-item>
                        <label>-</label>
                        <p>marketed.csv (counts of marketed intents)</p>
                    </list-item>
                    <list-item>
                        <label>-</label>
                        <p>metadata.csv (meta data about messages)</p>
                    </list-item>
                    <list-item>
                        <label>-</label>
                        <p>tok_bi_en.csv (bigram counts, English)</p>
                    </list-item>
                    <list-item>
                        <label>-</label>
                        <p>tok_en.csv (single word frequency, English)</p>
                    </list-item>
                    <list-item>
                        <label>-</label>
                        <p>tok_en_f.csv (single word frequency, females by age category, English)</p>
                    </list-item>
                    <list-item>
                        <label>-</label>
                        <p>tok_en_g.csv (single word frequency, by gender, English)</p>
                    </list-item>
                    <list-item>
                        <label>-</label>
                        <p>tok_en_m.csv (single word frequency, males by age category, English)</p>
                    </list-item>
                    <list-item>
                        <label>-</label>
                        <p>tok_sw.csv (single word frequency, Swahili)</p>
                    </list-item>
                </list>
            </sec>
            <sec>
                <title>Extended data</title>
                <p>Zenodo: ericpgreen/asknivi-text-mining-2019: zenodo. 
                    <ext-link ext-link-type="uri" xlink:href="https://dx.doi.org/10.5281/zenodo.2653865">https://doi.org/10.5281/zenodo.2653865</ext-link> (
                    <xref ref-type="bibr" rid="ref-7">Green, 2019</xref>)</p>
                <p>This project contains the following extended data within the &#x2018;ericpgreen-asknivi-text-mining-2019-v1-2&#x2019; folder:</p>
                <list list-type="bullet">
                    <list-item>
                        <label>-</label>
                        <p>README.md (list of all R packages, instructions required to reproduce the analysis, and a tutorial for running the message tokenization on a sample of raw data)</p>
                    </list-item>
                    <list-item>
                        <label>-</label>
                        <p>manuscript.Rmd (text and code to reproduce the analysis and manuscript)</p>
                    </list-item>
                </list>
                <p>This project contains the following extended data within the &#x2018;ericpgreen-asknivi-text-mining-2019-v1-2\input\example&#x2019; folder:</p>
                <list list-type="bullet">
                    <list-item>
                        <label>-</label>
                        <p>modifications-en.csv (custom modifications to relabel words and collapse synonyms)</p>
                    </list-item>
                    <list-item>
                        <label>-</label>
                        <p>stop.csv (custom stop words)</p>
                    </list-item>
                </list>
                <p>Data are available under the terms of the 
                    <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by-nc/4.0">Creative Commons Attribution 4.0 International license</ext-link> (CC-BY 4.0).</p>
            </sec>
            <sec>
                <title>Software availability</title>
                <p>Source code available from: 
                    <ext-link ext-link-type="uri" xlink:href="https://protect-eu.mimecast.com/s/BJuSCAjQfvqP2h8Cds_">https://github.com/ericpgreen/asknivi-text-mining-2019</ext-link>
                </p>
                <p>Archived source code at time of publication: 
                    <ext-link ext-link-type="uri" xlink:href="https://dx.doi.org/10.5281/zenodo.2653865">https://doi.org/10.5281/zenodo.2653865</ext-link> (
                    <xref ref-type="bibr" rid="ref-7">Green, 2019</xref>)</p>
                <p>License: 
                    <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by-nc/4.0">Creative Commons Attribution 4.0 International license</ext-link> (CC-BY-4.0)</p>
            </sec>
        </sec>
    </body>
    <back>
        <ref-list>
            <ref id="ref-1">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
				
                        <name name-style="western">
                            <surname>Armanasco</surname>
                            <given-names>AA</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>Miller</surname>
                            <given-names>YD</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>Fjeldsoe</surname>
                            <given-names>BS</given-names>
                        </name>
				
                        <etal/>
			</person-group>:
                    <article-title>Preventive Health Behavior Change Text Message Interventions: A Meta-analysis.</article-title>
                    <source>
				
                        <italic toggle="yes">Am J Prev Med.</italic>
			</source>
                    <year>2017</year>;<volume>52</volume>(<issue>3</issue>):<fpage>391</fpage>&#x2013;<lpage>402</lpage>.
                    <pub-id pub-id-type="pmid">28073656</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.amepre.2016.10.042</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-3">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
				
                        <name name-style="western">
                            <surname>Blanc</surname>
                            <given-names>AK</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>Glazer</surname>
                            <given-names>K</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>Ofomata-Aderemi</surname>
                            <given-names>U</given-names>
                        </name>
				
                        <etal/>
			</person-group>:
                    <article-title>Myths and Misinformation: An Analysis of Text Messages Sent to a Sexual and Reproductive Health Q&amp;A Service in Nigeria.</article-title>
                    <source>
				
                        <italic toggle="yes">Stud Fam Plann.</italic>
			</source>
                    <year>2016</year>;<volume>47</volume>(<issue>1</issue>):<fpage>39</fpage>&#x2013;<lpage>53</lpage>.
                    <pub-id pub-id-type="pmid">26952714</pub-id>
                    <pub-id pub-id-type="doi">10.1111/j.1728-4465.2016.00046.x</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-4">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
				
                        <name name-style="western">
                            <surname>Brown</surname>
                            <given-names>T</given-names>
                        </name>
			</person-group>:
                    <article-title>Design thinking.</article-title>
                    <source>
				
                        <italic toggle="yes">Harv Bus Rev.</italic>
			</source>
                    <year>2008</year>;<volume>86</volume>(<issue>6</issue>):<fpage>84</fpage>&#x2013;<lpage>92</lpage>, 141.
                    <pub-id pub-id-type="pmid">18605031</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-5">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
				
                        <name name-style="western">
                            <surname>Diaz</surname>
                            <given-names>G</given-names>
                        </name>
			</person-group>:
                    <article-title>Swahili stopwords collection</article-title>.<year>2017</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/stopwords-iso/stopwords-sw">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-6">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
				
                        <name name-style="western">
                            <surname>Green</surname>
                            <given-names>E</given-names>
                        </name>
			</person-group>:
                    <article-title>Building your own MTurk-style app in R using Shiny+Flexdashboard</article-title>.
                    <italic toggle="yes">Medium</italic>.<year>2018</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://www.webcitation.org/77zth3aaf">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-7">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
				
                        <name name-style="western">
                            <surname>Green</surname>
                            <given-names>E</given-names>
                        </name>
			</person-group>:
                    <article-title>ericpgreen/asknivi-text-mining-2019: zenodo (Version v1.1)</article-title>.
                    <source>
				
                        <italic toggle="yes">Zenodo.</italic>
			</source>
                    <year>2019</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://www.doi.org/10.5281/zenodo.2653865">http://www.doi.org/10.5281/zenodo.2653865</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-8">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
				
                        <name name-style="western">
                            <surname>Hall</surname>
                            <given-names>AK</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>Cole-Lewis</surname>
                            <given-names>H</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>Bernhardt</surname>
                            <given-names>JM</given-names>
                        </name>
			</person-group>:
                    <article-title>Mobile text messaging for health: a systematic review of reviews.</article-title>
                    <source>
				
                        <italic toggle="yes">Annu Rev Public Health.</italic>
			</source>
                    <year>2015</year>;<volume>36</volume>:<fpage>393</fpage>&#x2013;<lpage>415</lpage>.
                    <pub-id pub-id-type="pmid">25785892</pub-id>
                    <pub-id pub-id-type="doi">10.1146/annurev-publhealth-031914-122855</pub-id>
                    <pub-id pub-id-type="pmcid">4406229</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-9">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
				
                        <name name-style="western">
                            <surname>Head</surname>
                            <given-names>KJ</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>Noar</surname>
                            <given-names>SM</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>Iannarino</surname>
                            <given-names>NT</given-names>
                        </name>
				
                        <etal/>
			</person-group>:
                    <article-title>Efficacy of text messaging-based interventions for health promotion: a meta-analysis.</article-title>
                    <source>
				
                        <italic toggle="yes">Soc Sci Med.</italic>
			</source>
                    <year>2013</year>;<volume>97</volume>:<fpage>41</fpage>&#x2013;<lpage>48</lpage>.
                    <pub-id pub-id-type="pmid">24161087</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.socscimed.2013.08.003</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-10">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
				
                        <name name-style="western">
                            <surname>Househ</surname>
                            <given-names>M</given-names>
                        </name>
			</person-group>:
                    <article-title>The role of short messaging service in supporting the delivery of healthcare: An umbrella systematic review.</article-title>
                    <source>
				
                        <italic toggle="yes">Health Informatics J.</italic>
			</source>
                    <year>2016</year>;<volume>22</volume>(<issue>2</issue>):<fpage>140</fpage>&#x2013;<lpage>150</lpage>.
                    <pub-id pub-id-type="pmid">25038203</pub-id>
                    <pub-id pub-id-type="doi">10.1177/1460458214540908</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-11">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
				
                        <name name-style="western">
                            <surname>Koleck</surname>
                            <given-names>TA</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>Dreisbach</surname>
                            <given-names>C</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>Bourne</surname>
                            <given-names>PE</given-names>
                        </name>
				
                        <etal/>
			</person-group>:
                    <article-title>Natural language processing of symptoms documented in free-text narratives of electronic health records: a systematic review.</article-title>
                    <source>
				
                        <italic toggle="yes">J Am Med Inform Assoc.</italic>
			</source>
                    <year>2019</year>;<volume>26</volume>(<issue>4</issue>):<fpage>364</fpage>&#x2013;<lpage>379</lpage>.
                    <pub-id pub-id-type="pmid">30726935</pub-id>
                    <pub-id pub-id-type="doi">10.1093/jamia/ocy173</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-12">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
				
                        <name name-style="western">
                            <surname>Kreimeyer</surname>
                            <given-names>K</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>Foster</surname>
                            <given-names>M</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>Pandey</surname>
                            <given-names>A</given-names>
                        </name>
				
                        <etal/>
			</person-group>:
                    <article-title>Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review.</article-title>
                    <source>
				
                        <italic toggle="yes">J Biomed Inform.</italic>
			</source>
                    <year>2017</year>;<volume>73</volume>:<fpage>14</fpage>&#x2013;<lpage>29</lpage>.
                    <pub-id pub-id-type="pmid">28729030</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.jbi.2017.07.012</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-13">
                <mixed-citation publication-type="journal">
                    <collab>Nivi</collab>:
                    <article-title>Ask</article-title>.<year>2018</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://www.webcitation.org/77zqfMqGn">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-14">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
				
                        <name name-style="western">
                            <surname>Ooms</surname>
                            <given-names>J</given-names>
                        </name>
			</person-group>:
                    <article-title>Cld2: Google&#x2019;s compact language detector 2</article-title>.<year>2018a</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://CRAN.R-project.org/package=cld2">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-15">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
				
                        <name name-style="western">
                            <surname>Ooms</surname>
                            <given-names>J</given-names>
                        </name>
			</person-group>:
                    <article-title>Hunspell: High-performance stemmer, tokenizer, and spell checker</article-title>.<year>2018b</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://CRAN.R-project.org/package=hunspell">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-16">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
				
                        <name name-style="western">
                            <surname>Orr</surname>
                            <given-names>JA</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>King</surname>
                            <given-names>RJ</given-names>
                        </name>
			</person-group>:
                    <article-title>Mobile phone SMS messages can enhance healthy behaviour: a meta-analysis of randomised controlled trials.</article-title>
                    <source>
				
                        <italic toggle="yes">Health Psychol Rev. </italic>
			</source>
                    <year>2015</year>;<volume>9</volume>(<issue>4</issue>):<fpage>397</fpage>&#x2013;<lpage>416</lpage>.
                    <pub-id pub-id-type="pmid">25739668</pub-id>
                    <pub-id pub-id-type="doi">10.1080/17437199.2015.1022847</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-17">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
				
                        <name name-style="western">
                            <surname>Perez</surname>
                            <given-names>S</given-names>
                        </name>
			</person-group>:
                    <article-title>Facebook Messenger and WhatsApp combined see 3 times more messages than SMS</article-title>. TechCrunch.<year>2016</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://techcrunch.com/2016/04/12/facebook-messenger-and-whatsapp-combined-see-3-times-more-messages-than-sms/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-18">
                <mixed-citation publication-type="journal">
                    <collab>R Core Team</collab>:
                    <article-title>R: A language and environment for statistical computing</article-title>. Vienna, Austria: R Foundation for Statistical Computing.<year>2018</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://www.R-project.org/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-19">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
				
                        <name name-style="western">
                            <surname>Rinker</surname>
                            <given-names>TW</given-names>
                        </name>
			</person-group>:
                    <article-title>textstem: Tools for stemming and lemmatizing text</article-title>. Buffalo, New York.<year>2018</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://github.com/trinker/textstem">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-20">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
				
                        <name name-style="western">
                            <surname>Shaban-Nejad</surname>
                            <given-names>A</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>Michalowski</surname>
                            <given-names>M</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>Buckeridge</surname>
                            <given-names>DL</given-names>
                        </name>
			</person-group>:
                    <article-title>Health intelligence: How artificial intelligence transforms population and personalized health.</article-title>
                    <source>
				
                        <italic toggle="yes">NPJ Digit Med.</italic>
			</source>
                    <year>2018</year>;<volume>1</volume>(<issue>1</issue>):<fpage>53</fpage>.
                    <pub-id pub-id-type="doi">10.1038/s41746-018-0058-9</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-21">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
				
                        <name name-style="western">
                            <surname>Silge</surname>
                            <given-names>J</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>Robinson</surname>
                            <given-names>D</given-names>
                        </name>
			</person-group>:
                    <article-title>Tidytext: Text mining and analysis using tidy data principles in R.</article-title>
                    <source>
				
                        <italic toggle="yes">J Open Source Softw.</italic>
			</source>
                    <year>2016</year>;<volume>1</volume>(<issue>3</issue>):<fpage>37</fpage>.
                    <pub-id pub-id-type="doi">10.21105/joss.00037</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-22">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
				
                        <name name-style="western">
                            <surname>Sinnenberg</surname>
                            <given-names>L</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>Buttenheim</surname>
                            <given-names>AM</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>Padrez</surname>
                            <given-names>K</given-names>
                        </name>
				
                        <etal/>
			</person-group>:
                    <article-title>Twitter as a Tool for Health Research: A Systematic Review.</article-title>
                    <source>
				
                        <italic toggle="yes">Am J Public Health.</italic>
			</source>
                    <year>2017</year>;<volume>107</volume>(<issue>1</issue>):<fpage>e1</fpage>&#x2013;<lpage>e8</lpage>.
                    <pub-id pub-id-type="pmid">27854532</pub-id>
                    <pub-id pub-id-type="doi">10.2105/AJPH.2016.303512</pub-id>
                    <pub-id pub-id-type="pmcid">5308155</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-23">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
				
                        <name name-style="western">
                            <surname>Sites</surname>
                            <given-names>D</given-names>
                        </name>
			</person-group>:
                    <article-title>Compact Language Detector 2</article-title>. Google.<year>2013</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/CLD2Owners/cld2">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-24">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
				
                        <name name-style="western">
                            <surname>Sparkes</surname>
                            <given-names>M</given-names>
                        </name>
			</person-group>:
                    <article-title>WhatsApp overtakes text messages</article-title>.<year>2015</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://www.telegraph.co.uk/technology/news/11340321/WhatsApp-overtakes-text-messages.html">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-25">
                <mixed-citation publication-type="journal">
                    <collab>USAID</collab>:
                    <article-title>Artificial intelligence in global health: Defining a collective path forward</article-title>. Washington, D.C.: USAID.<year>2019</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://www.usaid.gov/sites/default/files/documents/1864/AI-in-Global-Health_webFinal_508.pdf">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-26">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
				
                        <name name-style="western">
                            <surname>Wahl</surname>
                            <given-names>B</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>Cossy-Gantner</surname>
                            <given-names>A</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>Germann</surname>
                            <given-names>S</given-names>
                        </name>
				
                        <etal/>
			</person-group>:
                    <article-title>Artificial intelligence (AI) and global health: how can AI contribute to health in resource-poor settings?</article-title>
                    <source>
				
                        <italic toggle="yes">BMJ Glob Health.</italic>
			</source>
                    <year>2018</year>;<volume>3</volume>(<issue>4</issue>):<fpage>e000798</fpage>.
                    <pub-id pub-id-type="pmid">30233828</pub-id>
                    <pub-id pub-id-type="doi">10.1136/bmjgh-2018-000798</pub-id>
                    <pub-id pub-id-type="pmcid">6135465</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-27">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
				
                        <name name-style="western">
                            <surname>Ye</surname>
                            <given-names>J</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>Rust</surname>
                            <given-names>G</given-names>
                        </name>
				
                        <name name-style="western">
                            <surname>Fry-Johnson</surname>
                            <given-names>Y</given-names>
                        </name>
				
                        <etal/>
			</person-group>:
                    <article-title>E-mail in patient-provider communication: a systematic review.</article-title>
                    <source>
				
                        <italic toggle="yes">Patient Educ Couns.</italic>
			</source>
                    <year>2010</year>;<volume>80</volume>(<issue>2</issue>):<fpage>266</fpage>&#x2013;<lpage>273</lpage>.
                    <pub-id pub-id-type="pmid">19914022</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.pec.2009.09.038</pub-id>
                    <pub-id pub-id-type="pmcid">4127895</pub-id>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report27486">
        <front-stub>
            <article-id pub-id-type="doi">10.21956/gatesopenres.14107.r27486</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Temmerman</surname>
                        <given-names>Marleen</given-names>
                    </name>
                    <xref ref-type="aff" rid="r27486a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-2069-8752</uri>
                </contrib>
                <aff id="r27486a1">
                    <label>1</label>International Centre for Reproductive Health, Mombasa, Kenya</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>8</day>
                <month>8</month>
                <year>2019</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2019 Temmerman M</copyright-statement>
                <copyright-year>2019</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport27486" related-article-type="peer-reviewed-article" xlink:href="10.12688/gatesopenres.12999.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>This is a well-written excellent paper, addressing an important new approach in the field of health education, more specifically sexual and reproductive health, not only using AI, but also two way communication transcript SMS methodology.</p>
            <p> </p>
            <p> The methodology is clearly established, sample size impressive and the results well presented, as well as the discussion section. References are adequate.</p>
            <p> </p>
            <p> A few remarks/questions: 
                <list list-type="order">
                    <list-item>
                        <p>How do the authors and the askNivi, differentiate between &#x2018;Family Planning&#x2019; and &#x2018;Contraceptives&#x2019; as both words are used for the same concept? Contraceptives are methods to prevent unplanned pregnancies, hence to &#x2018;plan the family&#x2019;.</p>
                    </list-item>
                    <list-item>
                        <p>For medical questions advice is asked from a nurse at a local maternity hospital. How does the project validate the correctness of the responses by the nurse? Some questions are difficult to address and the evidence is not always readily available. Is there a training/education/validation process set up to validate the responses?</p>
                    </list-item>
                </list> Conclusion: positive advice for publication.</p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Yes</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>Yes</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Yes</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>Yes</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>SRHR, Women and Child Health, Adolescent health</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report27264">
        <front-stub>
            <article-id pub-id-type="doi">10.21956/gatesopenres.14107.r27264</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>McCarthy</surname>
                        <given-names>Ona L.</given-names>
                    </name>
                    <xref ref-type="aff" rid="r27264a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-9902-6248</uri>
                </contrib>
                <aff id="r27264a1">
                    <label>1</label>Department of Population Health, Faculty of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, London, UK</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>4</day>
                <month>6</month>
                <year>2019</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2019 McCarthy OL</copyright-statement>
                <copyright-year>2019</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport27264" related-article-type="peer-reviewed-article" xlink:href="10.12688/gatesopenres.12999.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>This is a clear and concisely written report on a text mining analysis conducted on short messages sent via a SRH digital service (
                <italic>askNivi)</italic>. The analysis included inbound and outbound messages sent from male and female user to and from 
                <italic>askNivi </italic>agents. While I do not have experience in text mining, the techniques appear to be appropriate to answer the research question. I found the analysis fascinating. In addition, it is clear that AI is the way forward in the design of interactive and tailored digital interventions. The results (particularly the differences by gender and age) are useful for anyone planning to develop or analyse a similar service. One suggestion for additional information to include is to state how the random sample of 100 users selected for the conversation analysis.</p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Yes</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>Not applicable</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Yes</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>Yes</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>Sexual and reporductive health epidemiology</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
    </sub-article>
</article>
