<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="other" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">Gates Open Res</journal-id>
            <journal-title-group>
                <journal-title>Gates Open Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2572-4754</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/gatesopenres.14416.1</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Software Tool Article</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>Clinical Trial Risk Tool: software application using natural language processing to identify the risk of trial uninformativeness</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 1; peer review: 1 approved with reservations]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Wood</surname>
                        <given-names>Thomas A</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Validation</role>
                    <role content-type="http://credit.niso.org/">Visualization</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-8962-8571</uri>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>McNair</surname>
                        <given-names>Douglas</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Funding Acquisition</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Validation</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-0965-883X</uri>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>Fast Data Science Ltd, London, England, N5 2UP, UK</aff>
                <aff id="a2">
                    <label>2</label>Integrated Development, Bill &amp; Melinda Gates Foundation, Seattle, Washington, 98109, USA</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:thomas@fastdatascience.com">thomas@fastdatascience.com</email>
                </corresp>
                <fn fn-type="conflict">
                    <p>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>18</day>
                <month>4</month>
                <year>2023</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2023</year>
            </pub-date>
            <volume>7</volume>
            <elocation-id>56</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>3</day>
                    <month>4</month>
                    <year>2023</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2023 Wood TA and McNair D</copyright-statement>
                <copyright-year>2023</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://gatesopenresearch.org/articles/7-56/pdf"/>
            <abstract>
                <p>
                    <bold>Background</bold>: A large proportion of clinical trials end without delivering results that are useful for clinical, policy, or research decisions. This problem is called &#x201c;uninformativeness&#x201d;. Some high-risk indicators of uninformativeness can be identified at the stage of drafting the protocol, however the necessary information can be hard to find in unstructured text documents.</p>
                <p>
                    <bold>Methods</bold>: We have developed a browser-based tool which uses natural language processing to identify and quantify the risk of uninformativeness. The tool reads and parses the text of trial protocols and identifies key features of the trial design, which are fed into a risk model. The application runs in a browser and features a graphical user interface that allows a user to drag and drop the PDF of the trial protocol and visualize the risk indicators and their locations in the text. The user can correct inaccuracies in the tool&#x2019;s parsing of the text. The tool outputs a PDF report listing the key features extracted. The tool is focused HIV and tuberculosis trials but could be extended to more pathologies in future.</p>
                <p>
                    <bold>Results:</bold> On a manually tagged dataset of 300 protocols, the tool was able to identify the condition of a trial with 100% area under curve (AUC), presence or absence of statistical analysis plan with 87% AUC, presence or absence of effect estimate with 95% AUC, number of subjects with 69% accuracy, and simulation with 98% AUC. On a dataset of 11,925 protocols downloaded from ClinicalTrials.gov, the tool was able to identify trial phase with 75% accuracy, number of arms with 58% accuracy, and the countries of investigation with 87% AUC.</p>
                <p>
                    <bold>Conclusion</bold>: We have developed and validated a natural language processing tool for identifying and quantifying risks of uninformativeness in clinical trial protocols. The software is open-source and can be accessed at the following link: 
                    <ext-link ext-link-type="uri" xlink:href="https://app.clinicaltrialrisk.org/">https://app.clinicaltrialrisk.org/</ext-link>
                </p>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>Clinical trial protocol</kwd>
                <kwd>risk</kwd>
                <kwd>natural language processing</kwd>
                <kwd>uninformativeness</kwd>
            </kwd-group>
            <funding-group>
                <award-group id="fund-1" xlink:href="http://dx.doi.org/10.13039/100000865">
                    <funding-source>Gates Foundation</funding-source>
                    <award-id>INV-050345</award-id>
                </award-group>
                <funding-statement>This work was supported, in whole or in part, by the Gates Foundation [INV-050345]. </funding-statement>
                <funding-statement>
                    <italic>The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</italic>
                </funding-statement>
            </funding-group>
        </article-meta>
    </front>
    <body>
        <sec sec-type="intro">
            <title>Introduction</title>
            <sec>
                <title>Uninformative trials</title>
                <p>The goal of conducting a clinical trial is to produce evidence that can inform clinical and policy decisions. Each year, more than half a million clinical trials are run
                    <sup>
                        <xref ref-type="bibr" rid="ref-1">1</xref>
                    </sup>, each one seeking to gather information on an intervention such as a drug, device, or behavioral intervention.</p>
                <p>However, the majority of clinical trials do not prove or disprove a hypothesis
                    <sup>
                        <xref ref-type="bibr" rid="ref-2">2</xref>
                    </sup>. A 2022 study of 125 clinical trials in heart disease, diabetes, and lung cancer by Hutchinson 
                    <italic toggle="yes">et al.</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref-3">3</xref>
                    </sup> found that just over a quarter ended informatively, and other estimates are even lower
                    <sup>
                        <xref ref-type="bibr" rid="ref-4">4</xref>
                    </sup>.</p>
                <p>The Declaration of Helsinki, a key set of ethical principles on clinical trials, states that &#x201c;Medical research involving human subjects may only be conducted if the importance of the objective outweighs the risks and burdens to the research subjects&#x201d;
                    <sup>
                        <xref ref-type="bibr" rid="ref-5">5</xref>
                    </sup>. In other words, research is considered unethical if there is no benefit to science.</p>
                <p>In 2019, Deborah Zarin and colleagues addressed the problem of uninformative clinical trials in the Journal of the American Medical Association
                    <sup>
                        <xref ref-type="bibr" rid="ref-6">6</xref>
                    </sup>. Zarin 
                    <italic toggle="yes">et al.</italic> stated that for a trial to be informative, it must fulfill five conditions:</p>
                <list list-type="bullet">
                    <list-item>
                        <label>1. </label>
                        <p>The study hypothesis must address an important and unresolved question</p>
                    </list-item>
                    <list-item>
                        <label>2. </label>
                        <p>The study must be designed to provide evidence related to this question</p>
                    </list-item>
                    <list-item>
                        <label>3. </label>
                        <p>The study must be feasible</p>
                    </list-item>
                    <list-item>
                        <label>4. </label>
                        <p>The study must be conducted and analyzed in a scientifically valid manner</p>
                    </list-item>
                    <list-item>
                        <label>5. </label>
                        <p>The study must report results accurately and promptly.</p>
                    </list-item>
                </list>
                <p>When a trial does not fulfill all of the above conditions, it is likely to be uninformative. That means that the time and money spent on the trial, not to mention the subjects&#x2019; good intentions on enrolling, were wasted.</p>
                <p>The most common reason why trials end uninformatively is due to underpowering, or inadequate sample size. Other common reasons are safety and commercial factors
                    <sup>
                        <xref ref-type="bibr" rid="ref-7">7</xref>,
                        <xref ref-type="bibr" rid="ref-8">8</xref>
                    </sup>.</p>
                <p>At the stage of drafting the trial protocol, it is possible to identify a number of indicators of a high risk of uninformativeness. These include a smaller than typical sample size, a lack of statistical analysis plan, use of non-standard endpoints, or the use of cluster randomisation. Low-risk trials are often run by well-known institutions with external funding and an international or intercontinental array of sites. These indicators can be referred to as features or parameters.</p>
            </sec>
            <sec>
                <title>Definitions of uninformativeness</title>
                <p>In contrast to easily measured metrics, such as cost, informativeness is a somewhat subjective and even philosophical concept. One definition is that of &#x201c;moral efficiency&#x201d;: does the trial improve clinical practice?</p>
                <p>There are a number of ways of quantifying informativeness. For example, Hutchinson 
                    <italic toggle="yes">et al.</italic> measured informativeness in a retrospective longitudinal study, by following up trials longitudinally and identifying events that happened after trial completion, such as whether the trial influenced clinical practice guidelines, or was cited as part of a high-quality systematic review
                    <sup>
                        <xref ref-type="bibr" rid="ref-3">3</xref>
                    </sup>. These indicators can be subject to bias: a trial with industry funding is more likely to be cited, all other factors being equal, but may not necessarily be more informative. Likewise, a trial which shows that a drug is unsafe may not change guidelines or be cited, but prevents further uninformative trials being run, and is therefore informative. Hutchinson 
                    <italic toggle="yes">et al.</italic> acknowledged that the longitudinal technique is limited to a retrospective &#x2018;thermometer&#x2019; rather than for prospective use.</p>
                <p>In this paper, we use the concept of the &#x201c;risk of ending uninformatively&#x201d;, rather than the uninformativeness directly. An expert reviewer reads a protocol and classifies it as high, medium, or low risk of ending uninformatively, based solely on the content of the document. This approach is less data-driven and also subject to human biases, but does not require a longitudinal study to implement.</p>
            </sec>
            <sec>
                <title>Quantifying trial risk of uninformativeness</title>
                <p>A number of initiatives have been introduced to assist investigators in assessing the risk of a clinical trial. For example, the British National Institute for Health and Care Research has published a risk assessment tool as part of their Clinical Trials Toolkit
                    <sup>
                        <xref ref-type="bibr" rid="ref-9">9</xref>
                    </sup>. In 2017, a team funded by the Wellcome Trust developed a single-page clinical trials risk assessment tool
                    <sup>
                        <xref ref-type="bibr" rid="ref-10">10</xref>
                    </sup>, and in 2019 the company Mediana Inc released an R package for clinical trial simulations with an aim of reducing trial risk
                    <sup>
                        <xref ref-type="bibr" rid="ref-11">11</xref>
                    </sup>.</p>
                <p>New methods for simulating trials at the planning stage are becoming more popular, such as calculating the &#x201c;assurance&#x201d; of a trial, introduced by O&#x2019;Hagan 
                    <italic toggle="yes">et al.</italic> in 2005, which is the unconditional probability that the trial will yield a positive outcome
                    <sup>
                        <xref ref-type="bibr" rid="ref-12">12</xref>
                    </sup>. The assurance method can be used to choose a sample size, and has been implemented as an R package
                    <sup>
                        <xref ref-type="bibr" rid="ref-13">13</xref>
                    </sup>. In 2013, Wang 
                    <italic toggle="yes">et al.</italic> proposed a similar Bayesian method for calculating what they called the &#x2018;probability of study success&#x2019; (PrSS)
                    <sup>
                        <xref ref-type="bibr" rid="ref-14">14</xref>
                    </sup>.</p>
                <p>There have also been proposals to use probability distributions for making "go/no-go" decisions at clinical milestones such as the end of Phase I, IIa, or IIb
                    <sup>
                        <xref ref-type="bibr" rid="ref-15">15</xref>
                    </sup>. In 2009, Rosen 
                    <italic toggle="yes">et al.</italic> proposed the use of process maps throughout the trial to monitor if the trial is being conducted according to the Standard Operating Procedures (SOP)
                    <sup>
                        <xref ref-type="bibr" rid="ref-16">16</xref>
                    </sup>.</p>
                <p>In 2018, Wong 
                    <italic toggle="yes">et al.</italic> published an analysis of 406,038 entries of clinical trial data, and calculated trial success rates by trial phase, pathology, year, industry, and other factors. They found a number of interesting patterns, for example, that trials that used biomarkers as part of their selection criteria have higher overall success probabilities than trials without biomarkers
                    <sup>
                        <xref ref-type="bibr" rid="ref-17">17</xref>
                    </sup>.</p>
                <p>In 2022, a team at the Tufts Center for the Study of Drug Development analyzed 187 protocols and subsequent trials, and found that oncology and rare disease protocols have significantly longer clinical trial cycle times and face challenges in recruitment and retention
                    <sup>
                        <xref ref-type="bibr" rid="ref-18">18</xref>
                    </sup>. Phase III oncology trials are particularly troublesome, as they often deal with very small differences between arms. 62% of Phase III oncology trials fail to deliver results with statistical significance
                    <sup>
                        <xref ref-type="bibr" rid="ref-19">19</xref>
                    </sup>.</p>
                <p>As an analogous metric to trial risk, there is also trial complexity. There are several numerical tools for estimating the complexity (and, by extension, cost) of clinical trials using simple scoring mechanisms in the style of the Apgar score, which is a well-known formula for evaluating the health of newborn babies
                    <sup>
                        <xref ref-type="bibr" rid="ref-20">20</xref>&#x2013;
                        <xref ref-type="bibr" rid="ref-22">22</xref>
                    </sup>.</p>
            </sec>
            <sec>
                <title>Natural language processing</title>
                <p>In recent years, Natural language processing (NLP) tools have disrupted a number of industries where large unstructured text documents are commonplace
                    <sup>
                        <xref ref-type="bibr" rid="ref-23">23</xref>,
                        <xref ref-type="bibr" rid="ref-24">24</xref>
                    </sup>. The advent of models such as convolutional neural networks and transformer neural networks has enabled the development of AI systems which can understand complex natural language documents, such as contracts, or insurance claims
                    <sup>
                        <xref ref-type="bibr" rid="ref-25">25</xref>&#x2013;
                        <xref ref-type="bibr" rid="ref-28">28</xref>
                    </sup>. Clinical trial protocols may be several hundred pages long and require a large time investment by highly qualified people to interpret fully.</p>
                <p>In the legal industry, NLP software is commonplace for organizing, tracking, and performing advanced predictive analytics, clustering, and discovery on legal cases which are formed of bundles of documents. Examples of advanced legal NLP software include Luminance
                    <sup>
                        <xref ref-type="bibr" rid="ref-29">29</xref>
                    </sup> and Everlaw
                    <sup>
                        <xref ref-type="bibr" rid="ref-30">30</xref>
                    </sup>, which facilitate the manual review of legal contracts, leases, messages, depositions, interview transcripts, and other documents. We are not aware of a counterpart to these toolkits in the pharmaceutical industry, although there has been research and development on applications of NLP in the field
                    <sup>
                        <xref ref-type="bibr" rid="ref-31">31</xref>&#x2013;
                        <xref ref-type="bibr" rid="ref-33">33</xref>
                    </sup>.</p>
                <p>In 2020, Richard 
                    <italic toggle="yes">et al.</italic> published a comparison of NLP techniques for use on clinical trial protocol deviations, focusing on term-frequency inverse-document-frequency (Tf*Idf), support vector machines (SVM), and word vector embeddings
                    <sup>
                        <xref ref-type="bibr" rid="ref-34">34</xref>
                    </sup>. In the same year, Chen 
                    <italic toggle="yes">et al.</italic> performed a topic modeling analysis of the literature on NLP techniques in clinical trial texts, identifying key trends in NLP-enhanced clinical trial processing and research
                    <sup>
                        <xref ref-type="bibr" rid="ref-35">35</xref>
                    </sup>.</p>
            </sec>
            <sec>
                <title>Reducing the risk of a trial when the protocol is drafted</title>
                <p>The Bill &amp; Melinda Gates Foundation created a group called DAC (Design, Analyze, Communicate) which is intended to optimize the informativeness of research and includes resources on best practices to reduce the risk of trials ending uninformatively. An investigator choosing to work with DAC can submit their protocol draft, and receive a list of recommendations on 16 best practices for informativeness
                    <sup>
                        <xref ref-type="bibr" rid="ref-4">4</xref>
                    </sup>.</p>
                <p>In 2018, David Fogel described a number of common causes of trial failure, such as underpowering, safety issues, and lack of funding. Fogel proposed several opportunities for applying artificial intelligence, in particular NLP, to identify these factors. He suggested using NLP to mine available literature and previous trials in order to determine if a trial is using appropriate endpoints, eligibility criteria, and sample sizes, and to check for internal inconsistencies in a protocol
                    <sup>
                        <xref ref-type="bibr" rid="ref-36">36</xref>
                    </sup>. Fogel also suggested using other areas of AI and machine learning to profile patients to reduce the probability of attrition, modeling patient drop-out rates with neural networks.</p>
                <p>In 2023, Chang 
                    <italic toggle="yes">et al.</italic> used a contrast mining framework to identify the key indicators of successful and unsuccessful cancer trials, and used NLP to extract eligibility criteria from protocol documents, among other techniques
                    <sup>
                        <xref ref-type="bibr" rid="ref-37">37</xref>
                    </sup>.</p>
                <p>We are unaware of any existing automated tool using only NLP to extract risk factors from trial protocols.</p>
                <p>Although protocols are written in technical English, they are not constrained by any particular standard. Protocols from within a given organization generally follow a rough pattern, but there are many ways that a particular data point can be communicated: the sample size could be referred to as the &#x201c;number of participants,&#x201d; or &#x201c;
                    <italic toggle="yes">N</italic> = 90,&#x201d; or the researchers could write simply &#x201c;we plan to enroll up to 100 subjects per site,&#x201d; and leave it to the reader to infer the sample size.</p>
                <p>The result is that a person reading a protocol, who simply wants to find the sample size, effect size, prevalence, or other figure, must search the entire text for a number of possible keywords, refer to the contents page, or even worse, begin reading the entire paper from start to finish. This is a time consuming and error-prone process, and far from the best use of a professional&#x2019;s time.</p>
                <p>In this paper, we present a software tool called the Clinical Trial Risk Tool
                    <sup>
                        <xref ref-type="bibr" rid="ref-38">38</xref>
                    </sup>, which parses the PDF of a trial protocol and identifies key features within the text, such as the number of participants (the sample size), or the presence or absence of the Statistical Analysis Plan. The Clinical Trial Risk Tool then passes these features into a simple linear risk model and calculates a risk level, which is presented to the user as a traffic-light, indicating a high, medium, or low risk of ending uninformatively. The features of the risk model can be adjusted manually and saved by the user. The tool generates a PDF or Excel report which can be shared within the organization. The tool has been designed and trained with a focus on HIV and Tuberculosis (TB) trials but could be adapted to more pathologies.</p>
                <p>The tool has been open-sourced under 
                    <ext-link ext-link-type="uri" xlink:href="https://opensource.org/license/mit/">MIT License</ext-link> and deployed to the internet at the following link: 
                    <ext-link ext-link-type="uri" xlink:href="https://app.clinicaltrialrisk.org/">https://app.clinicaltrialrisk.org</ext-link>.</p>
            </sec>
        </sec>
        <sec sec-type="methods">
            <title>Methods</title>
            <sec>
                <title>Implementation</title>
                <p>The Clinical Trial Risk Tool is a web application written in Python
                    <sup>
                        <xref ref-type="bibr" rid="ref-39">39</xref>
                    </sup>, using the graphical interface library Plotly Dash
                    <sup>
                        <xref ref-type="bibr" rid="ref-40">40</xref>
                    </sup>, and the machine learning libraries NLTK
                    <sup>
                        <xref ref-type="bibr" rid="ref-41">41</xref>
                    </sup>, spaCy
                    <sup>
                        <xref ref-type="bibr" rid="ref-42">42</xref>
                    </sup>, and Scikit-Learn
                    <sup>
                        <xref ref-type="bibr" rid="ref-43">43</xref>
                    </sup>. The tool was developed as a Docker container
                    <sup>
                        <xref ref-type="bibr" rid="ref-44">44</xref>
                    </sup> and can run on any browser.</p>
                <p>The tool is architected as a Python server which contains most of the logic, which connects to a Java-based server running Tika for PDF parsing (
                    <xref ref-type="fig" rid="f1">Figure 1</xref>).</p>
                <fig fig-type="figure" id="f1" orientation="portrait" position="float">
                    <label>Figure 1. </label>
                    <caption>
                        <title>The user&#x2019;s browser connects to the Python server, running Dash and other logic (NLP, machine learning).</title>
                        <p>This server also connects to a separate server running Java and Tika for PDF parsing.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://gatesopenresearch-files.f1000.com/manuscripts/15729/e4262a86-6875-4352-876f-3606244677d5_figure1.gif"/>
                </fig>
                <p>Users have the option to login and create and save profiles.</p>
                <p>The tool allows a user to upload a trial protocol in PDF format. The tool processes the PDF into plain text using the software Apache Tika
                    <sup>
                        <xref ref-type="bibr" rid="ref-45">45</xref>
                    </sup>, and identifies features in the document content which indicate high or low risk of uninformativeness.</p>
                <p>The tool extracts eight features. The features are:</p>
                <list list-type="bullet">
                    <list-item>
                        <label>1. </label>
                        <p>Pathology (limited to HIV and TB at this stage)</p>
                    </list-item>
                    <list-item>
                        <label>2. </label>
                        <p>Trial phase</p>
                    </list-item>
                    <list-item>
                        <label>3. </label>
                        <p>Is a Statistical Analysis Plan (SAP) present?</p>
                    </list-item>
                    <list-item>
                        <label>4. </label>
                        <p>Is the effect estimate disclosed?</p>
                    </list-item>
                    <list-item>
                        <label>5. </label>
                        <p>Number of subjects (sample size)</p>
                    </list-item>
                    <list-item>
                        <label>6. </label>
                        <p>Number of arms</p>
                    </list-item>
                    <list-item>
                        <label>7. </label>
                        <p>Countries of investigation</p>
                    </list-item>
                    <list-item>
                        <label>8. </label>
                        <p>Does the trial use simulation for sample size determination?</p>
                    </list-item>
                </list>
                <p>The features are then passed into a scoring formula which scores the protocol from 0 to 100, and then the protocol is flagged as HIGH, MEDIUM or LOW risk.</p>
                <p>The tool allows for risk profiles and thresholds to be saved and loaded if the user is logged in (
                    <xref ref-type="fig" rid="f2">Figure 2</xref>).</p>
                <fig fig-type="figure" id="f2" orientation="portrait" position="float">
                    <label>Figure 2. </label>
                    <caption>
                        <title>Screenshot of the tool in operation, classifying a clinical trial protocol.</title>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://gatesopenresearch-files.f1000.com/manuscripts/15729/e4262a86-6875-4352-876f-3606244677d5_figure2.gif"/>
                </fig>
            </sec>
            <sec>
                <title>Feature selection exercise</title>
                <p>Before beginning development of the AI models, it was necessary to first identify the features which it would be advantageous for a tool to extract. It would be futile to spend a lot of time developing a model to extract a particular feature, only to discover that it has no or very little influence on the overall risk rating of a trial.</p>
                <p>We conducted a survey of two subject matter experts within the Bill &amp; Melinda Gates Foundation (BMGF) to gather information on which features would be helpful to focus on, independently of the technical difficulty of extracting that feature from the protocol text using NLP. Participants were sent a link to a questionnaire using the cloud-based software SurveyMonkey
                    <sup>
                        <xref ref-type="bibr" rid="ref-46">46</xref>
                    </sup>, which asked them to rank features of a protocol text in order of their level of influence on the protocol&#x2019;s success or failure in terms of informativeness (original questionnaire available 
                    <ext-link ext-link-type="uri" xlink:href="https://www.surveymonkey.com/r/TTVTZZG">here</ext-link>). A screenshot of the survey interface is given in 
                    <xref ref-type="fig" rid="f8">Figure 8</xref>. The consensus was that the SAP is by far the most informative feature, so it was decided to focus initially on developing an NLP model to identify the presence or absence of an SAP. This survey, although qualitative, may be useful in future if the tool is developed further.</p>
                <p>We took inspiration from the table of indicators of risk of uninformativeness given in Zarin 
                    <italic toggle="yes">et al.</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref-6">6</xref>
                    </sup>.</p>
                <p>The results of our survey, ranked in descending order of importance, are given in 
                    <xref ref-type="table" rid="T1">Table 1</xref>.</p>
                <table-wrap id="T1" orientation="portrait" position="anchor">
                    <label>Table 1. </label>
                    <caption>
                        <title>Results of a qualitative survey of feature importance for determining risk.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="bottom">Weighting informativeness features</th>
                                <th align="left" colspan="1" rowspan="1" valign="bottom">Mean
                                    <break/> score</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">Has an Statistical Analysis Plan</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">100%</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">Effect estimate not disclosed or unreliable</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">84%</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">tertile_of_sample_size by domain by phase</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">75%</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">Tertile of number of sites by domain by phase</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">72%</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">Composite product of tertile of Primary Duration times
                                    <break/> tertile of Sample Size</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">72%</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">tertile of number of (co-)primary endpoints by domain
                                    <break/> by phase</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">72%</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">Number of endpoints</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">66%</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">Multiple countries (Y/N)</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">56%</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">Uses simulation for sample size</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">56%</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">Prevalence estimate not disclosed or unreliable</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">53%</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">Is a master protocol or a subset or derivative of a 
                                    <break/>master protocol</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">53%</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">Is part of a platform trial</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">53%</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">Number of visits</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">50%</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">Duration of trial</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">50%</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">Multiple sites in a single country trial (Y/N)</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">47%</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">Number of countries with at least one site</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">47%</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">Uses model-informed drug development</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">47%</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">Tertile of primary duration</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">44%</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">Number of arms</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">44%</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">Patient consortium or trial consortium prominently
                                    <break/> involved</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">44%</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">Is an adaptive design</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">41%</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">Takes place in a hospital</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">38%</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">Phase-in-domain</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">38%</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">Recency of protocol vs today's date</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">38%</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">Recent dates in prevalence/burden citations</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">38%</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">Indicates intention or willingness to make changes at
                                    <break/> interim</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">38%</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">Number of trial sites in entire trial</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">31%</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">Number of procedures</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">31%</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">Includes analysis of real world data</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">28%</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">More than one drug in the intervention</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">28%</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">Number of mentions of the word policy</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">25%</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">Case report form pages-all trial</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">6%</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">Case report form pages per variable</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">0%</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">Duration of follow up (in months)</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">0%</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">Two-level binomial lowest tertile of sample size by
                                    <break/> domain by phase</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">0%</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
            </sec>
            <sec>
                <title>Datasets used for training and validation</title>
                <p>Two datasets were used to train and validate the tool.</p>
                <p>
                    <bold>
                        <italic toggle="yes">Manual dataset</italic>.</bold> A set of 300 protocols, some supplied by the BMGF and some downloaded from ClinicalTrials.gov, were read through individually and annotated with key features: the sample size, pathology, number of arms, phase, intervention type, countries of investigation, presence of SAP, effect estimate, use of simulation. The number of protocols manually annotated per parameter varied between 100 and 300.</p>
                <p>
                    <bold>
                        <italic toggle="yes">ClinicalTrials.gov dataset</italic>.</bold> The ClinicalTrials.gov dataset was a much larger dataset of 11,925 protocols which was downloaded from the ClinicalTrials.gov AACT data dump on 4 Nov 2022
                    <sup>
                        <xref ref-type="bibr" rid="ref-47">47</xref>
                    </sup>. The data dump came in the form of a PostgreSQL database
                    <sup>
                        <xref ref-type="bibr" rid="ref-48">48</xref>
                    </sup> and included the protocol PDFs and metadata on the National Clinical Trial (NCT) ID, phase, pathology, presence or absence of an SAP, number of arms and number of subjects. However, these values were voluntarily provided by the researchers and in many cases are out of date or inaccurate.</p>
                <p>By combining the two datasets, it was possible to combine some of the advantages of a large dataset with some of the advantages of a smaller, more accurate dataset.</p>
            </sec>
            <sec>
                <title>Breakdown of the individual machine learning models used</title>
                <p>Each parameter is identified in the document by a stand-alone component using machine learning and, optionally, some manually coded rules. The machine learning techniques used were naive Bayes classifiers, random forest classifiers, and convolutional neural networks. Examples of manual rules are &#x201c;a number followed by a unit such as &#x2018;mmHg&#x2019; cannot be a sample size&#x201d;. In particular, country names were identified using a dictionary lookup approach with some exceptions, such as &#x201c;a mention of &#x2018;Georgia&#x2019; is most likely to be the US state unless other words occur in the vicinity which indicate Georgia the country&#x201d;.</p>
                <p>The output values from these components are then fed into the linear risk model, which calculates the overall risk score of the protocol.</p>
                <p>An additional Na&#x00ef;ve Bayes classifier was used to obtain a baseline performance on each parameter before more advanced models were trained.</p>
                <p>
                    <bold>
                        <italic toggle="yes">Pathology (condition)</italic>.</bold> The pathology of a trial is identified using a three-way Na&#x00ef;ve Bayes classifier operating on the text of the whole document on token level, which classifies documents into HIV, TB, or Other. It treats HIV and TB as mutually exclusive, although in future work more pathologies could be covered and the tool could assign a document to multiple pathologies.</p>
                <p>To develop this, protocols were manually tagged as HIV, TB or other and the tool learnt which words are indicative of which pathology.</p>
                <p>The classifier was trained on the manual dataset as a three-class classifier, but could easily be extended in future to cover more pathologies.</p>
                <p>The tool also identifies key words and phrases throughout the document which are related to pathology and presents these to the user.</p>
                <p>
                    <bold>
                        <italic toggle="yes">Trial phase</italic>.</bold> Trial phase is represented in the model by a floating-point number (whole number or whole number plus 0.5) between 0 and 4, where 1.5 means Phase I/II. The model for extracting the phase was implemented as an ensemble between a convolutional neural network text classifier, implemented using the NLP library 
                    <ext-link ext-link-type="uri" xlink:href="https://spacy.io/">spaCy</ext-link>, and a rule-based pattern matching algorithm combined with a rule-based feature extraction stage and a random forest binary classifier, implemented using Scikit-Learn (RRID:SCR_002577). Both models in the ensemble output an array of probabilities, which were averaged to produce a final array. The phase candidate returned by the ensemble model was the maximum likelihood value.</p>
                <p>
                    <bold>
                        <italic toggle="yes">Presence or absence of statistical analysis plan (SAP)</italic>.</bold> The presence or absence of an SAP is identified 
                    <italic toggle="yes">via</italic> a Na&#x00ef;ve Bayes classifier operating on the text of the whole document on word level. In addition, candidate pages which are likely to be part of the SAP are highlighted to the user using a Na&#x00ef;ve Bayes classifier operating on the text of each page individually.</p>
                <p>
                    <bold>
                        <italic toggle="yes">Effect estimate</italic>.</bold> A rule-based component written in spaCy identifies candidate values for the effect estimate from the numeric substrings present in the document. These can be presented as percentages, fractions, or take other surface forms. A weighted Na&#x00ef;ve Bayes classifier which is applied to a window of 20 tokens around each candidate number found in the document, and the highest ranking effect estimate candidates are returned. The values are displayed to the user, but only the binary value of the presence or absence of an effect estimate enters into the risk calculation.</p>
                <p>
                    <bold>
                        <italic toggle="yes">Number of subjects (sample size)</italic>.</bold> A rule-based component written in spaCy identifies candidate values for the sample size from the numeric substrings present in the document. These values are then passed to a random forest classifier, which ranks them by likelihood of being the true sample size, and identifies any substrings such as &#x201c;per arm&#x201d; or &#x201c;per cohort&#x201d;, which can then be used to multiply by the number of arms if applicable.</p>
                <p>
                    <bold>
                        <italic toggle="yes">Number of arms</italic>.</bold> The number of arms is identified using an ensemble machine learning and rule-based tool using the NLP library spaCy and scikit-learn Random Forest.</p>
                <p>
                    <bold>
                        <italic toggle="yes">Countries of investigation</italic>.</bold> The countries of investigation are identified using an ensemble of machine learning and rule based components using regular expressions and Keras convolutional neural networks, which are combined using a Scikit-Learn Random Forest model.</p>
                <p>
                    <bold>
                        <italic toggle="yes">Simulation used for sample size determination</italic>.</bold> This is a Na&#x00ef;ve Bayes classifier operating on the text of each page individually. If a page contains information about simulation being used for sample size, the classifier classifies that page as 1, otherwise as 0. If any page in the whole document is classified as class 1, then the protocol is considered to have used simulation for sample size determination.</p>
                <p>Although trials may use simulation at various points, the data tagged for simulation includes only trials using simulation specifically for sample size planning. Trials using simulation for later stages of statistical analysis are excluded.</p>
                <p>
                    <bold>
                        <italic toggle="yes">Sample size tertiles</italic>.</bold> The sample size is not fed directly into the risk model, but is converted into a value of 0, 1, or 2, representing the tertile of that sample size within a dataset of comparable trials (same phase and pathology).</p>
                <p>The default sample size tertile threshold values are given in 
                    <xref ref-type="table" rid="T2">Table 2</xref>, but the user can change these values.</p>
                <table-wrap id="T2" orientation="portrait" position="anchor">
                    <label>Table 2. </label>
                    <caption>
                        <title>Default sample size tertiles for HIV and tuberculosis (TB).</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="bottom">Phase</th>
                                <th align="left" colspan="1" rowspan="1" valign="bottom">HIV
                                    <break/> lower
                                    <break/> tertile</th>
                                <th align="left" colspan="1" rowspan="1" valign="bottom">HIV
                                    <break/> upper
                                    <break/> tertile</th>
                                <th align="left" colspan="1" rowspan="1" valign="bottom">TB 
                                    <break/>lower
                                    <break/> tertile</th>
                                <th align="left" colspan="1" rowspan="1" valign="bottom">TB 
                                    <break/>upper
                                    <break/> tertile</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">10</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">15</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">10</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">15</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">0.5</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">40</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">130</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">30</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">60</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">1</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">40</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">130</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">30</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">60</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">1.5</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">80</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">280</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">40</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">80</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">2</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">100</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">300</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">50</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">100</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">2.5</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">1000</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">2000</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">500</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">1500</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">3</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">1000</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">2000</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">500</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">1500</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">4</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">3000</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">4000</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">3000</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">4000</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
                <p>The default sample size tertiles were derived from a sample of 21 trials in LMICs, but have been rounded and manually adjusted based on statistics from ClinicalTrials.gov data. </p>
                <p>The tertiles were first calculated using the training dataset, but in a number of phase and pathology combinations the data was too sparse and so tertile values had to be used from ClinicalTrials.gov. The ClinicalTrials.gov data dump was used from 28 Feb 2022.</p>
                <p>
                    <bold>
                        <italic toggle="yes">Linear risk model</italic>.</bold> The features extracted by the NLP components are fed into a linear scoring formula, which was designed for this software.</p>
                <p>Each parameter is converted into an integer or floating-point number, and multiplied by an associated weight, and this is used to calculate a score between 0 and 100. From this score, the protocol is flagged as HIGH, MEDIUM or LOW risk. For example, a protocol scores 26 points for having a completed SAP, and a protocol scoring above 50 points in total for all features is considered low risk. The linear formula has a bias term of -7.</p>
                <p>Protocols scoring 50 or above are considered by default to be low risk. Protocols which score 40 or above but below 50 are marked as medium risk, and scores below 40 are high risk.</p>
                <p>Our formula can be summarized as follows:</p>
                <disp-formula id="e">
                    <mml:math display="block" id="math">
                        <mml:mrow>
                            <mml:mi>s</mml:mi>
                            <mml:mo>=</mml:mo>
                            <mml:mn>26</mml:mn>
                            <mml:msub>
                                <mml:mi>x</mml:mi>
                                <mml:mrow>
                                    <mml:mi>S</mml:mi>
                                    <mml:mi>A</mml:mi>
                                    <mml:mi>P</mml:mi>
                                </mml:mrow>
                            </mml:msub>
                            <mml:mo>+</mml:mo>
                            <mml:mn>16</mml:mn>
                            <mml:msub>
                                <mml:mi>x</mml:mi>
                                <mml:mrow>
                                    <mml:mi>e</mml:mi>
                                    <mml:mi>f</mml:mi>
                                    <mml:mi>f</mml:mi>
                                    <mml:mi>e</mml:mi>
                                    <mml:mi>c</mml:mi>
                                    <mml:mi>t</mml:mi>
                                    <mml:mspace width="0.2em"/>
                                    <mml:mi>e</mml:mi>
                                    <mml:mi>s</mml:mi>
                                    <mml:mi>t</mml:mi>
                                    <mml:mi>i</mml:mi>
                                    <mml:mi>m</mml:mi>
                                    <mml:mi>a</mml:mi>
                                    <mml:mi>t</mml:mi>
                                    <mml:mi>e</mml:mi>
                                </mml:mrow>
                            </mml:msub>
                            <mml:mo>+</mml:mo>
                            <mml:mn>10</mml:mn>
                            <mml:msub>
                                <mml:mi>x</mml:mi>
                                <mml:mrow>
                                    <mml:mi>s</mml:mi>
                                    <mml:mi>a</mml:mi>
                                    <mml:mi>m</mml:mi>
                                    <mml:mi>p</mml:mi>
                                    <mml:mi>l</mml:mi>
                                    <mml:mi>e</mml:mi>
                                    <mml:mspace width="0.2em"/>
                                    <mml:mi>s</mml:mi>
                                    <mml:mi>i</mml:mi>
                                    <mml:mi>z</mml:mi>
                                    <mml:mi>e</mml:mi>
                                    <mml:mspace width="0.2em"/>
                                    <mml:mi>t</mml:mi>
                                    <mml:mi>e</mml:mi>
                                    <mml:mi>r</mml:mi>
                                    <mml:mi>t</mml:mi>
                                    <mml:mi>i</mml:mi>
                                    <mml:mi>l</mml:mi>
                                    <mml:mi>e</mml:mi>
                                </mml:mrow>
                            </mml:msub>
                            <mml:mo>+</mml:mo>
                            <mml:mn>10</mml:mn>
                            <mml:msub>
                                <mml:mi>x</mml:mi>
                                <mml:mrow>
                                    <mml:mi>i</mml:mi>
                                    <mml:mi>n</mml:mi>
                                    <mml:mi>t</mml:mi>
                                    <mml:mi>e</mml:mi>
                                    <mml:mi>r</mml:mi>
                                    <mml:mi>n</mml:mi>
                                    <mml:mi>a</mml:mi>
                                    <mml:mi>t</mml:mi>
                                    <mml:mi>i</mml:mi>
                                    <mml:mi>o</mml:mi>
                                    <mml:mi>n</mml:mi>
                                    <mml:mi>a</mml:mi>
                                    <mml:mi>l</mml:mi>
                                </mml:mrow>
                            </mml:msub>
                            <mml:mo>+</mml:mo>
                            <mml:mn>10</mml:mn>
                            <mml:msub>
                                <mml:mi>x</mml:mi>
                                <mml:mrow>
                                    <mml:mi>s</mml:mi>
                                    <mml:mi>i</mml:mi>
                                    <mml:mi>m</mml:mi>
                                    <mml:mi>u</mml:mi>
                                    <mml:mi>l</mml:mi>
                                    <mml:mi>a</mml:mi>
                                    <mml:mi>t</mml:mi>
                                    <mml:mi>i</mml:mi>
                                    <mml:mi>o</mml:mi>
                                    <mml:mi>n</mml:mi>
                                </mml:mrow>
                            </mml:msub>
                            <mml:mo>+</mml:mo>
                            <mml:mn>5</mml:mn>
                            <mml:msub>
                                <mml:mi>x</mml:mi>
                                <mml:mrow>
                                    <mml:mi>p</mml:mi>
                                    <mml:mi>h</mml:mi>
                                    <mml:mi>a</mml:mi>
                                    <mml:mi>s</mml:mi>
                                    <mml:mi>e</mml:mi>
                                </mml:mrow>
                            </mml:msub>
                            <mml:mo>+</mml:mo>
                            <mml:mn>2</mml:mn>
                            <mml:msub>
                                <mml:mi>x</mml:mi>
                                <mml:mrow>
                                    <mml:mi>a</mml:mi>
                                    <mml:mi>r</mml:mi>
                                    <mml:mi>m</mml:mi>
                                    <mml:mi>s</mml:mi>
                                </mml:mrow>
                            </mml:msub>
                            <mml:mo>&#x2212;</mml:mo>
                            <mml:mn>7</mml:mn>
                        </mml:mrow>
                    </mml:math> </disp-formula>
                <p>Where the 
                    <italic toggle="yes">x
                        <sub>i</sub>
                    </italic> are the features extracted from the text. All features are binary variables except for sample size tertile (0 = small trial, 1 = medium trial, 2 = large trial), phase, and number of arms (which is capped at 5 to avoid distortions caused by any trials with an unusually large number of arms).</p>
                <p>Our formula can be seen as a form of linear regression, where the weights were arrived at via human reasoning rather than a loss function.</p>
                <p>The risk values were arrived at as part of a qualitative process in consultation with subject matter experts, who identified the features that they would look for in assessing a protocol for risk manually. The consensus was that the SAP is by far the strongest predictor of risk (a trial lacking an SAP is extremely unlikely to succeed).</p>
                <p>The default feature weights are given in 
                    <xref ref-type="table" rid="T3">Table 3</xref>.</p>
                <table-wrap id="T3" orientation="portrait" position="anchor">
                    <label>Table 3. </label>
                    <caption>
                        <title>High and low risk thresholds for the total protocol score, and the default weights.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="bottom">Feature</th>
                                <th align="left" colspan="1" rowspan="1" valign="bottom">Value
                                    <break/> or 
                                    <break/>weight</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">High risk threshold</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">40</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">Low risk threshold</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">50</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">Number of arms</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">2</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">Trial phase</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">5</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">SAP completed?</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">26</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">Effect Estimate disclosed?</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">16</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">Number of subjects low/
                                    <break/>medium/high</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">10</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">Trial is international?</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">10</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">Trial uses simulation?</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">10</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">Constant (bias)</td>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">-7</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
            </sec>
            <sec>
                <title>Operation</title>
                <p>The Clinical Trial Risk Tool can be accessed 
                    <italic toggle="yes">via</italic> any web browser 
                    <ext-link ext-link-type="uri" xlink:href="https://app.clinicaltrialrisk.org/">here</ext-link>. All computations are conducted remotely on a Python server. The software has an embedded video tutorial to ease the learning process. The user interface contains mouseover tooltips with layperson-friendly explanations of the options in the tool.</p>
                <p>The user can adjust the sample size tertile thresholds and weights associated with the features in the user interface and save this as a configuration file.</p>
                <p>A user has the option to click on the Login button to create a user account and save their configuration on the server. Authentication is managed by the third party authentication provider Auth0.com.</p>
                <p>If a user wishes to use the application anonymously, all functionality is still available without logging in, but the user is not able to save and retrieve profiles at a later date.</p>
            </sec>
            <sec>
                <title>Workflow</title>
                <p>A user uploads a PDF file of a clinical trial protocol, either by dragging and dropping it into the tool, or by using a file selector dialog. On the server side, the tool parses the raw PDF file into plain text, and then presents the user with the features that were identified in the text: pathology, phase, SAP, effect estimate, sample size, sample size tertile, number of arms, countries of investigation, and simulation.</p>
                <p>The user can then correct these features by clicking on dropdowns and selecting or typing the correct value in the GUI.</p>
                <p>In real-time, the features are fed into the risk model which presents the protocol&#x2019;s risk level as a color-coded HIGH, MEDIUM or LOW risk.</p>
                <p>The GUI includes a graph view of the key terms&#x2019; locations within the document by page number, allowing the user to quickly identify pages which are heavy in statistical content or other relevant terms. The tool&#x2019;s analysis of the protocol of an HIV trial in 
                    <xref ref-type="bibr" rid="ref-49">49</xref> is shown in 
                    <xref ref-type="fig" rid="f3">Figure 3</xref>.</p>
                <fig fig-type="figure" id="f3" orientation="portrait" position="float">
                    <label>Figure 3. </label>
                    <caption>
                        <title>The graphical user interface showing the graph view of key statistical analysis plan-related terms by page number in the document.</title>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://gatesopenresearch-files.f1000.com/manuscripts/15729/e4262a86-6875-4352-876f-3606244677d5_figure3.gif"/>
                </fig>
                <p>The user can export the risk assessment with all explanations and key figures to an Excel or PDF file.</p>
                <p>Finally, if the user has changed the sample size tertile thresholds or feature weights, this configuration can be saved on the server (if the user is authenticated), or to the user&#x2019;s local machine.</p>
            </sec>
        </sec>
        <sec sec-type="results">
            <title>Results</title>
            <sec>
                <title>User testing</title>
                <p>The tool was tested by internal and external subject matter experts, who provided feedback throughout the project. In this way, inaccuracies and pain points could be identified and fixed in an iterative process.</p>
            </sec>
            <sec>
                <title>Validation</title>
                <p>For validation on the manual dataset, cross-validation was used. For validation on the ClinicalTrials.gov dataset, Trials with values 0&#x2013;7 as the third digit of their numeric NCT ID were used for training, with value 8 were used for validation, and those with value 9 are held out as a future test set.</p>
            </sec>
            <sec>
                <title>Validation scores for manual dataset</title>
                <p>The validation scores on small manually labeled dataset (about 100 protocols labeled, but 300 labeled for number of subjects) are given in 
                    <xref ref-type="table" rid="T4">Table 4</xref>.</p>
                <table-wrap id="T4" orientation="portrait" position="anchor">
                    <label>Table 4. </label>
                    <caption>
                        <title>Validation scores on manual dataset.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">Component</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Accuracy 
                                    <break/>&#x2013; manual 
                                    <break/>validation 
                                    <break/>dataset</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">AUC
                                    <break/> &#x2013; manual 
                                    <break/>validation 
                                    <break/>dataset</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Technique</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Condition 
                                    <break/>(Na&#x00ef;ve Bayes)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">88%</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">100%</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Na&#x00ef;ve Bayes</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Statistical 
                                    <break/>analysis plan 
                                    <break/>(Na&#x00ef;ve Bayes)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">85%</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">87%</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Na&#x00ef;ve Bayes</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Effect
                                    <break/> Estimate</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">73%</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">95%</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Ensemble:
                                    <break/> rule based +
                                    <break/> Na&#x00ef;ve Bayes</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Number of
                                    <break/> Subjects</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">69% (71%
                                    <break/> within 10%
                                    <break/> margin)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">N/A</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Ensemble: 
                                    <break/>rule based 
                                    <break/>+ Random
                                    <break/> Forest</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Simulation</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">94%</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">98%</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Na&#x00ef;ve Bayes</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
                <p>Each component was validated using accuracy and Area Under the Curve (AUC).</p>
            </sec>
            <sec>
                <title>Validation scores for ClinicalTrials.gov dataset</title>
                <p>Accuracy figures are reported in 
                    <xref ref-type="table" rid="T5">Table 5</xref> together with performance of a comparable Na&#x00ef;ve Bayes baseline model trained on the ClinicalTrials.gov training dataset, which can provide an estimate of a reasonable baseline performance.</p>
                <table-wrap id="T5" orientation="portrait" position="anchor">
                    <label>Table 5. </label>
                    <caption>
                        <title>Validation scores on the ClinicalTrials.gov dataset.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">Component</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Accuracy &#x2013; 
                                    <break/>ClinicalTrials.
                                    <break/>gov validation
                                    <break/> dataset</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Baseline
                                    <break/> Accuracy 
                                    <break/>(Na&#x00ef;ve Bayes)
                                    <break/> &#x2013; ClinicalTrials.
                                    <break/>gov validation 
                                    <break/>dataset</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Technique</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Phase</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">75%</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">45%</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Ensemble:
                                    <break/> rule based
                                    <break/> + Random
                                    <break/> Forest</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">SAP</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">82%</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">82%</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Na&#x00ef;ve
                                    <break/> Bayes</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Number of 
                                    <break/>Subjects</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">13%</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">6%</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Ensemble:
                                    <break/> rule based
                                    <break/> + Random 
                                    <break/>Forest</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Number of 
                                    <break/>Arms</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">58%</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">52%</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Ensemble:
                                    <break/> rule based 
                                    <break/>+ Random 
                                    <break/>Forest</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Countries of
                                    <break/> Investigation</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">AUC 87%</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">N/A</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Ensemble:
                                    <break/> rule based
                                    <break/> + CNN +
                                    <break/> Random
                                    <break/> Forest
                                    <break/> + Na&#x00ef;ve
                                    <break/> Bayes</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
            </sec>
            <sec>
                <title>Validation scores for Hutchinson 
                    <italic toggle="yes">et al.</italic> dataset</title>
                <p>In addition to validating the performance of the NLP components of the tool, it was also necessary to validate the risk model.</p>
                <p>We took the dataset of 125 trials analyzed by Hutchinson 
                    <italic toggle="yes">et al.</italic> in their 2022 analysis, where they attempted to establish the proportion of RCTs that inform clinical practice
                    <sup>
                        <xref ref-type="bibr" rid="ref-3">3</xref>
                    </sup>. Unfortunately, only six of the protocols in that study could be located from ClinicalTrials.gov.</p>
                <p>We passed these six protocols through the tool and compared the risk output of the tool to whether or not Hutchinson 
                    <italic toggle="yes">et al.</italic> considered the trials informative (
                    <xref ref-type="table" rid="T6">Table 6</xref>). On this small dataset, the tool predicted informativeness with 100% AUC (the two trials scoring 60 or below were not informative). This was a useful sanity check for the risk model, although the test set is far too small for this test to be scientifically rigorous.</p>
                <table-wrap id="T6" orientation="portrait" position="anchor">
                    <label>Table 6. </label>
                    <caption>
                        <title>Validation scores on Hutchinson 
                            <italic toggle="yes">et al.</italic> dataset.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">NCT</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Disease</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Informative 
                                    <break/>(Hutchinson 
                                    <break/>
                                    <italic toggle="yes">et al.</italic>)</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Risk
                                    <break/> score
                                    <break/> from
                                    <break/> tool</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Risk
                                    <break/> label 
                                    <break/>from
                                    <break/> tool</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">NCT00946712</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">LUNG</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">72</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">LOW</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">NCT01032629</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">DIAB</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">60</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">LOW</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">NCT01107626</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">LUNG</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">62</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">LOW</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">NCT01144338</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">DIAB</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">48</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">MEDIUM</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">NCT01205776</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">CVS</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">69</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">LOW</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">NCT01206062</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">CVS</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">69</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">LOW</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
            </sec>
        </sec>
        <sec sec-type="discussion">
            <title>Discussion</title>
            <p>We have validated the individual components of the tool separately. Accuracies vary among the datasets and components validated.</p>
            <p>In particular, the sample size component&#x2019;s accuracy on the ClinicalTrials.gov dataset was particularly inaccurate. The low performance for that value is due to lack of a reliable gold standard, rather than low performance of the risk tool itself. The sample size identification was particularly challenging and required the manual labeling of 300 protocols in order to achieve a performance that was acceptable in user testing. This is because the sample size cannot be reduced to a simple three- or four-way classification problem like many of the other features, but is a problem of data extraction with many confounding factors such as false positives.</p>
            <p>It is fortunate that some of the most important features, such as the presence or absence of the SAP, were relatively easy to identify with machine learning (since SAP can be reduced to a binary classification problem, which is one of the easiest kinds of problems to solve in machine learning).</p>
            <p>We were able to look inside the parameters of the models that are used to extract the individual features, in order to search for any potential improvements. For example, the sample size extraction component identifies candidate sample sizes in the text using a set of manually created rules, and calculates features for each of them (distance in tokens to the term &#x201c;sample size&#x201d;, 
                <italic toggle="yes">etc</italic>). The Random Forest model allows us to visualize the feature importances of the model, and we see at a glance that the strongest indicators that a number in the text is the true sample size are the distance to the terms &#x201c;sample size&#x201d; and  &#x201c;number of subjects&#x201d;, followed by the num_occurrences (the number of times that number occurs in the text). The feature importances of the sample size classifier are shown in 
                <xref ref-type="fig" rid="f4">Figure 4</xref>.</p>
            <fig fig-type="figure" id="f4" orientation="portrait" position="float">
                <label>Figure 4. </label>
                <caption>
                    <title>Feature importances for sample size extractor (random forest).</title>
                </caption>
                <graphic orientation="portrait" position="float" xlink:href="https://gatesopenresearch-files.f1000.com/manuscripts/15729/e4262a86-6875-4352-876f-3606244677d5_figure4.gif"/>
            </fig>
            <p>Likewise, the feature importances for the component that extracts mentions of &#x201c;simulation&#x201d; are shown in 
                <xref ref-type="fig" rid="f5">Figure 5</xref>.</p>
            <fig fig-type="figure" id="f5" orientation="portrait" position="float">
                <label>Figure 5. </label>
                <caption>
                    <title>Feature importances for simulation extractor (random forest).</title>
                </caption>
                <graphic orientation="portrait" position="float" xlink:href="https://gatesopenresearch-files.f1000.com/manuscripts/15729/e4262a86-6875-4352-876f-3606244677d5_figure5.gif"/>
            </fig>
            <p>We have also explored the performance of the models using more sophisticated metrics than AUC and accuracy. For example, 
                <xref ref-type="fig" rid="f6">Figure 6</xref> shows the confusion matrix of the phase extractor. We can see at a glance that the commonest phases in the dataset are 2 and 3, and phase 2 is likely to be confused with phase 1.5 (I/II).</p>
            <fig fig-type="figure" id="f6" orientation="portrait" position="float">
                <label>Figure 6. </label>
                <caption>
                    <title>Confusion matrix for phase extractor (ensemble model).</title>
                </caption>
                <graphic orientation="portrait" position="float" xlink:href="https://gatesopenresearch-files.f1000.com/manuscripts/15729/e4262a86-6875-4352-876f-3606244677d5_figure6.gif"/>
            </fig>
            <p>The confusion matrix visualization also makes it clear how much harder the sample size identification is compared to the other features that the tool extracts from the protocol text. 
                <xref ref-type="fig" rid="f7">Figure 7</xref> shows the confusion matrix for the sample size detection component.</p>
            <fig fig-type="figure" id="f7" orientation="portrait" position="float">
                <label>Figure 7. </label>
                <caption>
                    <title>Confusion matrix for sample size extractor (ensemble model).</title>
                </caption>
                <graphic orientation="portrait" position="float" xlink:href="https://gatesopenresearch-files.f1000.com/manuscripts/15729/e4262a86-6875-4352-876f-3606244677d5_figure7.gif"/>
            </fig>
            <fig fig-type="figure" id="f8" orientation="portrait" position="float">
                <label>Figure 8. </label>
                <caption>
                    <title>The survey on informativeness features.</title>
                </caption>
                <graphic orientation="portrait" position="float" xlink:href="https://gatesopenresearch-files.f1000.com/manuscripts/15729/e4262a86-6875-4352-876f-3606244677d5_figure8.gif"/>
            </fig>
            <p>In our accuracy calculations, we have considered a sample size to be correct only when it is exactly equal to the true value, so a predicted value of 61 for a ground truth of 62 would be considered an error. For the purposes of the confusion matrix, we allowed a tolerance of 1 significant figure. We can see at a glance that low sample sizes (10&#x2013;30) are the ones most likely to be confused by the model.</p>
            <p>We have provided Jupyter notebooks in the repository to run the validation and reproduce the results.</p>
            <p>It was not possible to conduct a thorough analysis of the linear risk model due to data on &#x201c;informativeness&#x201d; of clinical trials being harder to obtain, and the intersection of that data with the available trial protocol documents being small. Further studies are needed to validate the risk modeling part of the tool.</p>
        </sec>
        <sec>
            <title>Example scenarios and user journeys with the Clinical Trial Risk Tool</title>
            <sec>
                <title>Scenario 1: triage</title>
                <p>A funding organization receives large volumes of incoming protocols. They have a team of reviewers who are reading the documents and categorizing them as &#x2018;go&#x2019; or &#x2018;no-go&#x2019;. The majority of protocols are not accepted for funding, because they do not meet some of the funder&#x2019;s criteria. The organization would prefer to spend less time on the high-risk protocols.</p>
                <p>Using the Clinical Trial Risk Tool, the reviewers would be able to quickly identify the incoming protocols which should not be considered for funding, such as those which are missing key statistical information. This frees up more of their time to process the high-quality protocols.</p>
            </sec>
            <sec>
                <title>Scenario 2: standardization of review</title>
                <p>When protocols are passed to reviewers, each reviewer typically comes from a different background and brings with them their own way of viewing a protocol. The reviewing team could use the tool to calibrate and standardize their review processes for greater consistency. For example, they could agree on a standard set of weights and parameters for the model and save it on an organizational or departmental level.</p>
            </sec>
            <sec>
                <title>Scenario 3: pre-submission vetting</title>
                <p>An investigator is preparing a trial protocol for submission as part of a funding application. Each funding organization has their own checklist of key &#x2018;must-haves&#x2019; and &#x2018;should-haves&#x2019; in a trial. The applicant uses the Clinical Trial Risk tool to vet their protocol and identify any weak points. For example, the tool may flag the trial as high risk because the expected effect estimate is not clearly stated. This gives the investigator an opportunity to correct the issue before submission, increasing the chances of acceptance.</p>
            </sec>
            <sec>
                <title>Scenario 4: training</title>
                <p>The tool can be used for education and training of investigators or reviewers on what makes a robust protocol, facilitating the upskilling of junior reviewers.</p>
            </sec>
            <sec>
                <title>Scenario 5: auto-populating risk questionnaire</title>
                <p>Some funding organizations, such as the Bill &amp; Melinda Gates Foundation, require a risk assessment questionnaire (the DAC risk assessment questionnaire) to be submitted together with the protocol. If the tool is exposed as an application programmable interface (API), it can be used for auto-population of the risk assessment questionnaire. This streamlines the submission process, as the tool can retrieve important information from the PDF in seconds, freeing the applicant to do other tasks.</p>
            </sec>
            <sec>
                <title>Scenario 6: adapting source code for a new domain</title>
                <p>A pharmaceutical company may like to use the tool to estimate the cost of an oncology trial. The tool source code is open source, so the pharmaceutical company can engage a developer to modify the tool to estimate a dollar value of the trial. New features have to be added, such as cancer stage, and number of chemotherapy cycles, but fortunately the developer can &#x2018;recycle&#x2019; the code that is currently identifying trial phase for these purposes. The company has a database of past trials and confidential and sensitive industry data on their cost over the last ten years, which are used to train a regression model to predict the cost. The tool&#x2019;s performance can be validated on data on the most recent trials if that has been withheld from the training data. The pharmaceutical company now has a customized in-house cost estimation tool. Since the Clinical Trial Risk Tool is under MIT License, this means that the pharma company is not obligated to share its in-house cost model, which contains industry-sensitive data, but they choose to put the oncology-specific NLP features that they have added to the tool in the public domain.</p>
            </sec>
        </sec>
        <sec sec-type="conclusions">
            <title>Conclusions</title>
            <p>We have developed a software tool which we believe is unique in using natural language processing to provide a risk profile of a clinical trial protocol.</p>
            <p>The tool can assist a human in assessing the risk of uninformativeness of a trial, and understanding which factors contribute to the risk of uninformativeness. With the use of this tool, reviewers may be able to assess trials more rapidly, and the tool could be used to inform stakeholders about the most impactful features for risk of uninformativeness. The tool can also assist reviewers in assessing trials more consistently, and investigators may use it to validate their draft protocols before submitting them to a funding organization.</p>
            <p>The use of the tool is intuitive and the software is open-source and can be accessed 
                <italic toggle="yes">via</italic> any web browser, allowing clinical trial investigators who do not have the expertise in software or programming to use the tool.</p>
            <p>Since the software is open source under an MIT License, an investigator can easily fork the project and extend it to another field such as oncology, or to predict trial cost or complexity, with relatively little effort.</p>
            <p>Validation of the tool has been complex because each component of the tool has been designed independently, and the data on ClinicalTrials.gov is not entirely accurate because it depends on researchers updating their profiles manually. It was time consuming to manually annotate large numbers of protocols, but further manual labeling could pave the way for further improvements in accuracy. There is still much scope for improvement of several features, especially sample size.</p>
            <p>The tool is trained to detect only two pathologies, HIV and TB. However, if a user uploads a protocol from a different pathology, they could still use the tool for risk assessment, but they would need to set appropriate values for the feature weights and sample size tertiles. For some high-risk pathologies, such as oncology or cardiovascular disease, we would not expect the tool to be as accurate at identifying risk, because of the importance of other features, such as biomarkers, enrolment criteria, toxicity of treatment, and chemotherapy cycles, which are not currently handled in the tool, but which are important for these pathologies
                <sup>
                    <xref ref-type="bibr" rid="ref-7">7</xref>
                </sup>.</p>
            <p>Future work on this project could involve broadening the scope to more pathologies, or altering the tool to predict cost, complexity or other key metrics of a trial. If we were to extract further features from the text using NLP, candidate features would include the number of endpoints, the prevalence estimate not being disclosed, the trial being a platform trial, the protocol being a master protocol, and more.</p>
            <p>User requested features include support for multi-document protocols (e.g. Protocol and SAP in separate PDFs), or support for processing of multiple documents at the same time, or exposing the tool as an API or library.</p>
            <p>One potential future extension of the project would see the tool developed further into a case management system, which would ingest protocols, SAPs, questionnaires, and regulatory paperwork, and track the associated metadata on trial level, similar to the legal case management systems described in the Introduction.</p>
        </sec>
        <sec>
            <title>Ethics and consent</title>
            <p>No ethical approval was sought for this study due to the very low risk nature of the survey conducted, where no personal or identifiable information was collected.</p>
            <p>Completion of this survey implied consent for data collection, with written informed consent obtained from each participant before the publication of this manuscript for publication and use of their data</p>
        </sec>
        <sec>
            <title>Abbreviations</title>
            <list list-type="bullet">
                <list-item>
                    <p>SAP: Statistical Analysis Plan</p>
                </list-item>
                <list-item>
                    <p>NLP: Natural Language Processing</p>
                </list-item>
                <list-item>
                    <p>HIV: Human Immunodeficiency Virus</p>
                </list-item>
                <list-item>
                    <p>TB: Tuberculosis</p>
                </list-item>
                <list-item>
                    <p>CNN: Convolutional Neural Network</p>
                </list-item>
                <list-item>
                    <p>NLTK: Natural Language Toolkit</p>
                </list-item>
                <list-item>
                    <p>Tf*Idf: term frequency*inverse document frequency</p>
                </list-item>
                <list-item>
                    <p>AI: Artificial Intelligence</p>
                </list-item>
                <list-item>
                    <p>GUI: Graphical User Interface</p>
                </list-item>
                <list-item>
                    <p>AUC: Area Under the [ROC] Curve</p>
                </list-item>
                <list-item>
                    <p>ROC: Receiver Operating Characteristic</p>
                </list-item>
                <list-item>
                    <p>PDF: Portable Document Format</p>
                </list-item>
                <list-item>
                    <p>API: Application Programmable Interface</p>
                </list-item>
                <list-item>
                    <p>RCT: Randomized Clinical Trial</p>
                </list-item>
            </list>
        </sec>
    </body>
    <back>
        <sec sec-type="data-availability">
            <title>Data availability</title>
            <sec>
                <title>Source data</title>
                <p>A set of protocols in text format, and accompanying metadata, were used for the training and evaluation of the tool. The majority of protocols used in training were taken from ClinicalTrials.gov, and the source repository of this tool contains instructions on downloading the data. A small number of protocols are not available on the internet and are internal to the Bill &amp; Melinda Gates Foundation.</p>
                <p>The list of 125 protocols used for validation of the risk model were taken from Hutchinson 
                    <italic toggle="yes">et al.</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref-3">3</xref>
                    </sup>. Their dataset is available here 
                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.17605/OSF.IO/3EGKU">https://doi.org/10.17605/OSF.IO/3EGKU</ext-link>.</p>
            </sec>
            <sec>
                <title>Underlying data</title>
                <p>Zenodo: Feature weights for protocol informativeness. 
                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5281/zenodo.7769176">https://doi.org/10.5281/zenodo.7769176</ext-link>
                    <sup>
                        <xref ref-type="bibr" rid="ref-50">50</xref>
                    </sup>.</p>
                <p>This project contains the following underlying data:</p>
                <list list-type="bullet">
                    <list-item>
                        <label>- </label>
                        <p>v1 BMGF DAC Feature Weights Informativeness.xlsx (Responses to SurveyMonkey questionnaire)</p>
                    </list-item>
                </list>
                <p>Data are available under the terms of the 
                    <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/publicdomain/zero/1.0/">Creative Commons Zero "No rights reserved" data waiver</ext-link> (CC0 1.0 Public domain dedication).</p>
            </sec>
        </sec>
        <sec>
            <title>Software availability</title>
            <p>Software available from: 
                <ext-link ext-link-type="uri" xlink:href="https://app.clinicaltrialrisk.org/">https://app.clinicaltrialrisk.org/</ext-link>
            </p>
            <p>Source code: 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/fastdatascience/clinical_trial_risk">https://github.com/fastdatascience/clinical_trial_risk</ext-link>
            </p>
            <p>Archived source code at time of publication: 
                <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5281/zenodo.7633872">https://doi.org/10.5281/zenodo.7633872</ext-link>
                <sup>
                    <xref ref-type="bibr" rid="ref-38">38</xref>
                </sup>
            </p>
            <p>License: 
                <ext-link ext-link-type="uri" xlink:href="https://opensource.org/license/mit/">MIT</ext-link>.</p>
        </sec>
        <sec>
            <title>Acknowledgements</title>
            <p>We would like to acknowledge the help and advice given by Shawn Dolley and Dr. Thea C. Norman.</p>
        </sec>
        <sec>
            <title>Appendix</title>
            <p>Future</p>
            <list list-type="bullet">
                <list-item>
                    <p>If the NCT # is found in the protocol, sample size data can be retrieved from ClinicalTrials.gov API.</p>
                </list-item>
                <list-item>
                    <p>Number of sites</p>
                </list-item>
                <list-item>
                    <p>Primary duration</p>
                </list-item>
                <list-item>
                    <p>Number of primary endpoints</p>
                </list-item>
                <list-item>
                    <p>Prevalence estimate not disclosed</p>
                </list-item>
                <list-item>
                    <p>Is a master protocol or a subset or derivative of a master protocol</p>
                </list-item>
                <list-item>
                    <p>Is part of a platform trial</p>
                </list-item>
                <list-item>
                    <p>Number of visits</p>
                </list-item>
                <list-item>
                    <p>Duration of trial</p>
                </list-item>
                <list-item>
                    <p>Multiple sites in a single country trial</p>
                </list-item>
                <list-item>
                    <p>Number of countries with at least one site</p>
                </list-item>
                <list-item>
                    <p>Uses model-informed drug development</p>
                </list-item>
                <list-item>
                    <p>Tertile of primary duration</p>
                </list-item>
                <list-item>
                    <p>Patient consortium or trial consortium prominently involved</p>
                </list-item>
                <list-item>
                    <p>Is an adaptive design</p>
                </list-item>
                <list-item>
                    <p>Takes place in a hospital</p>
                </list-item>
                <list-item>
                    <p>phase-in-domain</p>
                </list-item>
                <list-item>
                    <p>Recency of protocol vs today's date</p>
                </list-item>
                <list-item>
                    <p>Recent dates in prevalence/burden citations</p>
                </list-item>
                <list-item>
                    <p>Indicates intention or willingness to make changes at interim</p>
                </list-item>
                <list-item>
                    <p>Number of trial sites in entire trial /</p>
                </list-item>
                <list-item>
                    <p>Number of procedures</p>
                </list-item>
                <list-item>
                    <p>Includes analysis of real-world data</p>
                </list-item>
                <list-item>
                    <p>More than 1 drug in the intervention cocktail</p>
                </list-item>
                <list-item>
                    <p>Number of mentions of the word policy</p>
                </list-item>
                <list-item>
                    <p>Case report form pages - all trial</p>
                </list-item>
                <list-item>
                    <p>Case report form pages per variable</p>
                </list-item>
                <list-item>
                    <p>Duration of follow up (in months)</p>
                </list-item>
                <list-item>
                    <p>External sponsorship</p>
                </list-item>
                <list-item>
                    <p>Non-standard endpoint</p>
                </list-item>
                <list-item>
                    <p>Trial uses cluster sampling</p>
                </list-item>
                <list-item>
                    <p>No trial database used</p>
                </list-item>
                <list-item>
                    <p>High number of follow-up appointments</p>
                </list-item>
                <list-item>
                    <p>Strict recruitment criteria (age, medical history)</p>
                </list-item>
                <list-item>
                    <p>Crossover design</p>
                </list-item>
                <list-item>
                    <p>Multiple consents, tests and forms for participants to fill out</p>
                </list-item>
                <list-item>
                    <p>Multiple randomisation steps</p>
                </list-item>
                <list-item>
                    <p>Extended investigational treatment or lengthy regimen until progression</p>
                </list-item>
                <list-item>
                    <p>Low disease prevalence</p>
                </list-item>
                <list-item>
                    <p>Trial takes place in hospital</p>
                </list-item>
                <list-item>
                    <p>Trial is a platform trial</p>
                </list-item>
                <list-item>
                    <p>Trial has sub-studies</p>
                </list-item>
                <list-item>
                    <p>Trial used model informed approach</p>
                </list-item>
                <list-item>
                    <p>Complex age criteria in recruitment</p>
                </list-item>
            </list>
        </sec>
        <ref-list>
            <ref id="ref-1">
                <label>1</label>
                <mixed-citation publication-type="journal">
                    <ext-link ext-link-type="uri" xlink:href="https://www.who.int/observatories/global-observatory-on-health-research-and-development/monitoring/number-of-trial-registrations-by-year-location-disease-and-phase-of-development">https://www.who.int/observatories/global-observatory-on-health-research-and-development/monitoring/number-of-trial-registrations-by-year-location-disease-and-phase-of-development</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-2">
                <label>2</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Yordanov</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Dechartres</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Porcher</surname>
                            <given-names>R</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Avoidable waste of research related to inadequate methods in clinical trials.</article-title>
                    <source>

                        <italic toggle="yes">BMJ.</italic>
</source>
                    <year>2015</year>;<volume>350</volume>:<fpage>h809</fpage>.
                    <pub-id pub-id-type="pmid">25804210</pub-id>
                    <pub-id pub-id-type="doi">10.1136/bmj.h809</pub-id>
                    <pub-id pub-id-type="pmcid">4372296</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-3">
                <label>3</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hutchinson</surname>
                            <given-names>N</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Moyer</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zarin</surname>
                            <given-names>DA</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The proportion of randomized controlled trials that inform clinical practice.</article-title>
                    <source>

                        <italic toggle="yes">eLife.</italic>
</source>
                    <year>2022</year>;<volume>11</volume>:<fpage>e79491</fpage>.
                    <pub-id pub-id-type="pmid">35975784</pub-id>
                    <pub-id pub-id-type="doi">10.7554/eLife.79491</pub-id>
                    <pub-id pub-id-type="pmcid">9427100</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-4">
                <label>4</label>
                <mixed-citation publication-type="journal">
                    <collab>Bill &amp; Melinda Gates Foundation</collab>:
                    <article-title>Uninformative research is the global health crisis you&#x2019;ve never heard of.</article-title>
                    <year> 2023</year>; retrieved 12 Feb 2023.
                    <ext-link ext-link-type="uri" xlink:href="https://www.gatesfoundation.org/ideas/articles/deworm3-clinical-trials-show-the-value-of-informed-research">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-5">
                <label>5</label>
                <mixed-citation publication-type="journal">
                    <collab>World Medical Association</collab>:
                    <article-title>Declaration of Helsinki.</article-title>(1964, rev. 2022).
                    <ext-link ext-link-type="uri" xlink:href="https://www.wma.net/what-we-do/medical-ethics/declaration-of-helsinki/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-6">
                <label>6</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Zarin</surname>
                            <given-names>DA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Goodman</surname>
                            <given-names>SN</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kimmelman</surname>
                            <given-names>J</given-names>
                        </name>
</person-group>:
                    <article-title>Harms from uninformative clinical trials.</article-title>
                    <source>

                        <italic toggle="yes">JAMA.</italic>
</source>
                    <year> 2019</year>;<volume>322</volume>(<issue>9</issue>):<fpage>813</fpage>&#x2013;<lpage>814</lpage>.
                    <pub-id pub-id-type="pmid">31343666</pub-id>
                    <pub-id pub-id-type="doi">10.1001/jama.2019.9892</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-7">
                <label>7</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Grignolo</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Pretorius</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <article-title>Phase III trial failures: Costly, but preventable.</article-title>
                    <source>

                        <italic toggle="yes">Appl Clin Trials.</italic>
</source>
                    <year>2016</year>;<volume>25</volume>(<issue>8</issue>):<fpage>36</fpage>&#x2013;<lpage>42</lpage>.
                    <ext-link ext-link-type="uri" xlink:href="https://www.appliedclinicaltrialsonline.com/view/phase-iii-trial-failures-costly-preventable">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-8">
                <label>8</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hwang</surname>
                            <given-names>TJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Carpenter</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lauffenburger</surname>
                            <given-names>JC</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Failure of investigational drugs in late-stage clinical development and publication of trial results.</article-title>
                    <source>

                        <italic toggle="yes">JAMA Intern Med.</italic>
</source>
                    <year>2016</year>;<volume>176</volume>(<issue>12</issue>):<fpage>1826</fpage>&#x2013;<lpage>1833</lpage>.
                    <pub-id pub-id-type="pmid">27723879</pub-id>
                    <pub-id pub-id-type="doi">10.1001/jamainternmed.2016.6008</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-9">
                <label>9</label>
                <mixed-citation publication-type="journal">
                    <collab>National Institute for Health and Care Research</collab>:
                    <article-title> Clinical Trials Toolkit: Risk Assessment</article-title>. (retrieved 12 Feb 2023).
                    <ext-link ext-link-type="uri" xlink:href="https://www.ct-toolkit.ac.uk/routemap/risk-assessment/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-10">
                <label>10</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Fuller</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <article-title>Developing a study risk assessment tool.</article-title>UKCRF Network study risk assessment tool group,<year> 2017</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://cambridge.crf.nihr.ac.uk/wp-content/uploads/2017/11/Developing-a-study-risk-assessment-tool_Ver-2.pdf">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-11">
                <label>11</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Dressler</surname>
                            <given-names>E</given-names>
                        </name>
</person-group>:
                    <article-title>Clinical Trial Optimization Using R.</article-title>Alex Dmitrienko and Erik Pulkstenis. Boca Raton, FL: Chapman &amp; Hall/CRC Press,<year> 2019</year>;<volume>73</volume>(<issue>2</issue>):<fpage>210</fpage>&#x2013;<lpage>211</lpage>.
                    <pub-id pub-id-type="doi">10.1080/00031305.2019.1603479</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-12">
                <label>12</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>O&#x2019;Hagan</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Stevens</surname>
                            <given-names>JW</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Campbell</surname>
                            <given-names>MJ</given-names>
                        </name>
</person-group>:
                    <article-title>Assurance in clinical trial design.</article-title>
                    <source>

                        <italic toggle="yes">Pharm Stat.</italic>
</source>
                    <year>2005</year>;<volume>4</volume>(<issue>3</issue>):<fpage>187</fpage>&#x2013;<lpage>201</lpage>.
                    <pub-id pub-id-type="doi">10.1002/pst.175</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-13">
                <label>13</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Alhussain</surname>
                            <given-names>ZA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Oakley</surname>
                            <given-names>JE</given-names>
                        </name>
</person-group>:
                    <article-title>Assurance for clinical trial design with normally distributed outcomes: Eliciting uncertainty about variances.</article-title>
                    <source>

                        <italic toggle="yes">Pharm Stat.</italic>
</source>
                    <year>2020</year>;<volume>19</volume>(<issue>6</issue>):<fpage>827</fpage>&#x2013;<lpage>839</lpage>.
                    <pub-id pub-id-type="pmid">32537910</pub-id>
                    <pub-id pub-id-type="doi">10.1002/pst.2040</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-14">
                <label>14</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wang</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Fu</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kulkarni</surname>
                            <given-names>P</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Evaluating and utilizing probability of study success in clinical development.</article-title>
                    <source>

                        <italic toggle="yes">Clin Trials.</italic>
</source>
                    <year>2013</year>;<volume>10</volume>(<issue>3</issue>):<fpage>407</fpage>&#x2013;<lpage>13</lpage>.
                    <pub-id pub-id-type="pmid">23471634</pub-id>
                    <pub-id pub-id-type="doi">10.1177/1740774513478229</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-15">
                <label>15</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Chuang-Stein</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>French</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kirby</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A quantitative approach for making Go/No-Go decisions in drug development.</article-title>
                    <source>

                        <italic toggle="yes">Therapeutic Innovation &amp; Regulatory Science.</italic>
</source>
                    <year>2011</year>;<volume>45</volume>:<fpage>187</fpage>&#x2013;<lpage>202</lpage>.
                    <pub-id pub-id-type="doi">10.1177/009286151104500213</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-16">
                <label>16</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Rosen</surname>
                            <given-names>DH</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Johnson</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kebaabetswe</surname>
                            <given-names>P</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Process maps in clinical trial quality assurance.</article-title>
                    <source>

                        <italic toggle="yes">Clin Trials.</italic>
</source>
                    <year>2009</year>;<volume>6</volume>(<issue>4</issue>):<fpage>373</fpage>&#x2013;<lpage>377</lpage>.
                    <pub-id pub-id-type="pmid">19625329</pub-id>
                    <pub-id pub-id-type="doi">10.1177/1740774509338429</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-17">
                <label>17</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wong</surname>
                            <given-names>CH</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Siah</surname>
                            <given-names>KW</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lo</surname>
                            <given-names>AW</given-names>
                        </name>
</person-group>:
                    <article-title>Estimation of clinical trial success rates and related parameters.</article-title>
                    <source>

                        <italic toggle="yes">Biostatistics.</italic>
</source>
                    <year>2019</year>;<volume>20</volume>(<issue>2</issue>):<fpage>273</fpage>&#x2013;<lpage>286</lpage>.
                    <pub-id pub-id-type="pmid">29394327</pub-id>
                    <pub-id pub-id-type="doi">10.1093/biostatistics/kxx069</pub-id>
                    <pub-id pub-id-type="pmcid">6409418</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-18">
                <label>18</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Getz</surname>
                            <given-names>K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Smith</surname>
                            <given-names>Z</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kravet</surname>
                            <given-names>M</given-names>
                        </name>
</person-group>:
                    <article-title>Protocol design and performance benchmarks by phase and by oncology and rare disease subgroups.</article-title>
                    <source>

                        <italic toggle="yes">Ther Innov Regul Sci.</italic>
</source>
                    <year>2023</year>;<volume>57</volume>(<issue>1</issue>):<fpage>49</fpage>&#x2013;<lpage>56</lpage>.
                    <pub-id pub-id-type="pmid">35960455</pub-id>
                    <pub-id pub-id-type="doi">10.1007/s43441-022-00438-5</pub-id>
                    <pub-id pub-id-type="pmcid">9373886</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-19">
                <label>19</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Amiri-Kordestani</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Fojo</surname>
                            <given-names>T</given-names>
                        </name>
</person-group>:
                    <article-title>Why do phase III clinical trials in oncology fail so often?</article-title>
                    <source>

                        <italic toggle="yes">J Natl Cancer Inst.</italic>
</source>
                    <year>2012</year>;<volume>104</volume>(<issue>8</issue>):<fpage>568</fpage>&#x2013;<lpage>569</lpage>.
                    <pub-id pub-id-type="pmid">22491346</pub-id>
                    <pub-id pub-id-type="doi">10.1093/jnci/djs180</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-20">
                <label>20</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Apgar</surname>
                            <given-names>V</given-names>
                        </name>
</person-group>:
                    <article-title>A proposal for a new method of evaluation of the newborn infant.</article-title>
                    <source>

                        <italic toggle="yes">Curr Res Anesth Analg.</italic>
</source>
                    <year>1953</year>;<volume>32</volume>(<issue>4</issue>):<fpage>260</fpage>&#x2013;<lpage>267</lpage>.
                    <pub-id pub-id-type="pmid">13083014</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-21">
                <label>21</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Calvin-Lamas</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Pita-Fernandez</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Pertega-Diaz</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A complexity scale for clinical trials from the perspective of a pharmacy service.</article-title>
                    <source>

                        <italic toggle="yes">Eur J Hosp Pharm.</italic>
</source>
                    <year>2018</year>;<volume>25</volume>(<issue>5</issue>):<fpage>251</fpage>&#x2013;<lpage>256</lpage>.
                    <pub-id pub-id-type="pmid">31157035</pub-id>
                    <pub-id pub-id-type="doi">10.1136/ejhpharm-2017-001282</pub-id>
                    <pub-id pub-id-type="pmcid">6452378</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-22">
                <label>22</label>
                <mixed-citation publication-type="journal">
                    <collab>Metrics Champion Consortium Protocol Operational Complexity Scoring Tool</collab>:
                    <article-title>Clinical Trial Risk &amp; Performance Management vSummit.</article-title>
                    <year> 2020</year>; retrieved 3 March 2023.
                    <ext-link ext-link-type="uri" xlink:href="https://cms.centerwatch.com/mcc-summit-2020">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-23">
                <label>23</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Forbes</surname>
                            <given-names>MK</given-names>
                        </name>
</person-group>:
                    <article-title>Distilling Constituent Symptoms and Patterns of Repetition in the Diagnostic Criteria of the DSM-5.</article-title>OSF, Web,<year>2023</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://osf.io/r5vqk/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-24">
                <label>24</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Yadav</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kar</surname>
                            <given-names>AK</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kashiramka</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <article-title>Artificial Intelligence Adoption for FinTech Industries-An Exploratory Study About the Disruptions, Antecedents and Consequences.</article-title>The Role of Digital Technologies in Shaping the Post-Pandemic World: 21st IFIP WG 6.11 Conference on e-Business, e-Services and e-Society, I3E 2022, Newcastle upon Tyne, UK, September 13&#x2013;14, 2022, Proceedings. Cham: Springer International Publishing,<year>2022</year>.
                    <pub-id pub-id-type="doi">10.1007/978-3-031-15342-6_1</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-25">
                <label>25</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Chalkidis</surname>
                            <given-names>I</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Fergadiotis</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Malakasiotis</surname>
                            <given-names>P</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>LEGAL-BERT: The muppets straight out of law school.</article-title>arXiv preprint arXiv: 2010.02559,<year>2020</year>.
                    <pub-id pub-id-type="doi">10.48550/arXiv.2010.02559</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-26">
                <label>26</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Matsuda</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ohtomo</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Tomizawa</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Incorporating Unstructured Patient Narratives and Health Insurance Claims Data in Pharmacovigilance: Natural Language Processing Analysis of Patient-Generated Texts About Systemic Lupus Erythematosus.</article-title>
                    <source>

                        <italic toggle="yes">JMIR Public Health Surveill.</italic>
</source>
                    <year>2021</year>;<volume>7</volume>(<issue>6</issue>):<fpage>e29238</fpage>.
                    <pub-id pub-id-type="pmid">34255719</pub-id>
                    <pub-id pub-id-type="doi">10.2196/29238</pub-id>
                    <pub-id pub-id-type="pmcid">8278300</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-27">
                <label>27</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Fernando</surname>
                            <given-names>N</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kumarage</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Thiyaganathan</surname>
                            <given-names>V</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Automated vehicle insurance claims processing using computer vision, natural language processing.</article-title>
                    <source>

                        <italic toggle="yes">2022 22nd International Conference on Advances in ICT for Emerging Regions (ICTer).</italic>
                    </source>IEEE,<year>2022</year>.
                    <pub-id pub-id-type="doi">10.1109/ICTer58063.2022.10024089</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-28">
                <label>28</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Eliot</surname>
                            <given-names>LB</given-names>
                            <suffix>Dr</suffix>
                        </name>
</person-group>:
                    <article-title>Generative pre-trained transformers (GPT-3) pertain to AI in the law. </article-title>
                    <year>2021</year>.
                    <pub-id pub-id-type="doi">10.2139/ssrn.3974887</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-29">
                <label>29</label>
                <mixed-citation publication-type="journal">
                    <article-title>Luminance.</article-title>software, retrieved 27 Feb 2023.
                    <ext-link ext-link-type="uri" xlink:href="https://www.luminance.com/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-30">
                <label>30</label>
                <mixed-citation publication-type="journal">
                    <article-title>Everlaw.</article-title>software, retrieved 27 Feb 2023.
                    <ext-link ext-link-type="uri" xlink:href="https://www.everlaw.com/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-31">
                <label>31</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Luo</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Thompson</surname>
                            <given-names>WK</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Herr</surname>
                            <given-names>TM</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Natural Language Processing for EHR-Based Pharmacovigilance: A Structured Review.</article-title>
                    <source>

                        <italic toggle="yes">Drug Saf.</italic>
</source>
                    <year>2017</year>;<volume>40</volume>(<issue>11</issue>):<fpage>1075</fpage>&#x2013;<lpage>1089</lpage>.
                    <pub-id pub-id-type="pmid">28643174</pub-id>
                    <pub-id pub-id-type="doi">10.1007/s40264-017-0558-6</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-32">
                <label>32</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Dutton</surname>
                            <given-names>G</given-names>
                        </name>
</person-group>:
                    <article-title>Big Pharma Reads Big Data, Sees Big Picture: Linguamatics Brings Natural Language Processing to Non-Experts, Expediting Drug Development.</article-title>
                    <source>

                        <italic toggle="yes">Genet Eng Biotechnol News.</italic>
</source>
                    <year>2018</year>;<volume>38</volume>(<issue>1</issue>):<fpage>8</fpage>&#x2013;<lpage>9</lpage>.
                    <pub-id pub-id-type="doi">10.1089/gen.38.01.05</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-33">
                <label>33</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Viswanath</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Fennell</surname>
                            <given-names>JW</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Balar</surname>
                            <given-names>K</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>An industrial approach to using artificial intelligence and natural language processing for accelerated document preparation in drug development.</article-title>
                    <source>

                        <italic toggle="yes">J Pharm Innov.</italic>
</source>
                    <year>2021</year>;<volume>16</volume>:<fpage>302</fpage>&#x2013;<lpage>316</lpage>.
                    <pub-id pub-id-type="doi">10.1007/s12247-020-09449-x</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-34">
                <label>34</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Richard</surname>
                            <given-names>E</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Reddy</surname>
                            <given-names>B</given-names>
                        </name>
</person-group>:
                    <article-title>Text classification for clinical trial operations: evaluation and comparison of natural language processing techniques.</article-title>
                    <source>

                        <italic toggle="yes">Ther Innov Regul Sci.</italic>
</source>
                    <year>2021</year>;<volume>55</volume>(<issue>2</issue>):<fpage>447</fpage>&#x2013;<lpage>453</lpage>.
                    <pub-id pub-id-type="pmid">33125616</pub-id>
                    <pub-id pub-id-type="doi">10.1007/s43441-020-00236-x</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-35">
                <label>35</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Chen</surname>
                            <given-names>X</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Xie</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Cheng</surname>
                            <given-names>G</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Trends and features of the applications of natural language processing techniques for clinical trials text analysis.</article-title>
                    <source>

                        <italic toggle="yes">Appl Sci.</italic>
</source>
                    <year>2020</year>;<volume>10</volume>(<issue>6</issue>):<fpage>2157</fpage>.
                    <pub-id pub-id-type="doi">10.3390/app10062157</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-36">
                <label>36</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Fogel</surname>
                            <given-names>DB</given-names>
                        </name>
</person-group>:
                    <article-title>Factors associated with clinical trials that fail and opportunities for improving the likelihood of success: A review.</article-title>
                    <source>

                        <italic toggle="yes">Contemp Clin Trials Commun.</italic>
</source>
                    <year>2018</year>;<volume>11</volume>:<fpage>156</fpage>&#x2013;<lpage>164</lpage>.
                    <pub-id pub-id-type="pmid">30112460</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.conctc.2018.08.001</pub-id>
                    <pub-id pub-id-type="pmcid">6092479</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-37">
                <label>37</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Chang</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Liu</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Mitchem</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Understanding Common Key Indicators of Successful and Unsuccessful Cancer Drug Trials Using A Contrast Mining Framework on ClinicalTrials.gov.</article-title>
                    <source>

                        <italic toggle="yes">J Biomed Inform.</italic>
</source>
                    <year>2023</year>;<volume>139</volume>:<fpage>104321</fpage>.
                    <pub-id pub-id-type="pmid">36806327</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.jbi.2023.104321</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-38">
                <label>38</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wood</surname>
                            <given-names>TA</given-names>
                        </name>
</person-group>:
                    <article-title>Clinical Trial Risk Tool (0.1). </article-title>Zenodo. [Code],<year>2023</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://www.doi.org/10.5281/zenodo.7633872">http://www.doi.org/10.5281/zenodo.7633872</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-39">
                <label>39</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Van Rossum</surname>
                            <given-names>G</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Drake</surname>
                            <given-names>F</given-names>
                        </name>
</person-group>:
                    <article-title>Python 3 Reference Manual.</article-title>CreateSpace,<year> 2009</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://www.bibsonomy.org/bibtex/29b5f994afefacb2b63709afbc4aaad55/msteininger">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-40">
                <label>40</label>
                <mixed-citation publication-type="journal">
                    <collab>Plotly Technologies Inc</collab>:
                    <article-title> Collaborative data science.</article-title>
                    <year> 2015</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://plotly.com/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-41">
                <label>41</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Bird</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Klein</surname>
                            <given-names>E</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Loper</surname>
                            <given-names>E</given-names>
                        </name>
</person-group>:
                    <article-title>Natural language processing with Python: analyzing text with the natural language toolkit.</article-title>O&#x2019;Reilly Media, Inc.<year> 2009</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://books.google.co.in/books?hl=en&amp;lr=&amp;id=KGIbfiiP1i4C&amp;oi=fnd&amp;pg=PR5&amp;dq=9.%09Bird,+S.,+Klein,+E.,+%26+Loper,+E.+(2009).+Natural+language+processing+with+Python:+analyzing+text+with+the+natural+language+toolkit.+%22+O%E2%80%99Reilly+Media,+Inc.%22&amp;ots=Y4GfB8HDL-&amp;sig=QQDHzE28QQP0CXSk1JucuEJaSw8&amp;redir_esc=y#v=onepage&amp;q=9.%09Bird%2C%20S.%2C%20Klein%2C%20E.%2C%20%26%20Loper%2C%20E.%20(2009).%20Natural%20language%20processing%20with%20Python%3A%20analyzing%20text%20with%20the%20natural%20language%20toolkit.%20%22%20O%E2%80%99Reilly%20Media%2C%20Inc.%22&amp;f=false">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-42">
                <label>42</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Honnibal</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Montani</surname>
                            <given-names>I</given-names>
                        </name>
</person-group>:
                    <article-title>spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing.</article-title>
                    <year> 2017</year>.</mixed-citation>
            </ref>
            <ref id="ref-43">
                <label>43</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Pedregosa</surname>
                            <given-names>F</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Varoquaux</surname>
                            <given-names>G</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gramfort</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Scikit-learn: Machine Learning in Python.</article-title>JMLR.<year> 2011</year>;<volume>12</volume>:<fpage>2825</fpage>&#x2013;<lpage>2830</lpage>.
                    <ext-link ext-link-type="uri" xlink:href="https://www.jmlr.org/papers/volume12/pedregosa11a/pedregosa11a.pdf">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-44">
                <label>44</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Merkel</surname>
                            <given-names>D</given-names>
                        </name>
</person-group>:
                    <article-title>Docker: lightweight Linux containers for consistent development and deployment.</article-title>Linux Journal,<year> 2014</year>;<volume>2014</volume>(<issue>239</issue>):<fpage>2</fpage>.
                    <ext-link ext-link-type="uri" xlink:href="https://www.seltzer.com/margo/teaching/CS508.19/papers/merkel14.pdf">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-45">
                <label>45</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Mattmann</surname>
                            <given-names>CA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zitting</surname>
                            <given-names>JL</given-names>
                        </name>
</person-group>:
                    <article-title>Tika in action.</article-title>
                    <year> 2012</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://sisis.rz.htw-berlin.de/inh2012/12422815.pdf">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-46">
                <label>46</label>
                <mixed-citation publication-type="journal">
                    <article-title>SurveyMonkey.</article-title>software, retrieved 25 March 2022.
                    <ext-link ext-link-type="uri" xlink:href="https://www.surveymonkey.com/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-47">
                <label>47</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Tasneem</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Aberle</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ananth</surname>
                            <given-names>H</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The database for aggregate analysis of ClinicalTrials.gov (AACT) and subsequent regrouping by clinical specialty.</article-title>
                    <source>

                        <italic toggle="yes">PLoS One.</italic>
</source>Database dump taken from,<year> 2012</year>;<volume>7</volume>(<issue>3</issue>):<fpage>e33677</fpage>.
                    <pub-id pub-id-type="pmid">22438982</pub-id>
                    <pub-id pub-id-type="doi">10.1371/journal.pone.0033677</pub-id>
                    <pub-id pub-id-type="pmcid">3306288</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-48">
                <label>48</label>
                <mixed-citation publication-type="journal">
                    <collab>PostgreSQL Global Development Group</collab>:
                    <article-title> PostgreSQL 12.13.</article-title>
                    <year> 2022</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://www.postgresql.org/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-49">
                <label>49</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Sharp</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Corp</surname>
                            <given-names>D</given-names>
                        </name>
</person-group>:
                    <article-title>A Single-Dose Clinical Trial to Study the Safety, Tolerability, Pharmacokinetics, and Anti-Retroviral Activity of MK-8591 Monotherapy in Anti-Retroviral Therapy (ART)-Na&#x00ef;ve, HIV-1 Infected Patients.</article-title>In:
                    <italic toggle="yes">ClinicalTrials.gov.</italic>[cited 21 Dec 2016].
                    <ext-link ext-link-type="uri" xlink:href="https://clinicaltrials.gov/ct2/show/NCT02217904">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-50">
                <label>50</label>
                <mixed-citation publication-type="data">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wood</surname>
                            <given-names>TA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Douglas</surname>
                            <given-names>M</given-names>
                        </name>
</person-group>:
                    <data-title>Feature weights for protocol informativeness [Data set].</data-title>
                    <source>
                        <italic toggle="yes">Zenodo</italic>
                    </source>.<year>2023</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://www.doi.org/10.5281/zenodo.7769176">http://www.doi.org/10.5281/zenodo.7769176</ext-link>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report34936">
        <front-stub>
            <article-id pub-id-type="doi">10.21956/gatesopenres.15729.r34936</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Idnay</surname>
                        <given-names>Betina</given-names>
                    </name>
                    <xref ref-type="aff" rid="r34936a1">1</xref>
                    <role>Referee</role>
                </contrib>
                <aff id="r34936a1">
                    <label>1</label>Columbia University, New York, New York, USA</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>29</day>
                <month>9</month>
                <year>2023</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2023 Idnay B</copyright-statement>
                <copyright-year>2023</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport34936" related-article-type="peer-reviewed-article" xlink:href="10.12688/gatesopenres.14416.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>In addressing the prevalent issue of trial uninformativeness, the manuscript introduces a browser-based, natural language processing tool designed to identify and quantify the risk of uninformativeness in clinical trials. The tool, initially focusing on human immunodeficiency virus (HIV) and tuberculosis (TB) trials, parses trial protocols, extracts key design features, and inputs them into a risk model, demonstrating high accuracy in identifying various trial conditions and features. Users can interactively upload, visualize, and correct the tool&#x2019;s interpretations. The study validates the tool&#x2019;s efficacy using manually tagged datasets and a large dataset from ClinicalTrials.gov, showcasing promising results, such as 100% Area Under Curve (AUC), in identifying the condition of a trial. The tool, open-source and accessible at https://app.clinicaltrialrisk.org, offers significant potential for future expansion to other pathologies and advancement in the field.</p>
            <p> </p>
            <p> The manuscript presents a distinctive and essential contribution to addressing the challenge of trial uninformativeness, a pervasive issue impacting the quality of evidence generated from clinical trials. By focusing on the early identification of risks of uninformativeness in trial protocols, the authors are addressing a critical gap in ensuring the optimal allocation of resources and efforts toward generating high-quality evidence. Developing a browser-based tool using natural language processing is particularly original, as it enables the automated extraction and analysis of key features from unstructured text documents, which is a significant advancement in this field. Furthermore, the tool&#x2019;s open-source nature and accessibility contribute to its significance by facilitating widespread adoption and adaptation, ultimately aiming to elevate the quality of clinical trials and the evidence they produce. This work is thus both timely and imperative, aligning with the pressing need for high-quality evidence in clinical, policy, and research decisions.</p>
            <p> </p>
            <p> 
                <underline>
                    <bold>Major comments:</bold>
                </underline>
            </p>
            <p> </p>
            <p> a. 
                <bold>Introduction</bold>: Several aspects could be further clarified to enhance the reader&#x2019;s understanding and the overall impact of the manuscript: 
                <list list-type="bullet">
                    <list-item>
                        <p>
                            <bold>Specificity on Commercial Factors</bold>: The manuscript mentions that commercial factors can lead to trials ending uninformatively. However, it would benefit the reader if this point could be elaborated on more specifically. Providing concrete examples or detailing how commercial factors contribute to trial uninformativeness would add depth to the discussion and enhance the overall comprehension of the issue.</p>
                    </list-item>
                    <list-item>
                        <p>
                            <bold>Clarification on Trial Complexity and Apgar Score</bold>: The paragraph discussing trial complexity and its relation to uninformativeness was somewhat perplexing. A more thorough explanation of how trial complexity contributes to uninformativeness would be valuable for the reader. Additionally, mentioning the Apgar score seemed tangential and did not directly contribute to the main discussion. A reconsideration of its inclusion or a more explicit connection to the main topic might be warranted to maintain focus and clarity.</p>
                    </list-item>
                    <list-item>
                        <p>
                            <bold>Rationale for Focus on HIV and TB</bold>: The manuscript specifies that the tool initially focuses on HIV and tuberculosis trials. It would be enlightening to understand the justification for this specific focus. Clarifying whether this decision was due to the prevalence of these diseases, the availability of data, or other reasons would provide valuable context and strengthen the significance of the work.</p>
                    </list-item>
                </list> b. 
                <bold>Methods</bold>: several aspects could benefit from further clarification and detail to ensure thorough understanding and reproducibility: 
                <list list-type="bullet">
                    <list-item>
                        <p>
                            <bold>Feature Selection and Subject Matter Expert Analysis</bold>: The feature selection exercise needs more explicit detailing. While the features are visualized in a table, mentioning the number and selection criteria in the text would enhance clarity. The analysis of subject matter expert ranking and the criteria for choosing these experts must be made more explicit. Were the experts specialized in HIV/TB or knowledgeable in clinical trial protocol development? Additionally, Table 1 presents identical data in separate categories (i.e., &#x201c;tertile_of_sample_size by domain by phase&#x201d; and &#x201c;Tertile of number of sites by domain by phase&#x201d;); an explanation for this separation is necessary. Including results in this section seems misplaced and would be more appropriately discussed in the Results section.</p>
                    </list-item>
                    <list-item>
                        <p>
                            <bold>Datasets for Training and Validation</bold>: The selection process for the protocols used in training and validation is ambiguous. It is unclear whether the manual dataset included protocols from ClinicalTrials.gov and if there was a possibility of duplication. Clarifying whether the datasets were stratified for training and validation or if the manual dataset was for training and AACT for validation is essential for understanding the validation process. How did they test the models?</p>
                    </list-item>
                    <list-item>
                        <p>
                            <bold>Annotation of ClinicalTrials.gov Dataset</bold>: It needs to be clarified whether the ClinicalTrials.gov dataset was annotated. Explicit mention of the annotation status of this dataset would remove ambiguity and aid in understanding the methodology.</p>
                    </list-item>
                    <list-item>
                        <p>
                            <bold>Annotation Process</bold>: The statement, &#x201c;The number of protocols manually annotated per parameter varied between 100 and 300,&#x201d; is somewhat confusing. More clarity on the number of protocols annotated, the identity and number of annotators, and the annotation process is necessary for reproducibility. Providing the annotation guideline as supplementary material would be highly beneficial, if not essential, for reproducibility.</p>
                    </list-item>
                    <list-item>
                        <p>
                            <bold>Machine Learning Models</bold>: A detailed breakdown of the machine learning models used, and the rationale behind the chosen score cutoff is needed. Exploring why other ML models were not considered would provide insight into the model selection process.</p>
                    </list-item>
                </list> c. 
                <bold>Results</bold>: there are a couple of areas where further information and restructuring would enhance clarity and coherence: 
                <list list-type="bullet">
                    <list-item>
                        <p>
                            <bold>User Testing Method and Feedback</bold>: This section mentions user testing but needs to detail the method used in the Methods section, making it challenging to understand the context of the results. It would benefit the reader to gain insight into the user testing methods, the feedback received, the subsequent results, and any implications or changes made based on this feedback.</p>
                    </list-item>
                    <list-item>
                        <p>
                            <bold>Placement of Validation Content</bold>: The subsections titled "Validation,&#x201d; "Validation scores for the manual dataset,&#x201d; "Validation scores for ClinicalTrials.gov dataset,&#x201d; and "Validation scores for Hutchinson et al. dataset" seem to contain content more suited to the Methods section as they describe the methodology used for validation. It would enhance the clarity and flow of the manuscript if this content were relocated to the Methods section, with the Results section expounded upon to focus solely on discussing and interpreting the validation outcomes rather than detailing the method used. Presenting tables and highlighting the key findings would align with this section's purpose and contribute to a more balanced and informative manuscript.</p>
                    </list-item>
                </list> d. 
                <bold>Discussion and Conclusion</bold>: Several elements need addressing to ensure clarity, coherence, and comprehensive representation of the study&#x2019;s findings and implications: 
                <list list-type="bullet">
                    <list-item>
                        <p>
                            <bold>Claim on Tool&#x2019;s Intuitiveness</bold>: The manuscript concludes that the use of the tool is intuitive; however, there is a noticeable absence of supporting results regarding the tool's intuitiveness and usefulness within the Results section. Providing concrete findings or user feedback to substantiate this claim would bolster the credibility of this assertion and give the reader a clearer understanding of the tool's practicality and user-friendliness.</p>
                    </list-item>
                    <list-item>
                        <p>
                            <bold>Inclusion of Scenarios</bold>: The inclusion of scenarios illustrating the potential applications and usefulness of the tool is a valuable addition to the manuscript. It helps contextualize the tool's practicality and provides insight into its real-world implications, contributing to a more rounded and impactful discussion.</p>
                    </list-item>
                    <list-item>
                        <p>
                            <bold>Confidentiality of Clinical Trial Protocols</bold>: Clinical trial protocols often contain confidential information, especially for Scenario 3. The manuscript must address how the web application ensures the privacy and security of the uploaded protocols. Clarifying this aspect is crucial for user trust and adherence to data protection regulations.</p>
                    </list-item>
                    <list-item>
                        <p>
                            <bold>Placement of Results</bold>: Some results appear in the Discussion and Conclusion sections. For clarity and structure, it would be beneficial to relocate these findings to the Results section, ensuring a clear delineation between the presentation of results and their interpretation and implications.</p>
                    </list-item>
                    <list-item>
                        <p>
                            <bold>Limitations Section</bold>: The manuscript would greatly benefit from a more explicit and organized discussion of the study's limitations and the web application. A dedicated subsection detailing the limitations would provide the reader with a balanced view of the research and help contextualize the findings, allowing for a more nuanced understanding of the study's implications and areas for future improvement.</p>
                    </list-item>
                </list> e. 
                <bold>Clarity and organisation</bold>: The manuscript, while addressing a topic of significant importance and utility, presents several areas where clarity and organization could be enhanced to convey the research&#x2019;s value and findings better: 
                <list list-type="bullet">
                    <list-item>
                        <p>
                            <bold>Defined Sections and Appropriate Placement</bold>: There is a noticeable overlap of content throughout the manuscript, with methods detailed in the results and discussion sections and results interspersed within the discussion. A more precise delineation and structuring of content according to the designated sections would aid in the reader&#x2019;s comprehension and the logical flow of the manuscript.</p>
                    </list-item>
                    <list-item>
                        <p>
                            <bold>Detailing and Specificity</bold>: Several instances across the introduction, methods, and results sections indicate a need for more specificity and detailing. Clarifications on commercial factors leading to trial uninformativeness, confidentiality measures, feature selection, user testing methods, and annotation processes would contribute to a more thorough and transparent representation of the research conducted.</p>
                    </list-item>
                    <list-item>
                        <p>
                            <bold>User-Friendly Presentation</bold>: Given the technical nature of the tool developed, ensuring a user-friendly presentation of information, such as clear and well-ordered tables, is essential. Addressing inconsistencies in table presentation and providing narrative explanations for significance and implications would enhance the manuscript&#x2019;s accessibility and impact.</p>
                    </list-item>
                </list> 
                <underline>
                    <bold>Minor comments:</bold>
                </underline> 
                <list list-type="bullet">
                    <list-item>
                        <p>Abstract: The sentence has a grammatical error: &#x201c;The tool is focused HIV and tuberculosis trials but could be extended to more pathologies in future.&#x201d;</p>
                    </list-item>
                    <list-item>
                        <p>Introduction: The acronyms "AI," "HIV," and "PDF" were not spelled out upon their first mention in the text. To ensure clarity and accessibility for all readers, consider providing the full forms of these acronyms at their initial appearance.</p>
                    </list-item>
                    <list-item>
                        <p>Methods: Including a flow diagram of the methods would significantly enhance the reader&#x2019;s understanding of the research process and contribute to the overall clarity of the manuscript. The acronym "AACT" was not spelled out upon its first mention; provide the complete form of this acronym; please double-check other abbreviations.</p>
                    </list-item>
                    <list-item>
                        <p>Figures and Tables: For ease of reference and a smoother reading experience, it is suggested that tables and figures be presented immediately after they are referenced in the text. This adjustment will prevent readers from having to search through the manuscript and will contribute to a more organized and user-friendly presentation.</p>
                    </list-item>
                </list> In conclusion, the manuscript under review presents a commendable effort in addressing the crucial issue of clinical trial uninformativeness through the development of a novel tool. The application of natural language processing in identifying and quantifying risks associated with trial protocols exhibits significant potential to enhance the quality of clinical research. However, a prominent area of concern is the lack of adequate user testing reported in the manuscript. While the authors have undertaken meticulous training and validation of the tool, the inclusion of thorough user testing is paramount to ensuring the tool&#x2019;s efficacy, user-friendliness, and practical applicability in real-world scenarios. Further, it is imperative that the authors undertake comprehensive testing of the tool, beyond training and validation, to affirm its reliability and robustness. Addressing these aspects, along with the aforementioned comments on clarity, organization, and minor adjustments, will significantly contribute to the manuscript&#x2019;s coherence, impact, and overall quality, ultimately aiding in the realization of its potential to improve the landscape of clinical trials.</p>
            <p>Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?</p>
            <p>No</p>
            <p>Is the rationale for developing the new software tool clearly explained?</p>
            <p>Yes</p>
            <p>Is the description of the software tool technically sound?</p>
            <p>Yes</p>
            <p>Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?</p>
            <p>No</p>
            <p>Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?</p>
            <p>No</p>
            <p>Reviewer Expertise:</p>
            <p>Clinical research in neurological disorders, primarily Alzheimer's and related dementias, clinical research informatics, NLP systems adoption to improve clinical research, clinical research recruitment, protocol development</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
    </sub-article>
</article>
