<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="data-paper" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">Gates Open Res</journal-id>
            <journal-title-group>
                <journal-title>Gates Open Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2572-4754</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/gatesopenres.12772.1</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Data Note</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                    <subj-group>
                        <subject>Applied Microbiology</subject>
                    </subj-group>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>The Microbe Directory: An annotated, searchable inventory of microbes&#x2019; characteristics</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 1; peer review: 1 approved, 3 approved with reservations]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="no" equal-contrib="yes">
                    <name>
                        <surname>Shaaban</surname>
                        <given-names>Heba</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a2">2</xref>
                    <xref ref-type="aff" rid="a3">3</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no" equal-contrib="yes">
                    <name>
                        <surname>Westfall</surname>
                        <given-names>David A.</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-4980-6238</uri>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a2">2</xref>
                    <xref ref-type="aff" rid="a4">4</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Mohammad</surname>
                        <given-names>Rawhi</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-4292-3159</uri>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a2">2</xref>
                    <xref ref-type="aff" rid="a5">5</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Danko</surname>
                        <given-names>David</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Visualization</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Bezdan</surname>
                        <given-names>Daniela</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Afshinnekoo</surname>
                        <given-names>Ebrahim</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a2">2</xref>
                    <xref ref-type="aff" rid="a6">6</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Segata</surname>
                        <given-names>Nicola</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a7">7</xref>
                </contrib>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Mason</surname>
                        <given-names>Christopher E.</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-1850-1642</uri>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a2">2</xref>
                    <xref ref-type="aff" rid="a8">8</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, 10065, USA</aff>
                <aff id="a2">
                    <label>2</label>The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, 10065, USA</aff>
                <aff id="a3">
                    <label>3</label>CUNY Hunter College, New York, NY, 10065, USA</aff>
                <aff id="a4">
                    <label>4</label>School of Medicine, Weill Cornell Medicine, New York, NY, 10065, USA</aff>
                <aff id="a5">
                    <label>5</label>CUNY College of Staten Island, Staten Island, NY, 10314, USA</aff>
                <aff id="a6">
                    <label>6</label>School of Medicine, New York Medical College, Valhalla, NY, 10595, USA</aff>
                <aff id="a7">
                    <label>7</label>Centre for Integrative Biology, University of Trento, Trento, 38122, Italy</aff>
                <aff id="a8">
                    <label>8</label>The Feil Family Brain and Mind Research Institute, New York, NY, 10065, USA</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:chm2042@med.cornell.edu">chm2042@med.cornell.edu</email>
                </corresp>
                <fn id="fn1">
                    <p>*Equal contributors</p>
                </fn>
                <fn fn-type="conflict">
                    <p>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>5</day>
                <month>1</month>
                <year>2018</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2018</year>
            </pub-date>
            <volume>2</volume>
            <elocation-id>3</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>21</day>
                    <month>12</month>
                    <year>2017</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2018 Shaaban H et al.</copyright-statement>
                <copyright-year>2018</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://gatesopenresearch.org/articles/2-3/pdf"/>
            <abstract>
                <p>The Microbe Directory is a collective research effort to profile and annotate more than 7,500 unique microbial species from the MetaPhlAn2 database that includes bacteria, archaea, viruses, fungi, and protozoa. By collecting and summarizing data on various microbes&#x2019; characteristics, the project comprises a database that can be used downstream of large-scale metagenomic taxonomic analyses, allowing one to interpret and explore their taxonomic classifications to have a deeper understanding of the microbial ecosystem they are studying. Such characteristics include, but are not limited to: optimal pH, optimal temperature, Gram stain, biofilm-formation, spore-formation, antimicrobial resistance, and COGEM class risk rating. The database has been manually curated by trained student-researchers from Weill Cornell Medicine and CUNY&#x2014;Hunter College, and its analysis remains an ongoing effort with open-source capabilities so others can contribute. Available in SQL, JSON, and CSV (i.e. Excel) formats, the Microbe Directory can be queried for the aforementioned parameters by a microorganism&#x2019;s taxonomy. In addition to the raw database, The Microbe Directory has an online counterpart (
                    <ext-link ext-link-type="uri" xlink:href="https://microbe.directory/">https://microbe.directory/</ext-link>) that provides a user-friendly interface for storage, retrieval, and analysis into which other microbial database projects could be incorporated. The Microbe Directory was primarily designed to serve as a resource for researchers conducting metagenomic analyses, but its online web interface should also prove useful to any individual who wishes to learn more about any particular microbe.</p>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>Microbe</kwd>
                <kwd>Metagenomics</kwd>
                <kwd>Microbiome</kwd>
                <kwd>Next-Generation Sequencing</kwd>
                <kwd>Metadata</kwd>
                <kwd>Database</kwd>
            </kwd-group>
            <funding-group>
                <award-group id="fund-1" xlink:href="http://dx.doi.org/10.13039/100007204">
                    <funding-source>Vallee Foundation</funding-source>
                </award-group>
                <award-group id="fund-2" xlink:href="http://dx.doi.org/10.13039/100006984">
                    <funding-source>Irma T. Hirschl Trust</funding-source>
                </award-group>
                <award-group id="fund-3" xlink:href="http://dx.doi.org/10.13039/100000879">
                    <funding-source>Alfred P. Sloan Foundation</funding-source>
                    <award-id>G-2015-13964</award-id>
                </award-group>
                <award-group id="fund-4">
                    <funding-source>The Pershing Square Sohn Cancer Research Alliance</funding-source>
                </award-group>
                <award-group id="fund-5">
                    <funding-source>WorldQuant Foundation</funding-source>
                </award-group>
                <award-group id="fund-6" xlink:href="http://dx.doi.org/10.13039/100000865">
                    <funding-source>Gates Foundation</funding-source>
                    <award-id>OPP1151054</award-id>
                </award-group>
                <award-group id="fund-7" xlink:href="http://dx.doi.org/10.13039/100000104">
                    <funding-source>National Aeronautics and Space Administration</funding-source>
                    <award-id>NNX14AH50G</award-id>
                    <award-id>NNX17AB26G</award-id>
                </award-group>
                <award-group id="fund-8" xlink:href="http://dx.doi.org/10.13039/100000002">
                    <funding-source>National Institutes of Health</funding-source>
                    <award-id>R25EB020393</award-id>
                    <award-id>R01NS076465</award-id>
                    <award-id>R21AI129851</award-id>
                </award-group>
                <award-group id="fund-9">
                    <funding-source>Monique Weill-Caulier Charitable Trust</funding-source>
                </award-group>
                <funding-statement>We would like to thank the Epigenomics Core Facility at Weill Cornell Medicine, funding from the Irma T. Hirschl and Monique Weill-Caulier Charitable Trusts, Bert L and N Kuggie Vallee Foundation, the WorldQuant Foundation, The Pershing Square Sohn Cancer Research Alliance, NASA (NNX14AH50G, NNX17AB26G), the National Institutes of Health (R25EB020393, R01NS076465, R21AI129851), the Gates Foundation (OPP1151054), and the Alfred P. Sloan Foundation (G-2015-13964).  </funding-statement>
                <funding-statement>
                    <italic>The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</italic>
                </funding-statement>
            </funding-group>
        </article-meta>
    </front>
    <body>
        <sec sec-type="intro">
            <title>Introduction</title>
            <p>With the advent of next-generation sequencing technologies, there has been a surge of metagenomic and microbiome studies in the last decade, ranging from studying the human microbiome
                <sup>
                    <xref ref-type="bibr" rid="ref-1">1</xref>
                </sup> to the environment (water and soil)
                <sup>
                    <xref ref-type="bibr" rid="ref-2">2</xref>&#x2013;
                    <xref ref-type="bibr" rid="ref-5">5</xref>
                </sup>, and city surfaces
                <sup>
                    <xref ref-type="bibr" rid="ref-6">6</xref>,
                    <xref ref-type="bibr" rid="ref-7">7</xref>
                </sup>. All these studies depend heavily on bioinformatics analyses that translate the sequences they uncover to taxonomic profiles found in their samples. However, an immediate challenge from taxonomoic outputs is the interpretation of the data. Learning more about a microorganism&#x2019;s properties, such as optimal pH and temperatures, presence in the human microbiome, ability to form spores or biofilms, and antimicrobial sensitivity, amongst many others, are key to understanding the biochemical and ecological dynamics of the microbiomes that can be found. Despite the presence of several databases that include some of this information, such as 
                <ext-link ext-link-type="uri" xlink:href="https://microbewiki.kenyon.edu/">MicrobeWiki</ext-link>, 
                <ext-link ext-link-type="uri" xlink:href="https://www.patricbrc.org/">PATRIC</ext-link>, 
                <ext-link ext-link-type="uri" xlink:href="https://ardb.cbcb.umd.edu/">ARDB</ext-link>, and 
                <ext-link ext-link-type="uri" xlink:href="https://img.jgi.doe.gov/">IMG-JGI</ext-link>, these databases are either incomplete or focus on a specific characteristic (e.g. antimicrobial resistance). The Microbe Directory seeks to fill this gap with an online tool that aggregates these data and expands their annotations, which thus provides a useful tool for exploration of functional, medical, or biological traits found in any microbial community.</p>
        </sec>
        <sec sec-type="methods">
            <title>Methods</title>
            <sec>
                <title>MetaPhlAn2 list of species</title>
                <p>The list of distinct species that was subject to curation was generated from the 
                    <ext-link ext-link-type="uri" xlink:href="https://bitbucket.org/biobakery/metaphlan2">MetaPhlAn2 database</ext-link>, a computational tool for profiling the composition of microbial communities from sequencing data. MetaPhlAn2 works by relying on unique clade-specific marker genes identified from more than 16,000 reference genomes from NCBI and RefSeq
                    <sup>
                        <xref ref-type="bibr" rid="ref-8">8</xref>
                    </sup>. It provides a 7-level (kingdom to strain) consistent taxonomic characterization of known domains of life and currently has identified &gt;7,500 unique species in its database. This database was specifically chosen for the Microbe Directory due to its prevalent usage in microbiome and metagenomic studies
                    <sup>
                        <xref ref-type="bibr" rid="ref-9">9</xref>
                    </sup>, allowing researchers to directly integrate the Microbe Directory into their research to learn more from the MetaPhlAn output
                    <sup>
                        <xref ref-type="bibr" rid="ref-10">10</xref>
                    </sup>. Furthermore, there is a built-in capability for researchers to contribute and expand the Microbe Directory beyond the species currently curated in the database (see 
                    <italic toggle="yes">Using the Microbe Directory</italic>).</p>
            </sec>
            <sec>
                <title>Selection and training of researchers</title>
                <p>The Microbe Directory database was curated by a team of trained undergraduate, graduate, and medical students from City University of New York (CUNY) Hunter College, Macaulay Honors College, and Weill Cornell Medicine (see full list of students in 
                    <italic toggle="yes">Acknowledgements</italic>). The student-researchers were selected from a pool of applicants and underwent a three-hour training session that a) explained the objective of the research project and the desired outcome, b) provided a detailed and thorough explanation of each of the parameters that were the subject of research, and c) provided clear instructions on how to curate the internet for the parameters for each species. They were also given a tutorial on how to conduct the research for a sample of 10 species. They were given a list of annotation-based websites to assist their research, but they were not limited to using only those sites. (see 
                    <italic toggle="yes">Annotation Tutorial and Guidelines</italic> in 
                    <xref ref-type="other" rid="SF1">Supplementary File 1</xref>).</p>
                <p>After every entry, students inserted citation links to the sources they utilized for the information they inputted. Each student-researcher independently worked 4&#x2013;5 hours per week to curate parameters for 10 species per week, for a total of 20 weeks. To ensure that students were not making errors during curation, the first three weeks of the project were heavily monitored and entries were manually checked for inaccuracies by the project leads. After the first 3-week trial, only two randomly selected species were checked manually from every submitted entry of 10 species per week, per student. Considerable error rates (3 or more incorrect annotations out of 10 being the threshold) consequently meant the student had to resubmit the entire set of 10 species the following week. While there is always the potential for human error in manually curated databases, the Microbe Directory has a feature where anyone can make an account and submit edits and changes to the information hosted in the database. Thus, there is potential for the Microbe Directory to continue to grow and expand, but also ensure minimal errors in its database.</p>
            </sec>
            <sec>
                <title>Building the microbe directory</title>
                <p>
                    <xref ref-type="table" rid="T1">Table 1</xref> defines the various microbial characteristics and categories of information that were curated to build the Microbe Directory. The parameters chosen were strictly objective features of microbes that are important to help interpret and understand the findings and context of whatever microbiome a researcher is studying. There is built-in potential to expand the Microbe Directory and for researchers to contribute more characteristics of these microbes, including native location, industrial applications, and associated symptoms/diseases; these features were considered to be included in the Microbe Directory but due to their subjective nature were omitted out to maintain proper quality control outlined above. Several databases were used to collect this information, including 
                    <ext-link ext-link-type="uri" xlink:href="http://www.cogem.net/index.cfm/en">COGEM</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://microbewiki.kenyon.edu/index.php/MicrobeWiki">MicrobeWiki</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="http://wishart.biology.ualberta.ca/BacMap/">BacMap</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://www.lgcstandards-atcc.org/?geo_country=gb">ATCC</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://www.patricbrc.org/">PATRIC</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://ardb.cbcb.umd.edu/">ARDB</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://gold.jgi.doe.gov/">GOLD</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="http://www.homd.org/">HOMD</ext-link>, and 
                    <ext-link ext-link-type="uri" xlink:href="https://www.beiresources.org/">BEI Resources</ext-link> (see 
                    <italic toggle="yes">Annotation Tutorial and Guidelines and Links</italic> in 
                    <xref ref-type="other" rid="SF1">Supplementary File 1</xref>). These peer-reviewed resources and databases have been well-established in the literature as reliable sources of information for researchers. Now, this information can be housed in one place, allowing for more efficient and comprehensive interpretation of microbiome analysis. 
                    <xref ref-type="fig" rid="f1">Figure 1</xref> is a heatmap summarizing the current information hosted in the Microbe Directory&#x2019;s database across all species and parameters.</p>
                <table-wrap id="T1" orientation="portrait" position="anchor">
                    <label>Table 1. </label>
                    <caption>
                        <title>The Microbe Directory inventory parameters and descriptions.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="center" colspan="1" rowspan="1" valign="top">Parameter</th>
                                <th align="center" colspan="1" rowspan="1" valign="top">Definition and notes</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>Optimal pH</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">The optimal pH at which this species grows. If the species was not widely studied, the American
                                    <break/>Type Culture Collection (ATCC) was used to determine the optimal pH for storage. If two far ranges
                                    <break/>of pH were determined, the average was taken. </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>Optimal</bold>
                                    <break/>
                                    <bold>temperature</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">The optimal temperature at which this species grows. If the species was not widely studied, the
                                    <break/>ATCC was used to determine the optimal temperature for storage. If two far ranges of temperatures
                                    <break/>were determined, the average was taken. </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>COGEM</bold>
                                    <break/>
                                    <bold>pathogenicity</bold>
                                    <break/>
                                    <bold>rating</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">COGEM released a comprehensive database of pathogenicity assessment of around 2575 bacterial
                                    <break/>species in 2011
                                    <sup>
                                        <xref ref-type="bibr" rid="ref-10">10</xref>
                                    </sup>. The database ranks the pathogenicity of species on a scale of 1 to 4 - 1 being
                                    <break/>not belonging to a recognized group of disease-invoking agents in humans or animals and having an
                                    <break/>extended history of safe usage and 4 being a species that can cause a very serious human disease,
                                    <break/>for which no prophylaxis is known.</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>Antimicrobial</bold>
                                    <break/>
                                    <bold>susceptibility</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Are there any known antibiotics that this species is sensitive to? 
                                    <bold>No = 0, Yes = 1</bold>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>Spore-formation</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Is the species spore-forming? 
                                    <bold>No = 0, Yes = 1</bold>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>Biofilm-formation</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Is the species biofilm-forming? 
                                    <bold>No = 0, Yes = 1</bold>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>Extremophile</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Extremophiles are organisms that live in extreme environments, as opposed to organisms that live in
                                    <break/>moderate (mesophilic) environments. This category includes acidophiles, thermophiles, osmophiles,
                                    <break/>halophiles, oligotrophs, and others. 
                                    <bold>Mesophiles = 0, Extremophile = 1</bold>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>Gram-stain</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>Negative = 0, Positive = 1, Indeterminate = 2</bold>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>Found in human</bold>
                                    <break/>
                                    <bold>microbiome</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Microbes that live anywhere in the human body and are not pathogenic to humans (i.e. capable of
                                    <break/>causing human disease) 
                                    <bold>No=0, Yes=1</bold>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>Plant pathogen</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Does the species causes disease in plants? 
                                    <bold>No = 0, Yes = 1</bold>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>Animal pathogen</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Does the species causes disease in animals? 
                                    <bold>No = 0, Yes =1</bold>
                                </td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
                <fig fig-type="figure" id="f1" orientation="portrait" position="float">
                    <label>Figure 1. </label>
                    <caption>
                        <title>Microbe Directory heatmap.</title>
                        <p>Annotation types (x-axis) are represented across the online database and the numbers of each category (y-axis, left side) are shown, including Viroids (purple), Viruses (yellow), Eukaryotes (blue), Prokayotes (green), and Fungi (red). The scale for each of the types of metadata (right) are also shown for binary classifications (black, white) and quantitative traits (red scales). Heatmap was constructed using R (version 3) and Illustrator.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://gatesopenresearch-files.f1000.com/manuscripts/13832/90c0dbbf-71db-4b6b-b180-4e8fc1401fa7_figure1.gif"/>
                </fig>
                <p>
                    <bold>
                        <italic toggle="yes">Pre-search</italic>.</bold> Before assignments were given to the student-researchers, the databases listed above were pre-searched in order to collect as much information  as possible about the microbes. This was done using each website's search page. The species name was used as the search query, and the search results html page was parsed using regular expressions. The first search result that contained the microbe's binomial name and contained a link to the website's entry for that microbe was used as the pre-search's result. Such links for each microbe were compiled and given to each student with his or her weekly assignments. The student-researchers were only given the link to the entry, and they then had to manually find the relevant information (e.g. "optimal pH"). Such a system allowed the students to manually confirm that the pre-search identified the correct entry for the microbe and not just a microbe with a similar name. We also supplemented the manual curation by parsing MicrobeWiki for common keywords that could indicate particular features. We found that we could extract useful data for pathogenicity, biofilm-formation, microbe shape, halophilicity, spore formation, and metabolism. We were able to extract some subset of these features for 331 of the microbes that had been manually curated.</p>
                <p>
                    <bold>
                        <italic toggle="yes">Text validation and normalization</italic>.</bold> Student-researchers filled out the columns for a given microbe using an Excel spreadsheet. Each entry was filled out as free-form text, so it was necessary to later normalize and validate the text. Valid column types included positive real numbers (e.g. optimal pH), ranges of positive real numbers (e.g. range of optimal pH values), series of ranges (e.g. multiple optimal pH ranges), binary values (e.g. spore forming or non-forming), ternary values (e.g. Gram-positive, Gram-negative, Gram-indeterminate), and quaternary values (e.g. COGEM Classes 1-4). Regular expressions (RegEx) were used to ensure that a given column entry conformed to the correct type (i.e. validation); validated columns were then transformed to a common form (i.e. normalization). The common form for each entry is the form used in the database.</p>
            </sec>
            <sec>
                <title>Using the Microbe Directory</title>
                <p>The Microbe Directory can be accessed online at 
                    <ext-link ext-link-type="uri" xlink:href="https://microbe.directory">https://microbe.directory</ext-link>. This interface provides individual users a way to browse and search the directory&#x2019;s contents in an interactive format. Such a representation should prove useful for researchers who need information for a particular microbe. While viewing the page for a given microbe, registered users can also submit edits to that microbe&#x2019;s data. Individuals can register to contribute to the Microbe Directory by signing up 
                    <ext-link ext-link-type="uri" xlink:href="https://www.microbe.directory/register">here</ext-link>. The edits are then put in a queue to be later reviewed by The Microbe Directory team (HS, DAW, RS).</p>
                <p>In addition to the interactive web interface, the main website provides links to the project&#x2019;s 
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/microbe-directory/microbe-directory">GitHub</ext-link> and 
                    <ext-link ext-link-type="uri" xlink:href="https://bitbucket.org/account/signin/?next=/microbedb/microbedb">BitBucket</ext-link> repositories. From the GitHub repository, users can download the SQLite database used to power the website. Users will also find JSON and CSV (i.e. Excel) representations of the database, which are auto-generated from the SQLite database using Python scripts. Since the Microbe Directory is meant to grow and expand over time, researchers wanting to make more substantial contributions can suggest changes to the database through our GitHub page. The requested changes will be merged as appropriate and could be incorporated into future releases. Moreover, there is a tutorial on the GitHub repository that shows users how they can use the JSON version of the database given a MetaPhlAn2 output file. Finally, the website used to power the web interface can also be accessed and modified through a separate BitBucket repository, which can also be accessed through the main website.</p>
                <p>The Microbe Directory was designed to help researchers in the microbiome and metagenomics fields to learn more about the organisms they are identifying through their bioinformatics analyses. While this is only version 1.0 of the Microbe Directory, it is readily able to incorporate any contributions to the database to expand the microbial features included in our inventory. For more information on how to contribute to the project visit 
                    <ext-link ext-link-type="uri" xlink:href="https://microbe.directory/">https://microbe.directory/</ext-link>.</p>
            </sec>
        </sec>
        <sec>
            <title>Data availability</title>
            <p>The web interface for the Microbe Directory can be found at 
                <ext-link ext-link-type="uri" xlink:href="https://microbe.directory/">https://microbe.directory/</ext-link>
            </p>
            <p>The database and other files can also be found on the GitHub repository here: 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/microbe-directory/microbe-directory">https://github.com/microbe-directory/microbe-directory</ext-link> and the BitBucket repository here: 
                <ext-link ext-link-type="uri" xlink:href="https://bitbucket.org/account/signin/?next=/microbedb/microbedb">https://bitbucket.org/account/signin/?next=/microbedb/microbedb</ext-link>. 
                <italic toggle="yes">Note:</italic> BitBucket requires a login, but account generation is free and there are no restrictions for signing up.</p>
            <p>Archived code as at time of publication:</p>
            <list list-type="bullet">
                <list-item>
                    <label/>
                    <p>Github: 
                        <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5281/zenodo.1069858">https://doi.org/10.5281/zenodo.1069858</ext-link>
                        <sup>
                            <xref ref-type="bibr" rid="ref-12">12</xref>
                        </sup>
                    </p>
                </list-item>
                <list-item>
                    <label/>
                    <p>Bitbucket: 
                        <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5281/zenodo.1069860">https://doi.org/10.5281/zenodo.1069860</ext-link>
                        <sup>
                            <xref ref-type="bibr" rid="ref-13">13</xref>
                        </sup>
                    </p>
                </list-item>
            </list>
            <p>License: MIT</p>
        </sec>
    </body>
    <back>
        <ack>
            <title>Acknowledgements</title>
            <p>We would like to thank all the student-researchers who helped curate the data for the Microbe Directory without which this project would never be possible (student names are ordered based on their contribution to the database): Sophie Dornbaum (Weill Cornell Medicine), Rabab Shaddoud (CUNY Hunter College), Sadia Chowdhury (CUNY Hunter College), Sarah Chebli (CUNY Hunter College), Christopher Chiang (CUNY Hunter College), Ellen Koag (CUNY Hunter College), Sophia Tam (CUNY Hunter College), Christopher Campbell (CUNY Hunter College), Timothy Lau (CUNY Hunter College), Camille Derderian (CUNY Hunter College), Elyas Amin (CUNY Hunter College), Nicole Rakhmanova (CUNY Hunter College), Amina Durakovic (CUNY Hunter College), Jereen Chowdhury (CUNY Hunter College), Catherine Ng (CUNY Hunter College), Jasmine Wong (CUNY Hunter College), Phuong Vo (CUNY Hunter College), Calvin Herman (CUNY Hunter College), Silva Baburyan (CUNY Hunter College), Kevin Londono (CUNY Hunter College), Julianna Romeo (CUNY Hunter College), Leah Katz (CUNY Hunter College), Valentina Bedoya (CUNY Hunter College), Juan Cambeiro (CUNY Hunter College), Amzad Chowdhury (CUNY Hunter College), Rangon Islam (CUNY Hunter College), Bibi Begum (CUNY Hunter College), Frances Chung (CUNY Hunter College), Mimi Fellner (CUNY Hunter College), Phillip Ye (CUNY Hunter College), Madeleine Winter (Poly Prep High School), Raghav Pant (Millburn High School), Kriti Devasenapathy (California Institute of Technology), Halime Sena Bastug (Istanbul University Cerrahpasa Medical Faculty), Chou Chou (Weill Cornell Medicine), Jasmine Sharron (Columbia Secondary School), Laolu Ogunnaike (Johns Hopkins University), Alina Sheikh (Adelphi University), Carol Apai (Rutgers New Jersey Medical School), Salama Chaker (Weill Cornell Medicine Qatar), Caleb Gordon (Bowdoin College), Michael Pineda (Arizona State), Dara Pierre (CUNY John Jay), Scott Kulm (Weill Cornell Medicine), Ike Lewis (Weill Cornell Medicine), Mustafa Hakyemezoglu (Weill Cornell Medicine).</p>
        </ack>
        <sec sec-type="supplementary-material">
            <title>Supplementary material</title>
            <p id="SF1">
                <bold>Supplementary File 1: Microbial Annotations - Tutorial and guidelines for student researchers, and useful links and tips.</bold>
            </p>
            <p>
                <ext-link ext-link-type="uri" xlink:href="https://gatesopenresearch.s3.amazonaws.com/supplementary/12772/1a7798a3-a13e-402b-9f1d-52f341001aa9.pdf">Click here to access the data.</ext-link>.</p>
            <p>
                <bold>Supplementary File 2: Annotations_Automator.py &#x2013; Python script used for automated research.</bold>
            </p>
            <p>
                <ext-link ext-link-type="uri" xlink:href="https://gatesopenresearch.s3.amazonaws.com/supplementary/12772/8e523c01-dfc5-4d2e-b692-52f58ace0982.py">Click here to access the data.</ext-link>.</p>
        </sec>
        <ref-list>
            <ref id="ref-1">
                <label>1</label>
                <mixed-citation publication-type="journal">
                    <collab>The NIH HMP Working Group, </collab>
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Peterson</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Garges</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The NIH Human Microbiome Project.</article-title>
                    <source>

                        <italic toggle="yes">Genome Res.</italic>
</source>
                    <year>2009</year>;<volume>19</volume>(<issue>12</issue>):<fpage>2317</fpage>&#x2013;<lpage>2323</lpage>.
                    <pub-id pub-id-type="pmid">19819907</pub-id>
                    <pub-id pub-id-type="doi">10.1101/gr.096651.109</pub-id>
                    <pub-id pub-id-type="pmcid">2792171</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-2">
                <label>2</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Gilbert</surname>
                            <given-names>JA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Jansson</surname>
                            <given-names>JK</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Knight</surname>
                            <given-names>R</given-names>
                        </name>
</person-group>:
                    <article-title>The Earth Microbiome project: successes and aspirations.</article-title>
                    <source>

                        <italic toggle="yes">BMC Biol.</italic>
</source>
                    <year>2014</year>;<volume>12</volume>:<fpage>69</fpage>.
                    <pub-id pub-id-type="pmid">25184604</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s12915-014-0069-1</pub-id>
                    <pub-id pub-id-type="pmcid">4141107</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-3">
                <label>3</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Thompson</surname>
                            <given-names>LR</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Sanders</surname>
                            <given-names>JG</given-names>
                        </name>

                        <name name-style="western">
                            <surname>McDonald</surname>
                            <given-names>D</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A communal catalogue reveals Earth's multiscale microbial diversity.</article-title>
                    <source>

                        <italic toggle="yes">Nature.</italic>
</source>
                    <year>2017</year>;<volume>551</volume>(<issue>7681</issue>):<fpage>457</fpage>&#x2013;<lpage>463</lpage>.
                    <pub-id pub-id-type="pmid">29088705</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nature24621</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-4">
                <label>4</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Tighe</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Afshinnekoo</surname>
                            <given-names>E</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Rock</surname>
                            <given-names>TM</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Genomic methods and microbiological technologies for profiling novel and extreme environments for the Extreme Microbiome Project (XMP).</article-title>
                    <source>

                        <italic toggle="yes">J Biomol Tech.</italic>
</source>
                    <year>2017</year>;<volume>28</volume>(<issue>1</issue>):<fpage>31</fpage>&#x2013;<lpage>39</lpage>.
                    <pub-id pub-id-type="pmid">28337070</pub-id>
                    <pub-id pub-id-type="doi">10.7171/jbt.17-2801-004</pub-id>
                    <pub-id pub-id-type="pmcid">5345951</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-5">
                <label>5</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Yooseph</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Andrews-Pfannkoch</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Tenney</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A metagenomic framework for the study of airborne microbial communities.</article-title>
                    <source>

                        <italic toggle="yes">PLoS One.</italic>
</source>
                    <year>2013</year>;<volume>8</volume>(<issue>12</issue>):<fpage>e81862</fpage>.
                    <pub-id pub-id-type="pmid">24349140</pub-id>
                    <pub-id pub-id-type="doi">10.1371/journal.pone.0081862</pub-id>
                    <pub-id pub-id-type="pmcid">3859506</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-6">
                <label>6</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Afshinnekoo</surname>
                            <given-names>E</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Meydan</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Chowdhury</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Geospatial resolution of human and bacterial diversity with city-scale metagenomics.</article-title>
                    <source>

                        <italic toggle="yes">Cell Syst.</italic>
</source>
                    <year>2015</year>;<volume>1</volume>(<issue>1</issue>):<fpage>72</fpage>&#x2013;<lpage>87</lpage>.
                    <pub-id pub-id-type="pmid">26594662</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.cels.2015.01.001</pub-id>
                    <pub-id pub-id-type="pmcid">4651444</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-7">
                <label>7</label>
                <mixed-citation publication-type="journal">
                    <collab>MetaSUB International Consortium</collab>:
                    <article-title>The Metagenomics and Metadesign of the Subways and Urban Biomes (MetaSUB) International Consortium inaugural meeting report.</article-title>
                    <source>

                        <italic toggle="yes">Microbiome.</italic>
</source>
                    <year>2016</year>;<volume>4</volume>(<issue>1</issue>):<fpage>24</fpage>.
                    <pub-id pub-id-type="pmid">27255532</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s40168-016-0168-z</pub-id>
                    <pub-id pub-id-type="pmcid">4894504</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-8">
                <label>8</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Truong</surname>
                            <given-names>DT</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Franzosa</surname>
                            <given-names>EA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Tickle</surname>
                            <given-names>TL</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>MetaPhlAn2 for enhanced metagenomic taxonomic profiling.</article-title>
                    <source>

                        <italic toggle="yes">Nat Methods.</italic>
</source>
                    <year>2015</year>;<volume>12</volume>(<issue>10</issue>):<fpage>902</fpage>&#x2013;<lpage>903</lpage>.
                    <pub-id pub-id-type="pmid">26418763</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nmeth.3589</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-9">
                <label>9</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>McIntyre</surname>
                            <given-names>AB</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ounit</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Afshinnekoo</surname>
                            <given-names>E</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Comprehensive benchmarking and ensemble approaches for metagenomic classifiers.</article-title>
                    <source>

                        <italic toggle="yes">Genome Biol.</italic>
</source>
                    <year>2017</year>;<volume>18</volume>(<issue>1</issue>):<fpage>182</fpage>.
                    <pub-id pub-id-type="pmid">28934964</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s13059-017-1299-7</pub-id>
                    <pub-id pub-id-type="pmcid">5609029</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-10">
                <label>10</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Pasolli</surname>
                            <given-names>E</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Schiffer</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Manghi</surname>
                            <given-names>P</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Accessible, curated metagenomic data through ExperimentHub.</article-title>
                    <source>

                        <italic toggle="yes">Nat Methods.</italic>
</source>
                    <year>2017</year>;<volume>14</volume>(<issue>11</issue>):<fpage>1023</fpage>&#x2013;<lpage>1024</lpage>.
                    <pub-id pub-id-type="pmid">29088129</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nmeth.4468</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-11">
                <label>11</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Van Belkum</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <article-title>COGEM Research Report: Classification of Bacterial Pathogens. Department of Medical Microbiology and Infectious Diseases</article-title>. The Netherlands,<year>2011</year>.</mixed-citation>
            </ref>
            <ref id="ref-12">
                <label>12</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Shaaban</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Westfall</surname>
                            <given-names>DA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Mohammad</surname>
                            <given-names>R</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Microbe Directory Data v1.0.0 (Version v1.0.0) [Data set].</article-title>
                    <source>

                        <italic toggle="yes">Zenodo. </italic>
                    </source>
                    <year>2017</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.5281/zenodo.1069858">Data Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-13">
                <label>13</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Shaaban</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Westfall</surname>
                            <given-names>DA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Mohammad</surname>
                            <given-names>R</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Microbe Directory Website v1.0.0 (Version v1.0.0).</article-title>
                    <source>

                        <italic toggle="yes">Zenodo. </italic>
                    </source>
                    <year>2017</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.5281/zenodo.1069860">Data Source</ext-link>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report26308">
        <front-stub>
            <article-id pub-id-type="doi">10.21956/gatesopenres.13832.r26308</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Bik</surname>
                        <given-names>Elisabeth M.</given-names>
                    </name>
                    <xref ref-type="aff" rid="r26308a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-5477-0324</uri>
                </contrib>
                <aff id="r26308a1">
                    <label>1</label>uBiome, San Francisco, CA, USA</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>I work at uBiome, a microbial sequencing company.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>22</day>
                <month>3</month>
                <year>2018</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2018 Bik EM</copyright-statement>
                <copyright-year>2018</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport26308" related-article-type="peer-reviewed-article" xlink:href="10.12688/gatesopenres.12772.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>Shaaban 
                <italic>et al.</italic> describe The Microbe Directory, a database with more than 7,500 microbial species. This is a great initiative, in which a group of academic researchers, helped by a team of (under)graduate students have annotated bacteria, archaea, viruses and eukaryotic microbes taken from the MetaPhlAn2 database, with respect to pathogenicity, growth characteristics, and presence in the human microbiome. The paper is well written, and the initiative is very welcome. Although the initial list of fields is small, this database has the potential to grow, and there is an option for registered users to add missing data.&#x00a0;</p>
            <p> Comments on the manuscript:</p>
            <p> &#x00a0; 
                <list list-type="order">
                    <list-item>
                        <p>A typo in the Introduction: "taxonomoic"</p>
                    </list-item>
                    <list-item>
                        <p>In the Methods, section &#x201c;MetaPhlAn2 list of species&#x201d;, it reads "7-level (kingdom to strain)". However, based on the MetaPhlAn2 documentation, this should read "kingdom to species" (1, Kingdom; 2, phylum; 3, class; 4, order; 5, family; 6, genus; 7, species).</p>
                    </list-item>
                    <list-item>
                        <p>Figure 1. 
                            <list list-type="order">
                                <list-item>
                                    <p>There is a discrepancy to the color codes described in the legend text and those in the key on the top right of the figure. Viruses, viroids have opposite colors, and the "prokaryotes"&#x00a0; and fungi are not mentioned in the key.&#x00a0;</p>
                                </list-item>
                                <list-item>
                                    <p>"Prokaryotes" is not a good term, as it defines the 2 groups by something they do not have, and suggests a common ancestor between archaea and bacteria. See e.g. Norm Pace's essay
                                        <sup>
                                            <xref ref-type="bibr" rid="rep-ref-26308-1">1</xref>
                                        </sup>.</p>
                                </list-item>
                                <list-item>
                                    <p>What is meant by "Microbiome Location"? It has a binary value, suggesting that maybe "Found in Human Microbiome" (as used in Table 1) might be a better description. The &#x201c;Location&#x201d; suggests that this field stores which anatomical site this species has been found, which would also be a nice field to have, but not what is meant here.</p>
                                </list-item>
                                <list-item>
                                    <p>Similar question for "Antimicrobial" - does a yes mean that it is sensitive or resistant? Or that antimicrobial properties are known? Or that it makes an antimicrobial? Table 1 provides the answer to the question, but it might be worth addressing this here as well.</p>
                                </list-item>
                                <list-item>
                                    <p>&#x201c;Optimal Ph" should be &#x201c;Optimal pH".</p>
                                </list-item>
                                <list-item>
                                    <p>The order of the last 4 categories is different between the labels under the heatmap and the key on the right.</p>
                                </list-item>
                            </list> </p>
                    </list-item>
                    <list-item>
                        <p>A possible and useful addition to the paper would be to describe some potential fields that could be added to the database. In its current form, the useability of the database is very limited, and it would probably be faster to just look up the information in e.g. Wikipedia. But the strength of this database is that it can grow, both in number of entries, as well as number of fields. Some suggestions would be: a link to the draft genome of the organism, number of chromosomes, linear/circular chromosome, RNA/DNA virus.</p>
                    </list-item>
                    <list-item>
                        <p>The paper could also address that some information that appears to be simple at first glance, such as pathogenicity, might not be simple at all. For example, Escherichia coli and Clostridium difficile can both be a peaceful member of the human gut microbiota, or a human pathogen, depending on the presence of toxin genes. Herpes virus infections are so common in humans, and usually latent, that one could argue that it might be considered part of the human microbiome. The ability to form biofilms might be also more complicated than just a simple yes/no.&#x00a0;The paper would be stronger if it acknowledges the difficulties of capturing these subtleties into simple binary answers.</p>
                    </list-item>
                </list> &#x00a0;</p>
            <p> Comments on the database: 
                <list list-type="order">
                    <list-item>
                        <p>Of course, this is just a first version, and the database will hopefully grow quickly, but the current data felt very sparse. For example, and as also pointed out by other reviewers, pH/temperature information was missing for well-studied microbes such as Salmonella enterica,&#x00a0;Agrobacterium tumefaciens, Candida albicans, Schizosaccharomyces pombe,&#x00a0;or Yersinia pestis. For others, there was an entry present in the database but all fields were empty (e.g. Rhinovirus A, Bacillus cereus thuringiensis).&#x00a0;</p>
                    </list-item>
                    <list-item>
                        <p>There are important classification errors such as: 
                            <list list-type="order">
                                <list-item>
                                    <p>Yersinia pestis, which causes disease in rodents, is not listed as an animal pathogen.</p>
                                </list-item>
                                <list-item>
                                    <p>Magnaporthe oryzae, causative agent of one of the most destructive diseases in rice, was not listed as a plant pathogen.&#x00a0;</p>
                                </list-item>
                                <list-item>
                                    <p>Candida albicans was listed as not susceptible for antimicrobials, with a reference from 1999.</p>
                                </list-item>
                                <list-item>
                                    <p>Influenza B virus is classified as biofilm forming based on a paper that shows that Influenza A virus can disperse Streptococcus pneumonia biofilms.</p>
                                </list-item>
                                <list-item>
                                    <p>Agrobacterium tumefaciens, a well-studied plant pathogen, is listed as an animal pathogen, although it does not infect animals in nature, only under laboratory conditions.&#x00a0;</p>
                                </list-item>
                                <list-item>
                                    <p>Human herpesvirus 4 (Epstein-Barr virus, but that search did not bring up any results), is not listed as a human pathogen.</p>
                                </list-item>
                            </list> </p>
                    </list-item>
                    <list-item>
                        <p>The &#x201c;Found in the human microbiome&#x201d; category is defined as &#x201c;Microbes that live anywhere in the human body and are not pathogenic to humans (i.e. capable of causing human disease) No = 0, Yes = 1&#x201d;. However, both Escherichia coli and Clostridium difficile, which can be both a pathogen as well as a symbiotic member of the microbiome, are classified as a &#x201c;yes&#x201d; . So maybe the definition of this should be refined and the part about not being pathogenic to humans should be taken out?</p>
                    </list-item>
                    <list-item>
                        <p>Links: Fields with data are marked with an &#x201c;i&#x201d;, which will lead to the source. Some of the i&#x2019;s are yellow, while others are black/white. The yellow i&#x2019;s lead to an URL that is not a hyperlink, while the white i&#x2019;s are hyperlinks. As one of the other reviewers pointed out, it would be nice if all links would be hyperlinks. Also, in these links, it was very noticeable that the database was compiled of contributions by many different users, who all had their own specific way of adding links. In some cases, the yellow I&#x2019;s give a citation but they are not very helpful. Examples: 
                            <list list-type="order">
                                <list-item>
                                    <p>Akkermansia muciniphila: all information links give &#x201c;Everard, Amandine, et al&#x201d;, without doi, year, or working link.&#x00a0;</p>
                                </list-item>
                                <list-item>
                                    <p>Cryptococcus neoformans: all I&#x2019;s lead to &#x201c;Todd W., Larimer, Frank W., Lippmeier, J. Casey, Lucas, Susan, Medina&#x201d; - no link, year, doi. Which paper is that? I could probably find it, but that defeats the purpose of having a reference database.</p>
                                </list-item>
                                <list-item>
                                    <p>Magnaporthe oryzaeAbx susceptibility is listed as an unhelpful &#x201c;Choi J et al&#x201d;. A Pubmed search for that author returns 19067 papers .</p>
                                </list-item>
                                <list-item>
                                    <p>Escherichia coli: the field for the optimal pH leads to a biotech company that sells culture media, but the link is broken.&#x00a0;</p>
                                </list-item>
                                <list-item>
                                    <p>All i&#x2019;s for Candida albicans refer to Staab, J. F. (1999)
                                        <sup>
                                            <xref ref-type="bibr" rid="rep-ref-26308-2">2</xref>
                                        </sup>, which albeit a paper about Candida, is an older paper about a specific protein, not a general review paper.</p>
                                </list-item>
                                <list-item>
                                    <p>The &#x201c;Pathogenicity&#x201d; appears to always use the COGEM 2011 list as a source, but it is annotated in many different ways. It seems that each of the (under)graduate students used a different description for this field. Sometimes it is a working link to the COGEM document, but in other cases it is a cryptic and not-helpful pop-up text such as &#x201c;CGM2011&#x201d;, &#x201c;CGM PDF&#x201d;, &#x201c;COGEM&#x201d;, &#x201c;CGM 2011-07 Bijlage I Algemene text&#x201d;, without a hyperlink.&#x00a0;</p>
                                </list-item>
                                <list-item>
                                    <p>Pseudomonas aeruginosa: the field Plant Pathogen has a link to a personal file (
                                        <ext-link ext-link-type="uri" xlink:href="">file:///Users/catherineng/Downloads/54028.pdf</ext-link>) that does not work for other users.&#x00a0;</p>
                                </list-item>
                                <list-item>
                                    <p>Haloferax denitrificans: another personal link: file:///C:/Users/Maddie/Downloads/35960.pdf</p>
                                </list-item>
                                <list-item>
                                    <p>Human herpesvirus 4: None of the yellow I&#x2019;s appear to show any text.</p>
                                </list-item>
                                <list-item>
                                    <p>Some of the listed sources of information consists of 2 URLs separated by a space; it is tricky to correctly copy/paste these into a browser. E.g. Methanobrevibacter smithii: &#x201c;https://microbewiki.kenyon.edu/index.php/Methanobrevibacter_Smithii http://bacmap.wishartlab.com/organisms/525&#x201d;&#x00a0;</p>
                                </list-item>
                                <list-item>
                                    <p>The i under Bacillus antracis lists about 10 URLs but most are private search terms, so they are not useful for anyone. The complete list is: &#x201c;http://medschool.creighton.edu/fileadmin/user/medicine/MMI/Files/Bacteria_Table.pdf ; http://www.life.umd.edu/classroom/bsci424/PathogenDescriptions/PathogenList.htm#D; https://www.patricbrc.org/portal/portal/patric/SpecialtyGeneList?cType=genome&amp;cId=1297866.3&amp;kw=source:%22Victors%22 ; http://ardb.cbcb.umd.edu/cgi/search.cgi?db=R&amp;term=YP_001373621 ; http://ardb.cbcb.umd.edu/cgi/search.cgi?db=R&amp;term=ZP_02395450 ; http://ardb.cbcb.umd.edu/cgi/search.cgi?db=R&amp;term=ZP_02391336 ; http://ardb.cbcb.umd.edu/cgi/search.cgi?db=R&amp;term=ZP_03108029 ; https://microbewiki.kenyon.edu/index.php/Bacillus ; https://en.wikipedia.org/wiki/Bacillus_anthracis ; https://gold.jgi.doe.gov/organisms?id=3251&#x201d;</p>
                                </list-item>
                            </list> </p>
                    </list-item>
                    <list-item>
                        <p>Layout: 
                            <list list-type="order">
                                <list-item>
                                    <p>I found the use of upper and lower case for the microorganism names a bit distracting. Bacteria are listed as completely lower case in the dark top bar, but in upper case in the bottom part. This might be great from a designer point of view, but it is not how most of us are used to write bacterial names.&#x00a0;</p>
                                </list-item>
                                <list-item>
                                    <p>When browsing taxonomically, viruses are listed in non-alphabetical order, making it hard to find the correct entry. E.g. Picornaviridae-Enterovirus or genus Bacillus both lead to such a list.</p>
                                </list-item>
                                <list-item>
                                    <p>In &#x201c;Optimal Temerature&#x201d; there is a &#x201c;P&#x201d; missing</p>
                                </list-item>
                                <list-item>
                                    <p>The name of the category called &#x201c;Microbiome location &#x201c; suggests this field would contain a location, such as gut/mouth/skin), while it is a yes/no field.&#x00a0;</p>
                                </list-item>
                            </list> </p>
                    </list-item>
                    <list-item>
                        <p>&#x00a0;Suggestion for a future additions: 
                            <list list-type="order">
                                <list-item>
                                    <p>a short line describing what the organism is known for.</p>
                                </list-item>
                                <list-item>
                                    <p>Additional categories: chromosome information (linear, circular, how many, number of ribosomal operon copies, RNA/DNA for viruses), use in food industry (brewing, bread making, probiotic)</p>
                                </list-item>
                            </list> </p>
                    </list-item>
                </list>
            </p>
            <p>Are sufficient details of methods and materials provided to allow replication by others?</p>
            <p>Yes</p>
            <p>Is the rationale for creating the dataset(s) clearly described?</p>
            <p>Yes</p>
            <p>Are the datasets clearly presented in a useable and accessible format?</p>
            <p>Yes</p>
            <p>Are the protocols appropriate and is the work technically sound?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>Human microbiome analysis, biotech industry</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
        <back>
            <ref-list>
                <title>References</title>
                <ref id="rep-ref-26308-1">
                    <label>1</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>Time for a change.</article-title>
                        <source>
                            <italic>Nature</italic>
                        </source>.<year>2006</year>;<volume>441</volume>(<issue>7091</issue>) :
                        <elocation-id>10.1038/441289a</elocation-id>
                        <fpage>289</fpage>
                        <pub-id pub-id-type="pmid">16710401</pub-id>
                        <pub-id pub-id-type="doi">10.1038/441289a</pub-id>
                    </mixed-citation>
                </ref>
                <ref id="rep-ref-26308-2">
                    <label>2</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>Adhesive and mammalian transglutaminase substrate properties of Candida albicans Hwp1.</article-title>
                        <source>
                            <italic>Science</italic>
                        </source>.<year>1999</year>;<volume>283</volume>(<issue>5407</issue>) :<fpage>1535</fpage>-<lpage>8</lpage>
                        <pub-id pub-id-type="pmid">10066176</pub-id>
                    </mixed-citation>
                </ref>
            </ref-list>
        </back>
        <sub-article article-type="response" id="comment3119-26308">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Westfall</surname>
                            <given-names>David</given-names>
                        </name>
                        <aff/>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>11</day>
                    <month>9</month>
                    <year>2018</year>
                </pub-date>
            </front-stub>
            <body>
                <p>Hello Dr. Bik,</p>
                <p> </p>
                <p> Thank you for your comments on the manuscript and database. Please see below for responses to your points.</p>
                <p> </p>
                <p> Manuscript: 
                    <list list-type="order">
                        <list-item>
                            <p>Yes, "taxonomoic" was a typo and should be changed.</p>
                        </list-item>
                        <list-item>
                            <p>Yes, neither MetaPhlAn2 nor The Microbe Directory supports strain-level resolution. The wording was an error on our part.</p>
                        </list-item>
                        <list-item>
                            <p>Figure 1 
                                <list list-type="order">
                                    <list-item>
                                        <p>Yes, the color codes are incorrect as you described.</p>
                                    </list-item>
                                    <list-item>
                                        <p>Yes, there is a discrepancy between the color codes and legend text. The text should be changed to reflect the key in the top-right of the figure.</p>
                                    </list-item>
                                    <list-item>
                                        <p>Yes, "Microbiome Location" is a binary location. The label should be changed to "Found in Human Microbiome."</p>
                                    </list-item>
                                    <list-item>
                                        <p>Yes, changing the label from "Antimicrobial" to "Susceptible to Antimicrobials&#x201d; would be more clear.</p>
                                    </list-item>
                                    <list-item>
                                        <p>Yes, the capitalization should be changed to &#x201c;pH&#x201d; and not &#x201c;Ph&#x201d;.</p>
                                    </list-item>
                                    <list-item>
                                        <p>Yes, the ordering of the labels in the legend text should be changed to reflect that of the key in the top-right.</p>
                                    </list-item>
                                </list> </p>
                        </list-item>
                        <list-item>
                            <p>Yes, the utility of the database is currently limited given the current columns and data, but we agree that it has the potential to grow. In the next version of the database, we want to add features that will allow researchers to add columns such as those you have suggested.</p>
                        </list-item>
                        <list-item>
                            <p>Yes, many of the columns of the database have subtleties not addressed (e.g. whether or not an organism forms biofilms is more complicated than a binary yes/no). In the future, we hope to add additional columns that better address these subtleties. This could be handled by allowing more complex data types in the database, in addition to having columns with more restrictive definitions. We ultimately decided to reduce such complexities into simpler values so the database could be more computationally useful. That being said, there are ways to improve our current approach as discussed.</p>
                            <p> </p>
                            <p> As far as pathogenicity is concerned, we relied on the COGEM pathogenicity classification in order to provide a common set of definitions. That being said, we agree that pathogenicity is a complex topic with subtleties that are difficult to represent with a simple number.</p>
                        </list-item>
                    </list> </p>
                <p> Database 
                    <list list-type="order">
                        <list-item>
                            <p>Yes, the current database is a work in progress and has many unfilled fields. That being said, we hope the community will participate in filling in the database&#x2019;s holes and that it will grow as you mentioned. Moreover, we would like to add additional data for commonly-studied organisms via computational techniques (e.g. data scraping / natural language processing).</p>
                        </list-item>
                        <list-item>
                            <p>Unfortunately, it is very difficult for us to catch all errors such as these. We hope to improve our community-editing functionality, so that members of the scientific community can help us with data quality. Ensuring data integrity in a resource of this size is something that can only be accomplished with the help of the community, and we hope to make the editing process easier in the next phase of the database. We also would like to introduce some sort of &#x201c;quality rating&#x201d; that helps determine the reliability of a particular data value, but this is currently a work-in-progress.</p>
                        </list-item>
                        <list-item>
                            <p>Yes, the "found in the human microbiome" category is unable to capture all of the nuances of such microbes. In the future, we want to create functionality that will allow the community to create additional column types. We could then clarify either the column definition and/or possible data types in the column, which would allow for the representation of more nuanced relationships.</p>
                        </list-item>
                        <list-item>
                            <p>Yes, the citation infrastructure needs to be improved and is one of our main focuses for the next iteration of the database. In its current form, many of the citations are stored as free text, which makes it difficult to parse and intelligently display to end users. With a more robust citation infrastructure, we can eliminate many of the issues you have mentioned.</p>
                        </list-item>
                        <list-item>
                            <p>Layout 
                                <list list-type="order">
                                    <list-item>
                                        <p>Yes, scientific norms should trump aesthetics and microbe names should be represented as per normal scientific conventions.</p>
                                    </list-item>
                                    <list-item>
                                        <p>Yes, viruses should be listed in alphabetical order when browsing taxonomically.</p>
                                    </list-item>
                                    <list-item>
                                        <p>Yes, &#x201c;Optimal Temerature" is misspelled and should be fixed.</p>
                                    </list-item>
                                    <list-item>
                                        <p>Yes, "microbiome location" should be changed to better indicate that it is a binary value, as previously discussed.</p>
                                    </list-item>
                                </list> </p>
                        </list-item>
                        <list-item>
                            <p>Suggestions 
                                <list list-type="order">
                                    <list-item>
                                        <p>Yes, we agree that a small, free text summary would be a useful column to add.</p>
                                    </list-item>
                                    <list-item>
                                        <p>Yes, we want to add additional column types as previously discussed. We agree that columns such as these would be useful additions to the database.</p>
                                    </list-item>
                                </list> </p>
                        </list-item>
                    </list> Thank you again for giving the database such a close look. We hope to incorporate as much of this as possible into the next iteration of the database.</p>
                <p> </p>
                <p> Best,</p>
                <p> David Westfall</p>
            </body>
        </sub-article>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report26309">
        <front-stub>
            <article-id pub-id-type="doi">10.21956/gatesopenres.13832.r26309</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Vega</surname>
                        <given-names>Nicole M.</given-names>
                    </name>
                    <xref ref-type="aff" rid="r26309a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-9929-6109</uri>
                </contrib>
                <aff id="r26309a1">
                    <label>1</label>Biology Department, Emory University, Atlanta, GA, USA</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>15</day>
                <month>3</month>
                <year>2018</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2018 Vega NM</copyright-statement>
                <copyright-year>2018</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport26309" related-article-type="peer-reviewed-article" xlink:href="10.12688/gatesopenres.12772.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>In this manuscript, the authors describe the creation and construction of the Microbe Directory, a resource for profiling and annotating species after large-scale metagenomic taxonomic analyses.</p>
            <p> </p>
            <p> I very much like the idea of the Microbe Directory and think that this could be a valuable resource for the field. The manuscript describing the Directory&#x2019;s construction and curation to date was clear and understandable.</p>
            <p> </p>
            <p> I think that the ability of researchers to add information to the database directly is a great feature. Is there also a plan and/or schedule for incorporating database updates from the sources described in the paper?</p>
            <p> </p>
            <p> I did not download the database, but I did try the web interface and found it fairly intuitive to use. The Browse function was a little odd - it would be helpful if the options for each clade were presented alphabetically or in some other obvious order.</p>
            <p>Are sufficient details of methods and materials provided to allow replication by others?</p>
            <p>Yes</p>
            <p>Is the rationale for creating the dataset(s) clearly described?</p>
            <p>Yes</p>
            <p>Are the datasets clearly presented in a useable and accessible format?</p>
            <p>Yes</p>
            <p>Are the protocols appropriate and is the work technically sound?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>Microbiology, microbial ecology</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
        <sub-article article-type="response" id="comment3120-26309">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Westfall</surname>
                            <given-names>David</given-names>
                        </name>
                        <aff/>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>11</day>
                    <month>9</month>
                    <year>2018</year>
                </pub-date>
            </front-stub>
            <body>
                <p>Hello Dr. Vega,</p>
                <p> </p>
                <p> Thank you for your review. I have addressed your comments below:</p>
                <p> </p>
                <p> As far as updating the database in real-time from the sources is concerned, there is currently no plan to do this automatically. A lot of the data was curated manually, which makes it difficult to update the data in real-time. That being said, we agree that it would be useful to automatically update data from sources that provide an API and could be a consideration for future versions of the database.</p>
                <p> </p>
                <p> And yes, the browse function could be improved. Representing the clades in alphabetical order would be a useful addition that would greatly improve usability. We hope to improve the usability of our website in the next version in order to address issues such as these.</p>
                <p> </p>
                <p> Best,</p>
                <p> David Westfall</p>
            </body>
        </sub-article>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report26238">
        <front-stub>
            <article-id pub-id-type="doi">10.21956/gatesopenres.13832.r26238</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>McDonald</surname>
                        <given-names>James E.</given-names>
                    </name>
                    <xref ref-type="aff" rid="r26238a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-6328-3752</uri>
                </contrib>
                <aff id="r26238a1">
                    <label>1</label>School of Biological Sciences, Bangor University, Bangor, UK</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>15</day>
                <month>3</month>
                <year>2018</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2018 McDonald JE</copyright-statement>
                <copyright-year>2018</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport26238" related-article-type="peer-reviewed-article" xlink:href="10.12688/gatesopenres.12772.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>
                <bold>Concept:</bold>
            </p>
            <p> </p>
            <p> The microbe directory is an excellent concept and aims to provide phenotypic and ecological profiles of approx. 7500 microbial species represented in the MetaPhlAn2 database. Although some information is present in other repositories, the Microbe Directory aggregates information on the functional, biological or medical traits of these organisms into a single source where the profiles may be further expanded to represent a useful resource to better interpret the functional and ecological properties of taxonomic data. If the directory continues to grow and expand with additional information, it would be a fantastic and heavily-utilised resource for the wider community. In particular, integration of data for bacteria, archaea, viruses, fungi, and protozoa in the same database is a positive strategy.</p>
            <p> </p>
            <p> An important feature of the Microbe Directory is that while there is always the potential for human error in manually curated databases, scientists can generate an account and submit edits and changes to the information hosted in the database. I hope that the wider scientific community engages with and contributes to the directory in order to enable it to reach its potential as an important resource for microbiologists.</p>
            <p> </p>
            <p> 
                <bold>Manuscript:</bold>
            </p>
            <p> </p>
            <p> Abstract. The abstract focusses heavily on the application of the directory &#x2018;downstream of large-scale metagenomic taxonomic analysis&#x2019; and &#x2018;designed to serve as a resource for researchers conducting metagenomic analysis&#x2019;, but perhaps this is too narrow a focus on the utility of the directory. I can see several other uses for the directory in other areas of microbiology; to inform/validate the potential phenotypic and ecological properties of a microbial isolate, or as an information source on a specific microorganism for an undergraduate student after a lab class, for example. Maybe it&#x2019;s worth re-wording this to broaden the potential for wider adoption of the resource.</p>
            <p> </p>
            <p> Average values. Table 1 describes the microbial features currently listed in the directory. However, in instances where more than one optimal temperature and pH could be found for different strains of a species, an average value has been taken. This would mask the range of optimal temperatures across the strains, which is useful information if you are using the resource to find out the best temperature(s) to grow a species at. Could the range of temperature recorded not also be provided as an additional source of information? &#x00a0;</p>
            <p> </p>
            <p> 
                <bold>Website and sources of information:</bold>
            </p>
            <p> </p>
            <p> The website looks good and was generally easy to navigate.</p>
            <p> </p>
            <p> Information. I performed a few searches for microorganisms that we work on, some of which are not well-characterised at present, and was not surprised to see that for most of these many of the categories were blank. However, I then looked at some very well-studied organisms and also found lots of blank categories. For example, only 3/11 categories were complete for Bacillus subtilis, when a quick google search reveals primary research articles that provide information on several of these categories (e.g. that it is a spore former). E. coli also has no data for biofilm formation, which can again be verified with a quick google search. Going forward, additional buy-in will therefore been needed to ensure that the information is complete as possible.</p>
            <p> </p>
            <p> For some species, the information links didn&#x2019;t work (I got an &#x2018;error 404&#x2019; code) which made it difficult to find the source of information, but others worked fine.</p>
            <p> </p>
            <p> Where the links did work, many of the sources of information were webpages (e.g. wiki pages) that did refer to primary literature that could be accessed to verify the information. However, in my view, if it was possible to provide links to more than one reference in the primary literature, and to allow others to add links to primary research articles, you could very quickly generate a set of primary literature that described those key attributes, which would be very beneficial as a reference source and for validation of the information.</p>
            <p> </p>
            <p> Some links to information about microbial species took me to website articles that although were apparently peer-reviewed, had collated information from primary research articles. However, if it is possible to incorporate them, direct links to the research articles themselves would in my view be more useful.</p>
            <p>Are sufficient details of methods and materials provided to allow replication by others?</p>
            <p>Yes</p>
            <p>Is the rationale for creating the dataset(s) clearly described?</p>
            <p>Yes</p>
            <p>Are the datasets clearly presented in a useable and accessible format?</p>
            <p>Yes</p>
            <p>Are the protocols appropriate and is the work technically sound?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>Microbial ecology</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
        <sub-article article-type="response" id="comment3121-26238">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Westfall</surname>
                            <given-names>David</given-names>
                        </name>
                        <aff/>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>11</day>
                    <month>9</month>
                    <year>2018</year>
                </pub-date>
            </front-stub>
            <body>
                <p>Hello Dr. McDonald,</p>
                <p> </p>
                <p> Thank you very much for your review. Please find responses to your comments below:</p>
                <p> </p>
                <p> Manuscript: 
                    <list list-type="order">
                        <list-item>
                            <p>Abstract 
                                <list list-type="order">
                                    <list-item>
                                        <p>The directory was worded as being used &#x2018;downstream of large-scale metagenomic taxonomic analysis&#x2019; largely to distinguish it from other resources. There are other resources out there, such as MicrobeWiki, that focus on free text information and do a great job at that. With this in mind, we decided to focus on quantifiable sources of information--both for ease of data collection and data application. That being said, there are certainly additional use cases of such quantifiable microbial information to which we could broaden the wording.</p>
                                    </list-item>
                                </list> </p>
                        </list-item>
                        <list-item>
                            <p>Average Values 
                                <list list-type="order">
                                    <list-item>
                                        <p>We took average values for optimal temperature and pH for a few different technical reasons. The current implementation is a limitation of (1) the directory not having strain-level resolution and (2) the database not being designed to support complex data types. That being said, this is certainly something we would like to support in future versions. &#x00a0;We certainly agree that it would be useful to have more granularity when it comes to these values, and we hope to incorporate it in future versions of the database (e.g. including entire growth curves).</p>
                                    </list-item>
                                </list> </p>
                        </list-item>
                    </list> </p>
                <p> Website: 
                    <list list-type="order">
                        <list-item>
                            <p>Current Data 
                                <list list-type="order">
                                    <list-item>
                                        <p>Yes, we agree that the data is incomplete and that much work remains to be done. We hope to make it easier for users to add data in order to increase community buy-in.</p>
                                    </list-item>
                                </list> </p>
                        </list-item>
                        <list-item>
                            <p>Citations 
                                <list list-type="order">
                                    <list-item>
                                        <p>We absolutely agree that one of the biggest areas for improvement lies in the citations. We plan to greatly improve this infrastructure in future releases--largely with the idea of preferring primary sources. Moreover, we plan to accommodate timestamps in order to prevent (or at the very least address) 404 errors. Moreover, we want to improve the usability of the citations. When we improve the infrastructure (the citations are currently stored as free text), it will be easier for us to display the relevant links from the citation.</p>
                                    </list-item>
                                </list> </p>
                        </list-item>
                    </list>
                </p>
            </body>
        </sub-article>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report26191">
        <front-stub>
            <article-id pub-id-type="doi">10.21956/gatesopenres.13832.r26191</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Coil</surname>
                        <given-names>David A.</given-names>
                    </name>
                    <xref ref-type="aff" rid="r26191a1">1</xref>
                    <role>Referee</role>
                </contrib>
                <aff id="r26191a1">
                    <label>1</label>Genome Center, University of California, Davis, Davis, CA, USA</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>Our lab participated in the worldwide MetaSUB project which was run by Chris Mason and his lab. I wasn't the primary point of contact for the project, but I supervised the person that was.  Our involvement was collecting samples and sending them to his lab (as did dozens of other labs).  In addition, I once sent him some bacterial DNA that might or might not someday be part of a future publication.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>12</day>
                <month>1</month>
                <year>2018</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2018 Coil DA</copyright-statement>
                <copyright-year>2018</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport26191" related-article-type="peer-reviewed-article" xlink:href="10.12688/gatesopenres.12772.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>I love the idea behind &#x201c;The Microbe Directory&#x201d;. I think this information will be of great value and I really like the way it was generated with the help of students. With the ability to expand the database, I think this could become an important resource. In particular I&#x2019;m curious if the authors have considered working with the folks at Traitar? They have a really useful tool but their underlying phenotypic data is lacking and is heavily biased towards human pathogens in a way that the Microbe Directory appears not to be.</p>
            <p> </p>
            <p> However, my attempts to use the website were somewhat frustrating. It&#x2019;s not clear to me in a review of this nature whether I should be reviewing just the paper (basically fine) or the software itself (needs work before going live).</p>
            <p> 
                <bold>Paper </bold>
            </p>
            <p> The paper is well-written and clear. I only had very minor suggestions (harder without line numbers!)</p>
            <p> </p>
            <p> The phrase starting with &#x201c;these features were considered&#x201d; in the &#x201c;Building the microbe directory section needs grammatical revision.</p>
            <p> </p>
            <p> In the same section the sentence &#x201c;These peer-reviewed resources and databases&#x2026;&#x201d; is a bit misleading since most of the listed resources are not peer-reviewed.</p>
            <p> </p>
            <p> Not sure the &#x201c;(RegEx)&#x201d; abbreviation is useful since it&#x2019;s never used again.</p>
            <p> </p>
            <p> &#x201c;The edits are then put in a queue to be later reviewed by the Microbe Directory team&#x201d;. I would like to see a bit more about what the criteria for review are. Will they just be checking for spam or will they actually verify information using the reference(s)?</p>
            <p> 
                <bold>Website</bold>
            </p>
            <p> The site is very clean and easy to navigate. However, when I attempted to do things I encountered some snags.</p>
            <p> </p>
            <p> Firstly, I went and looked at some microbes. The first think I noticed is that reference links aren&#x2019;t clickable links, they are just html which is a bit off-putting and requires careful copying and pasting. But in a case where there are multiple links it becomes a mess (see 
                <ext-link ext-link-type="uri" xlink:href="https://s3-eu-west-1.amazonaws.com/gatesopenresearch/linked/178823.Slide2.JPG">screenshot</ext-link>). It seems like some better way to parse links is required, and having them be clickable would be awesome. The first one I tried also led to a 404 error, is there some way that links could be automatically checked?</p>
            <p> </p>
            <p> So after I created an account I clicked on &#x201c;Contribute&#x201d; which took me to the login page, after logging in, it dropped me back at the main page which seems not ideal&#x2026; I then had to navigate back to the microbe I was interested in. Then after clicking on &#x201c;Contribute&#x201d;, I&#x2019;m faced with some fields&#x2026; the first of which is the Microbe ID&#x2026; but it doesn&#x2019;t say that anywhere, there&#x2019;s just a number there. A bit confusing. Perhaps this field could be labeled?</p>
            <p> </p>
            <p> Then I wanted to add something about Gram staining&#x2026; but there&#x2019;s no key for the &#x201c;Values&#x201d; field (see 
                <ext-link ext-link-type="uri" xlink:href="https://s3-eu-west-1.amazonaws.com/gatesopenresearch/linked/178821.Slide1.JPG">screenshot</ext-link>). I had to open a new window and pull up another organisms to know that &#x201c;1&#x201d; is what I wanted for Gram-positive. Is it possible to display the key for the values on the contribute page? Or do have a small key appear for the selected attribute? Some way for someone to have access to the key within the &#x201c;Contribute&#x201d; page.</p>
            <p> I was surprised to see an entry like &#x201c;
                <italic>Porphyrobacter</italic> sp AAP82&#x201d;. Is there a rationale for including isolates that don&#x2019;t have a species level identification? Not sure how useful this sort of thing is for the stated purpose of the database. For example in this case there are only two pieces of data, the COGEM listing (which since the reference &#x201c;link&#x201d; just says &#x201c;COGEM&#x201d; can&#x2019;t be verified) and the Gram stain field which links to the listing for that genus at bacterio.net which actually doesn&#x2019;t contain information about Gram staining and anyway wouldn&#x2019;t have a listing for an isolate like this that doesn&#x2019;t have a species name. How does something like this end up in the database?&#x00a0;</p>
            <p> </p>
            <p> In a similar vein, there are microbes in the database that have no information whatsoever, is this expected?</p>
            <p> </p>
            <p> I really wanted to try adding a new species to the database, but couldn&#x2019;t find any way to do so. The paper sort of implies that this is possible but I didn&#x2019;t see any such option on the website?&#x00a0; I then went to the &#x201c;Contact&#x201d; page figuring that I would send a ping with this question. But there&#x2019;s no general contact address for the project? Seems like that might be useful? Wasn&#x2019;t sure which specific person would be appropriate for this question.</p>
            <p>Are sufficient details of methods and materials provided to allow replication by others?</p>
            <p>Yes</p>
            <p>Is the rationale for creating the dataset(s) clearly described?</p>
            <p>Yes</p>
            <p>Are the datasets clearly presented in a useable and accessible format?</p>
            <p>Yes</p>
            <p>Are the protocols appropriate and is the work technically sound?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>Microbiology, Microbial Ecology, Bacterial Genomics</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
        <sub-article article-type="response" id="comment3059-26191">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Shaaban</surname>
                            <given-names>Heba</given-names>
                        </name>
                        <aff>Weill Cornell Medical Center, USA</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>4</day>
                    <month>2</month>
                    <year>2018</year>
                </pub-date>
            </front-stub>
            <body>
                <p>Thank you for your review, Dr. Coil.</p>
                <p>Paper:</p>
                <p>The edits you suggested regarding the paper will be published on version 2 of the manuscript.&#x00a0;</p>
                <p>As for the edits that contributors will be making to the site, we will review them one-by-one to confirm that the sources/citations submitted are reliable and that the data inputted is verified by the reference. We have an administrative view that our team uses to accept/reject edits. We will expand on this in the paper, as suggested.</p>
                <p>Website:</p>
                <p>We are sorry that you had a difficult experience using our website. The links are now clickable if they are validated, and you should be redirected to the link right away. If more than one link was cited, you can click on the icon to see all the links. For now, copying and pasting the links (if multiple) is the only way to view them. We eventually want to change this, but due to the non-standard format of some of the citations, this is not currently possible. We will need to manually re-curate the citations in order to provide this functionality, and we are planning to do so in the near future. As for the link you clicked on being expired, the project has been ongoing for about two years now and some of the links are indeed broken. We plan to add timestamps to each citation, but this is currently not implemented. If a link is broken, it is possible for Microbe Directory users to manually submit edits using the &#x201c;Contribute&#x201d; interface.</p>
                <p>We also programmed the site to redirect you to the microbe you were attempting to edit after you create an account. Thank you for bringing that to our attention. The microbe ID was actually an in-house cataloging number for admins and we understand why it might be confusing. It is just meant to make it easier to share species on social media without having to type in the species name. Also, the key for values will now appear on the contribute page. We also created a drop-down menu for each value, which should make it easier to edit.</p>
                <p>As for including isolates that don&#x2019;t have a species-level identification, we made the decision to be compatible with MetaPhlAn2, so researchers using this popular metagenomics analysis tool would have their code work &#x201c;out-of-the-box.&#x201d; If&#x00a0;in the future, researchers decide to work with these strains or species, they will already be cataloged in the database. We are aware that there are microbes for which there is no information. We wanted to include these to be conformant to MetaPhlAn2 and also to provide a scaffold for users to fill in additional information.</p>
                <p>Additionally, users can now add new species to the database by using the link on the &#x201c;Contribute&#x201d; page. We also updated the contact page to include our role descriptions in the project, so that we may be contacted accordingly. We also improved our "Help" page to include more instructions on contributing and to address some troubleshooting.&#x00a0;</p>
                <p>Lastly, this is just version 1 of the Microbe Directory. As the project expands, we plan to make changes to the site format and configuration as necessary in addition to maintaining proper quality-control. As you had mentioned with Traitar, we want to collaborate with organizations that have their own databases and incorporate them into the Microbe Directory. We think v1 of the Microbe Directory provides a scaffold for expansion, and we really want to see this project grow over time.</p>
                <p>Thank you for your time and efforts in reviewing our project.</p>
            </body>
        </sub-article>
    </sub-article>
</article>
