Whole blood genome-wide transcriptome profiling and metagenomics next-generation sequencing in young infants with suspected sepsis in low-and middle-income countries: A study protocol [version 1; peer review: 2 approved with reservations]

Conducting collaborative and comprehensive epidemiological research on neonatal sepsis in lowand middle-income countries (LMICs) is challenging due to a lack of diagnostic tests. This Open Peer Review Reviewer Status Gates Open Research Page 1 of 15 Gates Open Research 2020, 4:139 Last updated: 19 OCT 2020


Introduction
Sepsis is defined as life-threatening organ dysfunction caused by a dysregulated host response to infection. Each year, over 11 million people die of sepsis worldwide, with young infants accounting for nearly a quarter of these deaths [1][2][3] . By far, the greatest burden of infants' sepsis mortality occurs in low-and middle-income countries (LMIC). Concerted global efforts have resulted in a 50% reduction in sepsis-related mortality in children below 5 years of age, over three decades. However, health gains in young infants lag behind older children 4 . Furthermore, precise epidemiological data on the incidence and risk factors for neonatal sepsis are still lacking especially in LMIC where the vastmajority of cases occur 1,2,5,6 .
Effective public health measures and treatment guidelines rely on robust evidence and case identification using clear diagnostic criteria, both of which are lacking in young infants 7 . This makes the management of sepsis in this age group challenging. Early clinical signs are inconspicuously non-specific and often overlap with other health conditions, making sepsis difficult to recognize based on clinical criteria alone. This is compounded by the need for prompt antibiotic treatment to ensure survival from bacterial sepsis in infants. Currently, blood cultures are the gold standard for diagnosis of bacterial sepsis. However, the specificity and sensitivity of blood cultures are low in young infants. Also, blood culture contamination can be frequent in low resource settings due to a lack of resources and disinfection policies 8 . As such, positive blood culture results do not necessarily indicate a true infection because of the high possibility of sample contamination during blood collection when aseptic procedures are not strictly followed. Furthermore, differentiating a blood culture contamination from a true infection can be problematic particularly due to the common occurrence of bacteremia involving commensal skin pathogens in newborns 9 . In addition to limiting the epidemiological understanding of the problem, these challenges greatly complicate the management of infants with sepsis in LMICs 10 and seriously limit the development of effective prevention and treatment guidelines for antibiotic use.
Ribonucleic acid (RNA) profiling using deep sequencing in whole blood detects the host immune response produced during an infection. This approach provides useful information about the etiology and, potentially, the severity of sepsis in humans. Studies in the United Kingdom 11,12 , Spain 13 and the United States of America 14,15 have shown that host transcriptome signatures can discriminate between bacterial and viral causes of sepsis, as well as other syndromes in infants under 3 months of age. To date, very few studies have used RNA-sequencing (RNA-Seq) for case identification of bacterial sepsis in neonates, and to understand the etiology of sepsis in LMIC which globally represent over 90% of the disease burden. Therefore, we hypothesize that RNA-Seq of host immune responses in whole blood will inform the "true" prevalence and epidemiology of bacterial sepsis in infants in a LMIC setting. In addition, we hypothesize that metagenomic next-generation sequencing (mNGS) technologies may be useful in this study population for detection of etiologic pathogens and genes conferring antimicrobial resistance (AMR) further augmenting our epidemiological knowledge [16][17][18][19] .
To help fill these knowledge gaps, we developed a prospective study protocol that aims to: 1) Determine the prevalence of bacterial sepsis in infants under three months evaluated for suspected sepsis at a regional hospital in Lilongwe, Malawi, Africa.
2) Establish whether blood molecular RNA signatures in this setting can more accurately identify young infants with bacterial causes among those in whom sepsis is suspected.
3) Provide proof-of-concept that mNGS can be used to detect pathogens and AMR gene detection in infants with bacterial sepsis.

Study protocol
Study design This will be a prospective, longitudinal cohort study.

Setting
Kamuzu Central Hospital (KCH), based in Lilongwe, is the largest referral hospital for the central region of Malawi (population ∼18 millions), delivering adult and pediatric clinical care for ∼5 million inhabitants. At KCH, infants under 2 weeks of age are admitted to a dedicated Neonatal Unit from the maternity ward, home, or another hospital facility. Infants between 2 weeks and 3 months are admitted from home or another hospital to the Special Care Nursery in the main pediatric ward or a High-Dependency Unit if they require oxygen therapy.

Participants
Infants less than 3 months with suspected sepsis from all gender and ethnic groups who present to the Neonatal Unit, pediatric Special Care Nursery or High-Dependency Unit at KCH are eligible for inclusion if they have received antibiotics for less than 4 hours prior to enrollment. Infants less than 3 months of age in whom sepsis is sufficiently unlikely such that antibiotics will not be administered, who have not received antibiotics within 72 hours prior to enrollment, and who require blood sampling for clinical indications are included as controls. Written informed consent (see Consent Form in Extended data 20 ) will be obtained from parents/legal guardians in English or Chichewa (local dialect).

Study procedures
Active recruitment will begin in June 2018, after a 2-week period of study training with the Malawi team. Infants will be screened daily for eligibility at the time of initial presentation for suspected sepsis. Eligibility will be determined by a trained study nurse or clinical officer on-site, who will also obtain consent in English or Chichewa, collect the history of the presenting illness and perform a physical examination (with vital signs), following a pre-specified Data Collection Form (see Extended data 20 ). Consent will take place in a location as private as possible, near the bedside or in a separate room. Acknowledging the typical congestion in the hospital environment, complete privacy is not always possible. Research staff will be trained in International Conference on Harmonization Good Clinical Practices and will follow the highest possible standards of privacy and confidentiality.
A log of all infants approached for consent will be collected. Additionally, unit census data from all admissions will be reviewed at the end of the study, to determine the overall number of eligible infants. For participants, a separate paper log of participant ID and name will be maintained.
Once consent has been granted, additional verbal permission will be obtained from parents to record a short video (~30 seconds) of the infant, using a High Definition iPad camera. Vital signs (heart rate, respiratory rate, oxygen saturation, temperature and blood pressure) will be recorded by a study nurse or clinical officer on admission, and daily thereafter by a dedicated vital sign assistant. Heart rate, respiratory rate and oxygen saturation will be captured using a custom application developed by the Digital Health Innovation Lab at the BC Children's Hospital Research Institute and Centre for International Child Health (DD, GD, JMA) that employs a saturation probe connected to an Android device 21 . Blood pressure will be measured via automated monitors (General Electric Dinamap Pro 300V2 monitor) using a standard blood pressure protocol (see Extended data 20 ). Axillary body temperatures will be obtained using electronic thermometers (Welch Allyn SureTemp, model 692) by trained staff. Blood pressure monitors and electronic thermometers were provided by the study investigators at the beginning of the study.
A complete blood count with differential, blood glucose, blood culture and lumbar puncture (as determined by the medical team) will be collected at the time of initial assessment, as per the standard of care for infants with suspected sepsis in Malawi 22 . A blood sample (0.5 mL) for RNA studies will also be collected in RNAlater TM (Invitrogen), at the time of blood sampling to minimize infants' discomfort.
An additional 2 mL of blood, a rectal swab from the infant, and a vaginal swab from the mother will be collected in 100 infants weighing more than 2.5 kg (for safety reasons), for mNGS analyses.
Prior to initiating the study, clinical staff will be trained on a specific blood culture sampling protocol (see Extended data 20 ) designed and implemented in conjunction with the clinical team at KCH to minimize blood culture contamination and provide at least 2 mL of blood for pathogen detection.
Blood will be inoculated into BACTEC PEDS Plus bottles, incubated in a BD BACTEC 9050. Cerebrospinal fluid samples will be plated to sheep blood and chocolate agar media and a thioglycollate broth tube and incubated for five days. Gram stain (Fisher Healthcare protocol Gram stain set with stabilized iodine) will be performed according to manufacturer.
Blood culture bottles flagged by the instrument as possible growth will be further analyzed by Gram stain performed on the sample. Samples will be plated to appropriate media based on organism morphology seen on the Gram stain. Identification of organisms for cultures of cerebrospinal fluid with growth will be completed using biochemical tests, bioMerieux API, and BD Crystal kits. Antimicrobial susceptibility testing will be performed by disk diffusion (BD BBL Susceptibility Disks) and/or MIC (bioMerieux E-Test) in accordance with Clinical and Laboratory Standards Institute (CLSI) M100 Performance Standards for Antimicrobial Susceptibility Testing guidelines and according to manufacturer. If no growth is detected after five days, the culture result will be finalized.
Complete blood counts with cell differential will be performed in an EDTA whole blood sample using a Beckman Coulter AcT5 Diff analyzer. Samples will be tested within 24 hours of collection. Samples that were clotted or demonstrated 3-4+ hemolysis will be rejected.
The processing of blood counts, blood culture and cerebrospinal fluids (as indicated), as well as the storage (-80°C) of the RNA-protected research whole blood samples, will be done on-site by the University of North Carolina (UNC)-Project Malawi laboratory. At the end of the study, research blood samples and all bacterial isolates will be shipped in a single batch, on dry ice to Vancouver, Canada.
The standard of treatment for sepsis at KCH is to use intravenous (IV) benzylpenicillin 50,000 International Units (IU) per kg of body weight twice a day for neonates younger than 7 days and 4 times a day for infants between 7 days and 3 months of age. In addition, IV gentamicin is used at 3 mg per kg of body weight once a day for low birth weight infants, or 5 mg per kg of body weight once a day in appropriately grown term infants, and 7.5 mg per kg of body weight once per day for infants between 7 days and 3 months of age. The total duration of antibiotic treatment is dependent on the clinical course. If the infant is able to tolerate feeds orally, is afebrile and otherwise clinically well, oral antibiotics are administered after 3 days of IV treatment, using either amoxicillin 125 mg every 8 hours or erythromycin 125 mg every 6 hours for an additional 5 days. In cases of atypical pneumonia, azithromycin 10 mg once per day for 3 days is used. For suspected meningitis, the dose of penicillin is increased (100,000 IU -same dosing interval). Intravenous ceftriaxone 100 mg per kg once a day is used as a second-line antibiotic treatment if there is no response to the first-line, or in cases of suspected meningitis. During hospitalization, clinical interventions, including antibiotic treatment will be provided as per the medical team and will follow the standard of care at KCH 22 .

Ethical considerations
The study received approval on October 3, 2017 from the National Health Sciences Research Committee at the Ministry of Health in Lilongwe, Malawi (under study #17/8/1819; title: "Improving the Early Diagnosis of Neonatal Sepsis", amended Oct 4th, 2019 to include mNGS in a subgroup of infants), and on October 18, 2017 from the UBC Children's & Women's Research Ethics Board (certificate #H16-02639; Vancouver Canada).
The study will provide a standardized blood culture and complete blood count with differential to all participants, as these tests are often not available clinically due to laboratory resource limitations. Cerebrospinal fluid and urine cultures will also be provided whenever the clinical team determines these tests are indicated as per the standards of clinical care in Malawi.

Data collection
Data will be prospectively collected at the time of presentation and during hospitalization until final disposition, as detailed in the Data Collection Form (see Extended data 20 ), using standardized paper and electronic forms. Variables are designed to be largely self-explanatory with no attempt made at prespecifying definitions. However, all history and physical exam data will be captured by clinical staff members trained to the study protocol during the 2-week run-in period, under the supervision of a single clinical officer (BT). Gestational maturity will be estimated visually using a Ballard assessment when the information cannot be provided from the caregiver or the chart.
De-identified data will be entered into a password-protected REDCap database using a password-protected iPad device 23 . Electronic REDCap data (including videos) will be uploaded weekly via a dedicated wi-fi network onto a secured database hosted at the BC Children's Hospital Research Institute (Vancouver, Canada). No information that discloses the identity of participants will be recorded on the mobile study devices during data collection. No personal information will be published. A list of study personnel and their delegated tasks will also be maintained by the study coordinator in Vancouver.
During the study, paper-based data collection forms, including consent forms, will be stored at KCH in a secured place, under the responsibility of the site PI (MC). Access to the data will be limited to co-investigators and study members directly involved in the study via secured access to the main server in Vancouver, Canada. At the end of the study, records will be reviewed and the data verified by at least two study investigators for accuracy and completeness. All study-related documents will be kept for at least 5 years according to policies from the University of British Columbia. De-identified data will be made publicly available, following approval by the Vancouver and Malawi research ethics boards.

Partnerships
Study staff will be hired from a pool of clinical staff dedicated to the neonatal unit at KCH, through a partnership with the Pediatric and Child Health Initiative (PACHI). The metagenomic sequencing for pathogen and AMR gene detection sub-study component is conducted in partnership with UNC Project-Malawi and the UNC. MNGS will be analyzed through the Chan-Zuckerberg Initiative.

Study size
Precise a priori power calculations are difficult due to the absence of transcriptome data in similar LMIC cohorts. However, based on communications with local study investigators, we expect that ~20% of the infants in the study will have a positive blood culture. Therefore, we estimate that enrolling 300 infants will yield about 60 bacterial sepsis cases. This will provide 80% power to detect differences in expression for ~40 gene markers, considering previous studies 24,25 , using a 5% false-discovery rate method of adjustment for multiple comparisons. An additional 100 infants will be enrolled for the mNGS objective. As this is exploratory, no formal sample size calculation was performed for the mNGS sub-study.

Data analysis
Data will be coded to facilitate analysis. The cohort will initially be analyzed descriptively, listing baseline demographic and clinical variables with mean ± standard deviation, median with interquartile range, and proportions (with 95% confidence intervals) depending on the data distribution. Bacterial species and antimicrobial resistance patterns for positive blood cultures will also be reported.

Definitions.
The following definitions will be used to classify sepsis cases in the study: • Culture-proven bacterial sepsis: Infants with a positive blood or cerebrospinal fluid culture for a known bacterial pathogen, who present with at least one of the following clinical signs: ill-looking (based on physician assessment), not feeding well (according to parent/caregiver), severe recessions with breathing, convulsions, abdominal distension or lethargy. Infants who present the above-listed criteria and are severely ill (based on physician assessment) are classified as having severe sepsis.
• Clinical sepsis: Infants who meet the above-listed criteria in absence of a positive blood culture.
• Contaminants: Infants who have a positive blood culture, but who are feeding well (according to parent/caregiver), appear clinically well (by physician assessment), and do not show severe recessions with breathing, lethargy, convulsions or abdominal distention.
• Non-sepsis controls: Infants who evolve clinically well without having received antibiotics.
Differences in baseline demographics (gestational age, birth weight, age at presentation, etc.) between the aforementioned groups (sepsis, severe sepsis, clinical sepsis, contaminants, and non-sepsis) will be compared. Significant independent association between culture-proven bacterial and/or clinical sepsis (versus controls), or mortality will be determined using multivariable models adjusting for gestational age or birth weight, sex and age at presentation, plus other significant co-variables.

Matching.
To identify a gene signature of bacterial sepsis, RNA-Seq will first be run on a subset of infants with culture-proven bacterial sepsis and matched controls. Matching will be performed using a semi-parametric propensity score algorithm 26 , by identifying the main confounders to the outcome of sepsis. Propensity scores will be estimated from a generalized linear model (GLM) and a nearest neighbor propensity matching with replacement algorithm will be performed to generate a 1:1 match between controls and sepsis cases. Sensitivity of matches will be assessed by using a variable number of potential confounders from the following: sex, gestational age, age, and birth weight.
For RNA-seq, total RNA will be extracted from whole blood using the RiboPure RNA Purification kit. Quantification and quality assessment of total RNA will be performed on an Agilent 2100 Bioanalyzer. Samples with sufficiently high RNA Integrity Number will be considered for sequencing. Poly-adenylated RNA will be captured using the NEBNext Poly (A) mRNA Magnetic Isolation Module. Strand-specific cDNA libraries will be generated from polyadenylated RNA using the KAPA Stranded RNA-Seq Library Preparation kit and sequenced on a HiSeq 2500 (Illumina; San Diego, CA). Sequence quality will be assessed using FastQC and MultiQC1.8.1. The FASTQ sequence reads will be aligned to the human genome (Ensembl GRCh38.98) using STAR v2.7 and mapped to Ensembl GRCh38 transcripts.
Read-counts will be generated using htseq-count (HTSeq 0.11.2-1). Data processing and subsequent differential gene expression will be performed using the latest versions of R and DESeq2 27 . Genes with very low counts (with less than 10 counts in the smallest number of biological replicates within each group) and globin transcripts will be filtered out prior to analysis.
Classifiers. We will derive a set of gene classifiers from the RNA-Seq data obtained from matched culture-proven bacterial sepsis and control cases. These classifiers will be derived, first, from differentially expressed genes identified using the Wald statistics test to identify the top 100 differentially expressed genes between groups. Differentially expressed genes will be compared to published literature (Table 1) to define a final list of curated markers. Additionally, we will apply machine learning approaches to identify potential biomarkers specific to neonatal sepsis from the blood transcriptome. Performance of models from different machine learning approaches 28,29 will be assessed to compare model accuracy, precision and recall. These classifiers will then be applied to culture-negative clinical sepsis cases. Given that sepsis outcomes are strongly linked to infants' sex 30 , post-natal age 15,27 and other factors such as breastfeeding we will also explore how gene signatures are influenced by these variables, and how sex-related transcript profiles may alter disease severity.
For mNGS, both DNA and RNA will be extracted using Zymo Quick DNA/RNA kits and sequenced on an Illumina iSeq platform. We will target an 8 million-reads depth for DNA and 4 million-reads depth for RNA. The data will be analyzed using IDSeq. To determine if bacterial AMR genes can be linked to maternal vaginal flora, we will sequence the DNA from the vaginal swab to determine if the same bacterial AMR genes are present in maternal flora. In an exploratory analysis, we will assess if we can identify the same strain of bacteria, using approaches similar to StrainSifter 31 .

Prediction models.
We will test the ability of clinical variables, but also a limited set of top-discriminating gene markers to predict in-hospital mortality from bacterial sepsis. Clinical variables will include features extracted from the pointof-care vital signs photoplethysmogram and infants' videos. Univariate analyses will first be carried out to determine their level of association with the mortality outcome. Continuous variables will be assessed for model fit using the Hosmer-Lemeshow test 32 . Missing data will be imputed by the method of multivariate imputation by chained equations 33 . Following univariate analysis candidate models will be generated using a step-wise selection procedure minimizing Akaike's Information Criterion (AIC). This method is considered asymptotically equivalent to cross-validation and bootstrapping 34,35 . All models generated in this sequence having AIC values within 10% of the lowest value will be considered as reasonable candidates. The final selection of a model will be judged on model parsimony (the simpler the better), availability of the predictors (with respect to minimal resources and cost), and the attained sensitivity. We will aim for a predictive model with a ROC of >0.75-0.8, favoring sensitivity over specificity whenever required. Analyses will be conducted using SAS 9.4 (Carey, NC, USA) and R 3.1.3 (Vienna, Austria; http://www.R-project.org).

Future data availability
At the end of the study period, de-identified data will be made publicly accessible, following approvals from the University of British Columbia Children's and Women's Research Ethics Board, and the National Health Science Research Council of Malawi. A demonstration version of the data collection (no upload to REDCap) of the data collection Android app will be available from the Pediatric Sepsis Data CoLab website: https://dataverse.scholarsportal.info/dataverse/Pedi_SepsisCoLab. RNA-Seq will be deposited with National Center for Biotechnology Information Gene Expression Omnibus.

Discussion
This study will provide robust epidemiological data in a high risk area for sepsis. This will help address an important global health problem that affects the lives of millions of infants around the world. In the long term, these data could help improve the triaging, diagnosis, and immediate clinical management of young infants with suspected sepsis in both a local and global context. It could potentially also inform more judicial antibiotic use. As antimicrobial resistance is a major rising global health concern, identifying truly septic patients may reduce unnecessary empirical antibiotic use in infants with non-bacterial infections. This study will also directly benefit infants at KCH, by enabling access to standard of care investigations for suspected sepsis and providing human resource support for all infants admitted to the neonatal unit, given the current staffing limitations.

Strengths
The prospective nature of this study is a major strength. The data collection is informed by rigorous literature reviews 41,42 . Studies of neonatal sepsis in LMIC have been mostly retrospective, often starting with case selection by a positive blood culture. In addition, a variety of different definitions for sepsis have been used, without congruence or consistent biological confirmation 10 . In our study, the use of robust microbiological methods informed by the complementary use of whole blood RNA-Seq may help address these gaps and allow a more precise estimation of the incidence of sepsis. RNA-seq has not been reported in full-term infants with sepsis and this study will provide these data in LMIC.
As the mNGS component of this study will be carried out locally as part of a UNC Project, in Lilongwe, the study will build capacity for using this technology in Malawi, for research and eventually for diagnosis.

Limitations
There are some limitations in the study protocol. First, the choice of a single regional hospital for recruitment may not be representative of infants assessed for suspected sepsis, for example, in a rural setting. Second, following an infant's disposition only until discharge will limit an assessment of long-term mortality and morbidity post-discharge. Third, we anticipate a number of challenges for this study: some lead investigators are located in geographically remote time zones, which could make troubleshooting and real-time study monitoring more challenging; access to blood sampling outside normal business hours; ensuring a sufficient supply of study supplies and staff in a resource-limited hospital environment; and inconsistent wi-fi/cell network which could complicate data transfer. These challenges will be specifically considered, discussed and addressed throughout the study duration.

Conclusion
In conclusion, this study protocol aims to address the gap of epidemiological data on the prevalence of sepsis in infants in a LMIC and to contribute to advancing diagnostic precision using RNA-Seq and mNGS. Specific protocols derived for the purpose of this study are outlined followed by a potential data analysis plan. The discussion considers the impact of the study as well as the strengths and limitations. Ultimately, the data generated from the study provides an opportunity to advance the knowledge of sepsis in infants, particularly in LMICs where it has the most substantial impact.

Data availability
Underlying data No underlying data are associated with this article. pathogens, but potentially bacteria that are generally associated with normal commensals on the skin and contamination of cultures).

Extended data
Para 1: The opening statement of "Sepsis is defined as life-threatening organ dysfunction caused by a dysregulated host response to infection" is made without a reference. Given that the issue of sepsis definitions in infants and children is under review by a variety of working groups, it may be important to provide the reference (from adult sepsis groups).
○ Methods -Participants: The authors will be including infants with no indication for antibiotic therapy as control subjects. The protocol states that these infants will be admitted to the study if they need blood sampling for clinical indications. It would be really useful to understand what possible clinical indications there will be for taking blood, and specifically how consent for the study will be taken from the parents of these infants.

○
Study procedures: I would appreciate explanation of how active recruitment for the study can be started in June 2018, when the protocol is up for review now.
○ Para 4: It is noted that differential counts will be done using coulter counts, however, is there any capacity to check very high counts using manual techniques? In patients with high red cell precursor counts, will white cell counts be corrected for this (I have noted that patients with evidence of extensive hemolysis will have those specimens rejected)? ○ Data collection: Given that patients up to 3 months of age will be admitted to the study, how will gestational age be estimated in older infants (I am not concerned that Ballard scores may not be valid after a few days of age)?
○ Definitions: It would be interesting to consider the items included and not included in the development of the definitions. Factors such as apnea (may overlap with the lethargy, but not necessarily), abnormalities of temperature (either hyper-or hypothermia) have not been included.
○ It is clear that the group being defined as clinical sepsis could be infected with nonbacterial pathogens. However, the authors have not addressed the group who may have been given antibiotics prior to the collection of blood culture specimens (they would be admitted to the study if antibiotics have been given within 4 hours prior to being consented). How likely are patients to fall into this category, and if there are patients in this category, how will they be defined? Is there a reason for a 4-hour cutoff, and how likely are antibiotics prior to culture to adversely impact on culture positivity rates? ○ Clearly infants with factors such as hypoglycemia, dehydration with associated electrolyte and acid-base abnormalities, congenital cardiac problems (probably a very small group of infants) could fall into the clinical sepsis group, but potentially do not have sepsis (as defined by bacterial infection).

Julia Johnson
Division of Neonatology, Department of Pediatrics, Johns Hopkins University, Baltimore, MD, USA This is an overall well-articulated and clear study protocol for a research study seeking to assess whole blood genome-wide transcriptome profiling and metagenomics NGS in infants with suspected sepsis. The authors lay out the rationale for the study and have selected the appropriate study population and design to answer this question. Thank you for the opportunity to review the protocol of this interesting study.
A few comments: Control population could be described in more detail. It seems that only infants with suspected sepsis will be considered for this study, and that the subset of infants who initially present with suspected sepsis but are interpreted as unlikely to have it and not given antibiotics will serve as controls. This does potentially lead to misclassification of infants, including those who initially are fairly well appearing but become ill during admission and are ultimately treated with antibiotics and/or are found to have cultureconfirmed bacterial sepsis. How would these infants be handled? An alternative approach would be to include infants who are presenting with other chief complaints, without concern for SBI or sepsis, to serve as true controls.
○ Would suggest inclusion of urine samples in young infants, as urinary tract infections are a common cause of serious bacterial infection and associated sepsis in this age group. Urine samples should be obtained by catheter specimen to be of utility.

○
The protocol states that rectal swabs will only be obtained in infants weighing greater than 2.5 kg for safety reasons, but do not state what specifically the safety concerns are. Rectal or peri-rectal swabs have been obtained in smaller infants as part of research studies and clinical care, and with appropriate procedures, this should not be an issue. Avoidance in very small extremely preterm infants may be indicated, but this does not seem to be the likely population for this study. If there is concern, perhaps stool samples could be obtained instead in infants below threshold of a weight or age cutoff, if available. ○ It may be beneficial to describe what is known about neonate/infant sepsis at the study site based on microbiology data, which would help the reader interpret whether the described microbiologic procedures (and standard antibiotic therapy) are likely to capture the most common causes of sepsis in this population. This may not be necessary for the protocol but should be included in a future manuscript.

○
This may exceed the scope of the study, but to adequately capture etiologic pathogens of infants with clinical sepsis without identified bacterial pathogen, the authors could consider adding at least a limited investigation for viral pathogens.
○ Would recommend review of this protocol by an additional reviewer with specific expertise in NGS.

○
Comments on supplemental materials: Would recommend providing specific time recommended for betadine/iodine to air dry (as provided for alcohol), as inadequate time to dry is a common lapse in IPC practices for blood culture collection that leads to contamination. ○ Would specify acceptable sites for blood draws.
○ For BP measurements, it currently states to recheck if low in 30 minutes. Depending on how low the BP is, this may be a critical finding and waiting 30 minutes could be life-threatening. Recommend rechecking within 5 minutes or less to assess validity of finding. Also, consider providing BP norms by age/weight to assist providers, unless these are readily known.

○
For the birth history, it states to enter full term as 40 weeks. Full term includes 37-40 weeks. This granularity of data may not be needed, but if descriptive statistics, such as median GA, are performed, inclusion of 40 weeks for all term infants may be misleading. Consider creating a separate question of preterm (yes/no) and using a GA question as follow-up for only preterm instead.

○
For type of delivery, what does "bre" stand for? Breech? Would this mean breech extraction/vaginal delivery or C-section for breech? Would expand most abbreviations to avoid any confusion.

○
Describe what resuscitation at birth means -any intervention at all by medical team? Only need for respiratory intervention such as oxygen, PPV, or intubation, or need for compressions? Unclear, for example, whether need for suctioning would be considered needing resuscitation for this form's purpose.

○
Number of days in hospital: consider collection of precise admission date and discharge date to avoid any errors in calculation or inadvertent incorrect interpretation of partial days.
○ Is the rationale for, and objectives of, the study clearly described? Yes Is the study design appropriate for the research question?