Maré cohort-profile: a prospective cohort study based in a socially vulnerable community during the COVID-19 pandemic in Rio de Janeiro, Brazil [version 1; peer review: 1 approved with reservations]

Background: Socially vulnerable populations were vastly affected by the COVID-19 pandemic. The pandemic significantly impacted Brazil,


Introduction
Since December 2019, the world population, health workers, and governments have been dealing with the challenges imposed by the COVID-19 pandemic. The burden across the globe was not only perceived through high morbidity and mortality rates but also by its effects on economic and social dimensions. Vulnerable populations, defined as groups of individuals susceptible to physical, biological, psychological, and socioeconomic stressors while lacking adequate resources to cope with these stressors 1 , were vastly affected by the COVID-19 pandemic, both in high-income and low-and-middle-income countries (LMICs) 2 .
One of the countries most affected by the pandemic was Brazil, where the health system was under increased demand and pressure for several months, and the mortality rate was high, even among the young population 3 . During the pandemic, initiatives articulated between social organisations and academia were essential to provide health and socioeconomic support to vulnerable populations, such as those living in favelas or "slums" (a Brazilian urban community, sharing some features with slums in other countries, comprising highly populated areas, with different levels of urbanisation and informal settlements, historically neglected from govern) -usually crowded areas with several basic needs 4 .
Cohort studies associated with disease surveillance are essential for understanding pathogens' circulation, surrogates of protection, disease burden, and demand for health services, especially in poor communities in LMIC. By following individuals over time, cohorts provide granular data for monitoring disease trends 5,6 and the long-term impacts of the pandemic. Prospective studies may fill knowledge gaps regarding the burden of disease and risk factors in a vulnerable population 7 .
Here, we describe the protocol and the baseline demographic profile of a cohort design, which started recruiting during the COVID-19 pandemic in Complexo da Maré, a neighborhood of Rio de Janeiro and one of the largest groups of favelas in Brazil. The Maré Cohort is embedded in several initiatives recently implemented in Maré, such as socioeconomic support, health care, and surveillance, managed by an innovative organisational arrangement involving non-governmental organisations (NGOs), research institutions, private funders, and local government 8-10 .

Objectives
We aim to assess clinical, epidemiological and genomic profiles, outcomes and the impact of the COVID- 19

Study Status
We planned five waves at approximately six-month intervals.
The study is planned to finish by the beginning of 2024. Data collection is ongoing; the first and second waves have been completed, and the third was initiated in December 2022.

Setting
This prospective cohort study was designed to be conducted among the residents of Maré in Rio de Janeiro municipality, an area composed of 16 favelas, one of the largest set of favelas in Brazil. Maré is the city' sninth most populous neighbourhood with more than 140,000 inhabitants 11 spread over 4.3 km 212 . Its population is mainly young (51.9% under 30 years old) 13 and almost twice the size of Rio's second largest favela 12 .
Maré is located near the city centre and is surrounded by major highways. Although the region presents informal settlements with substandard housing, most residences have running water, access to a sewage network and public lighting 13 . The chaotic urbanisation of favelas has resulted in a heterogeneous population spread within the limited area, generating a high population density and vulnerable regions. With respective educational and healthcare facilities, 44 public schools are attending the territory in 2013 13 , and the public health system provides seven primary healthcare units, three psychosocial centres, a mobile outreach clinic, an emergency unit, and a reference public hospital to the area ( Figure 1).
The community is among the most vulnerable neighbourhoods in the municipality (Table 1), evidenced by its low Human Development Index (HDI: 0.686), ranked 123rd out of 126 in Rio de Janeiro. The GINI index of 0.39 may suggest a more homogenous and less unequal community 12 . Maré has nearly three times as many illiterate residents and a one-third monthly per capita income of USD 294 compared with the municipality estimates. In terms of health indicators, Maré has higher incidences of tuberculosis (155.7 per 100,000 population), higher infant mortality rates (16.7 per 100,000 live births), and higher mortality rates due to external causes (105.8 standardised deaths per 100,000 population) than the municipality (Table 1).

Governance and Management
This cohort study is part of a platform composed of several initiatives developed in the territory to assist residents during the COVID-19 pandemic 8-10 , with diverse actions and partnerships ( Figure 2). A steering committee (SC) composed of the principal investigator and one member of four institutions was assembled to manage the cohort. The SC has held weekly meetings to plan and decide on support action in the territory. Furthermore, two coordination units report directly to the SC, one for field-related actions and the other for data-related activities. A communication team was constituted to perform direct engagement and targeted communication activities, composed of different professionals: coordinator, designer, illustrator, social media manager and content producer, web designer, photographer, video maker, and image editor. The field team consists of 17 community mobilisers, seven of them with technical training in nursing, and 14 community health workers (CHWs) responsible for direct contact with the cohort participants, interviewing, laboratory testing and clarifying questions about the project, and reporting directly to the field coordination. The CHWs were recruited specifically for the project on a part-time basis.Firstly, five candidates were selected by the manager of each primary healthcare unit. Then, after an interviewing process with the field coordinator, two were recruited/selected. The recruitment process was based on nursing technical training and the largest work experience in the unit.
In addition, the data working group is divided into management and analytics teams, who reports to the data coordination team. The former is responsible for curating and integrating the data and the latter consists of epidemiologists, data scientists and vaccinologists performing research and data analysis.
The corresponding author is the study principal investigator (PI) and part of the steering committee, along with three co-authors. Two authors are part of the field coordination team, and eight compose the data working group.
The laboratory testing structure has two major components to support the surveillance actions in Maré. "Dados do Bem", the Brazilian symptom tracking app 14 , was designed by the PI of this project with a team of developers (Zoox Smart Data) to detect symptomatic cases, track contacts, and identify areas at increased risk of SARS-CoV-2 infection. The app was free of charge and available in iOS and Android systems. The app manages the user registration and SARS-CoV-2 testing process,

Population selection and participants
The cohort comprises participants that reside in Maré. The inclusion criteria considered all applicants must be permanent residents of Maré and complete an Informed Consent (IC) form before enrolment. We excluded residents with incomplete identification data, such as invalid unique identifier numbers or incomplete addresses and telephone numbers.
We calculated there quired number of responders using the formula for prevalence studies 15 . The appropriate sample size required for different prevalence and precision levels is shown in Figure 3. We estimated that an overall minimum sample size of approximately 2,400 participants, with an alpha of 0.05 and precision of 2%, is sufficient to ensure the outcome is significant and provides meaningful results.

Participant recruitment and consent
The initial round of recruitment occurred during the largescale COVID-19 vaccination campaign from July 29 th , 2021 to August 1 st , 2021, when approximately 36,000 first doses of ChAdOx1 were administered 16 . Since then, study participants have been monitored via household follow-up.
The enrolment strategy has three stages. In the first stage, a trained interviewer invites participants, explains the research purpose and testing procedures, clarifies questions, and obtains informed consent -in the case of participants who are illiterate, the researcher reads the consent inform supervised by a family member of the participant, and a thumbprint is taken.
In the second stage, the participant answers a questionnaire filled out by a trained mobiliser. In the third stage, they register at the app -in the case of illiterate participants, the mobiliser completes the registration -and takes a serological test -a specialised technician collects the material (dry blood spot). For children, parental consent is obtained, and the guardian answers a simplified and adapted questionnaire. Participants can withdraw from the study anytime, even after providing written, informed consent. This enrolment strategy will be used in all waves of data collection.
The Maré cohort is based on household units. Following the enrolment of an index adult, the mobilisers and CHWs conducted home visits to map and recruit new family members, following the same steps executed during the recruitment of the first adult (i.e., applying an informed consent form to the new family member and collecting information as a new entry).

Representativeness
Participants were pragmatically included in the cohort since recruitment occurred during the pandemic. Our efforts at mitigating selection bias and achieving representativeness combined social, spatial, and demographic characteristics, recruiting participants at different times and days of the week and spatial locations within Maré. We plan to adjust cohort estimates using sampling weights derived from characteristics reported in the 2013 Maré survey, where we have a complete demographic cross-sectional evaluation of the entire community 13 .

Follow-up
This cohort study will be divided into waves, each corresponding to different stages of analyses and data collection ( Figure 4). The study was initially planned to hold five waves at approximately six-month intervals and is expected to end at the beginning of the first half of 2024. The first wave occurred from August 2021 to January 2022, and we evaluated infection by COVID-19 and the socio-demographic characteristics of the target population. The second wave ran from February to July 2022, and the analyses focused on socioeconomic factors and COVID-19 response to the vaccine.
The first two waves have been completed at the time of writing this study protocol, and the third is underway. Following a similar enrolment and data collection strategy from the first and second waves, the third wave focuses on understanding the residents´ perspective about vaccination, primary care services, and socioeconomic aspects during the pandemic. Future (fourth and fifth) waves are planned to address the conduct of COVID-19 surveillance and other health problems prevalent in vulnerable communities in low-and-middle-income countries.
These new waves will follow the same strategy of recruitment, data collection, communication, and engagement used in waves one-three. The questionnaire will be adapted to incorporate the dimensions of other diseases and the follow-up questions from previous waves, using the same teams and structure already in place in the territory.

Data Collection
The protocol includes interviews, blood sampling and record linkage to secondary data associated with cohort participants.

Data access
Once the data has been collected, access to individual data will require a formal request to the authors and the cohort's Steering Committee. This route of access follows the guidelines and practices according to the Brazilian General Data Protection Regulation (Lei Geral da Proteção de Dados 13.709/2018) for data integrity and security.

Questionnaires Development
A group of 11 specialists, consisting of clinical researchers, epidemiologists, phycologists, social workers, and vaccinologists, developed the standardised questionnaires to collect data from the cohort participants. They were responsible for creating the protocol, and eight form this paper authorship. The questionnaires were validated with local community members, followed by interviewers' training. Questionnaire data was collected directly on a standardised form at each contact via an electronic data capture system (REDCap -Research Electronic Data Capture) 17 . The data was collected using a mobile device (tablet) during the interview, uploaded into REDCap at the end of the day, and then deleted from the device to preserve the integrity of the participants.
Questionnaires applied to adults included variables of sociodemographic characteristics, socioeconomic levels, general health, vaccines (including adverse events), well-being, search for medical assistance, the self-reported status of COVID-19 infection, and risk factors for exposure to COVID-19. Wellbeing questions addressed generalised anxiety and depression using the Patient Health Questionnaire-4 (PHQ-4) 18 , a standardised screening instrument. We also included an item measuring happiness 19 and two items from the Life Orientation Test 20 , indicating overall optimism levels. A specific questionnaire for those under 18 years contains variables of socio-demographic characteristics, education level status, general health and medical assistance.
Data from different sources were linked directly using a unique identifier or probabilistic linkage when applicable. After data preparation and integration, participants were de-identified and anonymised for analysis. Data governance and curation activities were described in a Data Protection Assessment document that followed the Brazilian General Data Protection Regulation (Lei Geral de Proteção de Dados) 21 .

Testing Strategy
We collected five blood spots on filter paper with a lancet for finger-prick in each wave of data collection. The primary data analysed were the serological tests for anti-SARS-CoV-2 IgG antibodies. We used the Perkin-Elmer (Rio de Janeiro, RJ, Brazil) kitto identify IgG immunoglobulin class antibodies to SARS-CoV-2, a fully automated system dried blood spot based able to test a large population efficiently 22,23 .

Vaccination campaign
On January 17 th 2021, the Brazilian government launched a nationwide vaccination campaign against COVID-19 following an age-based priority 24 . The Brazilian National Immunisation Programme (PNI) is responsible for the vaccination schedule and the administration of free doses of the following vaccines: Sinovac-CoronaVac, ChAdOx1-S/nCoV-19 (Oxford-AstraZeneca), BNT162b2 (Pfizer/BioNTech) and Ad26.COV2.S (Janssen). Six months after the vaccination roll-out started in the country (July 2021), a mass vaccination campaign (Vacina Maré) was carried out in Complexo da Maré, which increased the first dose coverage from 38% of to 93% in four days. Along with this campaign, the cohort study started the recruitment phase in the territory.
The COVID-19 immunisation data from cohort participants were made available for research and analysis purposes through the PNI Information System (Sistema de Informação do Programa Nacional de Imunização, [SI-PNI]), a secondary data made available by the city health department. SI-PNI is Brazil's central data archive for vaccine doses administered. It contains information on demographics (age, sex and place of residence), vaccine platform, the number of doses received, date of administration and location.

Secondary data
We plan to perform a deterministic followed by probabilistic record linkage between cohort participants and administrative data collected from the Brazilian health systems, such as COVID-19-related surveillance systems for mild and severe cases. Preliminary data Overall, 6,429 residents in Maré answered the questionnaire during the first wave, 5,923 (92%) were adults, and 506 (8%) were under 18 ( Table 2). The sample is adherent to the size predicted, considering an alpha of 0.05 and precision of 2%.
A brief description of the baseline characteristics of the cohort, Maré neighbourhood Census 2019 and Rio de Janeiro municipality Census 2010, is shown in Table 2. The majority of participants are up to 39 years old. The cohort has a lower proportion of children (newborn-nine years) and a higher proportion of the elderly than Maré. Females are the majority of the three samples; almost half of the cohort and Maré are self-declared Pardo.
Evaluating anti-SARS-CoV-2 antibody prevalence is part of the next step of this cohort analysis, and at the time of writing this study protocol, we have results from 6,272 (97.6% of 6,429) tests for IgG. A total of 157 (2.4%) individuals not tested or with inconclusive results were excluded at this stage ( Figure 5).
Immunisation information was obtained for 5,780 participants. We excluded 92 adults based on system registry inconsistencies data (5,688 with complete immunisation status).
Full baseline data will be included in the final dataset.

Outcomes
The primary outcome of this study will be the seroprevalence of SARS-CoV-2 antibodies in the population. Secondary outcomes will include symptomatic and asymptomatic COVID-19 infections and severe cases with hospitalisations and deaths related to COVID-19, and long-term impacts, such as the prevalence of long COVID-19 symptoms and mental health conditions.

Data analysis
We will describe seroprevalence, socio-demographic characteristics, and the burden of the disease for the baseline and each wave using descriptive statistics. Categorical variables will be displayed in absolute and relative frequencies, and continuous variables will be described as means and standard deviation (SD) or medians and interquartile range (p25-p75). When applicable, missing values will be imputed using multiple imputations with chained equations (MICE) 25 .
Direct comparisons between baseline and waves characteristics will be performed using statistical hypothesis testing, such as Pearson´s chi-square test, Fisher´s exact test, Student´s t-test or Wilcoxon-Mann-Whitney test. Correlation between variables will be calculated using Spearman's rho correlation.
We will use regression modelling approaches to estimate the association of socioeconomic characteristics and seroprevalence of SARS-CoV-2 infection. The response variable will be the seroprevalence outcome; therefore, we will apply generalised linear models accounting for repeated measures when necessary. We have considered study and work status, number of healthcare workers, the density of residents in households, use of public transport, and number of vaccinated residents as exposure variables. Numerical variables will be modelled using cubic splines. All statistical analyses are to be performed in R 26 . A significance level of 0.05 will be considered. We will follow STROBE recommendations. Dictionaries data from the first and the second waves are available on the extended data.

Dissemination of study results
Monthly meetings with local leaders, agents from the NGO Redes Maré and the primary healthcare units' workers are to be held throughout the study to notify and discuss, for example, the main self-reported comorbidities, vaccine status, number of tests and data collection at each phase. The meeting's objectives are to show the scientific results from the cohort to stakeholders and create a space for scientific discussions about how this study can improve healthcare in a vulnerable population. Newsletters addressing the pandemic context and initiatives which happened in the community were regularly disclosed on print and digital platforms, targeting the cohort participants, healthcare workers and the Maré population.

Conclusions/discussion
Cohort studies conducted in urban poor communities are essential to understand the spread of pathogens, protective strategies, disease burden and health resource needs in areas where epidemiological information is usually limited.
This study aims to contribute a new source of data on infectious and poverty-related diseases for researchers, policymakers and programme implementers who wish to improve health services for a vulnerable population in an urban area of a LMIC. The results will be used to support the establishment of a public health data repository and communication and mobilisation activities that can promote data-driven decision-making and public health action in other vulnerable communities, especially in LMICs.
Data and results will be limited to the follow-up of participants during wave collections. To mitigate potential dropout, engagement and communication activities will be performed with Maré residents. The communication team will be in charge of the dissemination and information in different platforms, with weekly actions in social media, newsletters posters and cards. In parallel, the field team will be in direct contact with the participants during waves. We also plan to conduct events such as seminars to disseminate the insights from the study.
Moreover, records linkage with secondary data to enrich the cohort participants and community information may present inconsistencies since external databases offer different input patterns and data quality. It should be emphasised that participants enrolled in the cohort will be mostly vaccinated, as enrolment started after the vaccination campaign.
The engagement process is challenging in a vulnerable population due to difficulties in maintaining contact with participants, as they frequently change their mobile phone numbers and addresses. This is combined with low adherence to other COVID-19 vaccine doses and the potential refusal to participate in the subsequent waves of data collection.

Underlying data
No underlying data are associated with this article.

Open Peer Review
Excellent methodology and protocol. All important elements of the research are stated correctly. The sample is representative. The statistical methods, which have been used so far, and will continue to be used, are relevant and guarantee correct results of the research. Problematic, to some extent, is that a part of the respondents change and it will be tricky to get as close as possible to the results which would represent the numbers if the sample of respondents would be about the same. However, such a problem is always expected when the survey population and sampling frame are like in this case. Altogether, I find the research going very well and with promising correct expected results.

Vesselin Blagoev
Competing Interests: I am not a participant, or co-author, or linked in whatever way to the sponsors of this research. Vesselin Blagoev