A comment piece exploring what factors led to limitations in some of the work we produced during the COVID-19 pandemic, whether this work was good enough, and some potential ideas for improving future work of a similar kind.
The COVID-19 pandemic has seen an unprecedented use of infectious disease research conducted over short time scales as a tool for setting health policy. However, it has been widely recognised that some of this work was flawed. These flaws were often acknowledged by the authors but in some cases were not. Here we explore some of the underlying reasons for these limitations using examples drawn from our own work. We do not highlight limitations in the work of others but expect these would be similar. This means that decisions made using a wide array of sources of evidence may still be vulnerable to these issues. We welcome, and encourage, similar reflections from other groups contextualised using their work.
We find that key issues include: lack of technical and statistical knowledge, poor availability of domain-specific tools, limited evaluation of available methods and tools, difficulty in maintaining long-term projects, and lack of support for gradual advance outside periods of crisis. We conclude that researchers should be more open and realistic about the limitations of their work and the reasons for them and that more focus and support should be given to evaluation, incremental improvement, and the development of domain-specific tools if we wish to improve the quality of evidence available in real-time for future outbreaks.
This piece was originally written in August 2021 and shared with colleagues, but as we see it little progress has been made since on the issues we raise. The recent Monkeypox outbreak and response highlights this lack of progress and the limited lessons learned so far from the COVID-19 pandemic.
In February 2022, SA gave a talk at the LSHTM symposium How can mathematical and statistical models combine with big data to improve our response to pandemics (S. C. Whitty and McLean, n.d.) which touched on some of the same issues and for which a recording is available1. SA, along with Sebastian Funk, has also written a related reflection on our efforts to produce real-time effective reproduction estimates (Abbott and Funk 2022). Since writing this piece we have also become involved in the Epinowcast community project (“Welcome to the Epinowcast Community,” n.d.) which aims to deal with many of the issues we raise here. If interested in supporting this project either via contributions, community participation, or financial support please reach out.
A great deal of the scientific response to the COVID-19 pandemic involved estimating as quickly as possible the most pertinent biological and epidemiological properties of the novel respiratory virus SARS-CoV-2. This research often fed into government policy decisions related to responding to the outbreak, and therefore needed to be as accurate as possible. However, there is an inevitable trade-off between speed and accuracy in scientific research, as developing a thorough and nuanced analysis for which you are confident of its accuracy takes time and resources. Scientists are required to judge when their work has reached a level of accuracy that is “good enough” to be shared with policymakers or advisory panels so that a decision can be made. Working longer would often increase the accuracy of the work but it may not add further information that would aid in decision-making. This attitude is reflected in this quote from Chris Witty, the UK Government Chief Medical Adviser (C. J. M. Whitty 2015):
An 80% right paper before a policy decision is made is worth ten 95% right papers afterwards, provided the methodological limitations imposed by doing it fast are made clear.
This quote, as well as the attitude it represents, is commonly referenced in much of the academic discourse (Brooks-Pollock et al. 2021) around real-time analyses conducted to inform COVID-19 health policy. Less attention is commonly given to what is considered 80% right and the conditional statement that limitations must be clearly stated. Here, using examples from our policy adjacent COVID-19 work, we explore what it means for work to be 80% good enough, and highlight issues that prevent it from being better or limitations being more clearly stated.
The reproduction number has been a key epidemiological measurement during the COVID-19 pandemic. Estimating the basic reproduction number from early data emerging from Wuhan province, China, indicated the infectiousness, and pandemic potential, of the then-unknown novel pathogen (Abbott, Hellewell, Munday, et al. 2020). The effective reproduction number has also been extensively used to track the course of the pandemic and evaluate the impact of interventions in many countries. We started estimating the effective reproduction number in real-time in February 2020, with a focus on China before expanding to include national estimates globally, as well as estimates at smaller scales in multiple countries and with a range of data sources (Abbott, Hellewell, Thompson, et al. 2020a, 2020b). Our estimates for the United Kingdom fed into the Scientific Pandemic Influenza Group on Modelling (SPI-M) aggregated reproduction number estimates and short-term forecasts (Sherratt et al. 2021; Funk et al. 2020) and our site presenting these estimates had over 500,000 unique users including public health decision makers (Abbott and Funk 2022). Estimates were produced daily for many different countries across multiple geographic aggregations and surveillance data sources. The open-source tools we developed as part of this project have also been used widely by other research groups and public health practitioners for a range of real-time analyses (Abbott, Hellewell, Sherratt, et al. 2020).
As these estimates were produced in an automated fashion on a routine basis, at scale across highly varied settings, and by a resource-constrained research group they had many limitations. In our published work, we reported a range of these limitations including the lack of location-specific data on the time from infection to the case report, lack of clarity on the interaction between the generation time and the reproduction number, difficulty in extrapolating current reported cases to infer the dynamics of current infections, and the general assumption that aside from the effective reproduction number other parameters were static in time.
However, our initial work also included an unknown, and hence unreported, limitation in that it assumed that the distribution of the time delay from infection to report could be treated as reversible (Abbott, Hellewell, Thompson, et al. 2020a). This has since been investigated and has been shown to result in over-smoothed, biased estimates that lag behind real-world changes in transmission (Gostic et al. 2020). Though this issue has since been mitigated via methodological developments (Abbott, Hellewell, Thompson, et al. 2020b) it is likely that it resulted in flawed inferences. The primary reason for our initial flawed implementation was our lack of familiarity with the relevant literature for real-time reproduction number estimation and incidence curve reconstruction, compounded by the general conflation of real-time and retrospective methodologies in the public health community.
Methods to estimate infections from the reported cases were widely used during the HIV/AIDS epidemic (Gostic et al. 2020), but the literature available at the start of the pandemic was, as far as we are aware, largely historical and had seen limited continued development. Whilst tools existed to estimate the effective reproduction number targeting real-time settings these had in general not been developed with the ability to handle the realities of delayed reporting and time-varying ascertainment (Cori et al. 2021). Other tools and methods had been explored to deal with reporting delays (Salmon, Schumacher, and Höhle 2016; Meyer, Held, and Höhle 2017) but in general, these had not been fully evaluated, were often not robust to real-time data challenges, and were difficult to link together to answer real-time questions of interest. Much research work has been done for specific settings but in general, this rarely appears to lead to generalisable methods and in many cases, it is difficult to verify how well the approaches work outside the context for which they were developed. This limitation was a result of a lack of knowledge within our research group but unacknowledged limitations of this kind will remain unless methodological developments are propagated into robust, and well-evaluated, tooling since much of the real-time analysis was and will continue to be done by early career researchers who may lack the breadth of experience necessary.
Other key issues with this project were its continuous, repetitive nature, the lack of discrete research outcomes, and its large scope. Due to its scale, it required detailed knowledge of multiple statistical and mechanistic modelling approaches, software engineering skills, and insights into each data source used. Initially, whilst the scope of the was somewhat limited, it was possible to incentivise sufficient contributions. As the project continued the scope widened, more maintenance was required, and attention moved on to other problems, meaning that it became more difficult to source sufficient resources. This was compounded once the initial research paper on the methodology had been published and traditional academic credit was no longer available (Abbott, Hellewell, Thompson, et al. 2020a). This resulted in breakdowns in estimation, lack of innovation in the underlying methodology, limited evaluation, less than optimal software engineering making reuse challenging, and flawed estimates in some geographies where the data had unaccounted for eccentricities.
Ideally, these problems could have been resolved by building collaborations with those developing new methodologies, making use of estimates, or developing effective reproduction number estimation pipelines of their own. In practice, this was difficult to negotiate despite spending significant time and resources with this aim in mind. A range of real-time (and retrospective methods that can be repurposed for real-time usage) estimation methods now exist. Generally, these methods (including our own) have only been partially evaluated and they are rarely designed with reusability, robustness, or routine usage in mind. The situation now is unfortunately not too different to the start of the pandemic, despite significant apparent progress in methodology being achieved over the past two years. A researcher without previous experience in this area looking to perform some real-time analysis would likely still find it time-consuming to either properly evaluate the breadth of different methods themselves, or develop a robust software implementation of an existing method for which it does not already exist. We note that projects now exist to develop tooling to try and overcome some of these limitations but in general these projects are independent siloed efforts that do not build on the work of others. We feel that this approach is unlikely to lead to success given its repeated failure in the past in similar contexts.
Our second example is two linked studies that were conducted in December 2020 and June 2021 to estimate the transmissibility advantage of the Alpha and Delta variants of COVID-19 (Davies et al. 2021; Abbott, Kucharski, and Funk 2021). Both of these analyses used the same underlying regression framework to model the contribution to the effective reproduction number from the variant of concern compared to all other variants (with this only being a small part of the analyses in (Davies et al. 2021)). This gives a multiplicative transmission advantage estimate for the variant of concern compared to other circulating variants after adjusting for static and time-varying variables influencing transmission. These analyses were presented to SPI-M with the first, after being aggregated with estimates from other sources, being widely reported in the media, often as only a point estimate.
These analyses reported a range of limitations including the use of the mean reproduction number estimate and proportion of cases that had the variant of concern, the use of S-gene target failure as a proxy for variant status, the lack of UK-specific generation time estimates, the lack of explicit modelling of auto-correlation, the use of non-parametric reproduction number estimates as an input rather than a single model, and the assumption that the generation time was unchanged between variants. These issues were likely to have led to spuriously precise results, though the magnitude of this is difficult to quantify. Bias may have been introduced by using the mean estimated reproduction numbers and mean proportion of cases with the variant of concern as this weighs all areas equally. Bias may also have been introduced due to the assumption that the generation time was unchanged between variants. It is unclear to what extent these limitations impact the accuracy of these analyses and the resulting decision that may have been partially based on them.
This example highlights a general problem in real-time COVID-19 analysis: the quantification of uncertainty(Zelner et al. 2021), or lack thereof. It is difficult to assess the impact of the limitations on the transmission advantage analysis without replication of these studies with improved methods. The analysis was repeated effectively unchanged when concerns were raised about the Delta variant despite several months having passed in which improvements could have been made and it being entirely predictable that similar methodologies would likely be needed in the future. The reasons for this repetition were lack of capacity, both in terms of researcher time and technical infrastructure such as compute resources. Some small progress has since been made in improving this approach (Abbott and Funk, n.d.) but it has not been possible to prioritise due to other, better incentivised, research questions. Unfortunately, this is a commonly repeated pattern of doing an analysis that you hope is 80% correct but cannot check, due to limited time and resources. Then later you end up doing the same analysis when a similar question re-emerges, without any progress toward knowing if you were 80% correct or not.
In this instance, the lack of capacity is in large part due to a deluge of other concerns arising in the time between the two variants emerging. It can often be difficult to justify method evaluation as novel and important work, even though it can require substantially more complex methods than the original analysis. Given the likely development of new variants that would require the same work, not prioritising evaluating the accuracy of this analysis after its first application was an oversight. Other research groups have made progress in this area but effective reproduction number estimation solutions for co-circulating variants that are methodologically robust, well-engineered, thoroughly evaluated, and easily generalisable are not as openly available as needed should another pandemic scale outbreak occur in the short term.
Our examples have highlighted common themes that prevented us from performing fast, accurate real-time analyses. The demand for real-time analyses during the pandemic has exposed the lack of required knowledge in traditional research groups, even in those theoretically geared towards this area, including limited domain knowledge, statistical knowledge, software engineering skills, and experience putting ad-hoc analyses into routine production. These knowledge gaps are compounded by an additional lack of well-developed, and well-evaluated, software tools designed to be used flexibly for real-world analysis questions. Many of the currently available non-domain specific open-source tools, whilst generally high quality, require a substantial level of user knowledge and for relatively common domain-specific problems can require significant researcher time and expertise to produce useful results. Most of the available domain-specific tools have significant limitations, are not continuously supported, and have not been robustly evaluated in a broad enough range of contexts. Another common theme we identify that further compounds these issues is the difficulty, mainly caused by lack of incentives, in collaborating across groups on rapidly developing problems, and on method and tool development more generally.
Whilst it is difficult to define what is “good enough” for real-time analyses, we have summarised some of the common causes of the limitations within our work during the COVID-19 pandemic. We contend that real-time analysis, because it needs to be performed quickly and with confidence in the level of accuracy, should be undertaken, structured, and incentivised differently from other academic infectious disease research. A suitable analogy might be that firefighters, upon receiving the report that a fire has begun, do not then proceed by designing and developing their tools to fight the fire. The emergency response to a novel pathogen outbreak requires robust, well-understood, well-documented, and thoroughly evaluated methodologies and tools to perform the tasks that we can foresee will be needed each time such an event occurs. Efforts have been made historically to address some of these concerns, mainly for low-level tooling, but we saw little use of these tools during the COVID-19 pandemic (RECON-R Epidemics Consortium, n.d.). We must learn lessons from these initiatives to make progress in improving the response to future outbreaks.
We feel that the issues addressed here that stand in the way of good real-time analysis are due to persistent structural flaws within the academic system(Kucharski, Funk, and Eggo 2020) which, while they affect all research, particularly affect real-time analysis where there is less tolerance for inefficiencies and flawed results. Many of the issues we have highlighted are exacerbated by an academic culture that promotes apparent individual scientific achievement over collaborative work, that does not promote a culture of ongoing technical and statistical training, and indeed does not reward or value these skills in applied researchers beyond early career stages.
All of the examples in this letter were published with open-source code and the majority are contained in commonly used open-source software tools. In theory, this should allow other researchers to explore these analyses and mitigate some of the limitations but in practice, incentive structures favor new work rather than incremental improvements and evaluation. We feel it is vital that we and others evaluate real-time analyses, improve on the methodology used, and expand the supply of domain-specific tooling so that future work can be of higher quality. This can most easily be done when researchers are open and honest about the limitations of their work and the causes of these limitations and funders understand and are supportive of these goals.
We thank Sebastian Funk for his feedback on an earlier version of this piece.
slides are also available here: https://samabbott.co.uk/presentations/2022/how-can-governments-prepare-for-the-next-pandemic.pdf↩︎
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/seabbs/seabbs.github.io, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Abbott & Hellewell (2022, Sept. 19). Sam Abbott: What is 80% good enough for real-time infectious disease analyses?. Retrieved from https://samabbott.co.uk/posts/2022-09-19-80-percent-good-enough/
BibTeX citation
@misc{abbott2022what, author = {Abbott, Sam and Hellewell, Joel}, title = {Sam Abbott: What is 80% good enough for real-time infectious disease analyses?}, url = {https://samabbott.co.uk/posts/2022-09-19-80-percent-good-enough/}, year = {2022} }