Data, Models, and Modellers during a Global Pandemic
My notes for my points and answers as part of the Data, Models, and Modellers during a Global Pandemic discussion session at the International Indian Statistical Association 2022 meeting. Also includes the recording of my talk on our recent paper “Evaluating an epidemiologically motivated surrogate model of a multi-model ensemble” as part of the Pandemic forecasting: Lessons learnt from COVID-19 session.
As my last work commitment of 2022 I took part in a discussion session with Swapnil Mishra and Rukmini S. (whose book I am looking forward to reading) on Data, Models, and Modellers during a Global Pandemic discussion. I thought some interesting points were made but sadly, as far as I am aware, there is no recording. As a very poor substitute to what was an interesting discussion I have included the notes for my points and answers below. I aimed to give an overview of how the key topics (data, models, and modellers) changed over the last three years for me.
I also gave a talk on our recent paper “Evaluating an epidemiologically motivated surrogate model of a multi-model ensemble” and I’ve included the recording below. As ever, I sound quite insane to myself when listening back but hopefully you can push past that as I think the paper is interesting and worth a read.
Thanks to Bhramar Mukherjee for the invitation and for hosting the discussion so ably. Apologies for my terrible organisational abilities - I promise it isn’t just a technique to make myself appear like an academic stereotype (perhaps it would be better if it were).
If anyone listening to this thought “Oh my what an engaging chap. I wish we could have him to talk” - The answer is yes you can! Just give me a ping (and then maybe another one a short time later).
Pandemic forecasting: Lessons learnt from COVID-19 session session
If interested in the code repository you can check it out here.
Data, Models, and Modellers during a Global Pandemic
Introduction
Firstly thank you very much for inviting me to this panel I am looking forward to the discussion.
Brief intro to me and my work.
A brief overview of my experience of each of the main areas of this panel over the course of the pandemic and in particular highlight how this has changed with time.
Me
Background in mathematics and theoretical physics (this was bit hard for me). Moved into mathematical biology
PhD in Bristol. Infectious disease modelling. BCG for Tuberculosis
A short trip into finance/fintech
Started at LSHTM on January the 6th as part of a team to learn about modelling real-time infectious disease, focussed on Cholera.
Switch to COVID on the 12th of January.
Early work estimating the reproduction number and the feasibility of contact tracing.
Moved to estimation of the reproduction number and short term forecasting both for UK stakeholders and globally on a daily basis for 4k+ locations (our platform ended the pandemic with > 1 million unique users).
All work developed as open source software packages and now widely used by public health departments, WHO, CDC, ECDC for a range of diseases (for example by the CDC for the recent Monkeypox outbreak).
For the last 3 years kept iterating on this type of work along with reactive analysis (i.e transmissibility of variants etc) and more traditional academic (most recently particularly focussed on nowcasting methodology). Now shifting to writing this up for more traditional academic output.
Main aim is to democratise access to robust methodology and generally improve the standard of real-time work which historically is often of lower quality than more traditional retrospective work.
Data
2020
I started work on COVID-19 on the 12th of January. At this time we still only had case data coming out of China. This was sparsely reported and often delayed.
Reports of detected cases in other countries started coming in and we started to use these both to understand the outbreak in China and to try and get a handle of local transmission.
Data became commonly available by day of test and through official sources.
We set up our own data collection methodology (covidregionaldata) to service our modelling pipeline as no open access/open methodology sources were available.
As the outbreak progressed to be a pandemic UK data sources began to come online with line list data (though this was often very incomplete). Government data with reduced delays and more granularity also started being provided to us.
As testing became more widespread surveillance moved from hospitalised cases to community test positive cases (though in reality we used a blend of routine data)
Data from other countries (at least at the national level) became easier to source but we still relied on our own framework for sub-national data.
2021
Appreciation of the unreliability of much testing data.
Pivot back to the use of hospitalisations in the UK.
Use of excess deaths data to highlight the global impact of the pandemic.
Increased reliance on novel data sources like the ONS infection survey.
More use (by us) of sequence data sources built by community contributions (like GISAID).
Discussion of novel data sources (like cycle threshold by James Hay).
2022
Testing data less likely to be available daily.
Testing data increasingly unreliable and other surveillance sources increasingly sparsely reported.
Increased reliance on seroservalance, household studies, and prevalence studies.
Shift in focus to working with Public health departments on their very rich data sources.
Models
2020
Available methods and software not useful for the vast majority of real-world questions we faced. Largely did not make use of epi domain tooling (but general statistical tooling was very useful).
Reactive simple models. Designed based around available data sources and particular short-term analysis questions.
Over time these developed into several routine “pipelines” (i.e chains of models) that we used to produce estimates of interest both in the UK for stakeholders and publicly world wide.
In mid-2020 colleagues alerted us to issues (Katie Gostic) with our pipeline approach and we pivoted to a generative Bayesian model for our routine work.
2021
The detection and spread of new variants (of course this is also late 2020) could not be easily modelled (i.e not in the timeline required by policy makers) using our routine methods.
We (and many colleagues) therefore had to go back to the model pipeline approach feeding output from one model into another as a data source (i.e to estimate variant transmissibility).
Over time many specialist models were developed to deal with this particular question (both by us and others).
More generally we began to develop more specialised models for particular high quality data sources (such as the ONS infection survey).
2022
The spread of Omicron posed challenges for these models in particular the potential for immunity evasion and a shorter generation time.
Of those specialised models developed in 2021 that had not been abandoned/become inoperable most could not account for these features.
Early analysis (by us and others) therefore again fell back on the pipeline approach of using model outputs as data in order to evaluate the evidence for a different generation time and immune escape.
More detailed statistical modelling approaches to work with the detailed linelist data from partners (i.e nowcasting, Cycle threshold modelling). This seems to be a better pathway to impacting decision making.
Modellers
2020
Huge community effort with many people from other fields.
I personally worked ~ 18 hours a day for about 6 months.
Often hard to collaborate on work especially across knowledge silos. This led to lots of poorly linked efforts and much repetition.
2021
Many colleagues - especially from other fields pivoted away from COVID-19.
This meant that in many ways the amount of work to do actually increased.
We received very little support for our routine modelling.
As modelling became more complex we had to fall back on simpler methods due to lack of resources.
2022
Shift in focus to learning lessons from the pandemic and getting ready for the future.
Trying to build a community of practice as community >> methods or models alone.
As we move into 2023 hitting pause a little to take a breath and identify what the key challenges are and how the system can be made to recognise and respond to them.
Questions
How did the data science work during the pandemic influence your post-pandemic academic scholarly work
Profoundly. I would say I am borderline no longer now an academic in the traditional sense and I am not sure I am interested in moving back in this direction.
Very interested in thinking about how to develop a community of practice and a community supported suite of outbreak surveillance tools. Also interested in why efforts in this direction keep failing.
Lessons learnt about when it is useful to have a custom method and when we need more custom methodology was very important. In general I now try to make all work as modular as possible and as easy for others to hack on as possible.
The other vital realisation is how important being at least 80% but on time was (there is a nice paper by Chris Whitty (UK chief medical officer) on this). We need methods that allows us to do this for as many questions as we are likely to have in a similar scale outbreak (which could look very different in ways that are hard to predict).
What do you think are the biggest takeaways in terms of Scientific communications with the public and stakeholders? Some do’s and some don’ts
Be as open as possible.
Always clearly state the limitations of your work (ideally first).
No point estimates!
Technical detail needs to be present but often not in the main report (just list the main assumptions and limitations in plain text).
Joint statements across teams are very useful for public facing work as generally there is a broad consensus with different views on the details. If results are presented individually this can get lost and it incentivises being first out (and maybe therefore cutting corners). We saw a lot of rushed work during the pandemic because of poorly aligned incentive structures.
I personally didn’t feel it was useful for me to directly interact with the public/media. No training. No time. Explaining the evidence not my speciality leave it to others for whom that is a speciality.
How do we reward such contributions in more traditional academic setting?
This is really a question for those making these decisions. I have yet to see much movement on this which is a shame and it is nearly too late for those who worked on the pandemic.
Lack of leadership here from senior academics and funders who have done well in the current system.
There was a large amount of funding in the some locations for pandemic adjacent work but most of this was assigned using very traditional approaches.
I think a good start would be valuing outputs other than peer reviewed papers more (like software and support of public health departments etc). There is pushback on this because it is “hard” but ultimately if you want to incentivise better work then more effort is needed by those allocating resources.
What is your next project
{epinowcast}: Aiming to deal with some of the issues I outlined above and provide a consistent set of tooling to answer outbreak and surveillance questions in real-time without having to reduce the quality of the work to below the 80% target threshold.
Is the pandemic over
For me yes but not globally.
Corrections
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Citation
For attribution, please cite this work as
Abbott (2023, Jan. 4). Sam Abbott: Data, Models, and Modellers during a Global Pandemic. Retrieved from https://samabbott.co.uk/posts/2023-01-04-iisa-data-models-and-modellers-during-a-global-pandemic/
BibTeX citation
@misc{abbott2023data,,
author = {Abbott, Sam},
title = {Sam Abbott: Data, Models, and Modellers during a Global Pandemic},
url = {https://samabbott.co.uk/posts/2023-01-04-iisa-data-models-and-modellers-during-a-global-pandemic/},
year = {2023}
}