Overview

This vignette outlines the data sources for the packaged datasets. It also summarises the cleaning process used for each dataset - although it is recommended that these details are checked against the coded implementations.

Bovine tuberculosis (TB) incidence estimates

Humans

Raw data

  • Source from this systematic review by Muller et al.
  • Data represents a list of studies estimating the proportion of overall TB cases that have zoonotic TB. Data includes zoonotic TB cases, and sample size (TB cases).
  • Data is restricted to 61 countries.
  • Data quality is extremely low with multiple issues.

Data cleaning

Initial data cleaning
  • Added a logical variable capturing if studies are multi-year.
  • Standardized missing data to be listed as NA.
  • Extract country name by assuming it comes before the first comma in the area string.
  • Extract the study end date by assuming (with some manual post processing) that the last 4 digits in the study period are the final year of the study.
  • Replace NA with 0 zoonotic TB cases for studies that have no TB cases.
Exclusions
  • Replace missing sample size data with TB case data.
  • If the number of zoonotic TB cases is unknown attempt to back calculate using the sample size and proportion of cases that have zoonotic TB.
  • Drop studies with an unknown study period.
  • Drop studies from the EU as a whole.
  • Drop all studies from the US due to low quality data.
  • Drop all studies that look at subset of ages.
  • Drop all studies that look at a specific gender.
  • Exclude studies that did not look at the general population, identified TB cases, hospital populations, or identified TB cases in hospital populations.
  • Drop studies that are not country wide if that country has country wide data available. (This is a major assumption that could be re-evaluated).
  • If a study reports data on TB cases then drop duplicated data on the general/hospital population.
  • Drop all studies that don’t list a sample size (This may be to restrictive as some studies may use the overall population size as the sample size).
  • Split data-set into country-wide and non-country wide studies for further cleaning.
Country-wide data
  • Excluded studies in sub-populations or high risk populations via manual inspection.
  • Dropped data from New Zealand as a study is duplicated with differing sample sizes.
  • Excluded duplicate entries for France and Ireland.
  • Final data-set has data from 36 countries across 18 years (1990 - 2007)
Non-country wide data
  • No exclusions based on sub-populations as all studies are effectively sampling sub-populations (risk, location, etc). This needs to be appropriately adjusted for at the analysis stage.
  • Excluded duplicate entries from Sierra Leone and Argentina
  • Final data-set has data from 14 countries across 15 years (1991 - 2008).

Limitations

  • Studies are often not representative of overall TB and zoonotic TB incidence.
  • Zoonotic TB incidence is highly heterogeneous, being more likely to occur in rural areas and in populations that routinely eat raw dairy.

Animals

  • Sourced from the OIE via personal communication
  • Data cleaning details can be found in /data-raw/zoonotic_tb_animals.R
  • Data available for 2018 split into half years.
  • Data is available for domesticated and wild animals - stratified into present, limited, suspected, limited + suspected, not present and missing.
  • Incidence or incidence rates are not available in this data set.
  • Data quality is low.
  • Last recorded status is also available but this field has currently been excluded due to its poor quality.
  • Data for half years has been combined into data summarised across a year.

Tuberculosis (TB) incidence

  • Sourced from the WHO using {getTBinR}.
  • Extracted TB incidence, TB incidence rates (+ CI’s), proportion with extra-pulmonary TB, and proportion HIV positive (+ CI’s).
  • Data prior to 2000 is dropped due to the high proportion of missing data.
  • Data sourced using EstZoonoticTB::tb_data(). See ?EstZoonoticTB::tb_data() for documentation.
EstZoonoticTB::tb_data()
#> Downloading data from: https://extranet.who.int/tme/generateCSV.asp?ds=estimates
#> Saving data to: /tmp/RtmpTZSQOf/tb_burden.rds
#> Downloading data from: https://extranet.who.int/tme/generateCSV.asp?ds=mdr_rr_estimates
#> Saving data to: /tmp/RtmpTZSQOf/mdr_tb.rds
#> Joining TB burden data and MDR TB data.
#> Getting additional dataset: Latent TB infection
#> Downloading data from: https://extranet.who.int/tme/generateCSV.asp?ds=ltbi_estimates
#> Saving data to: /tmp/RtmpTZSQOf/latent_tb_infection.rds
#> Getting additional dataset: Notification
#> Downloading data from: https://extranet.who.int/tme/generateCSV.asp?ds=notifications
#> Saving data to: /tmp/RtmpTZSQOf/notification.rds
#> Getting additional dataset: Drug resistance surveillance
#> Downloading data from: https://extranet.who.int/tme/generateCSV.asp?ds=dr_surveillance
#> Saving data to: /tmp/RtmpTZSQOf/drug_resistance_surveillance.rds
#> Getting additional dataset: Non-routine HIV surveillance
#> Downloading data from: https://extranet.who.int/tme/generateCSV.asp?ds=tbhivnonroutinesurv
#> Saving data to: /tmp/RtmpTZSQOf/non-routine_hiv_surveillance.rds
#> Getting additional dataset: Outcomes
#> Downloading data from: https://extranet.who.int/tme/generateCSV.asp?ds=outcomes
#> Saving data to: /tmp/RtmpTZSQOf/outcomes.rds
#> Getting additional dataset: Budget
#> Downloading data from: https://extranet.who.int/tme/generateCSV.asp?ds=budget
#> Saving data to: /tmp/RtmpTZSQOf/budget.rds
#> Getting additional dataset: Expenditure and utilisation
#> Downloading data from: https://extranet.who.int/tme/generateCSV.asp?ds=expenditure_utilisation
#> Saving data to: /tmp/RtmpTZSQOf/expenditure_and_utilisation.rds
#> Getting additional dataset: Policies and services
#> Downloading data from: https://extranet.who.int/tme/generateCSV.asp?ds=policies
#> Saving data to: /tmp/RtmpTZSQOf/policies_and_services.rds
#> Getting additional dataset: Community engagement
#> Downloading data from: https://extranet.who.int/tme/generateCSV.asp?ds=community
#> Saving data to: /tmp/RtmpTZSQOf/community_engagement.rds
#> Getting additional dataset: Laboratories
#> Downloading data from: https://extranet.who.int/tme/generateCSV.asp?ds=labs
#> Saving data to: /tmp/RtmpTZSQOf/laboratories.rds
#> Joining TB burden data and additional datasets.
#> # A tibble: 4,474 x 12
#>    country iso3  g_whoregion  year tb_cases tb_inc tb_inc_lo tb_inc_hi
#>    <chr>   <chr> <chr>       <int>    <int>  <dbl>     <dbl>     <dbl>
#>  1 Afghan… AFG   Eastern Me…  2000    39000    190       123       271
#>  2 Afghan… AFG   Eastern Me…  2001    41000    189       123       271
#>  3 Afghan… AFG   Eastern Me…  2002    43000    189       122       270
#>  4 Afghan… AFG   Eastern Me…  2003    45000    189       122       270
#>  5 Afghan… AFG   Eastern Me…  2004    47000    189       122       270
#>  6 Afghan… AFG   Eastern Me…  2005    48000    189       122       270
#>  7 Afghan… AFG   Eastern Me…  2006    50000    189       122       270
#>  8 Afghan… AFG   Eastern Me…  2007    51000    189       122       270
#>  9 Afghan… AFG   Eastern Me…  2008    52000    189       122       270
#> 10 Afghan… AFG   Eastern Me…  2009    54000    189       123       270
#> # … with 4,464 more rows, and 4 more variables: prop_tb_ep <dbl>,
#> #   prop_hiv <dbl>, prop_hiv_lo <dbl>, prop_hiv_hi <dbl>