In this vignette the lastest data for the proportion of TB cases that are zoonotic, along with data on potential correlates, is rapidly explored using an EDA approach modelled on that used in the {DataExplorer}
package. This analysis is indicative only and any findings should be evaluated more carefully using more robust techniques.
Latest data available used by default - see data sources and data linkage for details. To instead explore the complete dataset comment out the get_latest_combined_data
call and rerun the analysis.
This step also drops variables that are unlikely to be of interest in an initial EDA. This includes: confidence intervals, standard error, country names, country codes, and regions. On top of this countries with fewer than 100 TB cases have been removed, along with countries with incidence rates below 1 per 100,000 population (effective TB elimination). This has been done as data for these countries is unlikely to provide meaningful insights for a global model of zoonotic TB incidence.
knitr::kable(t(DataExplorer::introduce(latest_data)),
row.names = TRUE,
col.names = "",
format.args = list(big.mark = ","))
rows | 170 |
columns | 19 |
discrete_columns | 4 |
continuous_columns | 15 |
all_missing_columns | 0 |
total_missing_values | 784 |
complete_rows | 33 |
total_observations | 3,230 |
memory_usage | 28,488 |
## Warning in cor(x = structure(list(tb_year = c(2018, 2018, 2018, 2018,
## 2018, : the standard deviation is zero
plot <- DataExplorer::plot_correlation(na.omit(latest_data), type = "c") +
theme(legend.position = "none")
## Warning in cor(x = structure(list(tb_year = c(2018, 2018, 2018, 2018,
## 2018, : the standard deviation is zero
## Warning: Removed 100 rows containing missing values (geom_text).
## Warning: Removed 100 rows containing missing values (geom_text).