Posts - Sam Abbott

Tue, Sep 3, 2019 4 min read R

getTBinR 0.7.0 released - more data, {ggplot2} best practices and bug fixes

getTBinR 0.7.0 should now be available on CRAN. This release includes some new experimental data (TB incidence by age and sex) that for now is only partly supported by {getTBinR}. It also brings {getTBinR} into line with new (or new to me) {ggplot2} best practices. This involved two major changes (plans are also afoot for an S3 plot method): Moving from a full @import of {ggplot2} to only using @importFrom for required functions (something that I had previously been too lazy to do…).

Thu, Jul 4, 2019 10 min read R

Celebrating £1.4bn for the Global Fund - Trying out gganimate on Tuberculosis data from getTBinR

Last week the UK pledged to contribute £467m a year for three years to the Global Fund. The money will be spent on: providing tuberculosis (TB) treatment for more than two million people; 90 million mosquito nets to protect people from malaria; and treatment for more than three million people living with HIV. This funding will drastically improve many peoples lives and needs to be celebrated even if it comes from a broadly unpopular source.

Thu, May 23, 2019 7 min read R

getTBinR 0.6.0 now on CRAN - from 80 variables to more than 450!

getTBinR 0.6.0 is now on CRAN and should be available on a mirror near you shortly! This update includes multiple new Tuberculosis datasets - increasing the available number of variables through getTBinR from 80 to over 450. To help support these new datasets the package now contains a dataframe listing the available datasets and search_data_dict can now also be used to search the data dictionary for variables by dataset. On top of this, this update contains suggested changes by reviewers (@rrrlw and @strengejacke) from JOSS (see here for the review thread).

Tue, Feb 5, 2019 5 min read R

Benchmarking an Rstats workstation - using benchmarkme

Why? I recently built out a new workstation and have done some benchmarking using xgboost via h2o. In this post I am using the benchmarkme package to get another perspective on performance. Note: The benchmarkme package appears to have some issues when it comes to plotting benchmarks. I ended up having to drop them entirely from this post. Update (2019-02-11): Just checked this issue using rocker/tidyverse:latest and found all benchmarkme functionality is working well.

Sun, Feb 3, 2019 18 min read R

Benchmarking an Rstats workstation on realistic workloads - using xgboost via h2o

Why? I recently built out a new workstation to give me some local compute for data science workloads. Now that I have local access to both a CPU with a large number of cores (Threadripper 1950X with 16 cores) and a moderately powerful GPU (Nvidia RTX 2070), I’m interested in knowing when it is best to use CPU vs. GPU for some of the tasks that I commonly do.

Tue, Jan 22, 2019 5 min read R

getTBinR 0.5.7 now on CRAN - Tuberculosis reports and summary plots

getTBinR 0.5.7 is now on CRAN and should be available on a mirror near you shortly! This update mainly focussed on building out new country level Tuberculosis (TB) report functionality but along the way this led to a new summary plotting function that quickly and easily shows TB trends across regions and globally. I also had some fun developing a hexsticker (Tweet at me with something you made using the package to get a physical version - whilst my postage money lasts…), reducing the dependencies with itdepends and pkgnet and dealing with some breaking changes from an uncoming dplyr update (my own fault for missing a function import).

Sun, Dec 23, 2018 11 min read R

Building an Rstats Workstation

Why? I regularly use cloud resources (AWS and GCP) both in my day job and for personal projects but recently I have been finding that having to spin up a cloud instance for quick analysis can be tedious, even when making use of tools for reproducibility like docker. This is particularly the case for self-learning when spending money on cloud resources feels wasteful, especially when I have half an eye on something else (i.

Fri, Sep 28, 2018 6 min read R

getTBinR 0.5.5 now on CRAN - 2017 data.

getTBinR 0.5.5 is now on CRAN and should be available on a mirror near you shortly! This update is mainly about highlighting the availability of TB data for 2017, although some small behind the scenes changes were required to get the code set up going forward for yearly updates. A few more plotting options have been added, along with the corresponding tests (definitely the most exciting news). The full changelog is below along with a short example highlighting some of the changes in the 2017 data.

Wed, May 16, 2018 4 min read R

getTBinR 0.5.4 now on CRAN - new data, map updates and a new summary function.

getTBinR 0.5.4 is now on CRAN and should be available on a mirror near you shortly! This update includes an additional data set for 2016 containing variables related to drug resistant Tuberculosis, some aesthetic updates to mapping functionality and a new summarise_tb_burden function for summarising TB metrics. Behind the scenes there has been an extensive test overhaul, with vdiffr being used to test images, and several bugs fixes. See below for a full list of changes and some example code exploring the new functionality.

Wed, Apr 11, 2018 266 min read R

Exploring Tuberculosis Monitoring Indicators in England; Using Dimension Reduction and Clustering

Introduction I recently attended the Public Health Research and Science Conference, run by Public Health England (PHE), at the University of Warwick. I was mainly there to present some work that I have been doing (along with my co-authors) estimating the direct effects of the 2005 change in BCG vaccination policy on Tuberculosis (TB) incidence rates (slides) but it was also a great opportunity to see what research is being done within, and partnered with, PHE.