Auto-deploying documentation: FASTER!

November 13, 2015
Steph
Data Science, DataOps, R

Over the past few years I’ve been delving deeper into automatically building and deploying documentation and reporting in R (with rmarkdown, LaTeX etc). This post covers another step forward on that journey towards awesomeness.

Posts in this series

I wrote a while back about how Robert Flight‘s post helped me make web pages out of my R package vignette automatically whenever I committed a change to GitHub. I took it further by modifying to loop through vignettes.

I then promised to do one thing: make a much more complex version. Well, that’s taken me a while and you can check it out in its current guise for Rtraining but it’s still not as clean I’d like because of my pants coding on some of the Rmarkdown docs. Whilst I can’t blog about a neat solution for that (yet), I wanted to cover the neat, but most importantly fast, solution I was able to crib from Kieran Healy on using containerized R builds and how this makes things go whizzy fast!

Part of the big problem I’ve had when doing my Rtraining auto-doc process has been the amount of time it takes to build my R environment from scratch before generating the documents. When your build takes more than 15 minutes you find other things to do with the intervening time and end up completely forgetting about it.

How do you solve that full-build issue?

Unfortunately, a long build time is a typical state of affairs with the R Travic-CI builds at the moment because they require elevated permissions and run on VMs. Travis have been using containers from docker to do much cooler stuff. By using containers they have less initial setup that needs doing so your build starts quicker, then you can set up some caching of dependencies so that it keeps all your installed stuff lying around ready for use when you commit things. This means that unless a dependency has changed e.g. new version, or no longer required, the cache is still valid and can be used. Overall, between the two pieces it means time is spent on testing your stuff, and less time is spent on the prep for testing.

This can literally take 10-15 minutes off your build time – woohoo!

Why isn’t everyone doing it?

Well, it is pretty new and R support has only just gone into travis generally. There’s also some permissions issues that go away with sudo. At the moment, the example container R build that Jan Tilly built is still pretty bare bones and isn’t as easy out of the box as the mainstream version.

Move on to the cool stuff already!

When I needed to do an rmarkdown repository for making R Consortium Infrastructure Proposals, I was able to take the opportunity to take Jan’s code and move forward with it so that the ISC proposal is always web-facing. Here’s how I did it:

Used @jtilly’s .travis.yml file as the backbone
Emulated @yihui’s knitr .travis.yml file in order to get recent versions of pandoc and texlive for rmarkdown to work correctly
Used a shell script for sorting out git commits, and my R file for generating the Rmarkdown docs

What next?

Take a look at the key components (below) or have a play – take the proposal boilerplate and follow the instructions on configuring it to work on your account. Try adding new docs or dependencies, or add a package infrastructure – in other words hack away! Most importantly though, smile at jobs that take just a minute 😀

The R code

[embedGitHubContent owner="stephlocke" repo="isc-proposal" path="ghgenerate.R"]

The bash code

[embedGitHubContent owner="stephlocke" repo="isc-proposal" path=".push_gh_pages.sh"]

The travis yaml

[embedGitHubContent owner="stephlocke" repo="isc-proposal" path=".travis.yml"]