Some web API package development lessons from HIBPwned

April 19, 2018
maelle
Data Science, R

As announced yesterday, HIBPwned version 0.1.7 has been released to CRAN! Although the release was mainly a maintenance release building on Steph’s already great code, internal changes were made to start transforming HIBPwned into a real showcase of web API package development. Let’s summarize some interesting points:

Avoiding useless requests

HIBPwned uses the memoise package in order to cache the results inside an active R session. What does this mean? Imagine the case where a user wants to check the breaches using HIBPwned::account_breaches for a bunch of email addresses, and either does not want to, or forgets to, check for duplicates. Well the API is queried only the first time an email address is used as input. The second time, it uses the cached result! Read more about Memoization on Wikipedia.

Limiting the rate for the user

Like many web APIs, haveibeenpwned.com implements a rate limit: you can’t do more than one query every 1.5 second. If you try to send more requests, you’ll get a 429 error. Instead of letting the user pace their requests, HIBpwned takes advantage of the ratelimitr package to limit the rate at which requests are made, see the source code. These few lines of code simplify the life of the user who shouldn’t deal with 429 errors, well except if they open several R sessions at once using the same IP address.

Letting the user change the user agent

By default the user agent sent to the API is “HIBPwned R pkg” but all exported functions allow the user to change it via the agent argument so it could, for example, be “Jane Doe cool Shiny app” to help the website maintainer monitor sources of traffic, or even a random string to make tracking harder as mentioned by Bob Rudis.

Testing error handling

As with many packages interacting with web APIs, HIBPwned contains code handling errors, e.g. a few retries with exponential waiting time in case of errors. Very cool, but how to test for this behavior since one cannot force the API to return error messages? Web mocking is the answer!

To do that one could use httptest but since my (Maëlle’s) other job is at rOpenSci, I am quite loyal to Scott Chamberlain’s tools and replaced the httr dependency with crul in order to be able to use webmockr for testing, since webmockr doesn’t support httr yet. See the test file in the wild and an excerpt below:

test_that("no 404 http errors are handled as expected", {
  memoise::forget(GETcontent)
  skip_on_cran()
  webmockr::enable()
  stub <- webmockr::stub_request("get", "https://haveibeenpwned.com/api/breaches") # nolint
  webmockr::to_return(stub, status = 429)
  expect_message(HIBPwned::breached_sites(), "Try")
  expect_silent(HIBPwned::breached_sites(verbose = FALSE))
  output <- HIBPwned::breached_sites()
  expect_null(output)
  webmockr::disable()

})

As you see memoise makes it slightly more complicated, but it was not too involved and allows us to check errors are handled as we want, which makes the package more robust!

Future improvements will include being able to test the behaviour in case of say, 2 422 errors and then 1 200 success from the API, which isn’t possible yet. Furthermore, vcr could be used to allow saving and replaying API responses for testing without sending new requests. Other thoughts about testing web API packages can be found in this thread on rOpenSci discussion forum.

Let’s end by saying things aren’t perfect yet, obviously. If you notice something you think could be improved, feel free to open an issue as a way to notify us… and to discuss a future pull request of yours? We’re also happy to mentor newbies!

Some web API package development lessons from HIBPwned

Avoiding useless requests

Limiting the rate for the user

Letting the user change the user agent

Testing error handling

Search