Some web API package development lessons from HIBPwned
As announced yesterday, HIBPwned
version 0.1.7 has been released to CRAN! Although the release was mainly a maintenance release building on Steph’s already great code, internal changes were made to start transforming HIBPwned
into a real showcase of web API package development. Let’s summarize some interesting points:
Avoiding useless requests
HIBPwned
uses the memoise
package in order to cache the results inside an active R session. What does this mean? Imagine the case where a user wants to check the breaches using HIBPwned::account_breaches
for a bunch of email addresses, and either does not want to, or forgets to, check for duplicates. Well the API is queried only the first time an email address is used as input. The second time, it uses the cached result! Read more about Memoization on Wikipedia.
Limiting the rate for the user
Like many web APIs, haveibeenpwned.com implements a rate limit: you can’t do more than one query every 1.5 second. If you try to send more requests, you’ll get a 429 error. Instead of letting the user pace their requests, HIBpwned
takes advantage of the ratelimitr
package to limit the rate at which requests are made, see the source code. These few lines of code simplify the life of the user who shouldn’t deal with 429 errors, well except if they open several R sessions at once using the same IP address.
Letting the user change the user agent
By default the user agent sent to the API is “HIBPwned R pkg” but all exported functions allow the user to change it via the agent
argument so it could, for example, be “Jane Doe cool Shiny app” to help the website maintainer monitor sources of traffic, or even a random string to make tracking harder as mentioned by Bob Rudis.
Testing error handling
As with many packages interacting with web APIs, HIBPwned
contains code handling errors, e.g. a few retries with exponential waiting time in case of errors. Very cool, but how to test for this behavior since one cannot force the API to return error messages? Web mocking is the answer!
To do that one could use httptest
but since my (MaĆ«lle’s) other job is at rOpenSci, I am quite loyal to Scott Chamberlain’s tools and replaced the httr
dependency with crul
in order to be able to use webmockr
for testing, since webmockr
doesn’t support httr
yet. See the test file in the wild and an excerpt below:
test_that("no 404 http errors are handled as expected", {
memoise::forget(GETcontent)
skip_on_cran()
webmockr::enable()
stub <- webmockr::stub_request("get", "https://haveibeenpwned.com/api/breaches") # nolint
webmockr::to_return(stub, status = 429)
expect_message(HIBPwned::breached_sites(), "Try")
expect_silent(HIBPwned::breached_sites(verbose = FALSE))
output <- HIBPwned::breached_sites()
expect_null(output)
webmockr::disable()
})
As you see memoise
makes it slightly more complicated, but it was not too involved and allows us to check errors are handled as we want, which makes the package more robust!
Future improvements will include being able to test the behaviour in case of say, 2 422 errors and then 1 200 success from the API, which isn’t possible yet. Furthermore, vcr
could be used to allow saving and replaying API responses for testing without sending new requests. Other thoughts about testing web API packages can be found in this thread on rOpenSci discussion forum.
Let’s end by saying things aren’t perfect yet, obviously. If you notice something you think could be improved, feel free to open an issue as a way to notify us… and to discuss a future pull request of yours? We’re also happy to mentor newbies!