How many CRAN package maintainers have been pwned?
The alternative title of this blog post is HIBPwned
version 0.1.7 has
been released! W00t!. Steph’s HIBPwned
package utilises the
HaveIBeenPwned.com API to check
whether email addresses and/or user names have been present in any
publicly disclosed data breach. In other words, this package potentially
delivers bad news, but useful bad news!
This release is mainly a maintenance release, with some cool code
changes invisible to you, the user, but not only that: you can now get
account_breaches
for several accounts in a data.frame
instead of a
list, and you’ll be glad to know that results are cached inside an
active R session. You can read about more functionalities of the package
in the function reference.
Wouldn’t it be a pity, though, to echo the release notes without a nifty use case? Another blog post will give more details about the technical aspects of the release, but here, let’s make you curious! How many CRAN package maintainers have been pwned?
Get the email addresses of CRAN package maintainers
library("magrittr")
Data was gathered thanks to an adaptation of the code published in this blog post of David Smith’s about prolific package maintainers. We are after the most endangered package maintainers on CRAN!
The helper function below extracts the email address of a string such as
“Jane Doe jane.doe@fakedomain.io”. On top of using the as.person
conversion, this function also deals with a few particular cases.
get_maintainer_email <- function(maintainer_string){
if(inherits(maintainer_string, "data.frame")){
maintainer_string <- maintainer_string$Maintainer[1]
}
if(maintainer_string != "ORPHANED"){
maintainer_string <- stringr::str_replace_all(maintainer_string,
'"', '')
maintainer_string <- stringr::str_replace_all(maintainer_string,
',', '')
# particular case!
maintainer_string <- stringr::str_replace_all(maintainer_string,
'Berlin School of Economics and Law', '')
maintainer <- as.person(maintainer_string)
maintainer$email
}else{
""
}
}
Here it is in action.
get_maintainer_email("Jane Doe <jane.doe@fakedomain.io>")
#> [1] "jane.doe@fakedomain.io"
The following code then gathers the email addresses of all CRAN package maintainers.
tools::CRAN_package_db() %>%
.[, c("Package", "Maintainer")] %>%
tidyr::nest(Maintainer, .key = "Maintainer") %>%
# get the email out of the maintainer
dplyr::mutate(email = purrr::map_chr(Maintainer,
get_maintainer_email)) %>%
dplyr::select(- Maintainer) %>%
# only keep the ones with email
dplyr::filter(email != "") %>%
# save result
readr::write_csv(path = "data/all_packages.csv")
emails <- readr::read_csv("data/all_packages.csv")
We obtained 12444 packages with 7173 unique email addresses. We do not
have to care about their uniqueness: since HIBPwned
implements caching
inside an active R session via
memoise
duplicate emails do not
mean duplicate requests! :nail_care: Another aspect we users do not
need to care about is rate limiting: HIBPwned
uses the nice
ratelimitr
package in order
to automatically pause R when needed.
So, have CRAN package maintainers been pwned?
Thanks to setting the new as_list
option to FALSE we get a data.frame
as output. Note that choosing this means we only get back accounts with
breaches. Depending on the analysis, we could supplement the original
emails
data.frame with the information using dplyr::left_join
for
instance.
pwned <- HIBPwned::account_breaches(emails$email,
as_list = FALSE)
pwned <- unique(pwned)
There are 7173 unique CRAN maintainer emails, among which 3613 i.e. 50% have been pwned. Dear reader, why not compare this to the proportion of Python module maintainers who’ve been pwned? Ping us if you complement this analysis!
Looking at these pwned maintainers, here are the number of breaches they’ve been victims of:
pwned %>%
dplyr::count(account) %>%
dplyr::summarise(median = median(n),
min = min(n),
max = max(n)) %>%
knitr::kable()
median | min | max |
---|---|---|
2 | 1 | 18 |
There are 136 unique breaches. What were the most common ones?
pwned %>%
dplyr::group_by(Title, BreachDate) %>%
dplyr::tally() %>%
dplyr::arrange(desc(n)) %>%
head(10) %>%
knitr::kable()
Title | BreachDate | n |
---|---|---|
Dropbox | 2012-07-01 | 1534 |
2012-05-05 | 1140 | |
Onliner Spambot | 2017-08-28 | 943 |
GeekedIn | 2016-08-15 | 782 |
Adobe | 2013-10-04 | 694 |
MDPI | 2016-08-30 | 558 |
Last.fm | 2012-03-22 | 350 |
NetProspex | 2016-09-01 | 310 |
B2B USA Businesses | 2017-07-18 | 279 |
Disqus | 2012-07-01 | 259 |
Maybe or probably some you’ve heard of, which might make you wonder about your own security, being a CRAN maintainer or not…
What about you?
You could check if you’ve been victim of any known breach right now by
installing HIBPwned
from CRAN!
# install.packages("HIBPwned")
str(HIBPwned::account_breaches("steff.sullivan@gmail.com"))
#> List of 1
#> $ steff.sullivan@gmail.com:'data.frame': 4 obs. of 16 variables:
#> ..$ Title : chr [1:4] "Adobe" "Disqus" "LinkedIn" "Onliner Spambot"
#> ..$ Name : chr [1:4] "Adobe" "Disqus" "LinkedIn" "OnlinerSpambot"
#> ..$ Domain : chr [1:4] "adobe.com" "disqus.com" "linkedin.com" ""
#> ..$ BreachDate : chr [1:4] "2013-10-04" "2012-07-01" "2012-05-05" "2017-08-28"
#> ..$ AddedDate : chr [1:4] "2013-12-04T00:00:00Z" "2017-10-06T23:03:51Z" "2016-05-21T21:35:40Z" "2017-08-29T19:25:56Z"
#> ..$ ModifiedDate: chr [1:4] "2013-12-04T00:00:00Z" "2017-10-06T23:03:51Z" "2016-05-21T21:35:40Z" "2017-08-29T19:25:56Z"
#> ..$ PwnCount : int [1:4] 152445165 17551044 164611595 711477622
#> ..$ Description : chr [1:4] "In October 2013, 153 million Adobe accounts were breached with each containing an internal ID, username, email,"| __truncated__ "In October 2017, the blog commenting service <a href=\"https://blog.disqus.com/security-alert-user-info-breach\"| __truncated__ "In May 2016, <a href=\"https://www.troyhunt.com/observations-and-thoughts-on-the-linkedin-data-breach\" target="| __truncated__ "In August 2017, a spambot by the name of <a href=\"https://benkowlab.blogspot.com.au/2017/08/from-onliner-spamb"| __truncated__
#> ..$ DataClasses :List of 4
#> .. ..$ : chr [1:4] "Email addresses" "Password hints" "Passwords" "Usernames"
#> .. ..$ : chr [1:3] "Email addresses" "Passwords" "Usernames"
#> .. ..$ : chr [1:2] "Email addresses" "Passwords"
#> .. ..$ : chr [1:2] "Email addresses" "Passwords"
#> ..$ IsVerified : logi [1:4] TRUE TRUE TRUE TRUE
#> ..$ IsFabricated: logi [1:4] FALSE FALSE FALSE FALSE
#> ..$ IsSensitive : logi [1:4] FALSE FALSE FALSE FALSE
#> ..$ IsActive : logi [1:4] TRUE TRUE TRUE TRUE
#> ..$ IsRetired : logi [1:4] FALSE FALSE FALSE FALSE
#> ..$ IsSpamList : logi [1:4] FALSE FALSE FALSE TRUE
#> ..$ LogoType : chr [1:4] "svg" "svg" "svg" "png"
If one of your addresses has been pwned, Steph says you should change passwords in other locations is you re-used passwords. Even if your address hasn’t been pwned yet you should use a password manager that will allow you not to re-use passwords, and set up two factor authentication, e.g. read more about 2FA for GitHub.
And how could you now right away if a known data breach is of concern to
you? Well, don’t only let your .Rprofile tell you you’re a rrrrock
star, but
also add some code checking whether you’ve been pwned, as explained in
this blog
post!
Pro-tip, you can use the usethis::edit_r_profile
function to easily
open your .Rprofile. Steph also says you can register for HIBPwned.com
notifications and ask your
organisation to watch breaches at the domain level. Stay safe!