Conference sessions – basic web scraping in R

It’s a bit sad but I enjoy dissecting what sessions are submitted to conferences I’m involved in or speak at. Instead of doing it primarily by eye, I’ve started dabbling in web scraping in R to do it. Initially, I used RCurl and my latest snippet uses rvest.

The first snippet for SQLBits bit of R code uses RCurl but it’s cumbersome, plus for SQLSaturday Exeter there is SSL to contend with. Using rvest makes it really easy and it was an excellent excuse to get around to using magrittr, Hadley Wickham’s pipe code paradigm for R.

Blogger tip: I also wanted the opportunity to see how Gists imported into WordPress – you just c&p the url in (into the code, no URL markup) and WordPress automatically pulls in the Gist. For more info on this see WordPress’ article on Gist.


This was my first stab at doing this and I’m sure I’ll upgrade it soon!

[SQLBits Session distributions][1]
SQLBits Session distributions

SQLSaturday Exeter

This is the code with rvest in. I also took it a step further to do some preliminary analysis of speakers, and to have fun building a wordcloud!

[SQL Saturday Exeter 2015 word cloud][2]
SQL Saturday Exeter 2015 word cloud
[SQL Saturday Exeter 2015 session level distribution][3]
SQL Saturday Exeter 2015 session level distribution