R for database and Excel people

September 15, 2013
Steph
Microsoft Data Platform, Misc Technology, R

What is R?

R is a statistical language for doing all sorts of analytics based on many different types of data and it’s also an open source platform that allows people to extend the base functionality. More details are available from the horse’s mouth.

How can I give it a go?

Download R and RStudio an awesome development environment for R. There is also an excellent online R learning site. I do not recommend sticking with just R – we’re used to a lot more convenience and good development bits and bobs like IntelliSense and Rstudio really delivers.

Why should I care about it?

If you do reporting or analysis of any form, R can be *very* useful in addition to SQL, Excel, SSRS etc.

If you’re a pure DBA with very little analytical responsibilities, you may still be interested in it because you can encourage others to utilise it for activities where SQL is not suitable for, thus reducing the database problems.

For developers, it can give you the ability to produce interactive graphics and visualisations on the interweb using Shiny or directly embedding in ASP.

Why ‘R for database and Excel people’?

R was designed and built by academic statisticians who have been using S (the commercial R predecessor), MATLAB, SAS and other odd programs with an insular user group who speak a really weird language full of jackknifes and lassos. You as someone who more likely comes from a technical/computing/general analytics background are unlikely to understand this easily without a google translate button. This means the design choices, the language, and the help pages can be almost wholly impenetrable and would provide a constant source of frustration! After six months grappling with R entirely on my own with just the internet for succour and help, I’d like to pass on my translations of hard won knowledge to you, the reader.

I’ll cover topics in the order of relevance to someone with SQL and Excel knowledge would likely look to tackle them. This means I won’t cover necessarily cover R topics in the way a stats person might approach them because the challenges we’ll face tend to be different from a statistician. I also intend to bypass ‘base’ methods for solving typical challenges and skip straight to the way I’ve found most intuitive after experimenting with different packages.

What will be covered?

Setting up an R server
Connecting to SQL Server
Effective data storage and the data.table package
Pivoting data
Aggregations
Dealing with time
Producing charts
Outputting to files
Producing reports
Building your own packages
Source control
Unit testing
Best practices

When will these be covered?

I plan on writing and outputting these once a week for your edification.