Bride of Frankenstein: TFS + R

The unholy abomination of trying to use TFS as my central repository for my R code over the past year has been tough and you may or not be looking at the screen as if I’m a crazy fool for even trying. Of course, now I have good news, because I’ve broken the back of the main issue I had with TFS. The crucial link was being able to programatically create Git repositories within a single project for small projects.

Using the API, I’ve been able to write an R package with functions that now save me at least 15 minutes of time and effort each time I want a new project. So I can happily holler “IT’S ALIVE!!”

What’re you source controlling? Well, we have internal developments like R packages, and we have a lot of analysis we’re doing in R. I also have databases, some web apps and a few other bits and bobs. I’m a firm believer that many of the best practices of developers are also best practices for DBA and BI professionals, whether we know it or not yet. This means stuff like source control, unit testing, continuous integration, and eventually continuous deployment are big goals in my mind. Therefore I’m trying to source control pretty much everything that is a script or documentation i.e. all databases, reports, manuals etc.

Great goal but still, why TFS? Team Foundation Server is/was (depending on who you ask) the de facto source control system of the Microsoft IT shop. It has a bunch of current or emerging plus points in its favour:

  • IT are already using it
  • Theoretically, it has nifty build & testing tools/integration
  • The project management layer is quite useful
  • You can use the git engine, instead of the traditional centralised system, so it’ll work with R etc
  • The recent Visual Studio Online means that you can start doing your dev in the cloud
  • Visual Studio Community Edition will see more support for open source languages in the future

But, why not GitHub? Well GitHub is super neat and we do release some of our open source stuff on their but we do need commercially sensitive or internal only things to be private and it costs (relatively) a fair bit to use GitHub for BI when you could easily spawn 50 private repos per quarter.

So, why not BitBucket? BitBucket gives you unlimited private repos and has lots of bells and whistles. This was going to be my next port of call, if I couldn’t get TFS working.

OK, I take your point, so what’s the problem? Project creation! When I want to create a project to put some analysis in, I have to open Visual Studio, create a blank solution, go to the Team Explorer pane, tell it to create a new project, go through more than 5 panes of a wizard, wait ages for the sharepoint and SSRS guff to be configured, then switch over to Rstudio (on my linux box dontchaknow) server, pull the blank repository, create a project in it (or drop all my files), commit, then push my project skeleton in. That was incredibly painful. That was compounded by a couple of other facts:

  1. Visual Studio, is a horrible program when it comes to updates. Most recently, I was unable to use database projects due to a botched upgrade (needed UAC, fine got an admin to put in the details, but also needed to be running VS as an admin, which it didn’t say until it broke), and of course VS is almost impossible to completely uninstall/repair which meant a complete rebuild. So I hate having to rely on it!

  2. I have MSDN but do I want to give my head of finance, and myriad other people Visual Studio just to be able to work safely? It’s a big cost and effort sink. Plus the cognitive dissonance involved for them when I say so to use R, you need to open this completely unrelated program. This was more important, when we weren’t looking at VS Online.

Arg, stop talking! I’m frustrated just listening to you! Totally understandable my dear chum, so after a long while of investigating git-tfs, Cross-Platform Command Line and finding none could create projects, a delightful chap called Buck Hodges at Microsoft pointed me in the direction of the API and suggested creating repositories inside projects. For me, they’d been the same but no, you can hold many repositories in a single project, and you can create these projects with a call to an API. This made me literally jump for joy and start yammering to Jan, my colleague.

So now what? The icing on the cake, is that using Hadley’s awesome httr package, I’ve been able to put together an R package that allows you to create repositories in a TFS project inside R so that people never have to go outside Rstudio (or their favourite R IDE) to do their source control work.

What’s the catch? If you want a standalone project, you still have to go through the old process, and I haven’t tested on older TFS versions.

Bye, you crazy so-and-so Bye, and if you find yourself wanting to avoid using Visual Studio for managing TFS do take a look at how I’ve interacted with the API and write your own tool in the language of your choice.

UPDATE: Project creation is nowhere near so bad in VSO – you can create on the site and the use the instructions to force a local repository into the online one. Although you may have to move some of the arguments around in the git push command VSO gives you.