Locke Data Blog

Locke Data helps organisations get started with data science. Grow your skills with our blog posts.

Search

Working with R

  • October 5, 2017
  • steph
I’ve been pretty quiet on the blog front recently. That’s because I overhauled my site, migrating it to Hugo (the foundation of blogdown). Just doing one extra thing on top of my usual workload, I also did another thing. I wrote a book too! I’m a big book fan, and I’m especially a Kindle Unlimited fan (all the books you can read for £8 per month, heck yeah!) so I wanted to make books that I could publish and see on Kindle Unlimited. Read More

All my talks in one place (plus a Hugo walkthrough!)

  • June 17, 2017
  • steph
I mentioned in an earlier post about how I’m revamping my presentation slides process but that I hadn’t tackled the user experience of browsing my slides, which wasted lots of the effort I put in. To tackle this part of it, I’ve made lockedata.uk using Hugo to be a way of finding and browsing presentations on R, SQL, and more. As Hugo is so easy, I thought I’d throw in a quick Hugo walkthrough too so that you could build your own blog/slides/company site if you wanted to. Read More

Why data people don’t do devops

  • June 13, 2017
  • steph
T-SQL TuesdayFor T-SQL Tuesday #91 the topic is databases and devops. Grant Fritchey asks us: How do we approach DevOps as developers, DBAs, report writers, analysts and database developers? How do we deal with data persistence, process, source control and all the rest of the tools and mechanisms, and most importantly, culture, that would enable us to get better, higher functioning teams put together? Please, tell me your DevOps stories. Read More

Using purrr with APIs – revamping my code

  • June 13, 2017
  • Steph
I wrote a little while back about using Microsoft Cognitive Services APIs with R to first of all detect the language of pieces of text and then do sentiment analysis on them. I wasn’t too happy with the some of the code as it was very inelegant. I knew I could code better than I had, especially as I’ve been doing a lot more work with purrr recently. However, it had sat in drafts for a while. Read More

R and Data Science activities in London, June 27th – 29th

  • June 7, 2017
  • Steph
Locke Data will be up to some shenanigans of various stripes in the big smoke. We hope to see you at some of them! June 26th — Monday Introduction to R (Newcastle) I won’t be in London for this but I’ll be doing a day of Introduction to R in Newcastle. This is supporting the local user groups and costs up to £90 for the whole day.Intro to R in Newcastle, June 26th Read More

Versioning R model objects in SQL Server

  • May 26, 2017
  • Steph
High-level info If you build a model and never update it you’re missing a trick. Behaviours change so your model will tend to perform worse over time. You’ve got to regularly refresh it, whether that’s adjusting the existing model to fit the latest data (recalibration) or building a whole new model (retraining), but this means you’ve got new versions of your model that you have to handle. You need to think about your methodology for versioning R model objects, ideally before you lose any versions. Read More

How to change “No match found!” on your no-code Q&A bot

  • May 22, 2017
  • Steph
Last week, I blogged about building a no-code Q&A bot for your website. One little niggle I had with the bot was the response when it could match a user input to a Q&A. I wondered how to change “No match found!”. I looked around the qnamaker.ai site and couldn’t find a place I could change this. I submitted some feedback and the great people at the other of the Q&A site responded super quickly. Read More

Improving automatic document production with R

  • May 19, 2017
  • Steph
In this post, I describe the latest iteration of my automatic document production with R. It improves upon the methods used in Rtraining, and previous work on this topic can read by going to the auto deploying R documentation tag. I keep banging on about this area because reproducible research / analytical document pipelines is an area I’ve a keen interest in. I see it as a core part of DataOps as it’s vital for helping us ensure our models and analysis are correct in data science and boosting our productivity. Read More

Easy-peasy Q&A bot

  • May 15, 2017
  • Steph
Everyone seems to have a live chat option for their site but I’m frequently away, so I wanted something that people could talk to interactively. This is a perfect scenario for a Q&A bot. Microsoft takes a ton of the pain out of Q&A bots, and it was much easier than I thought to get it added to my WordPress blog. Here is a how to do it for your site. Read More

How to go about interpreting regression cofficients

  • May 12, 2017
  • Steph
Following my post about logistic regressions, Ryan got in touch about one bit of building logistic regressions models that I didn’t cover in much detail – interpreting regression coefficients. This post will hopefully help Ryan (and others) out. This was so helpful. Thank you! I'd love to see more about interpreting the glm coefficients. — Ryan (@RyanEs) April 21, 2017 What is a coefficient? Coefficients are what a line of best fit model produces. Read More

datasauRus now on CRAN

  • May 9, 2017
  • Steph
datasauRus is a package storing the datasets from the paper Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing. It’s a useful package for: Having a dinosaur dataset Showing a dinosaur related variant of Anscombe’s Quartet You can now get datasauRus on CRAN, though it might not be on all mirrors just yet. install.packages("datasauRus") Credit This package wouldn’t exist without some nifty people: Read More

R Quick Tip: parameter re-use within rmarkdown YAML

  • May 8, 2017
  • Steph
Ever wondered how to make an rmarkdown title dynamic? Maybe, wanted to use a parameter in multiple locations? Maybe wanted to pass through a publication date? Advanced use of YAML headers can help! Normally, when we write rmarkdown, we might use something like the basic YAML header that the rmarkdown template gives us. --- title: "My report" date: "18th April, 2017" output: pdf_document --- You may already know the trick about making the date dynamic to whatever date the report gets rendered on by using the inline R execution mode of rmarkdown to insert a value. Read More

Minor update to HIBPwned

  • May 5, 2017
  • Steph
A new version of HIBPwned has been accepted onto CRAN. This occurred yesterday so it could still be filtering into some mirrors. HIBPwned is an R wrapper for the useful website HaveIBeenPwned and if you don’t already utilise the package or the site – you should. HaveIBeenPwned tells you when your details are included in data breaches. This is vital information to get quickly as it means you can sooner protect yourself from people trying to use the breach information to break into your accounts. Read More

Error installing latest R version (3.4.0) on Windows

  • May 3, 2017
  • Steph
UPDATE: R 3.4.1 does not have this problem so you can install that version instead If you’re getting the following error when you’ve installed R 3.4.0 on Windows, you’re not alone. Error in if (file.exists(dest) && file.mtime(dest) > file.mtime(lib) && : missing value where TRUE/FALSE needed The R team have released a patched version but right now it’s a little difficult to find out about. If you need/want the patched version, it’s available at: Read More

The making of datasauRus

  • May 2, 2017
  • Steph
Around 8:30pm I saw this tweet and duly retweeted https://t.co/WuyU9D6npK — Richie Cotton (@richierocks) May 1, 2017 It turns out awesome folks, George and Justin, had made a process whereby they can generate different distributions of points that retain the same summary statistics. They used this process for making some friends for Dino the Datasaurus who was created by Alberto Cairo. They made the data for Dino and the rest of the Datasaurus Dozen available for download. Read More

Getting started with data science – recommended resources

  • May 2, 2017
  • Steph
An oft asked question is what resources can I recommend for getting started with data science? Here are my recommendations, and if you have others, please put them in the comments! NB Links in this post may be affiliate links – it doesn’t change the prices you get but might earn me a little money Books Data Science for Business Data Science for Business Data Science for Business is a great overview book. Read More

R Quick Tip: Upload multiple files in shiny and consolidate into a dataset

  • April 28, 2017
  • Steph
In shiny, you can use the fileInput with the parameter multiple = TRUE to enable you to upload multiple files at once. But how do you process those multiple files in shiny and consolidate into a single dataset? The bit we need from shiny is the input$param$fileinputpath value. We can use lapply() with data.table‘s fread() to read multiple CSVs from the fileInput(). Then to consolidate the data, we can use data. Read More

Building an R training environment

  • April 24, 2017
  • Steph
I recently delivered a day of training at SQLBits and I really upped my game in terms of infrastructure for it. The resultant solution was super smooth and mitigated all the install issues and preparation for attendees. This meant we got to spend the whole day doing R, instead of troubleshooting. I’m so happy with the solution for an online R training environment that I want to share the solution, so you can take it and use it for when you need to do training. Read More

Logistic regressions (in R)

  • April 21, 2017
  • Steph
Logistic regressions are a great tool for predicting outcomes that are categorical. They use a transformation function based on probability to perform a linear regression. This makes them easy to interpret and implement in other systems. Logistic regressions can be used to perform a classification for things like determining whether someone needs to go for a biopsy. They can also be used for a more nuanced view by using the probabilities of an outcome for thinks like prioritising interventions based on likelihood to default on a loan. Read More

R Quick Tip: Table parameters for rmarkdown reports

  • April 19, 2017
  • Steph
The recent(ish) advent of parameters in rmarkdown reports is pretty nifty but there’s a little bit of behaviour that can come in handy but doesn’t come across in the documentation. You can use table parameters for rmarkdown reports. Previously, if you wanted to produce multiple reports based off a dataset, you would make the dataset available and then perform filtering in the report. Now we can pass the filtered data directly to the report, which keeps all the filtering logic in one place. Read More

Building your booth presence (SCE p4)

  • April 18, 2017
  • Steph

Building your booth presence is the fourth instalment of the Sponsoring Community Events series aimed at helping companies get to grips with sponsoring community events and getting the most out of them. This post covers some of the things that you should be thinking about when you are planning on having a booth at an event.

Read More

Community workshops

  • April 14, 2017
  • Steph
Following on from when we announced the availability of our community workshops, we’ve got three in the next three months that folks can attend. May 19th – Data science project in a day We’ll be in Kiev, Ukraine, doing a whole data science project in a day. This is intended to give people a little bit of code, process, and critical thinking along the whole data science workflow. This will enable folks to see how it hangs together and decide where and how much they want to invest their learning in future. Read More

Battle of the Beards: access it online

  • March 20, 2017
  • Steph
A fortnight ago I wrote Dear South Wales & Bristol readers: I need your help as Battle of the Beards was failing. Well, we didn’t manage to make it to the minimum amount of attendees to go forward with it and on Thursday it looked like I was going to have to cancel it. Then inspiration struck … the major reasons why maybe people didn’t want to attend are: Too far to travel Hard to get a day off work Not interested in all the talks What would make it so people didn’t have to travel, could watch from work, and see only the sessions they’re interested in? Read More

Dear South Wales & Bristol readers: I need your help

  • March 6, 2017
  • Steph

I need your help.

Battle of the Beards is an annual tech event in Cardiff that’s previously been an evening affair but is now a day-long conference. We’re hosting half-hour talks on security, infrastructure, software craftmanship, front-end development, and data visualisation. We’re starting out the day with bacon baps and it just gets better from there. Tickets cost £15 and there’s the option to add a donation to the charity we’re supporting with the event, the Campaign Against Living Miserably.

Right now ticket sales are really low and without your help, we’ll have to cancel the event.

I’m hoping you can help make this event succeed by doing one or both of two things:

  • register if it’s of interest to you
  • recommend it to others

[button text=”REGISTER” color=”orange” link=”https://battleofthebeards.eventbrite.co.uk”] [button text=”TWEET” color=”blue” link=”https://twitter.com/home?status=Hey%20%23tweeps%20-%20%23battleofthebeards%20is%20on%20March%2029th%20in%20Cardiff.%20You%20should%20check%20it%20out!%0Abattleofthebeards.eventbrite.co.uk”]

Read More

R Quick tip: Microsoft Cognitive Services’ Text Analytics API

  • March 1, 2017
  • Steph

Today in class, I taught some fundamentals of API consumption in R. As it was aligned to some Microsoft content, we first used HaveIBeenPwned.com‘s API and then played with Microsoft Cognitive Services‘ Text Analytics API. This brief post overviews what you need to get started, and how you can chain consecutive calls to these APIs in order to perform multi-lingual sentiment analysis.

UPDATE: See improved code in Using purrr with APIs – revamping my code

Read More

Announcing community R workshops

  • February 27, 2017
  • Steph

A big part of why I’ve launched Locke Data is so that I can give back more to my communities. I want to give more time and more support to others. One of the first steps is doing some activities that give financial support to community groups without damaging my startup cashflow! Community R workshops that fund local user groups is the first activity I’ll be trialling.

Here’s what’s involved, and what you might want to consider if you’d like to be a part of this endeavour:

Read More

Quick tip: knitr Python Windows setup checklist

  • February 22, 2017
  • Steph
One of the nifty things about using R is that you can use it for many different purposes and even other languages! If you want to use Python in your knitr docs or the newish RStudio R notebook functionality, you might encounter some fiddliness getting all the moving parts running on Windows. This is a quick knitr Python Windows setup checklist to make sure you don’t miss any important steps. Read More

Is my time series additive or multiplicative?

  • February 20, 2017
  • Steph

Time series data is an important area of analysis, especially if you do a lot of web analytics. To be able to analyse time series effectively, it helps to understand the interaction between general seasonality in activity and the underlying trend.

The interactions between trend and seasonality are typically classified as either additive or multiplicative. This post looks at how we can classify a given time series as one or the other to facilitate further processing.

Read More

Talking Data and Docker

  • February 8, 2017
  • Steph
If you need to know about persisting data in the world of containers then I recently did a talk and a spot on a podcast that should help you out. My NDC London talk Data + Docker = Disconbobulating? cover the basics and architectural decisions. In my podcast spot Data and Docker on .Net Rocks we go into more depth about the architectural decisions facing you when working with data and Docker. Read More

CRISP-DM and why you should know about it

  • January 13, 2017
  • Steph

The Cross Industry Standard Process for Data Mining (CRISP-DM) was a concept developed 20 years ago now. I’ve read about it in various data mining and related books and it’s come in very handy over the years. In this post, I’ll outline what the model is and why you should know about it, even if it has that terribly out of vogue phrase data mining in it! 😉

Read More

Going solo!

  • January 4, 2017
  • Steph
The year has started out on a high for me. I’ve handed in my notice Censornet and I was re-awarded the Data Platform MVP award by Microsoft. I handed in my notice not to go to a new job but to fly solo! I’m starting Locke Data in February to help people embed data science skills in their organisations. Business intelligence has been a thing long enough that there’s a whole department of people dedicated to it and it generally isn’t disruptive to other areas of the business and IT. Read More

I Love Azure Functions!

  • December 1, 2016
  • Steph
A while ago, I started my Stumbling Into series. I started but only got one in – I was gonna talk about how I failed with Azure Functions next. I was failing because the docs outside C# (and node.js) were so limited that I found it difficult to get things done. However, I persevered and overcame a little bit of C#-ophobia and I can honestly say it has been so worth it. Read More

An experiment in self-promotion – Revive Old Posts

  • November 24, 2016
  • Steph

I’ve been writing (not enough) blog posts for a while now and have built up some neat stuff in the backlog if I may so myself. Alas, a lot of this doesn’t get seen because it’s not on the front page or in the top 5 blog posts. Sad that posts like my one on sixth normal form databases don’t get enough love, I’ve installed the WordPress plugin Revive Old Posts (ROP) to try countering this!

Read More

My first MVP Summit – a fantastic reminder

  • November 14, 2016
  • Steph
I had the tremendous pleasure of going to the Microsoft MVP Summit this week and it was a fantastic experience. It also taught me a valuable lesson – I need to be an attendee more. Microsoft award ~4,000 people their Most Valued Professional (MVP) award each year. MVPs are influential, helpful people who work with Microsoft services. I’m not sure what I did to get in when so many awesome folks I know haven’t but I’m very proud to be in receipt of the Award. Read More

A note to (potential) new speakers: It’s ok not to be perfect!

  • November 8, 2016
  • Steph
This is a T-SQL Tuesday Post in response to Andy Bek’s kick off about growing new speakers. You can write your own advice for new speakers, or blog your journey to speaking. I’m always trying to encourage new speakers and the biggest fear I hear is “I won’t be any good at it”. Well, you won’t be perfect at it, that’s for sure. You may start off really bad at it. Read More

Quick tip: Passing values to a bash script

  • October 30, 2016
  • Steph
This is a very quick post on how you can make a bash script that allows you to provide it values via the command line. Passing values to a bash script uses a 1-based array convention inside the script, that are referenced by prefixing with $ inside the script. This means that if I provide .\dummyscript.sh value1 value2, inside the dummyscript.sh I can retrieve these by referencing their positions: echo $1 + $2 For improved clarity, you could assign them to new variables Read More

GirlswithDeepPockets.com

  • October 28, 2016
  • Steph
Ok, this post is about one of my latest crazy/harebrained/whacky ideas. I’m fed up of having to carry my Galaxy Note 3 in my hand. I can’t stand handbags and most women’s clothing items don’t have pockets or the pockets are insufficient. Given how easy it is to build a website these days, I thought I’d become a sofa warrior for the campaign for pockets. I’ve made a site and aim to make it an open & technical backend. Read More

5 useful CSS sites

  • October 27, 2016
  • Steph
I’ve been doing a lot of web development recently, primarily via the magical Hugo platform. Between it and the great themes for it, it’s making website building fairly painless. Of course, each theme often needs customising to the relevant brand a given site is for. That customising is usually just one by some fonts and by tweaking the CSS.* I’ve been relying on some old, and some very new, funky tools to help with CSS hacking and I thought I would share them, in case they should prove useful to you in the future. Read More

Slack all the things!

  • October 21, 2016
  • Steph
Slack all the things! OK, if you haven’t heard of it before Slack is kinda like IRC, kinda like Dropbox, kinda like a lot of things – it’s a neat place to bring together communications between your team or community, and the integrations allow you to pipe in external feeds like twitter activity or RSS. It’s a great way of collaborating online and I’ve found it especially useful not just within a company but within a global community. Read More

Stepping down from SQL Relay

  • October 13, 2016
  • Steph
Some folks may already know, but I handed in my resignation from SQL Relay as sponsorship lead and Cardiff organiser. Over my time in SQL Relay, I’ve helped deliver 30 conferences. I’ve attended about 15 of those! Being able to deliver so much learning to people all around the country has been an incredible experience and I’m tremendously proud of everything SQL Relay has achieved. However, SQL Relay has become increasingly difficult for me to dedicate the time to. Read More

Unit testing in SSDT – a quick intro

  • October 10, 2016
  • Steph

This post will give you a quick run-through of adding tSQLt to an existing database project destined for Azure SQL DB. This basically covers unit testing in SSDT and there is a lot of excellent info out there, so this focuses on getting you through the initial setup as quickly as possible. This post most especially relies on the information Ed Elliot and Ken Ross have published, so do check them out for more info on this topic!

Read More

ItsALocke.com now with added locks

  • October 7, 2016
  • Steph
This blog now has some extra locks, these are in the URL bar! It took my fantastic hosters WPEngine a little longer than I would have preferred to get a modern SSL policy. Now that they have, they did it in their typically awesome fashion. You can request a free Let’s Encrypt SSL certificate from the admin dashboard, and configure how http etc should work in less than 15 minutes, and you don’t have be a web wizard to do it. Read More

2016 PASS Board of Directors Candidate Town Halls

  • October 6, 2016
  • Steph

It’s PASS Board of Directors elections again! After a number of twitter discussions last week about the applicability of PASS outside of the US and what I think PASS is good and bad at, I thought I would engage the process instead of just being a complainy-pants. I attended all 6 town hall webinars and asked questions to all the candidates. I recommend you watch them before voting.

Find out more about the candidates, the PASS Board of Director elections, and how to vote on the PASS website.

Read More

Reflections: SQL Relay is coming

  • September 27, 2016
  • Steph
I’ve been pretty quiet recently, I haven’t presented much, I haven’t blogged much, I haven’t worked on my open source projects much. All my energy left over from my major work project has been going into SQL Relay. SQL Relay is an ambitious project every year. We organise a conference that goes on tour. In previous years, I’ve gone to 8 cities over two weeks. Over the past 4 years, I’ve been part of organising 30 conferences. Read More

Finished my first GameMaker Game

  • September 17, 2016
  • Oz
Getting started with GameMaker, making Asteroids! I mean OK, it’s just a clone of a classic, but isn’t that how fledgling artists practice? Initially created following a tutorial, I then went through and added a lot of extra features, including music, splash screens, a pause function and much cleaner code. Embedding by i-frame is pretty ugly, so please follow the link below to try the game. Asteroids Game A/D or Left/Right to turn W or Up to move Space or Right Control to shoot Escape to pause You can download the gmz file, and import into your copy of Game Maker if you’d like, using this link: https://dl. Read More

HIBPwned updated on CRAN

  • September 15, 2016
  • Steph
Haveibeenpwned.com is a fantastic service that helps people find out if they’ve been involved in a data breach. HIBPwned is an R wrapper for that service. Recently, due to abuse of the system, Troy Hunt had to add a limit of one request per 1.5s. The new version published on CRAN last night adds a delay into each call so that we can continue to use it in R. Check out the package on CRAN for vignettes and more information on the package. Read More

Being an Organised Sponsor (SCE p3)

  • August 30, 2016
  • Steph
Being an Organised Sponsor is the third instalment of the Sponsoring Community Events series aimed at helping companies get to grips with sponsoring community events, and getting the most out of them. This post covers how to organise yourself and the common activities needed to get the most out of your sponsorship of a community event. Project management Rarely is sponsorship a simple transaction, there’s often deliverables from both parties at different times over a period of anything up to a year. Read More

Assessing Sponsorship Opportunities (SCE p2)

  • August 4, 2016
  • Steph
Assessing Sponsorship Opportunities is the second instalment of the Sponsoring Community Events series aimed at helping companies get to grips with sponsoring community events and getting the most out of them. This post covers some of the things that you should be thinking about when you are considering sponsoring an event. What’s the point? Before entering into a sponsorship agreement you need to have a firm idea of what you’re hoping to achieve. Read More

Sponsorship Basics (SCE p1)

  • August 2, 2016
  • Steph
Sponsorship Basics is the first installment of the Sponsoring Community Events series aimed at helping companies get to grips with sponsoring community events, and getting the most out of them. What is a community event? A community event is one organised by members of the community, as opposed to one run by one or more companies with a financial interest in the community. These events are fundamentally different because they are not being run for profit, instead, they’re run to assist other members of the community to increase their skills. Read More

Sponsoring community events (SCE)

  • August 1, 2016
  • Steph
Sponsoring community events – is it right for you? This series of posts will take you through the things you need to know to help you decide. Over the coming weeks, this new series will go through the in’s and out’s of sponsoring community events. Community events are fantastic from an attendee perspective, but when you’re handing over cash you need to know what you’re letting yourself in for and how you get return on investment (ROI). Read More

Giving back with code

  • July 20, 2016
  • Steph

From code in answers on Stack Overflow to R packages or full programs, there’s a lot of code being written and given away. This post examines some of the reasons why the people writing all that code do it, why you should consider giving back with code, and how you can get started. Finally, I cap it all off with perspectives from some of my favourite coders!

Because reasons

There are many reasons why you should consider writing code and making it available for public consumption.

Altruistic

  • If you’re writing something to achieve a task, odds are someone else would have to write the same code – why not help them out?
  • You’re using a lot of open source software, whether you realise it or not. By open sourcing your code, you get to pay it forward
  • To give others something to contribute to

Career

  • Unknown quantities are risky hires, put your code out there for the world to see and employers get to see what you can do
  • Develop your skills for the next job, the one that requires you to be more skilled in something than you are now
  • You get to interact with a lot of different people who you build credibility with, and hopefully friendships!

For oneself

  • Generally speaking, the more code you write, the better your coding skills so if you want to improve your skills this is an ideal way to do it
  • For the sheer fun of doing cool stuff, especially if you don’t get to do cool stuff in the day job
  • To do it “the way it should be done”

Read More

Ideas for (lightning) talks

  • July 12, 2016
  • Steph
I’m trying to encourage more lightning talks at my user groups, and I started by writing a plea to folks at my local R user group, caRdiff. In it I included some ideas for lightning talks, and of course, these can be used as the basis for long talks too. We had some fun batting this list around and expanding it in the Cardiff dev group. I thought it was worth sharing, and getting some more ideas from you! Read More

Stumbling into … Azure Automation

  • July 11, 2016
  • Steph

I’ve recently been trying to solve the challenge of working extracting files from AWS and getting them into Azure in my desired format. I wanted a solution that kept everything on the cloud and completely avoid local tin. I wanted it to have built-in auditing and error handling. I wanted something whizzy and new, to be honest! One way in which I attempted to tackle the task was with Azure Automation. In this post, I’ll overview Automation and explore how it stacked up for what I was attempting to use it for.

Overall Task: Get compressed (.tar.gz) files from AWS S3 to Azure, decompress the files, concatenate the contents and put in a different container for analytics magic

Like with most things I dropped myself into the deep-end on it so had fairly minimal knowledge of PowerShell and the Azure modules, therefore I fully expect more knowledgeable folks to wince at my stuff. General advice, “you should do it like this, then this…”‘s, and resource recommendations are all very welcome – leave a comment with them in!

Azure Automation

Azure Automation is essentially a hosted PowerShell script execution service. It seems to be aimed primarily at managing Azure resources, particularly via Desired State Configurations.

It is, however, a general PowerShell powerhouse, with scheduling capabilities and a bunch of useful features for the safe storage of credentials etc. This makes it an excellent tool if you’re looking to do something with PowerShell on a regular basis and need to interact with Azure.

Read More

Bad ways to run a user group

  • July 7, 2016
  • Steph
I love user groups and I always want there to be more. I’m not a perfect organiser but I run reasonable groups. When I see organisers doing it badly, it makes me sad. There’s lots of great ways to run a user group, but I thought I’d cover some of the bad ways to run a user group. The anti-patterns if you will 😀 Don’t advertise Your group isn’t on Twitter. Read More

Not an expert

  • June 29, 2016
  • Steph
I don’t think of myself as an expert because an expert is someone with very deep knowledge of a comparatively narrow field. For better or worse, a lot of my sense of satisfaction with life derives from throwing myself into some enterprise that I don’t have the people skills, the knowledge, and/or the resources for succeeding. I welcome the failures, the dead ends, the crises of faith, because if it wasn’t hard it wouldn’t be worth doing. Read More

My PASS #Summit2016 submissions feedback

  • June 23, 2016
  • Steph

I really liked the way Brent showed us his feedback received and since mimicry is the best form of flattery, I thought I’d go ahead and do it too!

I didn’t get any accepted abstracts, and I’m actually grateful. The recent stresses to do with the PASS dramas aside, I would have had to use 5 days holiday time, pay for flights and hotel, and then flown out a week later for MVP Summit. Now I can attend some other conferences and/or have a Christmas break! Woo hoo 😀

Read More

Use your .Rprofile to give you important notifications

  • June 23, 2016
  • Steph
In R, we can use a file called .Rprofile to do things in R based on a number of triggers. One thing I’ve done is give myself a DIY notification of how many data breaches I’ve been involved in! First of all, you need a file called .Rprofile that’s stored in your working directory. Some useful resources about .Rprofiles can be found on .Rprofile CRAN docs and an .Rprofile intro. Read More

Azure Storage Accounts – Resource Groups matter to PowerShell!

  • June 10, 2016
  • Steph
I’m sure that all my PoSh friends out there, who use Azure and PowerShell all the time probably know this already but I thought I’d share a little snippet of hard-won knowledge. When you put an Azure Storage Account into a Resource Group, you can no longer use the default Azure.Storage module. Instead, you’ve got to use the AzureRM.Storage module. All the scripts I encountered whilst googling how to connect to blob storage via PowerShell, including the ones in the script gallery within Azure Automation seemed to all assume the azure storage account you wanted to connect to was standalone. Read More

HIBPwned on CRAN

  • June 9, 2016
  • Steph
Part of my (slowly) working pipeline of coding projects has been an R package that wraps the fantastic HaveIBeenPwned.com API. If you’re not already familiar with HaveIBeenPwned, rectify the situation, NOW! Don’t worry about continuing to read the rest of the post; getting yourself signed up for account breach notifications is way more important!Go now! /giphy run With that stern admonishment out of the way… HIBPwned is a feature complete R package that allows you to use every (currently) available endpoint of the API. Read More

Recent presentations

  • June 1, 2016
  • Steph
The last month or so has been a whirlwind of awesomeness with a veritable bevvy of user group and conference talks on my part! I thought I would share the materials with you and provide some brief thoughts on how each presentation went. Sessions SQL Saturday Exeter : Stats 101 London Business Analytics (LBAG) : Skilling up to code with data SQLBits & TUGA : Cut the R Learning Curve SQLBits & TUGA : R in the Microsoft Data Platform (full day of training) IT Pro Portugal : Being lazy with infrastructure SQL Saturday Exeter My presentation, in my opinion, was exceedingly brave. Read More

#satRdays voting closes May 31st

  • May 27, 2016
  • Steph
Voting for 2 of the 3 locations for satRday conferences will be closing at the end of May 31st (GMT). It’s been a phenomenal turnout with more than 1,500 votes so far. You can still vote if you haven’t already! EU status Budapest, Hungary, is where Gergely will be throwing the EU event and it’s tentatively set for September. US status Chicago started out with a close runner of Washington DC, but that all changed as folks realised they could visit Puerto Rico and get fantastic learning, or Puerto Rico has for more R people than my tenuous grasp of geography led me to expect. Read More

satRday location voting now open

  • May 11, 2016
  • Steph
satRdays, free R conferences, are a project being supported by the R Consortium. When Gergely and I submitted our proposal, we said we’d be supporting three conferences: Budapest, Hungary (Gergely’s home turf) Somewhere in the US Somewhere else in the world We’ve had an overwhelming response with 40 submitted conferences but for the fully-funded ones, there can only be three. We are looking at how the runners up can do the next wave of events but we want to get the ball rolling on the first three. Read More

Installing SQL Server ODBC drivers on Ubuntu 15.04

  • April 20, 2016
  • Steph

UPDATE 2016-10-21 : You can now get the ODBC 13 driver for Linux with a much smoother install process than below. Get all the relevant information on the announcement from the Microsoft SQLNCli team blog.

Did you know you can now get SQL Server ODBC drivers for Ubuntu? Yes, no, maybe? It’s ok even if you haven’t since it’s pretty new! Anyway, this presents me with an ideal opportunity to standardise my SQL Server ODBC connections across the operating systems I use R on i.e. Windows and Ubuntu. My first trial was to get it working on Travis-CI since that’s where all my training magic happens and if it can’t work on a clean build like Travis, then where can it work? Alas, the ODBC 13 driver doesn’t work Ubuntu 14.04 so this set of instructions has been modified to provide code for Ubuntu 15.04 only.

TL;DR

It works, but it’s really hacky right now. Definitely looking forward to the next iterations of this driver.

Disclaimer

  • This will work for Ubuntu 15.04 but 14.04 has a different set of C compilers
  • This is currently hacky, and Microsoft are on the case for improving it so this post could quickly become out of date.
  • Be very careful installing the driver on an existing machine. Due to the overwriting of unixODBC if already installed and potential compatibility issues with other driver managers you may have installed.

Read More

Shiny module design patterns: Pass module inputs to other modules

  • April 19, 2016
  • Steph

Continuing in the series of shiny module design patterns, this post covers how to pass all the inputs from one module to another.

TL;DR

Return input from within the server call. Store the callModule() result in a variable. Pass the variable into arguments for other modules. Access the variable like you would input. Steal the code and, as always, if you can improve it do so!

Read More

Shiny module design patterns: Pass module input to other modules

  • April 14, 2016
  • Steph

Following on from looking at the shiny modules design pattern of passing an input value to many modules, I’m now going to look at a more complex shiny module design pattern: passing an input from one module to another.

TL;DR

Return the input in a reactive expression from within the server call. Store the callModule() result in a variable. Pass the variable into arguments for other modules. Steal the code and, as always, if you can improve it do so!

Read More

Using Travis? Make sure you use a Github PAT

  • April 12, 2016
  • Steph

We’re in the fantastic situation where lots of people are using Travis-CI to test their R packages or use it to test and deploy their analytics/ documentation / anything really. It’s popularity has been having a negative side-effect recently though! GitHub rate limits API access to 5000 requests per hour so sometimes there are more R related jobs running on Travis per hour than this limit, causing builds to error typically with a message that includes

403 forbidden

This error will cause your build to fail, even if you didn’t do anything wrong. To solve it short-term you can wait a little while and restart your build.

How to restart a build in Travis-CI

How to restart a build in Travis-CI

That is a very short-termist solution and does not solve the problem for future you or other users of the service. The real solution to resolving this issue is to get off the default API access credentials and use your own.

The R integration in Travis makes good use of the devtools. The devtools package looks for an environment variable called GITHUB_PAT that holds a personal access token (PAT) for using the GitHub API and if it doesn’t find one it uses a default token. When we get our own PAT and store it in Travis, devtools will pick up our token and use it, meaning you’ll only ever get rate limited if you do more than 5000 builds in an hour, which is an achievement I’d love to hear about.

Read More

Shiny module design patterns: Pass a single input to multiple modules

  • April 8, 2016
  • Steph

For the awesome Shiny Developers Conference back in January, I endeavoured to learn about shiny modules and overhaul an application using them in the space of two days. I succeeded and almost immediately switched onto other projects, thereby losing most of the hard-won knowledge! As I rediscover shiny modules and start putting them into more active use, I’ll be blogging about design patterns. This post takes you through the case of multiple modules receiving the same input value.

TL;DR

Stick overall config input objects at the app level and pass them in a reactive expression to callModule(). Pass the results in as an extra argument into subsequent modules. These are reactive so don’t forget the brackets. Steal the code and, as always, if you can improve it do so!

Read More

R Quick Tip: Collapse a lists of data.frames with data.table

  • April 5, 2016
  • Steph

With my HIBPwned package, I consume the HaveIBeenPwned API and return back a list object with an element for each email address. Each element holds a data.frame of breach data or a stub response with a single column data.frame containing NA. Elements are named with the email addresses they relate to. I had a list of data.frames and I wanted a consolidated data.frame (well, I always want a data.table).

Enter data.table …

data.table has a very cool, and very fast function named rbindlist(). This takes a list of data.frames and consolidates them into one data.table, which can, of course, be handled as a data.frame if you didn’t want to use data.table for anything else.

Read More

Auto-deploying documentation: better change tracking of artefacts

  • April 4, 2016
  • Steph

As part of my never-ending quest to deploy documentation better, I’ve made yet another tweak to my scripts that deploy R vignettes or Rmarkdown documents to the gh-pages branch of my github repositories via Travis-CI.

The script from Robert Flight that’s provided the basis for most of this work does something specific to update the web facing branch of the repository. It would:

  1. Create a blank repository

  2. Add the requisite files to the repository

  3. Add and commit them to the repo

  4. Force the repo to overwrite the gh-pages branch

This had the unfortunate consequence of losing the history of what was previously hosted on the branch and could not tell me what commit to my development branches was responsible for a version of the docs. It took a little bit of playing but the revised script now:

  1. Clones the gh-pages branch

  2. Adds the requisite files into the reports

  3. Add and commit them to the repo

  4. Push the changes

Using an environment variable ($TRAVIS_COMMIT) the commit message is the commit ID for the latest commit in the build that occurs on Travis, making it very easy to see what changes triggered a documentation update.

Read More

R package news: tfsR, HIBPwned, mockaRoo

  • March 24, 2016
  • Steph
This is a brief update on my packages not currently on CRAN: tfsR, HIBPwned, and mockaRoo. tfsR tfsR is designed to help you work git repositories in Microsoft Team Foundation Server (TFS) and Visual Studio Team Services (VSTS). I wrote the package a while ago and it has/had just two functions; one for getting a list of git repositories, and one for making a new git repository. The release of httr 1. Read More

satRdays are go!

  • March 23, 2016
  • Steph
I’m very pleased to say that the R Consortium agreed to the support the satRday project! The idea kicked off in November and I was over the moon with the response from the community, then we garnered support before submitting to the Consortium and I must have looped the moon a few times as we had more than 500 responses. Now the R Consortium are supporting us and we can turn all that enthusiasm into action. Read More

HIBPwned, an R package for HaveIBeenPwned.com

  • March 21, 2016
  • Steph

The answer in life to the inevitable question of “How can I do that in R?” should be “There’s a package for that”. So when I wanted to query HaveIBeenPwned.com (HIBP) to check whether a bunch of emails had been involved in data breaches and there wasn’t an R package for HIBP, it meant that the responsibility for making one landed on my shoulders. Now, you can see if your accounts are at risk with the R package for HaveIBeenPwned.com, HIBPwned.

Have I Been Pwned | HaveIBeenPwned.com
Have I Been Pwned | HaveIBeenPwned.com

Current status

The package is currently available on github @ stephlocke/HIBPwned, but I intend to submit to CRAN after getting some feedback from y’all.

Read More

@SQLCardiff: Return of the Beard (2016-03-30)

  • March 15, 2016
  • Steph
Just a quick heads up to peeps in and around Cardiff, Wales. Later this month we’re holding Battle of the Beards: Return of the Beard. Fantastic speakers with resplendent beards are joining us for our 6 sq. ft of pizza to present on advanced SQL Tricks, using PowerBI as a DBA, and stepping up to the challenge of the last minute audit. This is a great chance to meet folks interested in the Microsoft Data Platform and learn from some incredibly knowledgable speakers. Read More

SSH tunnels on Windows for R

  • March 14, 2016
  • Steph

Recently I’ve had to get to grips with SSH tunnels. SSH tunnels are really useful for maintaining remote network integrity and work in a secure fashion. It is, however, a pain to open PuTTY and log in all the time, mainly because I couldn’t script it in R! It’s been a trial, but like most things it turned out to be pretty simple in the end so I thought I’d share it with you.

What’s required?

Beware the Microsoft Edge as a PDF reader

  • March 9, 2016
  • Steph
Just a heads-up for people like me who’ve gotten a Windows 10 machine and have used Edge as a PDF reader. I was too lazy to install Adobe Reader and was instead using Edge as my default reader. This gave me a mini-heart attack when I received a proof for my super cool NFC-tag laptop stickers and the colour was wrong. WTF right, I mean we did send it as CMYK and all that jazz so it should be right, a printer wouldn’t screw that up, right? Read More

mockaRoo – making realistic test data in R

  • March 8, 2016
  • Steph

When I’m building stuff in R like packages, models, etc. I find myself wishing for realistic looking test data without having to resort to getting data off my production server. To that end I’ve been on the hunt for a way of generating decent test data. A few months back I stumbled upon the neat system Mockaroo which provides a GUI to build some data that suits your needs.

Mockaroo is a really impressive service with a wide spread of different data types. They also have simple ways of adding things like within group differences to data so that you can mock realistic class differences. They use the freemium model so you can get a thousand rows per download, which is pretty sweet. The big BUT you can feel coming on is this – it’s a GUI! I don’t want to have spend time hand cranking a data extract.

Thankfully, they have a GUI for getting data too and it’s pretty simply to use so I’ve started making a package for it.

I’ve started the package on github and will be developing it over the next month or two. It’s up and working, but only in the most primitive way as I’d like to get some feedback from folks who might find this useful around how the interface for generating your desired data schema should work.

Read More

Upcoming R for Microsoft training days

  • March 7, 2016
  • Steph
In May, I will be delivering two R for Microsoft training days. These two days will focus on some R fundamentals and applying these fundamentals within the Microsoft Data Platform. These training days are ideal if you know one half of the components – whether that’s the R bit or the Microsoft BI bit. Either way, you’ll learn about the other half. SQLBits SQLBits XV is being held in Liverpool this year and my training day is on May 4th. Read More

Declutter a shiny report’s code v2.0

  • March 3, 2016
  • Steph
I wrote a year ago on a way to declutter shiny report code which involved putting objects into a sourced file, however, at that point in time the solution was a bit brittle and clunky. Now there’s a better way to develop shiny applications – shiny modules. In October, RStudio introduced the concept of modules which involves abstracting code out into self-contained blocks. Modules are ways of batching your code into discrete chunks – you keep all the code related to the inputs, manipulation, and presentation for doing something in one module. Read More

My life, my universe, my everything

  • March 1, 2016
  • Steph
Last year involved moving jobs to Mango as Principal Consultant, moving home, getting a dog, and Oz becoming the Purple Tadpole. That was on top of SQL Relay, copious presentations, and much travelling. By the end of the year I was pretty ill and exceedingly grumpy. Oz wasn’t having a huge amount of fun either between me never being home and him having to hold the fort in a massive way. Read More

Fixing the Tiny Icons, Big Text issue on my XPS13

  • January 13, 2016
  • Oz

The Issue

I love my Dell XPS13. It’s fast, sleek and gorgeous. It does however have one little problem: the icon and text size. The text was always too big for the buttons and boxes and the icons were so small you could hardly see them. This made it hard to use my machine without an external screen (which doesn’t have that issue and should have been my first clue!)

An example of the issue I was dealing with

Read More

satRdays: final push

  • January 6, 2016
  • Steph
I talked back in November about the idea of an RSaturday, which were free community-driven conferences on R. Since then, we created a GitHub repository and started hammering out the details for satRdays. The current proposal consists of: A name: satRday A proposition: Free/cheap (<£30) conferences organised by user groups around the world. Attendees get more access to training in R, with a much lower cost-barrier. We develop more speakers on R. Read More

optiRum 0.37.3 now out

  • January 4, 2016
  • Steph
Just a quick heads up to announce the availability of optiRum 0.37.3 – this takes into account the new version of ggplot2 and is backwards compatible. Read More

Anchor Modelling: Sixth Normal Form databases

  • December 31, 2015
  • Steph
About Anchor Modelling Anchor Modelling moves you beyond third normal form and into sixth normal form. What does this mean? Not sure about the normal forms? See the normalization process in actions with this normalisation example Essentially it means that an attribute is stored independently against the key, not in a big table with other attributes. This means you can easily store metadata about that attribute and do full change tracking with ease. Read More

Auto-deploying documentation: Rtraining

  • December 23, 2015
  • Steph
In my last post on using GitHub, Travis-CI, and rmarkdown/knitr for automatically building and deploying documentation, I covered how I was able to do it with a containerised approach so things were faster. I also said my Rtraining repository was still too brittle to blog about. This has changed – WOO HOO! The main thanks for that goes out to the new package ezknitr from Dean Attali. ezknitr takes the pain out of working directories, making my hierarchies much more resilient. Read More

Should presenters have to pay to attend?

  • December 7, 2015
  • Steph
I recently did something for the first time: I declined to speak somewhere. It was never stated on the submission page, and was raised only after my session was accepted – they wanted me to buy a ticket to attend and I refused to do that. As a speaker I love donating my time and I really don’t mind paying my own Travel and Expenses (T&E) but to have to pay to get in the door of the place I’m speaking at feels wrong. Read More

Auto-deploying documentation: FASTER!

  • November 13, 2015
  • Steph

Over the past few years I’ve been delving deeper into automatically building and deploying documentation and reporting in R (with rmarkdown, LaTeX etc). This post covers another step forward on that journey towards awesomeness.

Read More

Boris’ presentation agency – 356labs

  • November 11, 2015
  • Steph

Boris Hristrov, Data Platform MVP, design whizz, and all-round great guy, recently launched 356labs. Boris wrote a great Presentation Design course for PluralSight, you can sign up for a trial of PluralSight and watch the course if you’d like to find out more.

Being an avid reader of design stuff I did find I knew some of the things on the course, but the context and application were very helpful. Off the back of his course, I went on to produce my most visually impressive presentation slide deck to date – Agile BI.

I took a look over his site and asked a few questions since I was really curious. Here are the responses!

Read More

optiRum 0.37.1 now out

  • November 7, 2015
  • Steph

A while back, I wrote about how I was waiting to be able to release optiRum to CRAN, well data.table 1.9.6 was released (a key dependency for new functionality) and I’ve finally had some quiet time. So optiRum 1.37.1 is now accepted and trickling through the CRAN publish processes.

Read More

SQLSaturdays but for R?

  • November 4, 2015
  • Steph

UPDATE: Proposal now being developed after fantastic community support. Check out satRdays on GitHub and contribute your opinions!

I had a contact from a very nice chap in Dallas a month ago about whether in the R world we do anything like SQLSaturdays.

The great thing about the SQLSaturdays he said was not that they’re free (well it helps!) but that they’re on his time. Developing his skills was something he couldn’t get signed off by his boss so he wanted to be able to do it by himself.

In answer to the question of whether there are local(ish) weekend conferences happening regularly for R, my answer was “not really” and it’s a shame because the R community is fantastic. I started thinking about why we don’t have them and what would be needed to change that.

Free / cheap regional small-medium conferences are a must for growing user knowledge and speakers in R.

Read More

Some recent posts over on the Mango blog

  • October 28, 2015
  • Steph
Since August, I’ve had the pleasure to work at Mango Solutions, a data science consultancy, as a Principal Consultant. In that time, I’ve been to EARL London, SQL Relay, and SQL in the City, so conference season has been in full swing with more to come in the form EARL Boston next week. Surprisingly, I’ve also found some time to help some customers out and write some blog posts over on the Mango site. Read More

Women doing technology

  • October 24, 2015
  • Steph

Yesterday, another Women in Technology conference got forwarded around and looking at the agenda, I snapped. I asked to not see any more goddamn WiT conferences.

I’m really fed up with women talking about being in tech. I don’t perceive any value in attending a conference dedicated to that. I want to see more women talking about doing tech.

Read More

DataOps – it’s a thing (honest)

  • October 16, 2015
  • Steph

Today, I presented a lightning talk on DataOps at SQL in the City. It was a fantastic day and a great opportunity to catch up with how the database side of things is evolving to embrace DevOps.

My lightning talk was titled DataOps – it’s a thing (honest) and focused on what is essentially DevOps ported out of the developer sphere and into the data professional sphere.

Read More

SQL Relay is here!

  • July 16, 2015
  • Steph
SQL Relay is back on tour! Our 6th event sees us yet again touring the country bringing awesome speakers to eight cities over 2 weeks. As usual we’re improving things: A new platform called Attendee.Events which allows easier registration & speaker submission A whole bunch of webinars planned to get in the swing of things More breadth – this year each event includes a dedicated track to R, machine learning etc, on top of SQL Server and business intelligence More content – instead of swapping out existing slots for breadth, we added more tracks A green initiative … see it on the day Improved speaker experience – we’re building on last year’s fun bus with the help of SQL Sentry to make it easier and more fun for speakers to do multiple events, as well as offering first timers some mentoring as most of the organisers are also speakers You’ll see a lot of announcements about Relay over the next few months, but I hope this little post inspires you to check out our events 😀 Read More

R training day mk2 – @SQLSatMcr

  • June 24, 2015
  • Steph

Back in April, for SQL Saturday Exeter I ran my first ever full day of training. Next month sees me taking my second tilt at it.

To sign up for my R training day, July 24th, in Manchester you can go to the pre-con homepage.

If I may say so myself, it’s a steal at £99 but then they all are! For instance, Andrew Fryer’s training day covers the Machine Learning use of R via Azure, so if you’re already wrangling numbers like a pro in R, understanding how you can apply it to snazzy webservices is a great way to go.

Read More

Custom LaTeX Beamer style templates for rmarkdown

  • June 17, 2015
  • Steph

I’ve been producing presentations via R using rmarkdown and outputting to either ioslides or slidify. That was excellent, because I could provide a CSS that customised the look and feel (relatively) easily*.

However, when I wanted to produce a PDF version, I couldn’t make ones that look as good as the pure LaTeX versions I could make on overleaf.com. So I started RTFMing when I wanted to replicate the look and feel from my presentation, The LaTeX Show.

I didn’t want to spend a huge amount of time on it, so this little story of hack and slash may feel a bit dirty to you!

Read More

Auto-deploying documentation: multiple R vignettes

  • June 5, 2015
  • Steph

Following on from my post about the principles behind using travis-ci to commit to a gh-pages I wanted to follow-up with how I tackled my “intermediate” use case.


Posts in this series

  1. Automated documentation hosting on github via Travis-CI
  2. Auto-deploying documentation: multiple R vignettes
  3. Auto-deploying documentation: FASTER!
  4. Auto-deploying documentation: Rtraining
  5. Auto-deploying documentation: better change tracking of artefacts

Multiple vignettes

In my original post I show how I pushed the tfsR vignette to gh-pages, which involved copying it and renaming it to index.html.

Unfortunately, this wouldn’t work if I had multiple vignettes that I wanted to be accessible online.

Requirements

  • An index.html file
  • A way of extracting any number of html files from the vignette folder

    Read More

My (first) #SQLHangout

  • June 4, 2015
  • Steph
Yesterday I had the pleasure of hanging out “on air” with Boris Hristov. We talked open sourcing! You can download and/or contribute to the following projects I have going: MeDriAnchor: the Metadata Driven Anchor model system. You get to have a lot of fun with 6NF! optiRum: a useful package for R, especially for the UK tfsR: for my own special brand of crazy, working with TFS git repositories in R James Skipwith (the other major developer on MeDriAnchor) presented on automation and covers MeDriAnchor – you can check it out at SQLBits. Read More

optiRum – presentation

  • June 3, 2015
  • Steph

optiRum, the R package I built and support for Optimum on CRAN has gained some extra functions recently. Some of it uses currently experimental data.table functionality so I’m eagerly awaiting the release to CRAN to deliver optiRum.

In the interim, I thought I’d give some brief overviews of existing functionality contained in the package.

The next part of the coverage of optiRum functionality is to talk about the stuff that makes generating outputs easier!

Read More

Automated documentation hosting on github via Travis-CI

  • June 1, 2015
  • Steph

In this post, I’m going to cover how you can use continuous integration and source control to build and host documentation (or any other static HTML) for free, and in a way that updates every time your code changes. I’ll cover the generic capability, and then how I apply this to my simplest package, tfsR. In a later post (once I’ve cracked the best method to do it) I’ll cover my more complex use case of multiple documents and a dynamically constructed index page.

NB: This is kicked off from a post from Robert Flight about applying to the technique to R package vignettes. It’s a very useful post but it was quite specific to his situation and I wanted to understand the principles behind it before I started extending it to my more complex cases.


Posts in this series

  1. Automated documentation hosting on github via Travis-CI
  2. Auto-deploying documentation: multiple R vignettes
  3. Auto-deploying documentation: FASTER!
  4. Auto-deploying documentation: Rtraining
  5. Auto-deploying documentation: better change tracking of artefacts

Requirements

  • Must haves:
    • Travis-CI
    • GitHub
  • Optional:
    • A linux machine (so you can test your bash script that Travis-CI will run)
    • R (for following the specific instructions)

High-level process

  • Get an OAUTH token from github
  • Add OAUTH token to travis
  • Add a *.sh file that gets your HTML (depending on circumstance, you may also need to generate it) and pushes to gh-pages branch
  • Include your .sh file in the after_success part of your travis file
  • Commit & push!

    Read More

How many is too many conferences?

  • May 25, 2015
  • Steph
The SQL Server community has a lot of events. In the UK alone this year we will have/had in 2015: more than 75 user group meetings 5 SQL Saturdays 8 days of Relay 1 SQL in the City 3 to 5 days of SQLBits possibly a SQL Santa and probably more that I’ve forgotten or not know about at the time of writing this You could attend 2 days of conference per month (cpm from now on) on average in the UK alone, just at dedicated SQL events. Read More

SQLSaturday Portugal

  • May 18, 2015
  • Steph

SQLSaturday Portugal 2015 has been a huge amount of fun but I’ve also learnt a lot. A big thank you to the organisers!

Below are my slides and notes from sessions I attended.

Agile BI

My session slides:

Read More

SQL Saturday Exeter: Steph & Oz’s slides

  • April 26, 2015
  • Steph
Well SQL Saturday Exeter flew by. T’was great catching up with people, seeing folks learn how to use (old & new) tools better, and just generally watching everyone having a great time at one of the best organised conferences I have the pleasure of going to. Here are links to all the slide decks etc that we presented this weekend: My R: analysis to integration training day notes & source code My agile BI slide deck My Shiny: dashboards in R slide deck & source code Oz’s SSRS: Beyond the basics slide deck & source code If you attended any of our sessions, give us some constructive criticism! Read More

Easy Continuous Integration for R

  • April 20, 2015
  • Steph

With excellent guidance and tooling on making R packages, it’s becoming really easy to make a package to hold your R functionality. This has a host of benefits, not least source control (via GitHub) and unit testing (via the testthat package). Once you have a package and unit tests, a great way of making sure that as you change things you don’t break them is to perform Continuous integration.

What this means is that every time you make a change, your package is built and thoroughly checked for any issues. If issues are found the “build’s broke” and you have to fix it ASAP.

The easiest, cheapest, and fastest way of setting up continuous integration for R stuff is to use Travis-CI, which is free if you use GitHub as a remote server for your code.

NB – it doesn’t have to be your only remote server

Read More

Unified.Diff: source control smells & LaTeX

  • April 17, 2015
  • Steph
I had the pleasure of presenting at unified.diff, a general programming user group in Cardiff, last night and was able to debut my LaTeX show! If you’d like to talk at the group about anything tech related tweet them on @unifiedDiff. They’re very flexible on time and topic so if you’re based in Cardiff or are coming down to see a client, it’s an easy way of delivering a talk and meeting some nice people. Read More

optiRum – gini like a wizard

  • April 16, 2015
  • Steph

optiRum, the R package I built and maintain for Optimum on CRAN has gained some extra functions recently. Some of it uses currently experimental data.table functionality so I’m eagerly awaiting the release to CRAN to deliver optiRum.

In the interim, I thought I’d give some brief overviews of existing functionality contained in the package.

I do a lot of regression models and one of the common tools for assessing a regression’s ability to accurately model an event is to produce a Gini chart and a Gini coefficient. The higher the Gini coefficient, the more your model is able to discriminate probability accurately.

I simplify the process of producing gini charts (giniChart) and coefficients (giniCoef) so that I get a chart in one simple step.

Under the hood this uses the AUC package to get the coefficient, scales to format it and ggplot2 to produce the chart. Using ggplot leads to a better looking chart that can also be tweaked to suit your needs since a ggplot object is returned by the function.

Read More

Stuff I read this week

  • April 15, 2015
  • Steph
Maker’s Schedule, Manager’s Schedule Reality Check: Counseling for Developer Hero Worshippers Comments on Joining Microsoft Giving back for the future of open source SQLBits videos – watched not read though What they don’t tell you about public speaking Toxicity in Reddit Communities: a Journey to the Darkest Depths of the Interwebs Owner of a Credit Card Processor Is Setting a New Minimum Wage: $70,000 a Year Use Datazen for free if you have SQL Server Enterprise Read More

Organised speaking – improving font sizes

  • April 13, 2015
  • Steph
A recurring problem with my presentations is font size. The inclusion of code in my Rmarkdown slides was by default too small. Upping the fontsize via CSS worked ok, but when I switched to a shiny app version for my intro to shiny, it reverted and I’m afraid to say I didn’t notice beforehand. I use PuTTY for showing how to do some stuff in the linux command line but the font’s quite small by default and Gail Shaw’s tip of Magnifier in my session was tough to use I’ve upped my font size on my Rstudio IDE, but hadn’t yet implemented this across other IDEs I tend to use my mouse cursor to draw attention to things. Read More

An R data.table cookbook

  • April 8, 2015
  • Steph
For my precon on R at the end of the month I’m working on the takeaway — the handout. This’ll be thing that makes the training day able to be put into practice immediately, and refills all those drink and sleep depleted neurons back up with R knowledge. One of the things is a simple data.table cookbook. If you’re a data.table user, what other tasks do you think should be on there? Read More

Stuff I read this (bit more than a) week

  • April 7, 2015
  • Steph
It’s been a wee while due to SQLBits disruptions and a crazy work schedule but here’s some of what I’ve been reading recently: Introverts, Extroverts, and the Complexities of Team Dynamics Azure Blob Storage introduction Kevin Kline: Advice to new bloggers Editing for people who love to write too much ProBlogger generally after KK’s recommendation Why oil prices came down. and won’t any more Test Driven Analysis Standardising function names in R Microservices at Netflix Microsoft closes acquisition of Revolution Analytics Read More

Working with Azure Blob Storage, some notes

  • April 6, 2015
  • Steph

I’m working on building a snazzy shiny app that a) drops the inputs/parameter values into blob storage and b) uses Stream Analytics to query the values and present back what people are saying at the moment. This’ll be a fab tool for my pre-con next month if I can get it working in time!

Getting it working, does however mean utilising the Azure Blob Storage API in R which I confess is much harder than expected, especially after the ease of using the Visual Studio Online API for tfsR. To that end, I thought I’d write-up some of my findings before I do a bigger write-up that illustrates how to do everything (in R).

I’m working my way through an intro to azure storage on the (hopefully reasonable) expectation that more knowledge will make it easier to work with. There’s additionally the online reference, although I found the VSO REST API documentation easier to understand and get started with.

Read More

DeployR – why Microsoft bought Revolution?

  • March 23, 2015
  • Steph
I’ve been asked by a few people recently about why I don’t use Azure Machine Learning (ML). I answer that I don’t use it yet, and the reason being that at the moment the robust development life-cycle isn’t in place around it. I think that will change – one of the great reasons for the acquisition of Revolution Analytics (in my opinion) is their DeployR system. DeployR is essentially an R web service platform. Read More

Bride of Frankenstein: TFS + R

  • March 20, 2015
  • Steph

The unholy abomination of trying to use TFS as my central repository for my R code over the past year has been tough and you may or not be looking at the screen as if I’m a crazy fool for even trying. Of course, now I have good news, because I’ve broken the back of the main issue I had with TFS. The crucial link was being able to programatically create Git repositories within a single project for small projects.

Using the API, I’ve been able to write an R package with functions that now save me at least 15 minutes of time and effort each time I want a new project. So I can happily holler “IT’S ALIVE!!”

Read More

Overcoming social anxiety to attend user groups

  • March 14, 2015
  • Steph

For some people it might sound silly, but a frequent reason why people don’t sign up or don’t make it to their local user group is to do with social anxiety. I totally understand this – a room full of people you don’t know can be a daunting experience. I still get nervous when attending a new user group for the first time and I run three user groups, and speak at user groups and conferences all round the country!

This post takes you through the worries, and explains how I’ve approached some of the issues. Hopefully, this’ll help you get more people in to your local user group and learning, whether it’s because you have the tools to help yourself, or understand and can help others.

Read More

Stuff I read this week

  • February 27, 2015
  • Steph
The Unbearable Lightness of Tweeting Hadley Wickham: Impact the world by being useful Academics should be made accountable for exaggerations in press releases about their own work Why Complex Decisions Inevitably Take Weeks Six sentence emails that get fast responses Average house prices: how expensive is your area? FCA Consumer Spotlight — segmenting retail financial customers Making R Files Executable (under Windows) Shipping Culture Is Hurting Us Read More

Where do I fit in the Microsoft future?

  • February 24, 2015
  • Steph

Entering into the world of SQL Server around the same time as the 2008 release has meant that until the past couple of years, change in the Microsoft BI world only happened in dribs and drabs for me. SQL Server and it’s BI components were stable server products and the focus was on getting data and optimising “central reporting”. Recently though things have started to massively change due to Azure and Office 365.

No longer part of Server & Tools where products were considered in silos, SQL Server and BI are now part of the Cloud Platform. It’s now a means of delivering the Cloud-first vision that Microsoft have aligned themselves to.

Read More

Stuff I read this week

  • February 22, 2015
  • Steph
ONS Style guide for writing about statistics Your coding style can give you away Automated Tinder and the Eigenface R-help mailing list to use cage-fighting to resolve conflicts Application Containers For Cloud Computing Managing Test Data as a Database CI Component – Part 1 Let the Hackers In: Experts Say Traps Better than Walls How to Write a Blog Post Read More

Declutter a shiny report’s code

  • February 18, 2015
  • Steph
Shiny reports are awesome, but they sure do end up with many lines of code when adding lots of inputs and outputs. A ui.R file can rapidly exceed 50 lines of code and I prefer to keep things more compact. The best way I’ve found of doing that in other languages and in R is to modularise my code – break it down into independent chunks. Shiny already does this by having a server() and ui() section and allowing you to source other files. Read More

Stuff I read this week

  • February 13, 2015
  • Steph
Here’s a selection of articles etc. that I found really interesting this week: Gendered Language in Teacher Reviews Knowledge units – the atoms of statistical education SQLBits in The Register Paul Randal: Want to be mentored by me? Replacing Middle Management with APIs R in Business Intelligence The RHS assignment operator in R Ooh R Can Microsoft make R easy? Read More

A busy month or so

  • February 11, 2015
  • Steph
I’m really looking forward to a few months of user group and conference awesomeness: Feb 24: CaRdiff presenting Shiny: Dashboards in R Feb 26: Oxford UG presenting Learning the ropes via the community Mar 4-6: Helping out at SQLBits Mar 7: SQLBits presenting Shiny: Dashboards in R Mar 9: SQL Cardiff with Jen & Sean McCown presenting Mar 17: Diff.Net with Scott Hanselman presenting Mar 31: SQL Cardiff with the Battle of the Beards Then even more fun kicks in with a SQLSaturday Exeter precon, a visit to unified. Read More

magrittr: cleaner program flow

  • February 9, 2015
  • Steph

Last year I built a pretty sweet web service in R as part of the day job. However, not being well-versed in stuff like object-oriented programming, I did not do the best job of making the flow of my program particularly clear or robust. It wouldn’t take multiple inputs properly and I found it to be tough to test. In spare moments, I took to cogitating how to improve things.

I tried simply refactoring some of the functions but found my structure too cumbersome to allow much change. I tried starting afresh with an S4 system but was soon in a death spiral of class proliferation and no experience in how to stop it. After dabbling with different methods, I was getting pretty frustrated – I want my code to be better and more maintainable!

Now I’m looking at magrittr.

About magrittr

magrittr was designed to better facilitate functional programming based on piping inputs from one function to another. It’s the same paradigm as the PowerShell operator |.

This means you can more succinctly pass an input through various transformation steps (in contrast to my initial method) with a lot less code. The ability to add conditional functions or even new functions on the fly (aka lambda functions) with a similarly low code burden gives the added benefit of helping with branching logic.

Read More

R on Windows – weird user name gotcha

  • February 7, 2015
  • Steph

Oz and I being the lazy so and so’s that we are, share a profile and use it across all our devices. Our username is “Steph & Oz” which means the user folder that Windows has for us is C:UsersSteph & Oz. Having spaces and special characters is generally not recommended, and gives interesting issues when using R, primarily at initialization and when trying to do package installations.

By default, R will try make the user’s personal folder the directory which it works under, i.e. limiting its impact on the computer overall, but it’s Unix/Linux roots mean that it doesn’t like you doing whacky things like ampersands in folder names.

The result with ours is to cause this error on load:

Error installing package: Error: ERROR: no packages specified

‘Oz’ is not recognized as an internal or external command,

operable program or batch file.

Read More

Paul Randal offers mentoring

  • February 6, 2015
  • Steph

Hot off the back of his win in the Tribal Awards, Paul is offering to mentor 3 men & 3 women for two months. To be in with a chance of getting mentored by Paul, you simply need to apply by writing a blog post about why you should be considered for mentoring and posting the link by the 15th Feb 2015.

I think it’s an awesome offer that you should take up if possible (i.e. you’re reading before the deadline) and whilst I’m busy trying to convince you I’m going to insert my application too. Hopefully, seeing my application will help you form your own.

What is the value of being mentored?

Mentoring gives you the opportunity to have someone who can assist you in the way a senior techy can when you face a technical challenge. They can give valuable advice about hidden perils, shortcuts, and point out code smells.

That advice is valuable, but to get it you need to properly formulate your issue or challenge faced. Like posting on Stack Overflow, putting thought and preparation into the question gives you a deeper understanding before you even talk to your mentor.

It’s worth noting that you can’t be vague. “I want to be the best” or “I want to know everything” is never going to happen. Mentoring is not a panacea for your entire career – especially with short duration mentoring like Paul’s. To get the value, you need to settle on a specific issue or challenge that you want to tackle.

Read More

My R Pre-Con: SQLSat Exeter

  • January 24, 2015
  • Steph

As I covered in my post on SQLSaturday Exeter, I’m going to be doing a full day of R training on April 24th that takes you from cabin boy to first mate in a day. You can’t be captain because I’m Captain… until you go back to your own ship… then you can be captain.

TL;DR

Attend my day of training about R if you’d like to learn R, best practices, and how to manage it.

It’s £150 (early bird) and can be booked at SQLSaturday Exeter’s website

Read More

Organised speaking – Intro to R case study

  • January 21, 2015
  • Steph

In my iterative presentation design post I promised a case study. I thought I’d cover my most presented session Intro to R, in future called Knowing your Rs from your elbow courtesy of @FatherJack.

A brief history

Where I’ve been using R for the past couple of years and spent the first months struggling with it, I wanted to give a presentation that I would have wanted to see at the beginning. Not one about random bagging and a bunch of other stats but what are the best ways to do the fundamentals:

  • connecting to my database
  • performing data manipulations, summaries and updates
  • charting my data
  • producing reports

A few packages cover these awesomely and are much better than base R so whilst I was tackling a massive stats project, the things which took the time and stress were things I could have avoided with ease!

So my intro to R, takes people through the things I wish I’d been taken through thus making those first few months of R pleasant, happy times!

Read More

Spell check your spreadsheets

  • January 20, 2015
  • Steph
Just a quick tip for spreadsheet users about spellchecking in Excel. Firstly, yes you can spell check a spreadsheet. Secondly, you do it either by going to Review > Check Spelling, or more easily by hitting F7 on your keyboard. Please, please spellcheck your work – it makes your work much more professional and saves you having to do it manually! Read More

SQLSaturday Exeter 2015

  • January 19, 2015
  • Steph

Woohoo! The kind and crazy folks at SQLSaturday Exeter accepted my submitted training day for their roster. Before I wax lyrical on the virtues of being locked in a room with me all day, I thought I’d better cover the fundamentals of the event itself!

First, the awesome video…

Read More

Organised speaking – presentation design

  • January 18, 2015
  • Steph

I wanted to outline my approach to presentation design, or development as I prefer to call it.

Why do I consider it development? Well, it’s a product that can be manually done & delivered but with the potential to scale to thousands of users, I’d rather the product be easy to maintain & deploy, deliver real value to the users, and keep up with cutting edge developments in the subject. Also, I call it development because now with the use of rmarkdown, I do actually code my presentations.

General presentation design

I’ve read and studied a lot about presentations, some of the biggest influences being:

– Dr. Andrew Abela and the Extreme Presentation Method

Buck Woody and his fantastic presentation style

Brent Ozar and his excellent materials for presentations

– Solid fundamentals in presentation training courses (things like INTRO: Intro, Need, Title, Range, Objective)

When I first come up with the idea for a presentation, I write the abstract for it. In the abstract I set out the tone, material covered, and outline who should attend. This abstract is my requirements doc for later me – it tells me whether I’m selling, educating, or entertaining and what I’m doing it about.

In my opinion, you should always write the abstract first as not only can you write more abstracts than you can presentations but it distills the idea down and helps you think of your audience first.

Read More

Review: The first CaRdiff R User Group meeting

  • January 14, 2015
  • Steph

Last night was the first Cardiff R User Group event. There were 6 people registered out of 24 CaRdiffians. In the end we had 8 people show up – so a whopping third of our current membership base.

As we sat around the booth eating chips and drinking beer, we covered our experiences learning R to date, the trials and tribulations of our jobs and why you shouldn’t drop a barbell on your nose. We had great discussions and most of us came away with new R functionality to look at!

We decided to initially go with the three session formats I’d proposed and see how things go:

  • TalkRs: evening events with talks and socialising
  • LearnRs: after work sessions focused on learning some new bit of R
  • LunchRs: quick lunchtime sessions to talk through a problem with someone else and hopefully solve it!

    Read More

Photoshop image macro (or something even better!)

  • January 13, 2015
  • Steph

I spend a lot of time in Photoshop for someone in BI. Between cleaning up images, building logos for my latest project, or producing material for user groups, I probably use it at least once a week. Through it all, I usually need to produce variants, in different file formats and sizes. So it can quickly become a dozen uses of the Save As… or Save for Web functions.

I hate manual work, so you can see why it was frustrating in the extreme. Then I realised how silly I was being by not having already googled for it!

It took a while because my keyword searches weren’t the terms Photoshop use but I found the Secret Sauce. And if you’re the sort of person who’d type “photoshop image macro” – here’s how you do it!

Read More

Organised speaking – prioritising events to speak at

  • January 12, 2015
  • Steph

As part of my ongoing series about presenting at community events and conferences, I wanted to cover the my personal thought process when it comes to prioritising what events I’d like to speak at for my goal Throw 1, Speak 1.

There are a massive amount of awesome SQL Server and other technology events happening out there. I even throw a SQL Server lunchtime session once a week for the user group! Then of course there’s all those conferences in the UK and abroad that are worth attending. So how do I event start picking out where I’d like to talk, and how do I go about getting selected for them?

Read More

Conference sessions – basic web scraping in R

  • January 11, 2015
  • Steph

It’s a bit sad but I enjoy dissecting what sessions are submitted to conferences I’m involved in or speak at. Instead of doing it primarily by eye, I’ve started dabbling in web scraping in R to do it. Initially, I used RCurl and my latest snippet uses rvest.

The first snippet for SQLBits bit of R code uses RCurl but it’s cumbersome, plus for SQLSaturday Exeter there is SSL to contend with. Using rvest makes it really easy and it was an excellent excuse to get around to using magrittr, Hadley Wickham’s pipe code paradigm for R.

Blogger tip: I also wanted the opportunity to see how Gists imported into WordPress – you just c&p the url in (into the code, no URL markup) and WordPress automatically pulls in the Gist. For more info on this see WordPress’ article on Gist.

Read More

Organised speaking – Throw 1

  • January 10, 2015
  • Steph
Not quite part of being organised at speaking, but bundled up in part of my scheduling constraints for speaking is when I’m throwing user group events. Here’s the details of the user groups I’m planning on throwing to meet my goal of 1 user group event a month (not including lunch time sessions!) SQL Server I run the SQL Server user group in Cardiff and have done a few years – I’m not giving it up any time soon. Read More

Starting the Cardiff R User Group

  • January 9, 2015
  • Steph

These days any hobby of mine ends up with a user group if there isn’t one already.

The amount of value I derive from being able to hear experts in their fields talk about whether they’re on stage or in the audience is phenomenal. Also, it’s really great way to meet like-minded people.

So with the benefits in mind, 2 years of R under my belt, and a new starter in work, the time seemed ripe for an R user group.

Read More

Organised speaking – Throw 1, Speak 1

  • January 8, 2015
  • Steph

Following up from my last post on maintaining my session abstracts, I wanted to cover how I’m doing my scheduling this year for speaking at events. Perhaps more importantly than tech, is the intention and the planning process so I’ll be covering these factors in more detail than the tech.

Technology

I make use of Google services quite a bit, and their calendar system is a great help. So this year I’ve added a calendar that has all mine (and hopefully Oz’s) speaking engagements.

I’m then utilising a WordPress plugin called GCal events to connect to the calendar and pull the info into a page.

Throw 1, Speak 1

The goal this year is to throw one user group event and speak at one event each month.

Read More

Updated site

  • January 3, 2015
  • Steph

As I’ve been using this blog more recently, the page speed has been becoming much more frustrating. So today, I’ve done some stuff to improve it and it’s now twice as fast as it used to be. Please let me know what you think of the new style!

Read More

Organised speaking – session abstracts

  • January 2, 2015
  • Steph

Last year I spoke at 10 different events (I think) and was very lucky to be nominated in the Tribal Awards for my Intro to R session. I did just a couple of different session titles and I don’t think I managed the whole process very well.

To be an easier speaker to deal with, I’m trying to be more organised so that the selection process of myself & topics is easier whilst also ensuring I don’t develop too many presentations at the last minute.

Having dealt with awesome serial speakers, Tobiasz Koprowski and Denny Cherry, from the organiser end they did a few things which made it much easier to deal with them, particularly given the breadth of topics they can cover!

Read More

SQL Cardiff dates announced

  • December 29, 2014
  • Steph

This year we’ll be continuing to maintain evening events on Tuesday nights and lunch time events on Thursdays.

Evening events

So far we have the following events and speakers scheduled for the evening events:

  • Jan 27th – 2 hour intro to replication by David Williams
  • Mar 31st – Battle of the Beards! Tobiasz Koprowski vs Terry McCann vs Rob Sewell
  • May 26th – Index Fragmentation: Internals, Analysis, and Solutions by Paul Randal, and Steve Powell
  • Jul 28th

Alex Whittles on winning Fantasy F1 using PowerPivot

I’ve got slots in there for full hour sessions as well as lightning talks for up to half an hour long so whether you’re an existing speaker or want to improve your knowledge, please get in touch and book yourself in.

Read More

Aggregate on a Lookup in SSRS

  • November 23, 2014
  • Oz

The What

If you need to join multiple datasets inside SSRS, perhaps because of different sources, grains of detail etc, then you often need to aggregate over both datasets.

In SSRS, you can easily perform aggregations over another dataset but it can be tough to do this based on a grouping factor in your main dataset.

A key example of this might be Sales and Purchases – you want to show both of these by month but they come from two different data sources.

You could build two tables that appear to be just one table but this can be really clunky. Instead, you want just one table with the month, the total sales, and the total purchases in.

Although there’s no tidy way of doing this built in, you have the power to add your own functions to SSRS using the Code window of the report’s properties. Provided here is a block of VB script that can be added to your SSRS report to allow you to do those tricky aggregations as if they were just another built in function.

I call it AggLookup.

Read More

The basics of Common Table Expressions (CTEs)

  • November 10, 2014
  • Steph
Another quick post off the back of a SQL Lunch a did a while ago. Explore it via SQLFiddle: http://sqlfiddle.com/#!6/ad7f5/7/0 What is a CTE? A Common Table Expression (CTE) is essentially a function defining a relation instead of a table. This function outputs a table (like all queries) that is only present within the session, but data isn’t stored in tempdb like with a temporary table. Why CTE’s? CTEs are designed primarily to allow recursion within SQL – like a loop but ideal at working with hierarchies. Read More

Database / BI related unit testing options

  • November 6, 2014
  • Steph
A quick list of frameworks available for doing unit testing, based on what I covered in today’s SQL Lunch MSFT Database projects Purpose: unit testing database objects Method: SQL / GUI Site: http://msdn.microsoft.com/en-us/library/jj851200(v=vs.103).aspx Cost: Free Pros: Built-in, quite well documented Cons: Requires Visual Studio 2010 Pro or above Codeplex ssisUnit Purpose: SSIS unit testing Method: XML / GUI Site: https://ssisunit.codeplex.com Cost: Free Pros: Unique Cons: As of writing, only stable version was released in 2008 Read More

Where’ve we been?

  • June 28, 2014
  • Steph
Almost into July and I haven’t posted a single thing in this blog! Shameful of me to be sure – I’ve been learning but not sharing. So what’s been happening? Well I moved into a new job at a brand new startup where I’ve been primarily doing R, modelling, and finally getting my hands back into SQL Server! That’s been keeping my day’s and parts of my nights busy. I’m also working on a startup at home with Oz called Clocksmith Games. Read More

Merry Christmas

  • December 24, 2013
  • Steph
Read More

Fiddly SSRS

  • November 28, 2013
  • Steph
Read More

SQLRelay Cardiff – Nov 13th

  • September 25, 2013
  • Steph
After dipping Cardiff’s collective toes into the world of local SQL Server conferences, we’re doing it again in November. We’re taking registrations, and volunteers are always welcome. If an all day conference is a bit too much, why not try out a lunchtime or evening event? There are 9 other events around the country which you can attend as well as / instead of Cardiff, including Bristol where I’ve the privilege of speaking along with some excellent folk like Klaus Aschenbrenner. Read More

R for database and Excel people

  • September 15, 2013
  • Steph

What is R?

R is a statistical language for doing all sorts of analytics based on many different types of data and it’s also an open source platform that allows people to extend the base functionality.  More details are available from the horse’s mouth.

How can I give it a go?

Download R and RStudio an awesome development environment for R.  There is also an excellent online R learning site.  I do not recommend sticking with just R – we’re used to a lot more convenience and good development bits and bobs like IntelliSense and Rstudio really delivers.

Read More

Marketing for SQLRelay – two weeks in

  • September 13, 2013
  • Steph

Further to the last post introducing my trials and tribulations, and a hectic week or two we’ve made excellent progress on the Relay.  I’ve enlisted Mark (@tsqltidy) the chair for the Relay and others to assist with the twittering and other activities which has really held me reduce my workload substantially.

All ten venues are going ahead:

Location Date
Reading Monday 11th Nov 2013
Southampton Tuesday 12th Nov 2013
Cardiff Wednesday 13th Nov 2013
Birmingham Thursday 14th Nov 2013
Hertfordshire Friday 15th Nov 2013
Newcastle Monday 25th Nov 2013
Manchester Tuesday 26th Nov 2013
Norwich Wednesday 27th Nov 2013
Bristol Thursday 28th Nov 2013
London Friday 29th Nov 2013

So what’s been done so far?

Facebook

What have I been doing to try to make this a successful marketing channel:

Read More

Tired of right-clicking on folders and going to properties to get the folder size?

  • September 7, 2013
  • Steph

It’s a nightmare when I’m trying to find out what’s clogging up my hard drive, particularly now that I have an SSD and can no longer be quite so lazy and sprawling with myriad files and downloads.  This is the case even after moving most contents to Dropbox and putting this on my slow 1Tb harddrive. It can get really tiresome to be running out of space and having to trawl through, right-clicking on different folders.  It was boring but it was how you did it, well now I am enlightened, and now I don’t have to pour my time down the drain.  

Read More

Marketing for SQLRelay – In the beginning

  • September 1, 2013
  • Steph

After organising SQLRelay for June 24th in Cardiff, as part of the national series of 8 events.  We’re gearing up for November with the aim of being able to capitalise on the growing knowledge of SQL Server 2014 CTP and pushing the Relay into a less busy part of the UK community schedule.  The difficulty is that where we had more than 6 months to prep for the previous Relay, this time round we had less than 5.  What this means for me, is not only do I want to run a bigger and better Cardiff event, but I also (being a glutton for punishment) took on spearheading the marketing efforts for the whole shebang.

Details will be released next week on the launch, but given my lack of knowledge about anything social media this has already been a major undertaking for me, and I thought it might be of value for me, future me, and my dear readers to compile information and learnings as I go along so that it’s easier to implement in future for other marketing endeavours.  It also provides an area for discussion.

Read More

Dynamic named range generator

  • August 27, 2013
  • Steph

Why do I use dynamic named ranges?

Where I work, most reports are exposed via a web front-end and Excel can create an external connection and retrieve the information.  This is much safer than using direct database connections in workbooks.  A problem with web queries though is that they cannot be converted to Tables in order for referencing columns and the dataset as a whole to be made easier.  As a result, dynamic named ranges are a necessity for producing easy to develop and manage spreadsheets since the volumes in the raw data can change over time.

How I save myself time

A raw data table with 20 columns will take a long time to create the named ranges for, given that I want:

  1. A dynamic range covering the headers too for pivot tables
  2. A dynamic range without headers for vlookups
  3. A dynamic range for each column without headers

I use a macro, assigned to a nice button on my ribbon, to generate all the relevant ranges.

What are the special considerations?

Structure – raw data tables should ALWAYS be set up in a specific way – with the Primary Key on the left hand side and always filled in, with no empty rows or columns

Special characters – range names can’t contain special characters.  The VBA uses the RegEx functionality to strip these out.

Numbers – range names can’t have numbers either.  We can’t just strip out the numbers like we would special characters because they might be important like Grade1, Grade2 and Grade3 and collapsing them all to the name Grade would be a problem.  Instead, the macro converts all numbers to the corresponding letter in the alphabet.

How much the data will grow?  By default I set the macro to use 10 times the number of records present when I run the macro – if it’s already bigger than 25k rows, the number will need to be reduced, and if I don’t think 10 times the number will be adequate, I’ll increase the number.

Read More

RegEx functions in VBA

  • August 20, 2013
  • Steph
Regular Expressions (RegEx) is a common string processing technique for handling strings that conform to patterns, as opposed to fixed strings.XKCD Perl Problems It is an excellent set of functionality that is available in most programming languages, and even in SQL. It is however not readily available in Excel or VBA. This has it downsides if you’re trying to complex string matching and extraction, so in my personal workbook, I include the RegEx functions available at http://www. Read More

SSIS basics and gotchas – presentation and resources

  • August 16, 2013
  • Steph
Follow up resources / places to go for way more detail: Stairway to SSIS MSFT SSIS tutorial package 1 MSFT SSIS tutorial package 2 MSFT SSIS tutorial package 3 SQLCat SSIS best practices Bob Duffy SSIS best practices Connection Strings The BOL for SSIS  Design Patterns book Design patterns 24HOP vid Read More

Dynamic named ranges – the basics

  • July 27, 2013
  • Steph
Whoah nelly, what’s a named range first of all let alone a dynamic one? A named range is a shorthand or alias for a set of cells in Excel. These can be created easily by simply selecting one or more cells and using the name box to give it whatever name you feel relevant. This alias can then be used in formula to make something much more insightful like =A1_VAT as opposed to =A1_0. Read More

My First Platformer

  • June 28, 2013
  • Oz

My Platformer

Here’s my first attempt at a platformer, built in Construct 2 and using sprites form Game Maker.

Read More

My Day/Night Cycle Demo

  • June 27, 2013
  • Oz
My Day/Night Cycle Here’s my Day/Night cycle function demo, built in Construct 2. (Time moving at 4 minutes per second) It took me a day and a half of hair pulling, but I’ve only been using Construct 2 for a week so I suppose it’s not too shabby. It uses only 6 events, 2 global variables and 9 objects. Read More

User Group presentation

  • May 31, 2013
  • Steph
Read More

Synchronising schema between MSSQL & MySQL with SSIS

  • May 29, 2013
  • Steph

The problem:

A system we need to report on that is form based.  Whenever there is a new form, there is a new table, and whenever there is a new or amended* field on the form, there is a new column in the table.  Maintaining the imports of this data into a staging environment would require a lot of code and time to build manually from scratch.

What is required is something that goes through the two schema for all relevant objects and updates our staging area’s schema accordingly.

Points for consideration:

  • Due to the level of change in source system, all loads are dynamically generated SQL
  • Loads run from a data dictionary table, which needs to be updated when we update the schema
  • Loads occur daily

Read More

Making charts with conditionally coloured series

  • May 25, 2013
  • Steph
The example I’m running through is available at http://sdrv.ms/11lH3KR   The scenario we’re looking at is where we want to be able to convey quality within a chart by having differently coloured columns, based on different conditions that we want to specify. Unfortunately, the ability to natively apply conditional formatting isn’t yet present, but we can mimic it by overlaying series of the same size that are coloured differently.   Read More

Center across selection

  • May 24, 2013
  • Steph
Merging cells is easily done and can help make a spreadsheet look neat, but what you really, really should be doing instead is centering across the selection so it looks merged but isn’t. Center across selection though is hidden away and therefore time-consuming to use – no wonder people have bad habits! I wanted to do things the better way, but was lazy, so in the end I made a macro to go in my personal workbook and assigned it to my ribbon (I do this a lot). Read More

Time to go home…

  • May 19, 2013
  • Steph
I do a lot of work in spreadsheets and some cannot be left open on my PC as that’d make them locked for the morning report refresh. After a bunch of times having to buy cakes for so delaying the reports, I put something in place to stop it. It also had the very nice side effect of telling me to go home. The first thing to do is make sure you have a personal workbook. Read More

Objectless Check Boxes using VBA

  • May 12, 2013
  • Oz

For my first ever blog post (be gentle with me!) I wanted to talk about an issue I have with Excel’s check box object, and my way of resolving it. It’s not perfect, and I’d love to hear of any other versions or ideas you may have. So here’s how I create check boxes in Excel without using Excel check boxes.

 

The Problem with Check Box Objects

They look good and they work well, there’s no denying they do what they’re supposed to, but they also annoy the heck out of me!

  • As far as I’ve been able to find they can’t be properly bound to a cell; this means if you want to get rid of them you need to select them and delete them, which can be a big job
  • If you want to refer to their state in simple terms you need to add a linked cell and refer to that, which to me is just plain messy
  • They’re awkward to format and style and if you want a big tick you need a big box and as such you need a big cell… again, messy
  •  

    Read More

    Setting up WordPress on Azure

    • May 11, 2013
    • Steph

    This blog was configured super rapidly with goDaddy and Azure, instead of my previous implementation on EC2.  I’ve forgone the multi-site installation, with attendant subdomains, and gone for a straight wordpress Website (one of the Azure features).

    I already had an Azure account I’d gone through the billing setup for – but that was really simple anyway, so getting the blog up and running consisted of:

     

    Read More