Locke Data Blog

Locke Data helps organisations get started with data science. Grow your skills with our blog posts.

Search

Gift ideas for the R lovers

December 14, 2018
maelle

Are you looking for gift ideas for the R addicts in your life or the next guest speaker at your user group? Here are a few ideas, in four categories! Not all of them feature our company, promised. ;-) Learning and using R Is your dear one just starting to learn R? You might help them by gifting them a book, or even a couple of them! We suggest the R fundamentals series by Locke Data’s Steph Locke, Volume 1, Working with R and Volume 2, Data Manipulation in R, both of them available as paperback and Kindle versions. Read More

covrpage, more information on unit testing

December 10, 2018
maelle

In this post, we shall explore the first R package that received Locke Data’s new support, covrpage by Jonathan Sidi! With this nifty package you can better communicate the unit testing completeness and goodness of your package! What’s covrpage? Trust is earned not inherited importedFrom. Now that you’ve built a cool package, you want potential users to trust it so that they might adopt it. So how can you build trust in your software? Read More

Project planning with plotly

November 26, 2018
Ellen Talbot

Something a little different today for a quick chat about my latest project and why I’m finding the plotly package so helpful! Are you like me and physically can’t function unless you’ve got a to do list in front of you? Well even if you’re not, imagine my pain while I’m wearing my non - Locke Data hat and trying to plan out the final year of my PhD thesis! Read More

namer, Automatic Labelling of R Markdown Chunks

October 31, 2018
maelle

We’ve just released a sweet package to save you stress from the hassle of unnamed chunks in R Markdown! namer will name all your chunks, so you can quickly debug in future. More details in this post! Why name your R Markdown chunks? When writing R Markdown documents, be it a single report or a whole book based on dozens of documents, it’s crucial to name your R Markdown chunks. Read More

Packages for Testing your R Package

October 22, 2018
maelle

Testing your R package is crucial, and thankfully it only gets easier with time, thanks to experience… and awesome packages helping you setup and improve tests! In this post, we shall offer a roundup of packages for testing R packages, first in a section about general testing setup, and then in a section about testing “peculiar” stuff. General package testing infrastructure Create tests If you’re brand-new to unit testing your R package, I’d recommend reading this chapter from Hadley Wickham’s book about R packages. Read More

Package support offer

October 15, 2018
steph

The R community and the package ecosystem are awesome but it can be difficult to sustain your R packages when you only have so much free time. To make a stellar package you’ve got to keep on top of the issues, make great documentation, have that all important hex sticker, and generally have good quality code. This all takes time, time you don’t always have. We would like to help. Read More

cransays - Follow your R Package Journey to CRANterbury with our Dashboard!

October 11, 2018
maelle

We at Locke Data maintain a few R packages that we’ve submitted to CRAN to help increase their userbase. After running devtools::release(), clicking in a confirmation email… what remains is waiting. Inspired by our experience, we’ve created a dashboard to help other package maintainers follow their package’s journey to CRANterbury. Read more about its making in this post! Why create the cransays dashboard? Sometimes, depending on the workload of CRAN volunteers, it can last a while before a package ends up on their way to CRAN. Read More

Processing complicated package outputs

October 9, 2018
Steph

Sometimes packages have functions that don’t do the things the way you want them to do them and you have to either re-build the function, or work with it as-is and add code around it to solve your issue. I’ve had to do this recently with the googleway package and it’s google_distance() function so I wanted to take you through step by step how I wrote code to go from a single value function to a function that handles many inputs and returns 4 rows per input. Read More

Tidyverse 'Starts_with' in M/Power Query

October 8, 2018
David Parr

As a heavy R and Tidyverse user, I’ve been playing with Microsofts m/Power Query language included in Excel and PowerBI from that perspective, looking for the functions to make my life easier, developing small code pipelines for my processing and trying to get a smooth, clear and maintainable data manipulation process in place. The Problem In PowerBI I have data generated from an API call to HubSpot, which deliveres a json which is flattened as the first step of the process into a table with hundreds of columns. Read More

Speed Up With Microsoft

October 4, 2018
David Parr

People use R for lots of reasons: “It’s great for the models I need”, “I like the functional approach”, “It’s the tool I’m most comfortable with”. People don’t use R for these reasons: “I have a favourite processor core, I don’t want to use the others”, “I love how my memory needs to fit all my data”. What if I told you that you didn’t need to worry about that any more? Read More

Up your open source game with Hacktoberfest at Locke Data!

October 1, 2018
maelle

How awesome is open source software? Quite awesome in our opinion! Locke Data maintains several open source repos on GitHub, in particular of R packages, and we’d like you to join in the fun! This month, we’re taking part in Hacktoberfest and will do our best to mentor you through your first open source contributions if you wish! Hacktoberfest is a month-long operation celebrating open source software. As an open source newbie, it’s the occasion to start participating in open source development! Read More

Functions and Packages

September 29, 2018
Ellen Talbot

We’re done with the basics of handling data in R. Now we want to know how to make sense of it. We know what kind of data it is, we know how to look at column names, dimensions and the like. If you’re trying to add value to this data however, that very often isn’t enough, so here’s a look at using the tools available to you to start figuring out how to do what you want. Read More

Cosmos DB for Data Science

September 7, 2018
David Parr

Cosmos DB is a snazzy new(ish) Microsoft Azure product. I was able to go to Microsoft Office in London for three days of training on the database service, which was really well structured and well run, with a lot of knowledgeable Microsoft bods around to pass on their considerable knowledge. This post will extract out some key features and benefits of the service, and then discuss how this fit’s into a data scientists role. Read More

R Objects

August 24, 2018
Ellen Talbot

R objects To quickly recap, so far we’ve just worked with some single values to get to grips with how some of the various operations work. Of course, we rarely work with a single value! If we did, we could just use a calculator. This instalment you’ll get to grips with some different ways of storing data and how to manipulate your datasets in the “traditional” way. This will help you understand a lot of code written in the past, and will equip you to understand data manipulation of tabular data. Read More

A glass shattering book draw with gganimate

August 1, 2018
maelle

It’s time for a Twitter book draw again: every month, a random Locke Data Twitter follower wins an excellent data science book! This month’s book was Weapons of Math Destruction : How Big Data Increases Inequality and Threatens Democracy. The animation I chose to create was inspired by the idea of destruction and by my wanting to try out the fantastic new API of the gganimate package, and a very fast new gif encoder, gifski. Read More

Learn to R blog series - Operators and Objects

July 19, 2018
Ellen

Basic operations Now that we have some datatypes, we can start learning what we can do with them. This weeks video whisks over the basic operators - you know what plus and minus do, right? Then we look at some other less common operators and recap it all below. Pay special attention to all.equal(), there’s a reason I bang on about it! Maths In R, we have our common operators that you’re probably used to if you’ve performed calculations on computers before. Read More

Harmonizing and emojifying our GitHub issue trackers

July 12, 2018
maelle

A part of Locke Data’s mission is sharing R knowledge and tooling with the world for free. If you have a look at our GitHub account, you’ll see we’ve pinned six of our package repos. Furthermore, to make it easier to find all the R stuff we’ve packaged up, we’ve added an “r-package” repo topic to all packages: find them all via this URL. Adding such repo topics isn’t the only harmonization effort we’ve done to make it easier to maintain and promote our packages suite. Read More

SatRdays Cardiff

July 4, 2018
Ellen

Hey again lovely readers! This blog is a very special one indeed, you get to hear about our great day out at SatRdays in Cardiff recently not once, not twice, but five times, from each of our team members perspectives! I think it’s fair to say that it was a very different experience for each of us - from seasoned conference attendees like Steph and Maëlle, Amy who had never presented before, sponsorship newbie Oz and then Ellen somewhere inbetween, we all had very different (but great) take aways from the day! Read More

Python and Tidyverse

June 1, 2018
Leo

Introduction One of the great things about the R world has been a collection of R packages called tidyverse that are easy for beginners to learn and provide a consistent data manipulation and visualisation space. The value of these tools has been so great that many of them have been ported to Python. That’s why we thought we should provide an introduction to tidyverse for Python blog post. What is tidyverse? Read More

A crystal clear book draw

June 1, 2018
maelle

As you might know, every month, a random Locke Data Twitter follower wins an excellent data science book! This month’s gift was “An Introduction to Statistical Learning: with Applications in R”, a classic and useful textbook. In this post I’ll give you some magick-al tips from behind-the-scenes of this month’s winner announcement. It’ll feature learning from my mistakes, and reading from a crystal ball… or more seriously, image manipulation in R! Read More

How to use an R interface with Airtable API

May 23, 2018
Amy McDougall

How to use an R interface with an Airtable API Hi folks, so this blog is about how to use an R interface with an Airtable API. We are going to be using this interface and API to pick our winner for our T-shirt draw. We will also be using the dplyr function sample_n(). Airtable is a cloud collaboration service. It is a spreadsheet-database hybrid, containing the features of a database but applied to a spreadsheet. Read More

Data types

May 8, 2018
Ellen Talbot

In this installment of the Learn R series, we’re going to start to have a look at data types, dataframes and what on earth you do with them! Have a watch of this short video and then consolidate everything we cover by reading through the post and playing along. You can take the code blocks included in the post and try them out in your own script - you should be all set up with R and RStudio, so there’s no time like the present! Read More

A particles-arly fun book draw

May 2, 2018
maelle

Did you know that every month, a random Locke Data Twitter follower wins a nifty data science book? If you don’t, and you don’t follow Locke Data on Twitter yet, do it! This month’s book was “Tidy text mining” by Julia Silge and David Robinson, a fantastic introduction to Natural Language Processing in R. If you haven’t been lucky enough to score a paperback version, you can read it online for free! Read More

Some web API package development lessons from HIBPwned

April 19, 2018
maelle

As announced yesterday, HIBPwned version 0.1.7 has been released to CRAN! Although the release was mainly a maintenance release building on Steph’s already great code, internal changes were made to start transforming HIBPwned into a real showcase of web API package development. Let’s summarize some interesting points: Avoiding useless requests HIBPwned uses the memoise package in order to cache the results inside an active R session. What does this mean? Read More

How many CRAN package maintainers have been pwned?

April 18, 2018
maelle

The alternative title of this blog post is HIBPwned version 0.1.7 has been released! W00t!. Steph’s HIBPwned package utilises the HaveIBeenPwned.com API to check whether email addresses and/or user names have been present in any publicly disclosed data breach. In other words, this package potentially delivers bad news, but useful bad news! This release is mainly a maintenance release, with some cool code changes invisible to you, the user, but not only that: you can now get account_breaches for several accounts in a data. Read More

R Spatial Resources

April 6, 2018
steph

I recently met up with someone who does geospatial stuff but uses the more traditional GIS software to do it. I showed him a few things in R but not being a person who does a lot of geospatial analysis I thought I’d ask the lovely #rspatial crowd what they’d recommend. Here are the compiled recommendations. Happy learning spatial R! Feel free to comment or tweet your recommendations to get them added to this list. Read More

Learn to R blog series - R and RStudio

March 29, 2018
Ellen

Hello everyone, welcome back! This post marks the beginning, hopefully, of your foray into the wonderful world of R and RStudio…and my delve into the odd vlog to go with the blog! I’ve brushed my hair for you, and I don’t do that for just anyone, so you’d better watch it at least once! So without further ado. R R is an open source language released in 2001 that’s ideal for data wrangling and data science. Read More

Introducing Python for data scientists - Pt2

March 23, 2018
Leo

This second part of the ‘Python for Data Scientists’ post talks about the specifics of Python for data scientists. Part 1 of Python for Data Scientists talks about Python generally and can be found here. Python for data scientists Where you can use Python Python is a general purpose programming language meaning that is has many use cases outside of data science. These include game development, graphics, web development, GIS, and control systems. Read More

Introducing Python for data scientists - Pt1

March 15, 2018
Leo

If you have decided you want to learn Python but your not sure where to start then this post will point you in the right direction. Part 1 of Python for Data scientists talks about Python generally, before we dive into the specifics for data scientists in part 2. Python What is Python? Python.org states ‘Python is a programming language that lets you work quickly and integrate systems more effectively. Read More

Understanding rolling calculations in R

March 7, 2018
Steph

In R, we often need to get values or perform calculations from information not on the same row. We need to either retrieve specific values or we need to produce some sort of aggregation. This post explores some of the options and explains the weird (to me at least!) behaviours around rolling calculations and alignments. We can retrieve earlier values by using the lag() function from dplyr[1]. This by default looks one value earlier in the sequence. Read More

Connect to Google Sheets in Power BI using R

March 6, 2018
Ellen

Hello again everyone! Here’s the step by step instructions for using the googlesheets package in R to enable you to get your data from Google Sheets. This latest blog post comes from this video we published a little while ago. Step 1 - Preparation Create an authentication token for re-use Run the following: library(googlesheets) token <- gs_auth(cache = FALSE) gd_token() saveRDS(token, file="googlesheets_token.rds") This springs open a window in your browser, and asks you to choose your preferred google account. Read More

Image Recognition and Object Detection

February 28, 2018
Ellen

In this latest blog, I’m responding to a cry for help. Someone got in touch with us recently asking for some advice on image detection algorithms, so let’s see what we can do! They already know what algorithms they want to use, so let’s start with those. Hang on no, for the uninitiated, let’s start with what even is an image detection algorithm? “An image detection algorithm takes an image, or piece of an image as an input, and outputs what it thinks the image contains. Read More

Markdown based web analytics? Rectangle your blog

February 21, 2018
maelle

Locke Data’s great blog is Markdown-based. What this means is that all blog posts exist as Markdown files: you can see all of them here. They then get rendered to html by some sort of magic cough blogdown cough we don’t need to fully understand here. For marketing efforts, I needed a census of existing blog posts along with some precious information. Here is how I got it, in other words here is how I rectangled the website GitHub repo and live version to serve our needs. Read More

How to maraaverickfy a blog post without even reading it

February 12, 2018
maelle

Steph is currently out of the office, teaching people cool Data Science stuff on a cruise at Tech Outbound. She counts on her team to keep the company’s Twitter account afloat in the meantime, so I had to think of a way to contribute. What about advertising existing content from her blog in the style of her Twitter role model Mara Averick, i.e. an informative tweet accompanied by appealing screenshots? Read More

Connecting to SQL Server on shinyapps.io

January 31, 2018
steph

If you use SQL Server (or Azure SQL DB) as your data store and you need to connect to the databasse from shinyapps.io, you’re presently stuck with FreeTDS. If you have any control over infrastructure I cannot recommend highly enough the actual ODBC Driver on Linux for ease. Alas, shinyapps.io does not let you control the infrastructure. We have to make do with with FreeTDS and it can be pretty painful to get right. Read More

Year 2 of Locke Data

January 29, 2018
Steph

Hey folks, I wanted to give y’all an update about Locke Data one year on from when I started it up. In the past year, I’ve delivered more than 32 days of training, wrote and published 2 books, worked with 3 clients, and generally whimpered at my schedule. It has been amazing how much support the community has given me, and I’ve tried to give back where possible by giving away books each month, doing the usual presenting, holding free office hours, and offering community workshops. Read More

Working with PDFs - scraping the PASS budget

December 29, 2017
Steph

Using tabulizer we’re able to extract information from PDFs so it comes in really handy when people publish data as a PDF! This post takes you through using tabulizer and tidyverse packages to scrape and clean up some budget data from PASS, an association for the Microsoft Data Platform community. The goal is to mainly show some of the tricks of the data wrangling trade that you may need to utilise when you scrape data from PDFs. Read More

Using blogdown with an existing Hugo site

December 20, 2017
steph

If you decide you want to use R in your existing Hugo blog, it’s really easy to convert over. There’s a single command you need to know from blogdown and the rest is working out your deployment process. To create content, use the blogdown Rstudio add-in to quickly get started. This niftily reads all tags and categories from past posts to help you get going. You can then write Rmarkdown as usual. Read More

Data Manipulation in R

December 18, 2017
steph

Data Manipulation in R is now generally available on Amazon. All book links will attempt geo-targeting so you end up at the right Amazon. Prices are in USD as most readers are American and the price will be the equivalent in local currency. Data Manipulation in R is the second book in my R Fundamentals series that takes folks from no programming knowledge through to an experienced R user. Read More

Working with R

October 5, 2017
steph

I’ve been pretty quiet on the blog front recently. That’s because I overhauled my site, migrating it to Hugo (the foundation of blogdown). Just doing one extra thing on top of my usual workload, I also did another thing. I wrote a book too! I’m a big book fan, and I’m especially a Kindle Unlimited fan (all the books you can read for £8 per month, heck yeah!) so I wanted to make books that I could publish and see on Kindle Unlimited. Read More

All my talks in one place (plus a Hugo walkthrough!)

June 17, 2017
steph

I mentioned in an earlier post about how I’m revamping my presentation slides process but that I hadn’t tackled the user experience of browsing my slides, which wasted lots of the effort I put in. To tackle this part of it, I’ve made lockedata.uk using Hugo to be a way of finding and browsing presentations on R, SQL, and more. As Hugo is so easy, I thought I’d throw in a quick Hugo walkthrough too so that you could build your own blog/slides/company site if you wanted to. Read More

Why data people don’t do devops

June 13, 2017
steph

T-SQL TuesdayFor T-SQL Tuesday #91 the topic is databases and devops. Grant Fritchey asks us: How do we approach DevOps as developers, DBAs, report writers, analysts and database developers? How do we deal with data persistence, process, source control and all the rest of the tools and mechanisms, and most importantly, culture, that would enable us to get better, higher functioning teams put together? Please, tell me your DevOps stories. Read More

Using purrr with APIs – revamping my code

June 13, 2017
Steph

I wrote a little while back about using Microsoft Cognitive Services APIs with R to first of all detect the language of pieces of text and then do sentiment analysis on them. I wasn’t too happy with the some of the code as it was very inelegant. I knew I could code better than I had, especially as I’ve been doing a lot more work with purrr recently. However, it had sat in drafts for a while. Read More

R and Data Science activities in London, June 27th – 29th

June 7, 2017
Steph

Locke Data will be up to some shenanigans of various stripes in the big smoke. We hope to see you at some of them! June 26th — Monday Introduction to R (Newcastle) I won’t be in London for this but I’ll be doing a day of Introduction to R in Newcastle. This is supporting the local user groups and costs up to £90 for the whole day.Intro to R in Newcastle, June 26th Read More

Versioning R model objects in SQL Server

May 26, 2017
Steph

High-level info If you build a model and never update it you’re missing a trick. Behaviours change so your model will tend to perform worse over time. You’ve got to regularly refresh it, whether that’s adjusting the existing model to fit the latest data (recalibration) or building a whole new model (retraining), but this means you’ve got new versions of your model that you have to handle. You need to think about your methodology for versioning R model objects, ideally before you lose any versions. Read More

How to change "No match found!" on your no-code Q&A bot

May 22, 2017
Steph

Last week, I blogged about building a no-code Q&A bot for your website. One little niggle I had with the bot was the response when it could match a user input to a Q&A. I wondered how to change “No match found!”. I looked around the qnamaker.ai site and couldn’t find a place I could change this. I submitted some feedback and the great people at the other of the Q&A site responded super quickly. Read More

Improving automatic document production with R

May 19, 2017
Steph

In this post, I describe the latest iteration of my automatic document production with R. It improves upon the methods used in Rtraining, and previous work on this topic can read by going to the auto deploying R documentation tag. I keep banging on about this area because reproducible research / analytical document pipelines is an area I’ve a keen interest in. I see it as a core part of DataOps as it’s vital for helping us ensure our models and analysis are correct in data science and boosting our productivity. Read More

Easy-peasy Q&A bot

May 15, 2017
Steph

Everyone seems to have a live chat option for their site but I’m frequently away, so I wanted something that people could talk to interactively. This is a perfect scenario for a Q&A bot. Microsoft takes a ton of the pain out of Q&A bots, and it was much easier than I thought to get it added to my WordPress blog. Here is a how to do it for your site. Read More

How to go about interpreting regression cofficients

May 12, 2017
Steph

Following my post about logistic regressions, Ryan got in touch about one bit of building logistic regressions models that I didn’t cover in much detail – interpreting regression coefficients. This post will hopefully help Ryan (and others) out. This was so helpful. Thank you! I'd love to see more about interpreting the glm coefficients. — Ryan (@RyanEs) April 21, 2017 What is a coefficient? Coefficients are what a line of best fit model produces. Read More

datasauRus now on CRAN

May 9, 2017
Steph

datasauRus is a package storing the datasets from the paper Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing. It’s a useful package for: Having a dinosaur dataset Showing a dinosaur related variant of Anscombe’s Quartet You can now get datasauRus on CRAN, though it might not be on all mirrors just yet. install.packages("datasauRus") Credit This package wouldn’t exist without some nifty people: Read More

R Quick Tip: parameter re-use within rmarkdown YAML

May 8, 2017
Steph

Ever wondered how to make an rmarkdown title dynamic? Maybe, wanted to use a parameter in multiple locations? Maybe wanted to pass through a publication date? Advanced use of YAML headers can help! Normally, when we write rmarkdown, we might use something like the basic YAML header that the rmarkdown template gives us. --- title: "My report" date: "18th April, 2017" output: pdf_document --- You may already know the trick about making the date dynamic to whatever date the report gets rendered on by using the inline R execution mode of rmarkdown to insert a value. Read More

Minor update to HIBPwned

May 5, 2017
Steph

A new version of HIBPwned has been accepted onto CRAN. This occurred yesterday so it could still be filtering into some mirrors. HIBPwned is an R wrapper for the useful website HaveIBeenPwned and if you don’t already utilise the package or the site – you should. HaveIBeenPwned tells you when your details are included in data breaches. This is vital information to get quickly as it means you can sooner protect yourself from people trying to use the breach information to break into your accounts. Read More

Error installing latest R version (3.4.0) on Windows

May 3, 2017
Steph

UPDATE: R 3.4.1 does not have this problem so you can install that version instead If you’re getting the following error when you’ve installed R 3.4.0 on Windows, you’re not alone. Error in if (file.exists(dest) && file.mtime(dest) > file.mtime(lib) && : missing value where TRUE/FALSE needed The R team have released a patched version but right now it’s a little difficult to find out about. If you need/want the patched version, it’s available at: Read More

The making of datasauRus

May 2, 2017
Steph

Around 8:30pm I saw this tweet and duly retweeted https://t.co/WuyU9D6npK — Richie Cotton (@richierocks) May 1, 2017 It turns out awesome folks, George and Justin, had made a process whereby they can generate different distributions of points that retain the same summary statistics. They used this process for making some friends for Dino the Datasaurus who was created by Alberto Cairo. They made the data for Dino and the rest of the Datasaurus Dozen available for download. Read More

Getting started with data science – recommended resources

May 2, 2017
Steph

An oft asked question is what resources can I recommend for getting started with data science? Here are my recommendations, and if you have others, please put them in the comments! NB Links in this post may be affiliate links – it doesn’t change the prices you get but might earn me a little money Books Data Science for Business Data Science for Business Data Science for Business is a great overview book. Read More

R Quick Tip: Upload multiple files in shiny and consolidate into a dataset

April 28, 2017
Steph

In shiny, you can use the fileInput with the parameter multiple = TRUE to enable you to upload multiple files at once. But how do you process those multiple files in shiny and consolidate into a single dataset? The bit we need from shiny is the input$param$fileinputpath value. We can use lapply() with data.table‘s fread() to read multiple CSVs from the fileInput(). Then to consolidate the data, we can use data. Read More

Building an R training environment

April 24, 2017
Steph

I recently delivered a day of training at SQLBits and I really upped my game in terms of infrastructure for it. The resultant solution was super smooth and mitigated all the install issues and preparation for attendees. This meant we got to spend the whole day doing R, instead of troubleshooting. I’m so happy with the solution for an online R training environment that I want to share the solution, so you can take it and use it for when you need to do training. Read More

Logistic regressions (in R)

April 21, 2017
Steph

Logistic regressions are a great tool for predicting outcomes that are categorical. They use a transformation function based on probability to perform a linear regression. This makes them easy to interpret and implement in other systems. Logistic regressions can be used to perform a classification for things like determining whether someone needs to go for a biopsy. They can also be used for a more nuanced view by using the probabilities of an outcome for thinks like prioritising interventions based on likelihood to default on a loan. Read More

R Quick Tip: Table parameters for rmarkdown reports

April 19, 2017
Steph

The recent(ish) advent of parameters in rmarkdown reports is pretty nifty but there’s a little bit of behaviour that can come in handy but doesn’t come across in the documentation. You can use table parameters for rmarkdown reports. Previously, if you wanted to produce multiple reports based off a dataset, you would make the dataset available and then perform filtering in the report. Now we can pass the filtered data directly to the report, which keeps all the filtering logic in one place. Read More

Building your booth presence (SCE p4)

April 18, 2017
Steph

Building your booth presence is the fourth instalment of the Sponsoring Community Events series aimed at helping companies get to grips with sponsoring community events and getting the most out of them. This post covers some of the things that you should be thinking about when you are planning on having a booth at an event.

April 14, 2017
Steph

Following on from when we announced the availability of our community workshops, we’ve got three in the next three months that folks can attend. May 19th – Data science project in a day We’ll be in Kiev, Ukraine, doing a whole data science project in a day. This is intended to give people a little bit of code, process, and critical thinking along the whole data science workflow. This will enable folks to see how it hangs together and decide where and how much they want to invest their learning in future. Read More

Battle of the Beards: access it online

March 20, 2017
Steph

A fortnight ago I wrote Dear South Wales & Bristol readers: I need your help as Battle of the Beards was failing. Well, we didn’t manage to make it to the minimum amount of attendees to go forward with it and on Thursday it looked like I was going to have to cancel it. Then inspiration struck … the major reasons why maybe people didn’t want to attend are: Too far to travel Hard to get a day off work Not interested in all the talks What would make it so people didn’t have to travel, could watch from work, and see only the sessions they’re interested in? Read More

Dear South Wales & Bristol readers: I need your help

March 6, 2017
Steph

I need your help.

Battle of the Beards is an annual tech event in Cardiff that’s previously been an evening affair but is now a day-long conference. We’re hosting half-hour talks on security, infrastructure, software craftmanship, front-end development, and data visualisation. We’re starting out the day with bacon baps and it just gets better from there. Tickets cost £15 and there’s the option to add a donation to the charity we’re supporting with the event, the Campaign Against Living Miserably.

Right now ticket sales are really low and without your help, we’ll have to cancel the event.

I’m hoping you can help make this event succeed by doing one or both of two things:

register if it’s of interest to you
recommend it to others

[button text=”REGISTER” color=”orange” link=”https://battleofthebeards.eventbrite.co.uk”] [button text=”TWEET” color=”blue” link=”https://twitter.com/home?status=Hey%20%23tweeps%20-%20%23battleofthebeards%20is%20on%20March%2029th%20in%20Cardiff.%20You%20should%20check%20it%20out!%0Abattleofthebeards.eventbrite.co.uk”]

March 1, 2017
Steph

Today in class, I taught some fundamentals of API consumption in R. As it was aligned to some Microsoft content, we first used HaveIBeenPwned.com‘s API and then played with Microsoft Cognitive Services‘ Text Analytics API. This brief post overviews what you need to get started, and how you can chain consecutive calls to these APIs in order to perform multi-lingual sentiment analysis.

UPDATE: See improved code in Using purrr with APIs – revamping my code

February 27, 2017
Steph

A big part of why I’ve launched Locke Data is so that I can give back more to my communities. I want to give more time and more support to others. One of the first steps is doing some activities that give financial support to community groups without damaging my startup cashflow! Community R workshops that fund local user groups is the first activity I’ll be trialling.

Here’s what’s involved, and what you might want to consider if you’d like to be a part of this endeavour:

February 22, 2017
Steph

One of the nifty things about using R is that you can use it for many different purposes and even other languages! If you want to use Python in your knitr docs or the newish RStudio R notebook functionality, you might encounter some fiddliness getting all the moving parts running on Windows. This is a quick knitr Python Windows setup checklist to make sure you don’t miss any important steps. Read More

Is my time series additive or multiplicative?

February 20, 2017
Steph

Time series data is an important area of analysis, especially if you do a lot of web analytics. To be able to analyse time series effectively, it helps to understand the interaction between general seasonality in activity and the underlying trend.

The interactions between trend and seasonality are typically classified as either additive or multiplicative. This post looks at how we can classify a given time series as one or the other to facilitate further processing.

February 8, 2017
Steph

If you need to know about persisting data in the world of containers then I recently did a talk and a spot on a podcast that should help you out. My NDC London talk Data + Docker = Disconbobulating? cover the basics and architectural decisions. In my podcast spot Data and Docker on .Net Rocks we go into more depth about the architectural decisions facing you when working with data and Docker. Read More

CRISP-DM and why you should know about it

January 13, 2017
Steph

The Cross Industry Standard Process for Data Mining (CRISP-DM) was a concept developed 20 years ago now. I’ve read about it in various data mining and related books and it’s come in very handy over the years. In this post, I’ll outline what the model is and why you should know about it, even if it has that terribly out of vogue phrase data mining in it! 😉 Data / R people. Read More

Going solo!

January 4, 2017
Steph

The year has started out on a high for me. I’ve handed in my notice Censornet and I was re-awarded the Data Platform MVP award by Microsoft. I handed in my notice not to go to a new job but to fly solo! I’m starting Locke Data in February to help people embed data science skills in their organisations. Business intelligence has been a thing long enough that there’s a whole department of people dedicated to it and it generally isn’t disruptive to other areas of the business and IT. Read More

I Love Azure Functions!

December 1, 2016
Steph

A while ago, I started my Stumbling Into series. I started but only got one in – I was gonna talk about how I failed with Azure Functions next. I was failing because the docs outside C# (and node.js) were so limited that I found it difficult to get things done. However, I persevered and overcame a little bit of C#-ophobia and I can honestly say it has been so worth it. Read More

An experiment in self-promotion – Revive Old Posts

November 24, 2016
Steph

I’ve been writing (not enough) blog posts for a while now and have built up some neat stuff in the backlog if I may so myself. Alas, a lot of this doesn’t get seen because it’s not on the front page or in the top 5 blog posts. Sad that posts like my one on sixth normal form databases don’t get enough love, I’ve installed the WordPress plugin Revive Old Posts (ROP) to try countering this!

November 14, 2016
Steph

I had the tremendous pleasure of going to the Microsoft MVP Summit this week and it was a fantastic experience. It also taught me a valuable lesson – I need to be an attendee more. Microsoft award ~4,000 people their Most Valued Professional (MVP) award each year. MVPs are influential, helpful people who work with Microsoft services. I’m not sure what I did to get in when so many awesome folks I know haven’t but I’m very proud to be in receipt of the Award. Read More

A note to (potential) new speakers: It’s ok not to be perfect!

November 8, 2016
Steph

This is a T-SQL Tuesday Post in response to Andy Bek’s kick off about growing new speakers. You can write your own advice for new speakers, or blog your journey to speaking. I’m always trying to encourage new speakers and the biggest fear I hear is “I won’t be any good at it”. Well, you won’t be perfect at it, that’s for sure. You may start off really bad at it. Read More

Quick tip: Passing values to a bash script

October 30, 2016
Steph

This is a very quick post on how you can make a bash script that allows you to provide it values via the command line. Passing values to a bash script uses a 1-based array convention inside the script, that are referenced by prefixing with $ inside the script. This means that if I provide .\dummyscript.sh value1 value2, inside the dummyscript.sh I can retrieve these by referencing their positions: echo $1 + $2 For improved clarity, you could assign them to new variables Read More

GirlswithDeepPockets.com

October 28, 2016
Steph

Ok, this post is about one of my latest crazy/harebrained/whacky ideas. I’m fed up of having to carry my Galaxy Note 3 in my hand. I can’t stand handbags and most women’s clothing items don’t have pockets or the pockets are insufficient. Given how easy it is to build a website these days, I thought I’d become a sofa warrior for the campaign for pockets. I’ve made a site and aim to make it an open & technical backend. Read More

5 useful CSS sites

October 27, 2016
Steph

I’ve been doing a lot of web development recently, primarily via the magical Hugo platform. Between it and the great themes for it, it’s making website building fairly painless. Of course, each theme often needs customising to the relevant brand a given site is for. That customising is usually just one by some fonts and by tweaking the CSS.* I’ve been relying on some old, and some very new, funky tools to help with CSS hacking and I thought I would share them, in case they should prove useful to you in the future. Read More

Slack all the things!

October 21, 2016
Steph

Slack all the things! OK, if you haven’t heard of it before Slack is kinda like IRC, kinda like Dropbox, kinda like a lot of things – it’s a neat place to bring together communications between your team or community, and the integrations allow you to pipe in external feeds like twitter activity or RSS. It’s a great way of collaborating online and I’ve found it especially useful not just within a company but within a global community. Read More

Stepping down from SQL Relay

October 13, 2016
Steph

Some folks may already know, but I handed in my resignation from SQL Relay as sponsorship lead and Cardiff organiser. Over my time in SQL Relay, I’ve helped deliver 30 conferences. I’ve attended about 15 of those! Being able to deliver so much learning to people all around the country has been an incredible experience and I’m tremendously proud of everything SQL Relay has achieved. However, SQL Relay has become increasingly difficult for me to dedicate the time to. Read More

Unit testing in SSDT – a quick intro

October 10, 2016
Steph

This post will give you a quick run-through of adding tSQLt to an existing database project destined for Azure SQL DB. This basically covers unit testing in SSDT and there is a lot of excellent info out there, so this focuses on getting you through the initial setup as quickly as possible. This post most especially relies on the information Ed Elliot and Ken Ross have published, so do check them out for more info on this topic!

October 7, 2016
Steph

This blog now has some extra locks, these are in the URL bar! It took my fantastic hosters WPEngine a little longer than I would have preferred to get a modern SSL policy. Now that they have, they did it in their typically awesome fashion. You can request a free Let’s Encrypt SSL certificate from the admin dashboard, and configure how http etc should work in less than 15 minutes, and you don’t have be a web wizard to do it. Read More

2016 PASS Board of Directors Candidate Town Halls

October 6, 2016
Steph

It’s PASS Board of Directors elections again! After a number of twitter discussions last week about the applicability of PASS outside of the US and what I think PASS is good and bad at, I thought I would engage the process instead of just being a complainy-pants. I attended all 6 town hall webinars and asked questions to all the candidates. I recommend you watch them before voting.

Find out more about the candidates, the PASS Board of Director elections, and how to vote on the PASS website.

September 27, 2016
Steph

I’ve been pretty quiet recently, I haven’t presented much, I haven’t blogged much, I haven’t worked on my open source projects much. All my energy left over from my major work project has been going into SQL Relay. SQL Relay is an ambitious project every year. We organise a conference that goes on tour. In previous years, I’ve gone to 8 cities over two weeks. Over the past 4 years, I’ve been part of organising 30 conferences. Read More

Finished my first GameMaker Game

September 17, 2016
Oz

Getting started with GameMaker, making Asteroids! I mean OK, it’s just a clone of a classic, but isn’t that how fledgling artists practice? Initially created following a tutorial, I then went through and added a lot of extra features, including music, splash screens, a pause function and much cleaner code. Embedding by i-frame is pretty ugly, so please follow the link below to try the game. Asteroids Game A/D or Left/Right to turn W or Up to move Space or Right Control to shoot Escape to pause You can download the gmz file, and import into your copy of Game Maker if you’d like, using this link: https://dl. Read More

HIBPwned updated on CRAN

September 15, 2016
Steph

Haveibeenpwned.com is a fantastic service that helps people find out if they’ve been involved in a data breach. HIBPwned is an R wrapper for that service. Recently, due to abuse of the system, Troy Hunt had to add a limit of one request per 1.5s. The new version published on CRAN last night adds a delay into each call so that we can continue to use it in R. Check out the package on CRAN for vignettes and more information on the package. Read More

Being an Organised Sponsor (SCE p3)

August 30, 2016
Steph

Being an Organised Sponsor is the third instalment of the Sponsoring Community Events series aimed at helping companies get to grips with sponsoring community events, and getting the most out of them. This post covers how to organise yourself and the common activities needed to get the most out of your sponsorship of a community event. Project management Rarely is sponsorship a simple transaction, there’s often deliverables from both parties at different times over a period of anything up to a year. Read More

Assessing Sponsorship Opportunities (SCE p2)

August 4, 2016
Steph

Assessing Sponsorship Opportunities is the second instalment of the Sponsoring Community Events series aimed at helping companies get to grips with sponsoring community events and getting the most out of them. This post covers some of the things that you should be thinking about when you are considering sponsoring an event. What’s the point? Before entering into a sponsorship agreement you need to have a firm idea of what you’re hoping to achieve. Read More

Sponsorship Basics (SCE p1)

August 2, 2016
Steph

Sponsorship Basics is the first installment of the Sponsoring Community Events series aimed at helping companies get to grips with sponsoring community events, and getting the most out of them. What is a community event? A community event is one organised by members of the community, as opposed to one run by one or more companies with a financial interest in the community. These events are fundamentally different because they are not being run for profit, instead, they’re run to assist other members of the community to increase their skills. Read More

Sponsoring community events (SCE)

August 1, 2016
Steph

Sponsoring community events – is it right for you? This series of posts will take you through the things you need to know to help you decide. Over the coming weeks, this new series will go through the in’s and out’s of sponsoring community events. Community events are fantastic from an attendee perspective, but when you’re handing over cash you need to know what you’re letting yourself in for and how you get return on investment (ROI). Read More

Giving back with code

July 20, 2016
Steph

From code in answers on Stack Overflow to R packages or full programs, there’s a lot of code being written and given away. This post examines some of the reasons why the people writing all that code do it, why you should consider giving back with code, and how you can get started. Finally, I cap it all off with perspectives from some of my favourite coders!

Because reasons

There are many reasons why you should consider writing code and making it available for public consumption.

Altruistic

If you’re writing something to achieve a task, odds are someone else would have to write the same code – why not help them out?
You’re using a lot of open source software, whether you realise it or not. By open sourcing your code, you get to pay it forward
To give others something to contribute to

Career

Unknown quantities are risky hires, put your code out there for the world to see and employers get to see what you can do
Develop your skills for the next job, the one that requires you to be more skilled in something than you are now
You get to interact with a lot of different people who you build credibility with, and hopefully friendships!

For oneself

Generally speaking, the more code you write, the better your coding skills so if you want to improve your skills this is an ideal way to do it
For the sheer fun of doing cool stuff, especially if you don’t get to do cool stuff in the day job
To do it “the way it should be done”

July 12, 2016
Steph

I’m trying to encourage more lightning talks at my user groups, and I started by writing a plea to folks at my local R user group, caRdiff. In it I included some ideas for lightning talks, and of course, these can be used as the basis for long talks too. We had some fun batting this list around and expanding it in the Cardiff dev group. I thought it was worth sharing, and getting some more ideas from you! Read More

Stumbling into … Azure Automation

July 11, 2016
Steph

I’ve recently been trying to solve the challenge of working extracting files from AWS and getting them into Azure in my desired format. I wanted a solution that kept everything on the cloud and completely avoid local tin. I wanted it to have built-in auditing and error handling. I wanted something whizzy and new, to be honest! One way in which I attempted to tackle the task was with Azure Automation. In this post, I’ll overview Automation and explore how it stacked up for what I was attempting to use it for.

Overall Task: Get compressed (.tar.gz) files from AWS S3 to Azure, decompress the files, concatenate the contents and put in a different container for analytics magic

Like with most things I dropped myself into the deep-end on it so had fairly minimal knowledge of PowerShell and the Azure modules, therefore I fully expect more knowledgeable folks to wince at my stuff. General advice, “you should do it like this, then this…”‘s, and resource recommendations are all very welcome – leave a comment with them in!

Azure Automation

Azure Automation is essentially a hosted PowerShell script execution service. It seems to be aimed primarily at managing Azure resources, particularly via Desired State Configurations.

It is, however, a general PowerShell powerhouse, with scheduling capabilities and a bunch of useful features for the safe storage of credentials etc. This makes it an excellent tool if you’re looking to do something with PowerShell on a regular basis and need to interact with Azure.

July 7, 2016
Steph

I love user groups and I always want there to be more. I’m not a perfect organiser but I run reasonable groups. When I see organisers doing it badly, it makes me sad. There’s lots of great ways to run a user group, but I thought I’d cover some of the bad ways to run a user group. The anti-patterns if you will 😀 Don’t advertise Your group isn’t on Twitter. Read More

Not an expert

June 29, 2016
Steph

I don’t think of myself as an expert because an expert is someone with very deep knowledge of a comparatively narrow field. For better or worse, a lot of my sense of satisfaction with life derives from throwing myself into some enterprise that I don’t have the people skills, the knowledge, and/or the resources for succeeding. I welcome the failures, the dead ends, the crises of faith, because if it wasn’t hard it wouldn’t be worth doing. Read More

My PASS Summit2016 submissions feedback

June 23, 2016
Steph

I really liked the way Brent showed us his feedback received and since mimicry is the best form of flattery, I thought I’d go ahead and do it too!

I didn’t get any accepted abstracts, and I’m actually grateful. The recent stresses to do with the PASS dramas aside, I would have had to use 5 days holiday time, pay for flights and hotel, and then flown out a week later for MVP Summit. Now I can attend some other conferences and/or have a Christmas break! Woo hoo 😀

June 23, 2016
Steph

In R, we can use a file called .Rprofile to do things in R based on a number of triggers. One thing I’ve done is give myself a DIY notification of how many data breaches I’ve been involved in! First of all, you need a file called .Rprofile that’s stored in your working directory. Some useful resources about .Rprofiles can be found on .Rprofile CRAN docs and an .Rprofile intro. Read More

Azure Storage Accounts – Resource Groups matter to PowerShell!

June 10, 2016
Steph

I’m sure that all my PoSh friends out there, who use Azure and PowerShell all the time probably know this already but I thought I’d share a little snippet of hard-won knowledge. When you put an Azure Storage Account into a Resource Group, you can no longer use the default Azure.Storage module. Instead, you’ve got to use the AzureRM.Storage module. All the scripts I encountered whilst googling how to connect to blob storage via PowerShell, including the ones in the script gallery within Azure Automation seemed to all assume the azure storage account you wanted to connect to was standalone. Read More

HIBPwned on CRAN

June 9, 2016
Steph

Part of my (slowly) working pipeline of coding projects has been an R package that wraps the fantastic HaveIBeenPwned.com API. If you’re not already familiar with HaveIBeenPwned, rectify the situation, NOW! Don’t worry about continuing to read the rest of the post; getting yourself signed up for account breach notifications is way more important! With that stern admonishment out of the way… HIBPwned is a feature complete R package that allows you to use every (currently) available endpoint of the API. Read More

Recent presentations

June 1, 2016
Steph

The last month or so has been a whirlwind of awesomeness with a veritable bevvy of user group and conference talks on my part! I thought I would share the materials with you and provide some brief thoughts on how each presentation went. Sessions SQL Saturday Exeter : Stats 101 London Business Analytics (LBAG) : Skilling up to code with data SQLBits & TUGA : Cut the R Learning Curve SQLBits & TUGA : R in the Microsoft Data Platform (full day of training) IT Pro Portugal : Being lazy with infrastructure SQL Saturday Exeter My presentation, in my opinion, was exceedingly brave. Read More

satRdays voting closes May 31st

May 27, 2016
Steph

Voting for 2 of the 3 locations for satRday conferences will be closing at the end of May 31st (GMT). It’s been a phenomenal turnout with more than 1,500 votes so far. You can still vote if you haven’t already! EU status Budapest, Hungary, is where Gergely will be throwing the EU event and it’s tentatively set for September. US status Chicago started out with a close runner of Washington DC, but that all changed as folks realised they could visit Puerto Rico and get fantastic learning, or Puerto Rico has for more R people than my tenuous grasp of geography led me to expect. Read More

satRday location voting now open

May 11, 2016
Steph

satRdays, free R conferences, are a project being supported by the R Consortium. When Gergely and I submitted our proposal, we said we’d be supporting three conferences: Budapest, Hungary (Gergely’s home turf) Somewhere in the US Somewhere else in the world We’ve had an overwhelming response with 40 submitted conferences but for the fully-funded ones, there can only be three. We are looking at how the runners up can do the next wave of events but we want to get the ball rolling on the first three. Read More

Installing SQL Server ODBC drivers on Ubuntu 15.04

April 20, 2016
Steph

UPDATE 2016-10-21 : You can now get the ODBC 13 driver for Linux with a much smoother install process than below. Get all the relevant information on the announcement from the Microsoft SQLNCli team blog.

Did you know you can now get SQL Server ODBC drivers for Ubuntu? Yes, no, maybe? It’s ok even if you haven’t since it’s pretty new! Anyway, this presents me with an ideal opportunity to standardise my SQL Server ODBC connections across the operating systems I use R on i.e. Windows and Ubuntu. My first trial was to get it working on Travis-CI since that’s where all my training magic happens and if it can’t work on a clean build like Travis, then where can it work? Alas, the ODBC 13 driver doesn’t work Ubuntu 14.04 so this set of instructions has been modified to provide code for Ubuntu 15.04 only.

TL;DR

It works, but it’s really hacky right now. Definitely looking forward to the next iterations of this driver.

Disclaimer

This will work for Ubuntu 15.04 but 14.04 has a different set of C compilers
This is currently hacky, and Microsoft are on the case for improving it so this post could quickly become out of date.
Be very careful installing the driver on an existing machine. Due to the overwriting of unixODBC if already installed and potential compatibility issues with other driver managers you may have installed.

April 19, 2016
Steph

Continuing in the series of shiny module design patterns, this post covers how to pass all the inputs from one module to another.

TL;DR

Return input from within the server call. Store the callModule() result in a variable. Pass the variable into arguments for other modules. Access the variable like you would input. Steal the code and, as always, if you can improve it do so!

April 14, 2016
Steph

Following on from looking at the shiny modules design pattern of passing an input value to many modules, I’m now going to look at a more complex shiny module design pattern: passing an input from one module to another.

TL;DR

Return the input in a reactive expression from within the server call. Store the callModule() result in a variable. Pass the variable into arguments for other modules. Steal the code and, as always, if you can improve it do so!

April 12, 2016
Steph

We’re in the fantastic situation where lots of people are using Travis-CI to test their R packages or use it to test and deploy their analytics/ documentation / anything really. It’s popularity has been having a negative side-effect recently though! GitHub rate limits API access to 5000 requests per hour so sometimes there are more R related jobs running on Travis per hour than this limit, causing builds to error typically with a message that includes

403 forbidden

This error will cause your build to fail, even if you didn’t do anything wrong. To solve it short-term you can wait a little while and restart your build.

That is a very short-termist solution and does not solve the problem for future you or other users of the service. The real solution to resolving this issue is to get off the default API access credentials and use your own.

The R integration in Travis makes good use of the devtools. The devtools package looks for an environment variable called GITHUB_PAT that holds a personal access token (PAT) for using the GitHub API and if it doesn’t find one it uses a default token. When we get our own PAT and store it in Travis, devtools will pick up our token and use it, meaning you’ll only ever get rate limited if you do more than 5000 builds in an hour, which is an achievement I’d love to hear about.

April 8, 2016
Steph

For the awesome Shiny Developers Conference back in January, I endeavoured to learn about shiny modules and overhaul an application using them in the space of two days. I succeeded and almost immediately switched onto other projects, thereby losing most of the hard-won knowledge! As I rediscover shiny modules and start putting them into more active use, I’ll be blogging about design patterns. This post takes you through the case of multiple modules receiving the same input value.

TL;DR

Stick overall config input objects at the app level and pass them in a reactive expression to callModule(). Pass the results in as an extra argument into subsequent modules. These are reactive so don’t forget the brackets. Steal the code and, as always, if you can improve it do so!

April 5, 2016
Steph

With my HIBPwned package, I consume the HaveIBeenPwned API and return back a list object with an element for each email address. Each element holds a data.frame of breach data or a stub response with a single column data.frame containing NA. Elements are named with the email addresses they relate to. I had a list of data.frames and I wanted a consolidated data.frame (well, I always want a data.table).

Enter data.table …

data.table has a very cool, and very fast function named rbindlist(). This takes a list of data.frames and consolidates them into one data.table, which can, of course, be handled as a data.frame if you didn’t want to use data.table for anything else.

April 4, 2016
Steph

As part of my never-ending quest to deploy documentation better, I’ve made yet another tweak to my scripts that deploy R vignettes or Rmarkdown documents to the gh-pages branch of my github repositories via Travis-CI.

The script from Robert Flight that’s provided the basis for most of this work does something specific to update the web facing branch of the repository. It would:

Create a blank repository
Add the requisite files to the repository
Add and commit them to the repo
Force the repo to overwrite the gh-pages branch

This had the unfortunate consequence of losing the history of what was previously hosted on the branch and could not tell me what commit to my development branches was responsible for a version of the docs. It took a little bit of playing but the revised script now:

Clones the gh-pages branch
Adds the requisite files into the reports
Add and commit them to the repo
Push the changes

Using an environment variable ($TRAVIS_COMMIT) the commit message is the commit ID for the latest commit in the build that occurs on Travis, making it very easy to see what changes triggered a documentation update.

March 24, 2016
Steph

This is a brief update on my packages not currently on CRAN: tfsR, HIBPwned, and mockaRoo. tfsR tfsR is designed to help you work git repositories in Microsoft Team Foundation Server (TFS) and Visual Studio Team Services (VSTS). I wrote the package a while ago and it has/had just two functions; one for getting a list of git repositories, and one for making a new git repository. The release of httr 1. Read More

satRdays are go!

March 23, 2016
Steph

I’m very pleased to say that the R Consortium agreed to the support the satRday project! The idea kicked off in November and I was over the moon with the response from the community, then we garnered support before submitting to the Consortium and I must have looped the moon a few times as we had more than 500 responses. Now the R Consortium are supporting us and we can turn all that enthusiasm into action. Read More

HIBPwned, an R package for HaveIBeenPwned.com

March 21, 2016
Steph

The answer in life to the inevitable question of “How can I do that in R?” should be “There’s a package for that”. So when I wanted to query HaveIBeenPwned.com (HIBP) to check whether a bunch of emails had been involved in data breaches and there wasn’t an R package for HIBP, it meant that the responsibility for making one landed on my shoulders. Now, you can see if your accounts are at risk with the R package for HaveIBeenPwned.com, HIBPwned.

Current status

The package is currently available on github @ stephlocke/HIBPwned, but I intend to submit to CRAN after getting some feedback from y’all.

March 15, 2016
Steph

Just a quick heads up to peeps in and around Cardiff, Wales. Later this month we’re holding Battle of the Beards: Return of the Beard. Fantastic speakers with resplendent beards are joining us for our 6 sq. ft of pizza to present on advanced SQL Tricks, using PowerBI as a DBA, and stepping up to the challenge of the last minute audit. This is a great chance to meet folks interested in the Microsoft Data Platform and learn from some incredibly knowledgable speakers. Read More

SSH tunnels on Windows for R

March 14, 2016
Steph

Recently I’ve had to get to grips with SSH tunnels. SSH tunnels are really useful for maintaining remote network integrity and work in a secure fashion. It is, however, a pain to open PuTTY and log in all the time, mainly because I couldn’t script it in R! It’s been a trial, but like most things it turned out to be pretty simple in the end so I thought I’d share it with you.

What’s required?

PuTTY
winSCP (optional tool, generally helpful)

Read More

Beware the Microsoft Edge as a PDF reader

March 9, 2016
Steph

Just a heads-up for people like me who’ve gotten a Windows 10 machine and have used Edge as a PDF reader. I was too lazy to install Adobe Reader and was instead using Edge as my default reader. This gave me a mini-heart attack when I received a proof for my super cool NFC-tag laptop stickers and the colour was wrong. WTF right, I mean we did send it as CMYK and all that jazz so it should be right, a printer wouldn’t screw that up, right? Read More

mockaRoo – making realistic test data in R

March 8, 2016
Steph

When I’m building stuff in R like packages, models, etc. I find myself wishing for realistic looking test data without having to resort to getting data off my production server. To that end I’ve been on the hunt for a way of generating decent test data. A few months back I stumbled upon the neat system Mockaroo which provides a GUI to build some data that suits your needs.

Mockaroo is a really impressive service with a wide spread of different data types. They also have simple ways of adding things like within group differences to data so that you can mock realistic class differences. They use the freemium model so you can get a thousand rows per download, which is pretty sweet. The big BUT you can feel coming on is this – it’s a GUI! I don’t want to have spend time hand cranking a data extract.

Thankfully, they have a GUI for getting data too and it’s pretty simply to use so I’ve started making a package for it.

I’ve started the package on github and will be developing it over the next month or two. It’s up and working, but only in the most primitive way as I’d like to get some feedback from folks who might find this useful around how the interface for generating your desired data schema should work.

March 7, 2016
Steph

In May, I will be delivering two R for Microsoft training days. These two days will focus on some R fundamentals and applying these fundamentals within the Microsoft Data Platform. These training days are ideal if you know one half of the components – whether that’s the R bit or the Microsoft BI bit. Either way, you’ll learn about the other half. SQLBits SQLBits XV is being held in Liverpool this year and my training day is on May 4th. Read More

Declutter a shiny report’s code v2.0

March 3, 2016
Steph

I wrote a year ago on a way to declutter shiny report code which involved putting objects into a sourced file, however, at that point in time the solution was a bit brittle and clunky. Now there’s a better way to develop shiny applications – shiny modules. In October, RStudio introduced the concept of modules which involves abstracting code out into self-contained blocks. Modules are ways of batching your code into discrete chunks – you keep all the code related to the inputs, manipulation, and presentation for doing something in one module. Read More

My life, my universe, my everything

March 1, 2016
Steph

Last year involved moving jobs to Mango as Principal Consultant, moving home, getting a dog, and Oz becoming the Purple Tadpole. That was on top of SQL Relay, copious presentations, and much travelling. By the end of the year I was pretty ill and exceedingly grumpy. Oz wasn’t having a huge amount of fun either between me never being home and him having to hold the fort in a massive way. Read More

Fixing the Tiny Icons, Big Text issue on my XPS13

January 13, 2016
Oz

The Issue

I love my Dell XPS13. It’s fast, sleek and gorgeous. It does however have one little problem: the icon and text size. The text was always too big for the buttons and boxes and the icons were so small you could hardly see them. This made it hard to use my machine without an external screen (which doesn’t have that issue and should have been my first clue!)

An example of the issue I was dealing with

January 6, 2016
Steph

I talked back in November about the idea of an RSaturday, which were free community-driven conferences on R. Since then, we created a GitHub repository and started hammering out the details for satRdays. The current proposal consists of: A name: satRday A proposition: Free/cheap (<£30) conferences organised by user groups around the world. Attendees get more access to training in R, with a much lower cost-barrier. We develop more speakers on R. Read More

optiRum 0.37.3 now out

January 4, 2016
Steph

Just a quick heads up to announce the availability of optiRum 0.37.3 – this takes into account the new version of ggplot2 and is backwards compatible. Read More

Anchor Modelling: Sixth Normal Form databases

December 31, 2015
Steph

About Anchor Modelling Anchor Modelling moves you beyond third normal form and into sixth normal form. What does this mean? Not sure about the normal forms? See the normalization process in actions with this normalisation example Essentially it means that an attribute is stored independently against the key, not in a big table with other attributes. This means you can easily store metadata about that attribute and do full change tracking with ease. Read More

Auto-deploying documentation: Rtraining

December 23, 2015
Steph

In my last post on using GitHub, Travis-CI, and rmarkdown/knitr for automatically building and deploying documentation, I covered how I was able to do it with a containerised approach so things were faster. I also said my Rtraining repository was still too brittle to blog about. This has changed – WOO HOO! The main thanks for that goes out to the new package ezknitr from Dean Attali. ezknitr takes the pain out of working directories, making my hierarchies much more resilient. Read More

Should presenters have to pay to attend?

December 7, 2015
Steph

I recently did something for the first time: I declined to speak somewhere. It was never stated on the submission page, and was raised only after my session was accepted – they wanted me to buy a ticket to attend and I refused to do that. As a speaker I love donating my time and I really don’t mind paying my own Travel and Expenses (T&E) but to have to pay to get in the door of the place I’m speaking at feels wrong. Read More

Auto-deploying documentation: FASTER!

November 13, 2015
Steph

Over the past few years I’ve been delving deeper into automatically building and deploying documentation and reporting in R (with rmarkdown, LaTeX etc). This post covers another step forward on that journey towards awesomeness.

November 11, 2015
Steph

Boris Hristrov, Data Platform MVP, design whizz, and all-round great guy, recently launched 356labs. Boris wrote a great Presentation Design course for PluralSight, you can sign up for a trial of PluralSight and watch the course if you’d like to find out more.

Being an avid reader of design stuff I did find I knew some of the things on the course, but the context and application were very helpful. Off the back of his course, I went on to produce my most visually impressive presentation slide deck to date – Agile BI.

I took a look over his site and asked a few questions since I was really curious. Here are the responses!

November 7, 2015
Steph

A while back, I wrote about how I was waiting to be able to release optiRum to CRAN, well data.table 1.9.6 was released (a key dependency for new functionality) and I’ve finally had some quiet time. So optiRum 1.37.1 is now accepted and trickling through the CRAN publish processes.

November 4, 2015
Steph

UPDATE: Proposal now being developed after fantastic community support. Check out satRdays on GitHub and contribute your opinions!

I had a contact from a very nice chap in Dallas a month ago about whether in the R world we do anything like SQLSaturdays.

The great thing about the SQLSaturdays he said was not that they’re free (well it helps!) but that they’re on his time. Developing his skills was something he couldn’t get signed off by his boss so he wanted to be able to do it by himself.

In answer to the question of whether there are local(ish) weekend conferences happening regularly for R, my answer was “not really” and it’s a shame because the R community is fantastic. I started thinking about why we don’t have them and what would be needed to change that.

Free / cheap regional small-medium conferences are a must for growing user knowledge and speakers in R.

October 28, 2015
Steph

Since August, I’ve had the pleasure to work at Mango Solutions, a data science consultancy, as a Principal Consultant. In that time, I’ve been to EARL London, SQL Relay, and SQL in the City, so conference season has been in full swing with more to come in the form EARL Boston next week. Surprisingly, I’ve also found some time to help some customers out and write some blog posts over on the Mango site. Read More

Women doing technology

October 24, 2015
Steph

Yesterday, another Women in Technology conference got forwarded around and looking at the agenda, I snapped. I asked to not see any more goddamn WiT conferences.

I’m really fed up with women talking about being in tech. I don’t perceive any value in attending a conference dedicated to that. I want to see more women talking about doing tech.

October 16, 2015
Steph

Today, I presented a lightning talk on DataOps at SQL in the City. It was a fantastic day and a great opportunity to catch up with how the database side of things is evolving to embrace DevOps.

My lightning talk was titled DataOps – it’s a thing (honest) and focused on what is essentially DevOps ported out of the developer sphere and into the data professional sphere.

July 16, 2015
Steph

SQL Relay is back on tour! Our 6th event sees us yet again touring the country bringing awesome speakers to eight cities over 2 weeks. As usual we’re improving things: A new platform called Attendee.Events which allows easier registration & speaker submission A whole bunch of webinars planned to get in the swing of things More breadth – this year each event includes a dedicated track to R, machine learning etc, on top of SQL Server and business intelligence More content – instead of swapping out existing slots for breadth, we added more tracks A green initiative … see it on the day Improved speaker experience – we’re building on last year’s fun bus with the help of SQL Sentry to make it easier and more fun for speakers to do multiple events, as well as offering first timers some mentoring as most of the organisers are also speakers You’ll see a lot of announcements about Relay over the next few months, but I hope this little post inspires you to check out our events 😀 Read More

R training day mk2 – @SQLSatMcr

June 24, 2015
Steph

Back in April, for SQL Saturday Exeter I ran my first ever full day of training. Next month sees me taking my second tilt at it.

To sign up for my R training day, July 24th, in Manchester you can go to the pre-con homepage.

If I may say so myself, it’s a steal at £99 but then they all are! For instance, Andrew Fryer’s training day covers the Machine Learning use of R via Azure, so if you’re already wrangling numbers like a pro in R, understanding how you can apply it to snazzy webservices is a great way to go.

June 17, 2015
Steph

I’ve been producing presentations via R using rmarkdown and outputting to either ioslides or slidify. That was excellent, because I could provide a CSS that customised the look and feel (relatively) easily*.

However, when I wanted to produce a PDF version, I couldn’t make ones that look as good as the pure LaTeX versions I could make on overleaf.com. So I started RTFMing when I wanted to replicate the look and feel from my presentation, The LaTeX Show.

I didn’t want to spend a huge amount of time on it, so this little story of hack and slash may feel a bit dirty to you!

June 5, 2015
Steph

Following on from my post about the principles behind using travis-ci to commit to a gh-pages I wanted to follow-up with how I tackled my “intermediate” use case.

Posts in this series

Multiple vignettes

In my original post I show how I pushed the tfsR vignette to gh-pages, which involved copying it and renaming it to index.html.

Unfortunately, this wouldn’t work if I had multiple vignettes that I wanted to be accessible online.

Requirements

An index.html file
A way of extracting any number of html files from the vignette folder

Read More

My (first) SQLHangout

June 4, 2015
Steph

Yesterday I had the pleasure of hanging out “on air” with Boris Hristov. We talked open sourcing! You can download and/or contribute to the following projects I have going: MeDriAnchor: the Metadata Driven Anchor model system. You get to have a lot of fun with 6NF! optiRum: a useful package for R, especially for the UK tfsR: for my own special brand of crazy, working with TFS git repositories in R James Skipwith (the other major developer on MeDriAnchor) presented on automation and covers MeDriAnchor – you can check it out at SQLBits. Read More

optiRum – presentation

June 3, 2015
Steph

optiRum, the R package I built and support for Optimum on CRAN has gained some extra functions recently. Some of it uses currently experimental data.table functionality so I’m eagerly awaiting the release to CRAN to deliver optiRum.

In the interim, I thought I’d give some brief overviews of existing functionality contained in the package.

The next part of the coverage of optiRum functionality is to talk about the stuff that makes generating outputs easier!

June 1, 2015
Steph

In this post, I’m going to cover how you can use continuous integration and source control to build and host documentation (or any other static HTML) for free, and in a way that updates every time your code changes. I’ll cover the generic capability, and then how I apply this to my simplest package, tfsR. In a later post (once I’ve cracked the best method to do it) I’ll cover my more complex use case of multiple documents and a dynamically constructed index page.

NB: This is kicked off from a post from Robert Flight about applying to the technique to R package vignettes. It’s a very useful post but it was quite specific to his situation and I wanted to understand the principles behind it before I started extending it to my more complex cases.

Posts in this series

Requirements

Must haves:
- Travis-CI
- GitHub
Optional:
- A linux machine (so you can test your bash script that Travis-CI will run)
- R (for following the specific instructions)

High-level process

Get an OAUTH token from github
Add OAUTH token to travis
Add a *.sh file that gets your HTML (depending on circumstance, you may also need to generate it) and pushes to gh-pages branch
Include your .sh file in the after_success part of your travis file
Commit & push!

Read More

How many is too many conferences?

May 25, 2015
Steph

The SQL Server community has a lot of events. In the UK alone this year we will have/had in 2015: more than 75 user group meetings 5 SQL Saturdays 8 days of Relay 1 SQL in the City 3 to 5 days of SQLBits possibly a SQL Santa and probably more that I’ve forgotten or not know about at the time of writing this You could attend 2 days of conference per month (cpm from now on) on average in the UK alone, just at dedicated SQL events. Read More

SQLSaturday Portugal

May 18, 2015
Steph

SQLSaturday Portugal 2015 has been a huge amount of fun but I’ve also learnt a lot. A big thank you to the organisers!

Below are my slides and notes from sessions I attended.

Agile BI

My session slides:

April 26, 2015
Steph

Well SQL Saturday Exeter flew by. T’was great catching up with people, seeing folks learn how to use (old & new) tools better, and just generally watching everyone having a great time at one of the best organised conferences I have the pleasure of going to. Here are links to all the slide decks etc that we presented this weekend: My R: analysis to integration training day notes & source code My agile BI slide deck My Shiny: dashboards in R slide deck & source code Oz’s SSRS: Beyond the basics slide deck & source code If you attended any of our sessions, give us some constructive criticism! Read More

Easy Continuous Integration for R

April 20, 2015
Steph

With excellent guidance and tooling on making R packages, it’s becoming really easy to make a package to hold your R functionality. This has a host of benefits, not least source control (via GitHub) and unit testing (via the testthat package). Once you have a package and unit tests, a great way of making sure that as you change things you don’t break them is to perform Continuous integration.

What this means is that every time you make a change, your package is built and thoroughly checked for any issues. If issues are found the “build’s broke” and you have to fix it ASAP.

The easiest, cheapest, and fastest way of setting up continuous integration for R stuff is to use Travis-CI, which is free if you use GitHub as a remote server for your code.

NB – it doesn’t have to be your only remote server

April 17, 2015
Steph

I had the pleasure of presenting at unified.diff, a general programming user group in Cardiff, last night and was able to debut my LaTeX show! If you’d like to talk at the group about anything tech related tweet them on @unifiedDiff. They’re very flexible on time and topic so if you’re based in Cardiff or are coming down to see a client, it’s an easy way of delivering a talk and meeting some nice people. Read More

optiRum – gini like a wizard

April 16, 2015
Steph

optiRum, the R package I built and maintain for Optimum on CRAN has gained some extra functions recently. Some of it uses currently experimental data.table functionality so I’m eagerly awaiting the release to CRAN to deliver optiRum.

In the interim, I thought I’d give some brief overviews of existing functionality contained in the package.

I do a lot of regression models and one of the common tools for assessing a regression’s ability to accurately model an event is to produce a Gini chart and a Gini coefficient. The higher the Gini coefficient, the more your model is able to discriminate probability accurately.

I simplify the process of producing gini charts (giniChart) and coefficients (giniCoef) so that I get a chart in one simple step.

Under the hood this uses the AUC package to get the coefficient, scales to format it and ggplot2 to produce the chart. Using ggplot leads to a better looking chart that can also be tweaked to suit your needs since a ggplot object is returned by the function.

April 15, 2015
Steph

Maker’s Schedule, Manager’s Schedule Reality Check: Counseling for Developer Hero Worshippers Comments on Joining Microsoft Giving back for the future of open source SQLBits videos – watched not read though What they don’t tell you about public speaking Toxicity in Reddit Communities: a Journey to the Darkest Depths of the Interwebs Owner of a Credit Card Processor Is Setting a New Minimum Wage: $70,000 a Year Use Datazen for free if you have SQL Server Enterprise Read More

Organised speaking – improving font sizes

April 13, 2015
Steph

A recurring problem with my presentations is font size. The inclusion of code in my Rmarkdown slides was by default too small. Upping the fontsize via CSS worked ok, but when I switched to a shiny app version for my intro to shiny, it reverted and I’m afraid to say I didn’t notice beforehand. I use PuTTY for showing how to do some stuff in the linux command line but the font’s quite small by default and Gail Shaw’s tip of Magnifier in my session was tough to use I’ve upped my font size on my Rstudio IDE, but hadn’t yet implemented this across other IDEs I tend to use my mouse cursor to draw attention to things. Read More

An R data.table cookbook

April 8, 2015
Steph

For my precon on R at the end of the month I’m working on the takeaway — the handout. This’ll be thing that makes the training day able to be put into practice immediately, and refills all those drink and sleep depleted neurons back up with R knowledge. One of the things is a simple data.table cookbook. If you’re a data.table user, what other tasks do you think should be on there? Read More

Stuff I read this (bit more than a) week

April 7, 2015
Steph

It’s been a wee while due to SQLBits disruptions and a crazy work schedule but here’s some of what I’ve been reading recently: Introverts, Extroverts, and the Complexities of Team Dynamics Azure Blob Storage introduction Kevin Kline: Advice to new bloggers Editing for people who love to write too much ProBlogger generally after KK’s recommendation Why oil prices came down. and won’t any more Test Driven Analysis Standardising function names in R Microservices at Netflix Microsoft closes acquisition of Revolution Analytics Read More

Working with Azure Blob Storage, some notes

April 6, 2015
Steph

I’m working on building a snazzy shiny app that a) drops the inputs/parameter values into blob storage and b) uses Stream Analytics to query the values and present back what people are saying at the moment. This’ll be a fab tool for my pre-con next month if I can get it working in time!

Getting it working, does however mean utilising the Azure Blob Storage API in R which I confess is much harder than expected, especially after the ease of using the Visual Studio Online API for tfsR. To that end, I thought I’d write-up some of my findings before I do a bigger write-up that illustrates how to do everything (in R).

I’m working my way through an intro to azure storage on the (hopefully reasonable) expectation that more knowledge will make it easier to work with. There’s additionally the online reference, although I found the VSO REST API documentation easier to understand and get started with.

March 23, 2015
Steph

I’ve been asked by a few people recently about why I don’t use Azure Machine Learning (ML). I answer that I don’t use it yet, and the reason being that at the moment the robust development life-cycle isn’t in place around it. I think that will change – one of the great reasons for the acquisition of Revolution Analytics (in my opinion) is their DeployR system. DeployR is essentially an R web service platform. Read More

Bride of Frankenstein: TFS + R

March 20, 2015
Steph

The unholy abomination of trying to use TFS as my central repository for my R code over the past year has been tough and you may or not be looking at the screen as if I’m a crazy fool for even trying. Of course, now I have good news, because I’ve broken the back of the main issue I had with TFS. The crucial link was being able to programatically create Git repositories within a single project for small projects.

Using the API, I’ve been able to write an R package with functions that now save me at least 15 minutes of time and effort each time I want a new project. So I can happily holler “IT’S ALIVE!!”

March 14, 2015
Steph

For some people it might sound silly, but a frequent reason why people don’t sign up or don’t make it to their local user group is to do with social anxiety. I totally understand this – a room full of people you don’t know can be a daunting experience. I still get nervous when attending a new user group for the first time and I run three user groups, and speak at user groups and conferences all round the country!

This post takes you through the worries, and explains how I’ve approached some of the issues. Hopefully, this’ll help you get more people in to your local user group and learning, whether it’s because you have the tools to help yourself, or understand and can help others.

February 27, 2015
Steph

The Unbearable Lightness of Tweeting Hadley Wickham: Impact the world by being useful Academics should be made accountable for exaggerations in press releases about their own work Why Complex Decisions Inevitably Take Weeks Six sentence emails that get fast responses Average house prices: how expensive is your area? FCA Consumer Spotlight — segmenting retail financial customers Making R Files Executable (under Windows) Shipping Culture Is Hurting Us Read More

Where do I fit in the Microsoft future?

February 24, 2015
Steph

Entering into the world of SQL Server around the same time as the 2008 release has meant that until the past couple of years, change in the Microsoft BI world only happened in dribs and drabs for me. SQL Server and it’s BI components were stable server products and the focus was on getting data and optimising “central reporting”. Recently though things have started to massively change due to Azure and Office 365.

No longer part of Server & Tools where products were considered in silos, SQL Server and BI are now part of the Cloud Platform. It’s now a means of delivering the Cloud-first vision that Microsoft have aligned themselves to.

February 22, 2015
Steph

ONS Style guide for writing about statistics Your coding style can give you away Automated Tinder and the Eigenface R-help mailing list to use cage-fighting to resolve conflicts Application Containers For Cloud Computing Managing Test Data as a Database CI Component – Part 1 Let the Hackers In: Experts Say Traps Better than Walls How to Write a Blog Post Read More

Declutter a shiny report’s code

February 18, 2015
Steph

Shiny reports are awesome, but they sure do end up with many lines of code when adding lots of inputs and outputs. A ui.R file can rapidly exceed 50 lines of code and I prefer to keep things more compact. The best way I’ve found of doing that in other languages and in R is to modularise my code – break it down into independent chunks. Shiny already does this by having a server() and ui() section and allowing you to source other files. Read More

Stuff I read this week

February 13, 2015
Steph

Here’s a selection of articles etc. that I found really interesting this week: Gendered Language in Teacher Reviews Knowledge units – the atoms of statistical education SQLBits in The Register Paul Randal: Want to be mentored by me? Replacing Middle Management with APIs R in Business Intelligence The RHS assignment operator in R Ooh R Can Microsoft make R easy? Read More

A busy month or so

February 11, 2015
Steph

I’m really looking forward to a few months of user group and conference awesomeness: Feb 24: CaRdiff presenting Shiny: Dashboards in R Feb 26: Oxford UG presenting Learning the ropes via the community Mar 4-6: Helping out at SQLBits Mar 7: SQLBits presenting Shiny: Dashboards in R Mar 9: SQL Cardiff with Jen & Sean McCown presenting Mar 17: Diff.Net with Scott Hanselman presenting Mar 31: SQL Cardiff with the Battle of the Beards Then even more fun kicks in with a SQLSaturday Exeter precon, a visit to unified. Read More

magrittr: cleaner program flow

February 9, 2015
Steph

Last year I built a pretty sweet web service in R as part of the day job. However, not being well-versed in stuff like object-oriented programming, I did not do the best job of making the flow of my program particularly clear or robust. It wouldn’t take multiple inputs properly and I found it to be tough to test. In spare moments, I took to cogitating how to improve things.

I tried simply refactoring some of the functions but found my structure too cumbersome to allow much change. I tried starting afresh with an S4 system but was soon in a death spiral of class proliferation and no experience in how to stop it. After dabbling with different methods, I was getting pretty frustrated – I want my code to be better and more maintainable!

Now I’m looking at magrittr.

About magrittr

magrittr was designed to better facilitate functional programming based on piping inputs from one function to another. It’s the same paradigm as the PowerShell operator |.

This means you can more succinctly pass an input through various transformation steps (in contrast to my initial method) with a lot less code. The ability to add conditional functions or even new functions on the fly (aka lambda functions) with a similarly low code burden gives the added benefit of helping with branching logic.

February 7, 2015
Steph

Oz and I being the lazy so and so’s that we are, share a profile and use it across all our devices. Our username is “Steph & Oz” which means the user folder that Windows has for us is C:UsersSteph & Oz. Having spaces and special characters is generally not recommended, and gives interesting issues when using R, primarily at initialization and when trying to do package installations.

By default, R will try make the user’s personal folder the directory which it works under, i.e. limiting its impact on the computer overall, but it’s Unix/Linux roots mean that it doesn’t like you doing whacky things like ampersands in folder names.

The result with ours is to cause this error on load:

Error installing package: Error: ERROR: no packages specified

‘Oz’ is not recognized as an internal or external command,

operable program or batch file.
Read More

Paul Randal offers mentoring

February 6, 2015
Steph

Hot off the back of his win in the Tribal Awards, Paul is offering to mentor 3 men & 3 women for two months. To be in with a chance of getting mentored by Paul, you simply need to apply by writing a blog post about why you should be considered for mentoring and posting the link by the 15th Feb 2015.

I think it’s an awesome offer that you should take up if possible (i.e. you’re reading before the deadline) and whilst I’m busy trying to convince you I’m going to insert my application too. Hopefully, seeing my application will help you form your own.

What is the value of being mentored?

Mentoring gives you the opportunity to have someone who can assist you in the way a senior techy can when you face a technical challenge. They can give valuable advice about hidden perils, shortcuts, and point out code smells.

That advice is valuable, but to get it you need to properly formulate your issue or challenge faced. Like posting on Stack Overflow, putting thought and preparation into the question gives you a deeper understanding before you even talk to your mentor.

It’s worth noting that you can’t be vague. “I want to be the best” or “I want to know everything” is never going to happen. Mentoring is not a panacea for your entire career – especially with short duration mentoring like Paul’s. To get the value, you need to settle on a specific issue or challenge that you want to tackle.

January 24, 2015
Steph

As I covered in my post on SQLSaturday Exeter, I’m going to be doing a full day of R training on April 24th that takes you from cabin boy to first mate in a day. You can’t be captain because I’m Captain… until you go back to your own ship… then you can be captain.

TL;DR

Attend my day of training about R if you’d like to learn R, best practices, and how to manage it.

It’s £150 (early bird) and can be booked at SQLSaturday Exeter’s website

January 21, 2015
Steph

In my iterative presentation design post I promised a case study. I thought I’d cover my most presented session Intro to R, in future called Knowing your Rs from your elbow courtesy of @FatherJack.

A brief history

Where I’ve been using R for the past couple of years and spent the first months struggling with it, I wanted to give a presentation that I would have wanted to see at the beginning. Not one about random bagging and a bunch of other stats but what are the best ways to do the fundamentals:

connecting to my database
performing data manipulations, summaries and updates
charting my data
producing reports

A few packages cover these awesomely and are much better than base R so whilst I was tackling a massive stats project, the things which took the time and stress were things I could have avoided with ease!

So my intro to R, takes people through the things I wish I’d been taken through thus making those first few months of R pleasant, happy times!

January 20, 2015
Steph

Just a quick tip for spreadsheet users about spellchecking in Excel. Firstly, yes you can spell check a spreadsheet. Secondly, you do it either by going to Review > Check Spelling, or more easily by hitting F7 on your keyboard. Please, please spellcheck your work – it makes your work much more professional and saves you having to do it manually! Read More

SQLSaturday Exeter 2015

January 19, 2015
Steph

Woohoo! The kind and crazy folks at SQLSaturday Exeter accepted my submitted training day for their roster. Before I wax lyrical on the virtues of being locked in a room with me all day, I thought I’d better cover the fundamentals of the event itself!

First, the awesome video…

January 18, 2015
Steph

I wanted to outline my approach to presentation design, or development as I prefer to call it.

Why do I consider it development? Well, it’s a product that can be manually done & delivered but with the potential to scale to thousands of users, I’d rather the product be easy to maintain & deploy, deliver real value to the users, and keep up with cutting edge developments in the subject. Also, I call it development because now with the use of rmarkdown, I do actually code my presentations.

General presentation design

I’ve read and studied a lot about presentations, some of the biggest influences being:

– Dr. Andrew Abela and the Extreme Presentation Method

– Buck Woody and his fantastic presentation style

– Brent Ozar and his excellent materials for presentations

– Solid fundamentals in presentation training courses (things like INTRO: Intro, Need, Title, Range, Objective)

When I first come up with the idea for a presentation, I write the abstract for it. In the abstract I set out the tone, material covered, and outline who should attend. This abstract is my requirements doc for later me – it tells me whether I’m selling, educating, or entertaining and what I’m doing it about.

In my opinion, you should always write the abstract first as not only can you write more abstracts than you can presentations but it distills the idea down and helps you think of your audience first.

January 14, 2015
Steph

Last night was the first Cardiff R User Group event. There were 6 people registered out of 24 CaRdiffians. In the end we had 8 people show up – so a whopping third of our current membership base.

As we sat around the booth eating chips and drinking beer, we covered our experiences learning R to date, the trials and tribulations of our jobs and why you shouldn’t drop a barbell on your nose. We had great discussions and most of us came away with new R functionality to look at!

We decided to initially go with the three session formats I’d proposed and see how things go:

TalkRs: evening events with talks and socialising
LearnRs: after work sessions focused on learning some new bit of R
LunchRs: quick lunchtime sessions to talk through a problem with someone else and hopefully solve it!

Read More

Photoshop image macro (or something even better!)

January 13, 2015
Steph

I spend a lot of time in Photoshop for someone in BI. Between cleaning up images, building logos for my latest project, or producing material for user groups, I probably use it at least once a week. Through it all, I usually need to produce variants, in different file formats and sizes. So it can quickly become a dozen uses of the Save As… or Save for Web functions.

I hate manual work, so you can see why it was frustrating in the extreme. Then I realised how silly I was being by not having already googled for it!

It took a while because my keyword searches weren’t the terms Photoshop use but I found the Secret Sauce. And if you’re the sort of person who’d type “photoshop image macro” – here’s how you do it!

January 12, 2015
Steph

As part of my ongoing series about presenting at community events and conferences, I wanted to cover the my personal thought process when it comes to prioritising what events I’d like to speak at for my goal Throw 1, Speak 1.

There are a massive amount of awesome SQL Server and other technology events happening out there. I even throw a SQL Server lunchtime session once a week for the user group! Then of course there’s all those conferences in the UK and abroad that are worth attending. So how do I event start picking out where I’d like to talk, and how do I go about getting selected for them?

January 11, 2015
Steph

It’s a bit sad but I enjoy dissecting what sessions are submitted to conferences I’m involved in or speak at. Instead of doing it primarily by eye, I’ve started dabbling in web scraping in R to do it. Initially, I used RCurl and my latest snippet uses rvest.

The first snippet for SQLBits bit of R code uses RCurl but it’s cumbersome, plus for SQLSaturday Exeter there is SSL to contend with. Using rvest makes it really easy and it was an excellent excuse to get around to using magrittr, Hadley Wickham’s pipe code paradigm for R.

Blogger tip: I also wanted the opportunity to see how Gists imported into WordPress – you just c&p the url in (into the code, no URL markup) and WordPress automatically pulls in the Gist. For more info on this see WordPress’ article on Gist.

January 10, 2015
Steph

Not quite part of being organised at speaking, but bundled up in part of my scheduling constraints for speaking is when I’m throwing user group events. Here’s the details of the user groups I’m planning on throwing to meet my goal of 1 user group event a month (not including lunch time sessions!) SQL Server I run the SQL Server user group in Cardiff and have done a few years – I’m not giving it up any time soon. Read More

Starting the Cardiff R User Group

January 9, 2015
Steph

These days any hobby of mine ends up with a user group if there isn’t one already.

The amount of value I derive from being able to hear experts in their fields talk about whether they’re on stage or in the audience is phenomenal. Also, it’s really great way to meet like-minded people.

So with the benefits in mind, 2 years of R under my belt, and a new starter in work, the time seemed ripe for an R user group.

January 8, 2015
Steph

Following up from my last post on maintaining my session abstracts, I wanted to cover how I’m doing my scheduling this year for speaking at events. Perhaps more importantly than tech, is the intention and the planning process so I’ll be covering these factors in more detail than the tech.

Technology

I make use of Google services quite a bit, and their calendar system is a great help. So this year I’ve added a calendar that has all mine (and hopefully Oz’s) speaking engagements.

I’m then utilising a WordPress plugin called GCal events to connect to the calendar and pull the info into a page.

Throw 1, Speak 1

The goal this year is to throw one user group event and speak at one event each month.

January 3, 2015
Steph

As I’ve been using this blog more recently, the page speed has been becoming much more frustrating. So today, I’ve done some stuff to improve it and it’s now twice as fast as it used to be. Please let me know what you think of the new style!

January 2, 2015
Steph

Last year I spoke at 10 different events (I think) and was very lucky to be nominated in the Tribal Awards for my Intro to R session. I did just a couple of different session titles and I don’t think I managed the whole process very well.

To be an easier speaker to deal with, I’m trying to be more organised so that the selection process of myself & topics is easier whilst also ensuring I don’t develop too many presentations at the last minute.

Having dealt with awesome serial speakers, Tobiasz Koprowski and Denny Cherry, from the organiser end they did a few things which made it much easier to deal with them, particularly given the breadth of topics they can cover!

December 29, 2014
Steph

This year we’ll be continuing to maintain evening events on Tuesday nights and lunch time events on Thursdays.

Evening events

So far we have the following events and speakers scheduled for the evening events:

Jan 27th – 2 hour intro to replication by David Williams
Mar 31st – Battle of the Beards! Tobiasz Koprowski vs Terry McCann vs Rob Sewell
May 26th – Index Fragmentation: Internals, Analysis, and Solutions by Paul Randal, and Steve Powell
Jul 28th

Alex Whittles on winning Fantasy F1 using PowerPivot

I’ve got slots in there for full hour sessions as well as lightning talks for up to half an hour long so whether you’re an existing speaker or want to improve your knowledge, please get in touch and book yourself in.

November 23, 2014
Oz

The What

If you need to join multiple datasets inside SSRS, perhaps because of different sources, grains of detail etc, then you often need to aggregate over both datasets.

In SSRS, you can easily perform aggregations over another dataset but it can be tough to do this based on a grouping factor in your main dataset.

A key example of this might be Sales and Purchases – you want to show both of these by month but they come from two different data sources.

You could build two tables that appear to be just one table but this can be really clunky. Instead, you want just one table with the month, the total sales, and the total purchases in.

Although there’s no tidy way of doing this built in, you have the power to add your own functions to SSRS using the Code window of the report’s properties. Provided here is a block of VB script that can be added to your SSRS report to allow you to do those tricky aggregations as if they were just another built in function.

I call it AggLookup.

November 10, 2014
Steph

Another quick post off the back of a SQL Lunch a did a while ago. Explore it via SQLFiddle: http://sqlfiddle.com/#!6/ad7f5/7/0 What is a CTE? A Common Table Expression (CTE) is essentially a function defining a relation instead of a table. This function outputs a table (like all queries) that is only present within the session, but data isn’t stored in tempdb like with a temporary table. Why CTE’s? CTEs are designed primarily to allow recursion within SQL – like a loop but ideal at working with hierarchies. Read More

Database / BI related unit testing options

November 6, 2014
Steph

A quick list of frameworks available for doing unit testing, based on what I covered in today’s SQL Lunch MSFT Database projects Purpose: unit testing database objects Method: SQL / GUI Site: http://msdn.microsoft.com/en-us/library/jj851200(v=vs.103).aspx Cost: Free Pros: Built-in, quite well documented Cons: Requires Visual Studio 2010 Pro or above Codeplex ssisUnit Purpose: SSIS unit testing Method: XML / GUI Site: https://ssisunit.codeplex.com Cost: Free Pros: Unique Cons: As of writing, only stable version was released in 2008 Read More

Where’ve we been?

June 28, 2014
Steph

Almost into July and I haven’t posted a single thing in this blog! Shameful of me to be sure – I’ve been learning but not sharing. So what’s been happening? Well I moved into a new job at a brand new startup where I’ve been primarily doing R, modelling, and finally getting my hands back into SQL Server! That’s been keeping my day’s and parts of my nights busy. I’m also working on a startup at home with Oz called Clocksmith Games. Read More

Merry Christmas

December 24, 2013
Steph

November 28, 2013
Steph

September 25, 2013
Steph

After dipping Cardiff’s collective toes into the world of local SQL Server conferences, we’re doing it again in November. We’re taking registrations, and volunteers are always welcome. If an all day conference is a bit too much, why not try out a lunchtime or evening event? There are 9 other events around the country which you can attend as well as / instead of Cardiff, including Bristol where I’ve the privilege of speaking along with some excellent folk like Klaus Aschenbrenner. Read More

R for database and Excel people

September 15, 2013
Steph

What is R?

R is a statistical language for doing all sorts of analytics based on many different types of data and it’s also an open source platform that allows people to extend the base functionality. More details are available from the horse’s mouth.

How can I give it a go?

Download R and RStudio an awesome development environment for R. There is also an excellent online R learning site. I do not recommend sticking with just R – we’re used to a lot more convenience and good development bits and bobs like IntelliSense and Rstudio really delivers.

September 13, 2013
Steph

Further to the last post introducing my trials and tribulations, and a hectic week or two we’ve made excellent progress on the Relay. I’ve enlisted Mark (@tsqltidy) the chair for the Relay and others to assist with the twittering and other activities which has really held me reduce my workload substantially.

All ten venues are going ahead:

Location	Date
Reading	Monday 11th Nov 2013
Southampton	Tuesday 12th Nov 2013
Cardiff	Wednesday 13th Nov 2013
Birmingham	Thursday 14th Nov 2013
Hertfordshire	Friday 15th Nov 2013
Newcastle	Monday 25th Nov 2013
Manchester	Tuesday 26th Nov 2013
Norwich	Wednesday 27th Nov 2013
Bristol	Thursday 28th Nov 2013
London	Friday 29th Nov 2013

So what’s been done so far?

Facebook

What have I been doing to try to make this a successful marketing channel:

September 7, 2013
Steph

It’s a nightmare when I’m trying to find out what’s clogging up my hard drive, particularly now that I have an SSD and can no longer be quite so lazy and sprawling with myriad files and downloads. This is the case even after moving most contents to Dropbox and putting this on my slow 1Tb harddrive. It can get really tiresome to be running out of space and having to trawl through, right-clicking on different folders. It was boring but it was how you did it, well now I am enlightened, and now I don’t have to pour my time down the drain.

September 1, 2013
Steph

After organising SQLRelay for June 24th in Cardiff, as part of the national series of 8 events. We’re gearing up for November with the aim of being able to capitalise on the growing knowledge of SQL Server 2014 CTP and pushing the Relay into a less busy part of the UK community schedule. The difficulty is that where we had more than 6 months to prep for the previous Relay, this time round we had less than 5. What this means for me, is not only do I want to run a bigger and better Cardiff event, but I also (being a glutton for punishment) took on spearheading the marketing efforts for the whole shebang.

Details will be released next week on the launch, but given my lack of knowledge about anything social media this has already been a major undertaking for me, and I thought it might be of value for me, future me, and my dear readers to compile information and learnings as I go along so that it’s easier to implement in future for other marketing endeavours. It also provides an area for discussion.

August 27, 2013
Steph

Why do I use dynamic named ranges?

Where I work, most reports are exposed via a web front-end and Excel can create an external connection and retrieve the information. This is much safer than using direct database connections in workbooks. A problem with web queries though is that they cannot be converted to Tables in order for referencing columns and the dataset as a whole to be made easier. As a result, dynamic named ranges are a necessity for producing easy to develop and manage spreadsheets since the volumes in the raw data can change over time.

How I save myself time

A raw data table with 20 columns will take a long time to create the named ranges for, given that I want:

A dynamic range covering the headers too for pivot tables
A dynamic range without headers for vlookups
A dynamic range for each column without headers

I use a macro, assigned to a nice button on my ribbon, to generate all the relevant ranges.

What are the special considerations?

Structure – raw data tables should ALWAYS be set up in a specific way – with the Primary Key on the left hand side and always filled in, with no empty rows or columns

Special characters – range names can’t contain special characters. The VBA uses the RegEx functionality to strip these out.

Numbers – range names can’t have numbers either. We can’t just strip out the numbers like we would special characters because they might be important like Grade1, Grade2 and Grade3 and collapsing them all to the name Grade would be a problem. Instead, the macro converts all numbers to the corresponding letter in the alphabet.

How much the data will grow? By default I set the macro to use 10 times the number of records present when I run the macro – if it’s already bigger than 25k rows, the number will need to be reduced, and if I don’t think 10 times the number will be adequate, I’ll increase the number.

August 20, 2013
Steph

Regular Expressions (RegEx) is a common string processing technique for handling strings that conform to patterns, as opposed to fixed strings.XKCD Perl Problems It is an excellent set of functionality that is available in most programming languages, and even in SQL. It is however not readily available in Excel or VBA. This has it downsides if you’re trying to complex string matching and extraction, so in my personal workbook, I include the RegEx functions available at http://www. Read More

SSIS basics and gotchas – presentation and resources

August 16, 2013
Steph

Follow up resources / places to go for way more detail: Stairway to SSIS MSFT SSIS tutorial package 1 MSFT SSIS tutorial package 2 MSFT SSIS tutorial package 3 SQLCat SSIS best practices Bob Duffy SSIS best practices Connection Strings The BOL for SSIS Design Patterns book Design patterns 24HOP vid Read More

Dynamic named ranges – the basics

July 27, 2013
Steph

Whoah nelly, what’s a named range first of all let alone a dynamic one? A named range is a shorthand or alias for a set of cells in Excel. These can be created easily by simply selecting one or more cells and using the name box to give it whatever name you feel relevant. This alias can then be used in formula to make something much more insightful like =A1_VAT as opposed to =A1_0. Read More

My First Platformer

June 28, 2013
Oz

My Platformer

Here’s my first attempt at a platformer, built in Construct 2 and using sprites form Game Maker.

June 27, 2013
Oz

My Day/Night Cycle Here’s my Day/Night cycle function demo, built in Construct 2. (Time moving at 4 minutes per second) It took me a day and a half of hair pulling, but I’ve only been using Construct 2 for a week so I suppose it’s not too shabby. It uses only 6 events, 2 global variables and 9 objects. Read More

User Group presentation

May 31, 2013
Steph

May 29, 2013
Steph

The problem:

A system we need to report on that is form based. Whenever there is a new form, there is a new table, and whenever there is a new or amended* field on the form, there is a new column in the table. Maintaining the imports of this data into a staging environment would require a lot of code and time to build manually from scratch.

What is required is something that goes through the two schema for all relevant objects and updates our staging area’s schema accordingly.

Points for consideration:

Due to the level of change in source system, all loads are dynamically generated SQL
Loads run from a data dictionary table, which needs to be updated when we update the schema
Loads occur daily

May 25, 2013
Steph

The example I’m running through is available at http://sdrv.ms/11lH3KR The scenario we’re looking at is where we want to be able to convey quality within a chart by having differently coloured columns, based on different conditions that we want to specify. Unfortunately, the ability to natively apply conditional formatting isn’t yet present, but we can mimic it by overlaying series of the same size that are coloured differently. Read More

Center across selection

May 24, 2013
Steph

Merging cells is easily done and can help make a spreadsheet look neat, but what you really, really should be doing instead is centering across the selection so it looks merged but isn’t. Center across selection though is hidden away and therefore time-consuming to use – no wonder people have bad habits! I wanted to do things the better way, but was lazy, so in the end I made a macro to go in my personal workbook and assigned it to my ribbon (I do this a lot). Read More

Time to go home…

May 19, 2013
Steph

I do a lot of work in spreadsheets and some cannot be left open on my PC as that’d make them locked for the morning report refresh. After a bunch of times having to buy cakes for so delaying the reports, I put something in place to stop it. It also had the very nice side effect of telling me to go home. The first thing to do is make sure you have a personal workbook. Read More

Objectless Check Boxes using VBA

May 12, 2013
Oz

For my first ever blog post (be gentle with me!) I wanted to talk about an issue I have with Excel’s check box object, and my way of resolving it. It’s not perfect, and I’d love to hear of any other versions or ideas you may have. So here’s how I create check boxes in Excel without using Excel check boxes.

The Problem with Check Box Objects

They look good and they work well, there’s no denying they do what they’re supposed to, but they also annoy the heck out of me!

As far as I’ve been able to find they can’t be properly bound to a cell; this means if you want to get rid of them you need to select them and delete them, which can be a big job

If you want to refer to their state in simple terms you need to add a linked cell and refer to that, which to me is just plain messy

They’re awkward to format and style and if you want a big tick you need a big box and as such you need a big cell… again, messy

May 11, 2013
Steph

This blog was configured super rapidly with goDaddy and Azure, instead of my previous implementation on EC2. I’ve forgone the multi-site installation, with attendant subdomains, and gone for a straight wordpress Website (one of the Azure features).

I already had an Azure account I’d gone through the billing setup for – but that was really simple anyway, so getting the blog up and running consisted of: