To quickly recap, so far we’ve just worked with some single values to get to grips with how some of the various operations work. Of course, we rarely work with a single value! If we did, we could just use a calculator.
This instalment you’ll get to grips with some different ways of storing data and how to manipulate your datasets in the “traditional” way. This will help you understand a lot of code written in the past, and will equip you to understand data manipulation of tabular data.
Get a cuppa and settle in to this sessions video, then have a play with the code yourself and read along with the blog! As always, get in touch with any questions on twitter using @LockeData!
When we were performing operations, we got some values output to the console. One of the key principles in writing code is Don’t Repeat Yourself (DRY) so we need to know how we can avoid repeating ourselves in R. One of the ways you can do that is to store a value for use later.
In R, we can store values by assigning them a name. This makes a variable or object. We can do this with a few different operators, but the traditional operator is a
<-. The format for assigning a value is
nameofthing <- value.
my_variable <- 5 + 3 my_variable*2
##  16
Valid names for a variable include upper-case letters, lower case letters, numbers anywhere but the beginning, periods (
.), and hyphens (
There are a number of different competing conventions for how you name variables. The most common conventions are shown below. I have no strong feelings for any system and only ask that you pick one and stick with it within a single script. Whatever you do, don’t forget names are case sensitive!
myfirstvariable <- 1 myFirstVariable <- 1 MyFirstVariable <- 1 my_first_variable <- 1 my.first.variable <- 1
You can create names breaking the rules governing valid names by placing the rule breaking name between two back-ticks (`). I don’t recommend you do this with variables you’ll create, but you’ll often end up with names that break conventions when importing data, especially when you import from spreadsheets.
A vector is a collection of values that hold the same datatype. It is one-dimensional in that none of the elements in the collection correspond to other values like they might in a table of values.
A single value is actually a vector of length 1.
When I introduced the colon (
:) as a means of generating a sequence, we were in fact generating a vector where each element was a number in the sequence. The vector has a length which is as long as the number of values generated by the sequence.
##  -1 0 1
Another way of producing a vector is to use the combine function (
c()). This is great for combining a number of disparate character strings into a vector.
##  "red" "yellow" "blue"
A single value is a still a vector. What we see when we use the
c() function is that we’re combining vectors. As a result we can also use it on longer vectors too.
c(1:3, 2:1, 5:8)
##  1 2 3 2 1 5 6 7 8
When we combine values into a single vector, R will change everything to the same datatype using some conversions.
Getting information about vectors
class() function will still work with a vector with a length greater than 1 to get you it’s datatype.
Let’s look at a sequence of numbers and one of the built-in vectors that contains the alphabet.
##  "integer"
##  "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" ##  "R" "S" "T" "U" "V" "W" "X" "Y" "Z"
##  "character"
We can use the
length() function to find out the number of elements in a vector.
##  1
##  26
To extract the names of values in a vector, we can use the
steph<-c(Steph="forename", Locke="surname") names(steph)
##  "Steph" "Locke"
Calculations on multiple vectors
When we perform calculations on two vectors, R will try to perform the operation for each set of elements. This is an element-wise or pair-wise calculation methodology.
In SQL, it’s equivalent to where you might write
colA*colB and you’ll get the answer calculated for every row in the table. In Excel, it’s equivalent to a Fill Down of multiplying two values on the same row.
Let’s looks at how this works in practice in R.
We have two vectors, each containing two elements.
vecA <- 1:2 vecB <- 2:3
##  1 2 ##  2 3
If we want to multiply the two vectors by each other, R will match each element in the first vector with it’s counterpart in the second and multiply the two values together to make a new element.
You can also use this functionality of making a vector the same length as another, known as recycling, work for other mis-matched vector sizes. The only rule is that one of the vector lengths must divide cleanly by the other.
- Two vectors of the same length divide by the other’s length exactly one time and won’t need to recycle
- A vector of length one always cleanly divides any other vector’s length and so will be recycled
- A vector of length 2, will divide any vector with an even length and so will be recycled in those cases, but it cannot recycle cleanly for odd length vectors
1:10 * 2 1:10 * 2:3 1:10 * 2:4
##  2 4 6 8 10 12 14 16 18 20 ##  2 6 6 12 10 18 14 24 18 30 ##  "longer object length is not a multiple of shorter object length"
Vector recycling is useful and dangerous – it can help you make elegant code or give you unexpected results. Especially when starting out, I recommend you make your vectors either the same length or length 1.
Proceed with caution
Our logical operators that we covered earlier, work in a pairwise fashion. They’ll return a vector of the same length as the longest one used in your logical statement.
##  FALSE TRUE ##  TRUE TRUE
Making logical statements returns vectors with a logical datatype.
##  FALSE TRUE
##  TRUE TRUE
Occasionally, you expect to only be operating on a single pair of values and want to enforce that R should only do the calculation on the first pair. In R, this called a bitwise AND (
&&) or OR (
A bitwise logical statement will only do the check for the first elements in the vectors and ignore all the others.
##  FALSE
##  TRUE
Use bitwise operators with extreme care!
And there we have it - It’s fair to say that you’ve pretty much covered the basics of data handling in R now, so next time we’ll have a look at some packages and functions - these are the bits where somebody else has done the hard work for you!
As always, here’s the video code to take away and play.
my_first_var <- 5+3 my_first_var * 2
my.other.variable = 7*4 9-5 -> YetAnotherVariable
dont.delete.me <- "please don't!"