So far we’ve just worked with some single values to get to grips with how some of the various operations work. Of course, we rarely work with a single value! If we did, we could just use a calculator.
This week you’ll get to grips with some different ways of storing data and how to manipulate your datasets in the “traditional” way. This will help you understand a lot of code written in the past, and will equip you to understand data manipulation of tabular data.
When we were performing operations, we got some values output to the console. One of the key principles in writing code is Don’t Repeat Yourself (DRY) so we need to know how we can avoid repeating ourselves in R. One of the ways you can do that is to store a value for use later.
In R, we can store values by assigning them a name. This makes a variable or object. We can do this with a few different operators, but the traditional operator is a
<-. The format for assigning a value is
nameofthing <- value.
Add this operation into your script for use later on.
my_variable <- 5 + 3 my_variable*2
##  16
Valid names for a variable include upper-case letters, lower case letters, numbers anywhere but the beginning, periods (
.), and hyphens (
There are a number of different competing conventions for how you name variables. The most common conventions are shown below. I have no strong feelings for any system and only ask that you pick one and stick with it within a single script. Whatever you do, don’t forget names are case sensitive!
myfirstvariable <- 1 myFirstVariable <- 1 MyFirstVariable <- 1 my_first_variable <- 1 my.first.variable <- 1
You can create names breaking the rules governing valid names by placing the rule breaking name between two back-ticks (`). I don’t recommend you do this with variables you’ll create, but you’ll often end up with names that break conventions when importing data, especially when you import from spreadsheets.
Variables you store get stored in-memory. This means they’ll hang around whilst R is open and will be gone after that. You can see variables you’ve created in the Environment tab.
RStudio will by default save your variables for you so that next time you open it up, your variables are stored.
*This is both a blessing and a curse. RStudio saving your variables is a blessing because you don’t have to worry about keeping RStudio open the all time.
BUT you’ll inevitably create something at some point through the console or an Untitled R file and then lose that bit of code. Now when you’re script runs in a fresh session it’ll fail. You’ll risk tearing your hair out and worse as you go through the pain of debugging this.
I recommend you get in the habit early of not working with your session being saved. Turn it off in Tools > Global Options, untick “Restore .RData into workspace at startup”*
If you need to manage what’s been stored you can remove them with
Try removing ‘my_variable’ you created earlier.
You can also achieve the same results by using the broom symbol in the Environment tab in RStudio.
A vector is a collection of values that hold the same datatype. It is one-dimensional in that none of the elements in the collection correspond to other values like they might in a table of values.
A single value is actually a vector of length 1.
When I introduced the colon (
:) as a means of generating a sequence, we were in fact generating a vector where each element was a number in the sequence. The vector has a length which is as long as the number of values generated by the sequence.
##  -1 0 1
Another way of producing a vector is to use the combine function (
c()). This is great for combining a number of disparate character strings into a vector.
##  "red" "yellow" "blue"
A single value is a still a vector. What we see when we use the
c() function is that we’re combining vectors. As a result we can also use it on longer vectors too.
c(1:3, 2:1, 5:8)
##  1 2 3 2 1 5 6 7 8
When we combine values into a single vector, R will change everything to the same datatype using some conversions.
class() function will still work with a vector with a length greater than 1 to get you it’s datatype.
Let’s look at a sequence of numbers and one of the built-in vectors that contains the alphabet.
##  "integer"
##  "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" ##  "R" "S" "T" "U" "V" "W" "X" "Y" "Z"
##  "character"
We can use the
length() function to find out the number of elements in a vector.
##  1
##  26
To extract the names of values in a vector, we can use the
steph<-c(Steph="forename", Locke="surname") names(steph)
##  "Steph" "Locke"
When we perform calculations on two vectors, R will try to perform the operation for each set of elements. This is an element-wise or pair-wise calculation methodology.
In SQL, it’s equivalent to where you might write
colA*colB and you’ll get the answer calculated for every row in the table. In Excel, it’s equivalent to a Fill Down of multiplying two values on the same row.
Let’s looks at how this works in practice in R.
We have two vectors, each containing two elements.
vecA <- 1:2 vecB <- 2:3
##  1 2 ##  2 3
If we want to multiply the two vectors by each other, R will match each element in the first vector with it’s counterpart in the second and multiply the two values together to make a new element.
You can also use this functionality of making a vector the same length as another, known as recycling, work for other mis-matched vector sizes. The only rule is that one of the vector lengths must divide cleanly by the other.
1:10 * 2 1:10 * 2:3 1:10 * 2:4
##  2 4 6 8 10 12 14 16 18 20 ##  2 6 6 12 10 18 14 24 18 30 ##  "longer object length is not a multiple of shorter object length"
Vector recycling is useful and dangerous – it can help you make elegant code or give you unexpected results. Especially when starting out, I recommend you make your vectors either the same length or length 1.
Our logical operators that we covered earlier, work in a pairwise fashion. They’ll return a vector of the same length as the longest one used in your logical statement.
##  FALSE TRUE ##  TRUE TRUE
Making logical statements returns vectors with a logical datatype.
##  FALSE TRUE
##  TRUE TRUE
Occasionally, you expect to only be operating on a single pair of values and want to enforce that R should only do the calculation on the first pair. In R, this called a bitwise AND (
&&) or OR (
A bitwise logical statement will only do the check for the first elements in the vectors and ignore all the others.
##  FALSE
##  TRUE
Use bitwise operators with extreme care!
Well done everyone - You’re half way through the course! Make yourself a brew and find a few R Blogs to read that interest you and are relevant to your intended eventual use of R. Maybe even reward yourself with a couple of biscuits.