Data types
In this installment of the Learn R series, we’re going to start to have a look at data types, dataframes and what on earth you do with them!
Have a watch of this short video and then consolidate everything we cover by reading through the post and playing along. You can take the code blocks included in the post and try them out in your own script - you should be all set up with R and RStudio, so there’s no time like the present!
R data types
When we think of different bits of data, some of it might be numbers, text, dates, and more. R has it’s own set of these data types. Before we get into the data types, let’s see how we can get R to tell us what something is.
R uses functions (basically, an inbuilt bit of code that you can call on to do things with, optionally passing it some data to work with) to take some inputs and get an output. The function that we can pass a value to, and get what data type it is as the output, is the class()
function.
Have a look at these three examples - what classes do you get? Can you see why they are different?
class(1)
class(1.1)
class("1")
You can use this class()
function if you’re ever unsure what data type something is. This is great for when you’re getting unexpected results!
Numbers
Numbers are split into a few different types:
- integers are whole numbers like 1 or 42[1]
- numerics are numbers that have a decimal portion associated with them like 1.0 or 3.133[2]
- complex numbers are numbers that make use of the imaginary number i like 4i[3]
Converting to numbers
The functions as.numeric()
and as.integer()
allow you to convert something stored as text into a number.
These functions will give you some red text as a warning if you attempt to convert something to a number that can’t be safely converted. It will still attempt to perform the conversion, but return missings (NA
) instead of actual values.
- Which of the three datatypes above would you apply this to? Add the code as a new line to your script and see if you are right.
Checking numbers
Now here’s something we didn’t cover in the video and is especially helpful if something just WILL NOT work and you’ve spent all morning panic eating biscuits.
You can write checks to see if something is numeric, or an integer, with is.numeric()
or is.integer()
.
The general “‘is.XXXXX()’” function will take many of the data types we cover here and more, and can be a real time/life saver.
We could also use class()
here and inspect the result.^[You might recall that class(1)
had the result of “numeric” - R was not by default considering 1 as an integer for the purpose of the class()
function. ### Special numbers As well as i
to denote imaginary numbers, there are some additional symbols you might encounter or want to use.
pi
= 3.1415927Inf
represents positive infinity. You’ll often see this if you divide a positive number by zero-Inf
represents negative infinity. You’ll often see this if you divide a negative number by zeroNaN
is what happens when you really screw up a calculation and do something like0/0
. It means the result is not a number!
Text
Text, also known as strings, is split up into two core types:
- characters are text as we typically think of it like “red”
- factors (and the subtype ordered factors) are a text type where the number of unique values is constrained e.g. the values are selected from a dropdown. Factors work by storing numbers that correspond to our values and then printing these values. This is much more space efficient when the number of unique values is low.[4]
We are going to focus on characters. This translates much better in written form than in video - so this is another new thing.
In R, you can’t just type some text as it will be construed as an object or function name. To delimit a string you can use speech marks ("
) or apostrophes ('
) at the beginning and end of it to show where it starts and ends. These are the text delimiters in R.
Note you can’t use the two delimiters interchangeably e.g. “red’, but you can use them together to enable you have speech marks or apostrophes inside a string e.g. 'They said "Read this"'
or "It's mine now"
.
If you need to have both inside a string you can escape the ones on the inside of a string to say they don’t count as text delimiters. To escape a delimiter you can use a backslash(\
) e.g. "They said \"Read this\""
.
Beware the copy and paste here - sometimes this can really mess your code up, and if it’s not working for no obvious reason, try tidying up all your speech marks and apostrophes.
Converting to strings
Converting to characters and factors is the same as working with numbers. You swap “numeric” for “character” or “factor” and you’re done!
Add another line to your script and try it out.
Similarly to checking numbers, the same rule applies to checking strings - I’m not going to give it away - have a go!
Logical values
Whilst we’ve been testing our datatypes, we’ve created a lot of logical or boolean values. Boolean values are TRUE
and FALSE
. R is case-sensitive so these have to be typed upper-case, otherwise it means something different.
I bet you can totally guess how to convert and check logicals. Add it to your script!
We quickly touched on dates in the video, but refer to this below for a deeper explanation of date handling in R.
Dates
Dates are one of the hardest parts of programming! This is a very brief introduction to dates so if you want more (and there’s lots) - get searching!
Dates in R split into:
- dates do not have any time component
- POSIX date-times
- POSIXct is an integer based storage method
- POSIXlt is a component based storage method
You might be looking at the two POSIX times and thinking to yourself “ZOMG how am I meant to choose?”. Most people use the POSIXct format[5], which is the default for many of R’s functions.
Converting to dates
You can convert to date-time’s with as.Date()
,as.POSIXct()
, and as.POSIXlt()
. Ideally, you’ll provide a string with the date(time) in ISO8601 formats e.g. “YYYY-MM-DD hh:mm”.
- Use your birthday and turn it into a date in your script. You guessed it. Use that
as.
function again. (Hint - date needs a capital D!) If you don’t know the specific minute of your arrival onto earth, you can just leave hh:mm blank and R will still understand!
Note that it’s assuming a time zone based on my device as I’ve not provided a default. It’s prudent to set the time zone in order to avoid the results of your code changing based on where the code is run or when[6].
- Add a timezone to your birthday by including
tz = GMT
inside the brackets.
Checking dates
Unfortunately, R does not provide functions for checking whether the class of something is a date-time type without extending it’s functionality. We have to use class()
as a consequence.
class(as.Date("2017-12-31"))
Missings
Every data type has an NA
, an identifier for a missing value.
If you use an NA in an object it will take on the data type used in the object. You can, however, make NAs directly.
NA
NA_integer_
NA_character_
Checking NAs
You can check what data type an NA is, using the class()
function. Add this to your script now.
Want to check if something is NA? You know how to check things by now! Add another line to your script, just to be sure.
Summary
There a few more datatypes out in the wild but numbers, strings, booleans, and dates are the core types you’ll encounter.
There are normally as.*
and is.*
functions for converting to a datatype or checking if something is a given datatype. You can use class()
to uncover the datatype too.
Just in case you want to play around with the code used in the video, I’ve included the code below.
Happy testing!
Ellen :)
PlantGrowth <- PlantGrowth
View(PlantGrowth)
summary(PlantGrowth)
PlantGrowth$weight <- as.character(PlantGrowth$weight)
class(PlantGrowth$weight)
PlantGrowth1 <- PlantGrowth
summary(PlantGrowth1)
PlantGrowth1$weight <- as.numeric(PlantGrowth1$weight)
class(PlantGrowth1$weight)
my.birthday <- '1991-12-12' # Note the speech marks
class(my.birthday)
birthday.date <- as.Date(my.birthday)
class(birthday.date)
str(PlantGrowth)
summary(PlantGrowth)