Getting started with data science – recommended resources
An oft asked question is what resources can I recommend for getting started with data science? Here are my recommendations, and if you have others, please put them in the comments!
NB Links in this post may be affiliate links – it doesn’t change the prices you get but might earn me a little money
Data Science for Business
Data Science for Business is a great overview book.
It’s not very technical and it makes the concepts very accessible. I especially found the sections describing how loss functions work, particularly enlightening compared to other explanations I’ve seen.
Most importantly, it dwells on the process and the business implications of data science. This is a book I’m happy to recommend to managers, technology people, and recent graduates alike.
I read the book on my phone on kindle, usually when I was travelling. It remained something readable and intelligible at early hours and late nights – a compelling endorsement in an area where the books can be dense and make one’s brain bleed out of one’s ears.
R for Data Science
R for Data Science: Import, Tidy, Transform, Visualize, and Model Data is written by Hadley Wickham and Garett Grolemund. You can buy it and you can also access it online.
If you’re interested in learning to actually start doing data science as a practitioner, this book is a very accessible introduction to programming.
Starting gently, this book doesn’t teach you much about the use of R from a general programming perspective. It takes a very task oriented approach and teaches you R as you go along.
This book doesn’t cover the breadth and depth of data science in R, but it gives you a strong foundation in the coding skills you need and gives you a sense of the process you’ll go through.
I really like this book but it’s important to note you may have some gaps in your knowledge if this is your main introduction to R programming.
Introductions to R
There are many introductions to R that are useful. The difficulty lies in that there aren’t many focusing on modern R.
I struggle with the ongoing debate about whether I teach people a strong understanding in base R (or vanilla R) or do I teach them modern R (e.g. data.table and the tidyverse) which I perceive to be easier?
- Base R is quirky and quite difficult to learn. Modern R is much easier.
- Base R knowledge ensures you understand the core objects and programming constructs. Modern R focusses on tabular data, which is what most people need.
- Base R is stable. Modern R is bleeding edge.
There are very good base R introductory books but we’re still relatively lacking in ones that incorporate a lot of modern R.
- An Introduction to R
- R Cookbook: Proven Recipes for Data Analysis, Statistics, and Graphics
- Teach Yourself R in 24 Hours (modern R-ish)
User groups are a great way to meet people in the field and see some talks in the area. For the most part, I recommend you go to meetup.com and look for local data science, python, or R meetups near you.
I love user groups and I think they’re a great way to build a local network of people you can chat to about your start in data science.
There’s a reasonable amount of people out there offering introductions to R and data science. I myself offer training, from bespoke to community workshops to training for BI people. You can often search Eventbrite or ask on twitter to find an event happening near you. Your local user group is another great place to ask.
You can also check out a lot of conferences. KDNuggets keep a good list of conferences that you can attend.
As data science is becoming so popular, you can often see a lot of data science appearing in non-data science conferences, especially data platform and analysis conferences so you might be able to look to conferences you already know about for getting started with data science.
I love reading blogs as a way of learning in an area – not only do you get technical knowledge but people are kind enough to share theory and current trends too. I read a variety that might not suit everyone but if you’re just starting out I recommend Becoming a Data Scientist and R-bloggers.
Getting hands-on is an important aspect of learning for me.
In more gentle ways of learning, you can start using Microsoft free notebook and machine learning platform to start coding things.
By far my most recommended site for learning R and data science is DataCamp. DataCamp blends videos and online exercises making it a great way to learn practically but still get the theory.
There are a number of free introductory courses, but DataCamp works on a monthly subscription access model of $29. For the monthly fee, you can consume any and all of their courses.
DataCamp is great value for money, especially if you want to do an intensive month of learning and then end your subscription.
Coursera is the original online course provider.
They’ve got some fantastic courses for getting started with data science.
With online courses like this, you can find yourself dropping off it and not finishing a course. I recommend you start with just a course or two before you go for something like the $20k Masters in Data Science that you could work towards on there!