If you know me, you know I like graphs. (I’m pretty one-dimensional, I know.)
I have graphed in dozens of ways, and my preferred method for going on 2 years now has been R’s standout graphics package ggplot2
. I could go on for hours about why ggplot2 is a great way to make data visuals, but I’ll spare readers familiar with ggplot the repitition and readers unfamiliar with ggplot the disgrace of describing graphics in words instead of pictures.
The only background you need to know is that ggplot2
graphics are incredibly customizable. You can display the same data in many ways, you can color your plots however you like, et cetera. Let’s dig into what goes into a custom package.
R users who load the ggplot2
package will also load some datasets we can use for graphing. We’ll focus on the diamonds
dataset, which contains information on 53,940 individual diamonds.
Here’s what a ggplot graphic can look like without much customization.
ggplot(data = diamonds) +
geom_point(aes(x = carat, y = price))
We wrote two slim lines of code, but we generated a full graphic in return. In particular,
This is my first argument for why ggplot is so cool. We get a lot for very little.
What if we wanted to spruce this up a bit?
The labs
function gives us a very easy interface for labeling our graphics.
ggplot(data = diamonds) +
geom_point(aes(x = carat, y = price)) +
labs(title = 'Title',
subtitle = 'This is my subtitle. I want to fit some lines here.',
caption = 'caption')
Here, we’ve added three more bits of text outside of our plot:
Being able to spruce up graphics relatively painlessly is a huge win for ggplot2
.
In my mind, ggplot2
really shines when we want to do some more complicated visualizations. Say, for example, we wanted to start with our plots from before, but color the diamonds in the diamond
dataset by the cut
of the diamond. ggplot2
makes this much easier than say Microsoft Excel:
ggplot(data = diamonds) +
geom_point(aes(x = carat, y = price, color = cut)) +
labs(title = 'Title',
subtitle = 'This is my subtitle. I want to fit some lines here.',
caption = 'caption')
What happened here? Well,
Let’s use a few more tricks to dress this plot up just a bit more.
ggplot(data = diamonds) +
geom_point(aes(x = carat, y = price, color = cut), alpha = .4) +
labs(title = 'The Price of Diamonds by Carat and Cut',
subtitle = 'In general, worse-cut diamonds are cheaper',
caption = 'Data from `diamonds` dataset in ggplot2',
color = 'Cut', x = 'Carat', y = 'Price')
The only new argument here is alpha
, which lets us make our points more transparent so that we can see through the clusters better.
I wrote this post because I wanted to spell out in writing why I liked the philosophy of ggplot2
. While that is still true, I do not like its defaults. I think the default theme (grey background, white gridlines), the default color palette, and the default labels (small Arial font on Windows; I believe Helvetica on Mac) are stylistically wrong. Wonderfully, these are all easy to fix using ggplot2
’s expansive theme
function. I’ll describe that more in a later post.