Now that you know how to create graphics and visualizations in R, you are armed with powerful tools for scientific computing and analysis. With this power also comes great responsibility. Effective visualizations is an incredibly important aspect of scientific research and communication. There have been several books (see references) written about these principles. In class today we will be going through several case-studies trying to develop some expertise into making effective visualizations.
The worksheet questions for today are embedded into the class notes.
You can download this Rmd file here
Note, there will be very little coding in-class today, but I’ve given you plenty of exercises in the form of a supplemental worksheet (linked at the bottom of this page) to practice with after class is over.
Fundamentals of Data Visualization by Claus Wilke.
Visualization Analysis and Design by Tamara Munzner.
STAT545.com - Effective Graphics by Jenny Bryan.
ggplot2 book by Hadley Wickam.
Callingbull.org by Carl T. Bergstrom and Jevin West.
Write some notes here about what “effective visualizations” means to you. Think of elements of good graphics and plots that you have seen - what makes them good or bad? Write 3-5 points.
Case Study #1 Before
Question: Evaluate the strength of the claim based on the data: “German workers are more motivated and work more hours than workers in other EU nations.”
Very strong, strong, weak, very weak, do not know
Very weak – working more hours does not necessarily mean more motivated to work. Additionally the way the plot is structured does not group EU nations together (only shows the EU-28 average). In 2014 (source of plot), the UK was most likely still part of the EU. Germany is also in fact not worked more hours by a lot. There is also no standard deviation / standard error / error bars. The number of people represented the data is also not shown.
Effective Visualization:
Case Study #1 After
Credit: https://callingbull.org/tools/tools_misleading_axes.html
Case Study #2 Before
Question: For the years this temperature data is displayed, is there an appreciable increase in temperature?
Yes, No, Do not know
No, there is no appreciable increase in temperature. The line barely fluctuates at around 57 degrees Fahrenheit. There is a small increase at the end of the graph, but without proper labels, it cannot be deemed appreciable
Effective Visualization:
Credit: https://callingbull.org/tools/tools_misleading_axes.html
Case Study #3
Question: Evaluate the strength of the claim based on the data: “Soon after this legislation was passed, gun deaths sharply declined.”
Very strong, strong, weak, very weak, do not know
Weak: claim contains too many qualitative words used in the claim (i.e. “soon”, “sharply declined”). The red background gives the plot a negative connotation. The y-axis 0 actually starts at the top of the graph instead of the bottom. The graph actually shows an increase in gun deaths.
Main takeaway: Plots can lead to misleading visualization unintentionally.
Credit: https://callingbull.org/tools/tools_misleading_axes.html
Great resource for selecting the right plot: https://www.data-to-viz.com/ ; encourage you all to consult it when choosing to visualize data.
Case Study #1
Case Study #2
We will be filling these principles in together as a class
Instructions: Here is a code chunk that shows an effective visualization. First, copy this code chunk into a new cell. Then, modify it to purposely make this chart “bad” by breaking the principles of effective visualization above. Your final chart still needs to run/compile and it should still produce a plot.
Effective Visualization:
library("plotly")
library("tidyverse")
ggplot(airquality, aes(`Month`, `Temp`, group = `Month`)) +
geom_boxplot(outlier.shape = NA) +
geom_jitter(alpha = 0.3) +
labs(x = "",
y = "",
title="Maximum temperature by month")+
theme_bw() +
scale_x_continuous(breaks=c(5,6,7,8,9),labels=c("May","June","July","August","September")) +
annotate("text", x = 4.08, y = 95,label="°F",size=8) +
coord_cartesian(xlim = c(4.5, 9.5),
clip = 'off')+
theme(panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.line = element_line(colour = "gray"),
panel.border = element_blank(),
text = element_text(size=18)
)
Ineffective Visualization:
library("plotly")
library("tidyverse")
ggplot(airquality, aes(`Month`, `Temp`, group = `Month`)) +
geom_line(colour = "yellow") +
geom_point(colour = "white",fill = "white", alpha = "0.1") +
labs(x = "",
y = "",
title="") +
scale_y_log10() +
ylim(70,200)
How many of the principles did you manage to break?
fill and colour into whitealpha to 0.1log10geom_boxplot() to geom_line()geom_jitter()fill and colour into white for geom_point()colour to yellow for geom_line()alpha to 0.1 for both geom_point() and geom_line()Did you know that you can make interactive graphs and plots in R using the plotly library? We will show you a demo of what plotly is and why it’s useful, and then you can try converting a static ggplot graph into an interactive plotly graph. (See Chart Studio)
This is a preview of what we’ll be doing in STAT 547 - making dynamic and interactive dashboards using R!
install.packages("plotly")
library(tidyverse)
library(gapminder)
library(plotly)
p <- ggplot(gapminder, aes(x = gdpPercap, y=lifeExp, colour = continent)) +
geom_point()
# make interactive
p %>%
ggplotly()
# plot_ly syntax
p <- gapminder %>%
plot_ly(x = ~gdpPercap,
y = ~lifeExp,
color = ~continent,
type = "scatter",
mode = "markers")
# Sys.setenv("plotly_username" = "diana.lin")
# Sys.setenv("plotly_api_key" = "API_KEY_REDACTED")
# api_create(p, filename = "cm013-plotly-example")
# URL: https://plot.ly/~diana.lin/1/
You are highly encouraged to the cm013 supplemental exercises worksheet. It is a great guide that will take you through Scales, Colours, and Themes in ggplot. There is also a short guided activity showing you how to make a ggplot interactive using plotly.