Effective Visualizations

Now that you know how to create graphics and visualizations in R, you are armed with powerful tools for scientific computing and analysis. With this power also comes great responsibility. Effective visualizations is an incredibly important aspect of scientific research and communication. There have been several books (see references) written about these principles. In class today we will be going through several case-studies trying to develop some expertise into making effective visualizations.

Worksheet

The worksheet questions for today are embedded into the class notes.

You can download this Rmd file here

Note, there will be very little coding in-class today, but I’ve given you plenty of exercises in the form of a supplemental worksheet (linked at the bottom of this page) to practice with after class is over.

Resources

Fundamentals of Data Visualization by Claus Wilke.
Visualization Analysis and Design by Tamara Munzner.
STAT545.com - Effective Graphics by Jenny Bryan.
ggplot2 book by Hadley Wickam.
Callingbull.org by Carl T. Bergstrom and Jevin West.

Part 1: Warm-up and pre-test [20 mins]

Warmup:

Write some notes here about what “effective visualizations” means to you. Think of elements of good graphics and plots that you have seen - what makes them good or bad? Write 3-5 points.

Informative labels (title, axes, data point labels)
Informative captions
Appropriate type of plot used
Not an overcrowded/overwhelming plot (or opposite: too sparse data)
Organized legend if applicable

CQ01: Weekly hours for full-time employees

Case Study #1 Before

Question: Evaluate the strength of the claim based on the data: “German workers are more motivated and work more hours than workers in other EU nations.”

Very strong, strong, weak, very weak, do not know

Very weak – working more hours does not necessarily mean more motivated to work. Additionally the way the plot is structured does not group EU nations together (only shows the EU-28 average). In 2014 (source of plot), the UK was most likely still part of the EU. Germany is also in fact not worked more hours by a lot. There is also no standard deviation / standard error / error bars. The number of people represented the data is also not shown.
Main takeaway: The data is presented in a way specifically that supports their claim instead of objectively presenting the data.
- consider where you start your axes
- if data point labels are available, grid lines are not necessary
Effective Visualization:

Case Study #1 After
Credit: https://callingbull.org/tools/tools_misleading_axes.html

CQ02: Average Global Temperature by year

Case Study #2 Before

Question: For the years this temperature data is displayed, is there an appreciable increase in temperature?

Yes, No, Do not know

No, there is no appreciable increase in temperature. The line barely fluctuates at around 57 degrees Fahrenheit. There is a small increase at the end of the graph, but without proper labels, it cannot be deemed appreciable
Main takeaway: Y-scale starts at 0 to 110, and its too ‘zoomed out’ (use an approrpiate Y-scale)
- 2 degrees is appreciable in the plot below
Effective Visualization: $Case Study #2 After$
Credit: https://callingbull.org/tools/tools_misleading_axes.html

CQ03: Gun deaths in Florida

Case Study #3

Question: Evaluate the strength of the claim based on the data: “Soon after this legislation was passed, gun deaths sharply declined.”

Very strong, strong, weak, very weak, do not know

Weak: claim contains too many qualitative words used in the claim (i.e. “soon”, “sharply declined”). The red background gives the plot a negative connotation. The y-axis 0 actually starts at the top of the graph instead of the bottom. The graph actually shows an increase in gun deaths.
Main takeaway: Plots can lead to misleading visualization unintentionally.
Credit: https://callingbull.org/tools/tools_misleading_axes.html

Part 2: Extracting insight from visualizations [20 mins]

Great resource for selecting the right plot: https://www.data-to-viz.com/ ; encourage you all to consult it when choosing to visualize data.

Case Study 1: Context matters

Case Study #1

Averaged AD/ASD prevalence and MMR coverage in UK and Scandinavian countries
Diesher et al. 2015 Issues in Law and Medicine
Correlation does not equal causation
Different axes for Autism Prevalence and MMR Coverage even though both are percentages (%)

Case Study 2: Gender Gap in the 100m Dash

Case Study #2

Linear regression to extrapolate
Credit to: https://callingbull.org/case_studies/case_study_gender_gap_running.html

Case Study 3: Case for pie charts

Do not make pie charts (universally hated)

Part 3: Principles of effective visualizations [20 mins]

We will be filling these principles in together as a class

Apply Principle of proportional ink
- Definition: “The amount of ink used to indicate a value should be proportional to the value itself.”
- Example: Truncating the y-axis on a bar chart to exaggerate the difference between bars violates the principle of proportional ink
Maintain a high data-to-ink ratio: less is more
- Definition: remove distracting visual elements to focus attention on the data
- Examples: Lighten line weights, remove backgrounds, never use 3D or special effects, remove unnecessary/redundant labels, etc…
Always update axes labels and titles on your plots
- In STAT545/547 we take principles of effective visualizations very seriously and you will lose marks if this isn’t followed
Choose your scale-type carefully
- Whether you choose a linear, logarithm, sqrt scale depends on your data, context, and purpose
Choose your graph-type carefully
- Examples: here is a great directory of plots
Choose colours with accessibility and readability in mind
- Examples: here is a great set of colour schemes that are colour-blind friendly and perceptually uniform

Make a great plot worse

Instructions: Here is a code chunk that shows an effective visualization. First, copy this code chunk into a new cell. Then, modify it to purposely make this chart “bad” by breaking the principles of effective visualization above. Your final chart still needs to run/compile and it should still produce a plot.

Effective Visualization:

library("plotly")
library("tidyverse")
ggplot(airquality, aes(`Month`, `Temp`, group = `Month`)) +
    geom_boxplot(outlier.shape = NA) +
    geom_jitter(alpha = 0.3) +
    labs(x = "",
         y = "",
         title="Maximum temperature by month")+
    theme_bw() + 
    scale_x_continuous(breaks=c(5,6,7,8,9),labels=c("May","June","July","August","September")) +
    annotate("text", x = 4.08, y = 95,label="°F",size=8) +
    coord_cartesian(xlim = c(4.5, 9.5),
                      clip = 'off')+
    theme(panel.grid.minor = element_blank(),
          panel.background = element_blank(), 
          axis.line = element_line(colour = "gray"),
          panel.border = element_blank(),
          text = element_text(size=18)
          )

Ineffective Visualization:

library("plotly")
library("tidyverse")
ggplot(airquality, aes(`Month`, `Temp`, group = `Month`)) +
    geom_line(colour = "yellow") +
    geom_point(colour = "white",fill = "white", alpha = "0.1") + 
    labs(x = "",
         y = "",
         title="") +
    scale_y_log10() +
    ylim(70,200)

How many of the principles did you manage to break?

Principal of Proportional Ink
- Chnaged y-axis minimum to 70 and maximum to 200
- Changed fill and colour into white
- Changed alpha to 0.1
Always update axes labels and titles on your plots
- Removed x-axis title and all specified labels
- Removed y-axis title
- Removed main title
Maintain a high data-to-ink ratio
- Added back grey background and grid back
Choose your scale-type carefully
- Changed y-axis scale to log10
Choose your graph-type carefully
- Changed geom_boxplot() to geom_line()
Choose colours with accessibility and readability in mind
- Removed geom_jitter()
- Changed fill and colour into white for geom_point()
- Changed colour to yellow for geom_line()
- Changed alpha to 0.1 for both geom_point() and geom_line()

Plotly demo [10 mins]

Did you know that you can make interactive graphs and plots in R using the plotly library? We will show you a demo of what plotly is and why it’s useful, and then you can try converting a static ggplot graph into an interactive plotly graph. (See Chart Studio)

This is a preview of what we’ll be doing in STAT 547 - making dynamic and interactive dashboards using R!

install.packages("plotly")

library(tidyverse)
library(gapminder)
library(plotly)

p <- ggplot(gapminder, aes(x = gdpPercap, y=lifeExp, colour = continent)) +
    geom_point()

# make interactive
p %>%
    ggplotly()

# plot_ly syntax
p <- gapminder %>%
    plot_ly(x = ~gdpPercap,
            y = ~lifeExp,
            color = ~continent,
            type = "scatter",
            mode = "markers")

# Sys.setenv("plotly_username" = "diana.lin")
# Sys.setenv("plotly_api_key" = "API_KEY_REDACTED")

# api_create(p, filename = "cm013-plotly-example")

# URL: https://plot.ly/~diana.lin/1/

Supplemental worksheet (Optional)

You are highly encouraged to the cm013 supplemental exercises worksheet. It is a great guide that will take you through Scales, Colours, and Themes in ggplot. There is also a short guided activity showing you how to make a ggplot interactive using plotly.

Supplemental Rmd file here

cm013 Exercises: Effective Visualizations