Effective Visualizations

Now that you know how to create graphics and visualizations in R, you are armed with powerful tools for scientific computing and analysis. With this power also comes great responsibility. Effective visualizations is an incredibly important aspect of scientific research and communication. There have been several books (see references) written about these principles. In class today we will be going through several case-studies trying to develop some expertise into making effective visualizations.

Worksheet

The worksheet questions for today are embedded into the class notes.

You can download this Rmd file here

Note, there will be very little coding in-class today, but I’ve given you plenty of exercises in the form of a supplemental worksheet (linked at the bottom of this page) to practice with after class is over.

Resources

  1. Fundamentals of Data Visualization by Claus Wilke.

  2. Visualization Analysis and Design by Tamara Munzner.

  3. STAT545.com - Effective Graphics by Jenny Bryan.

  4. ggplot2 book by Hadley Wickam.

  5. Callingbull.org by Carl T. Bergstrom and Jevin West.

Part 1: Warm-up and pre-test [20 mins]

Warmup:

Write some notes here about what “effective visualizations” means to you. Think of elements of good graphics and plots that you have seen - what makes them good or bad? Write 3-5 points.

  1. Informative labels (title, axes, data point labels)
  2. Informative captions
  3. Appropriate type of plot used
  4. Not an overcrowded/overwhelming plot (or opposite: too sparse data)
  5. Organized legend if applicable

CQ01: Weekly hours for full-time employees

Case Study #1 Before

Case Study #1 Before

Question: Evaluate the strength of the claim based on the data: “German workers are more motivated and work more hours than workers in other EU nations.”

Very strong, strong, weak, very weak, do not know

  • Very weak – working more hours does not necessarily mean more motivated to work. Additionally the way the plot is structured does not group EU nations together (only shows the EU-28 average). In 2014 (source of plot), the UK was most likely still part of the EU. Germany is also in fact not worked more hours by a lot. There is also no standard deviation / standard error / error bars. The number of people represented the data is also not shown.

  • Main takeaway: The data is presented in a way specifically that supports their claim instead of objectively presenting the data.
    • consider where you start your axes
    • if data point labels are available, grid lines are not necessary
  • Effective Visualization:

    Case Study #1 After

    Case Study #1 After

  • Credit: https://callingbull.org/tools/tools_misleading_axes.html

CQ02: Average Global Temperature by year

Case Study #2 Before

Case Study #2 Before

Question: For the years this temperature data is displayed, is there an appreciable increase in temperature?

Yes, No, Do not know

  • No, there is no appreciable increase in temperature. The line barely fluctuates at around 57 degrees Fahrenheit. There is a small increase at the end of the graph, but without proper labels, it cannot be deemed appreciable

  • Main takeaway: Y-scale starts at 0 to 110, and its too ‘zoomed out’ (use an approrpiate Y-scale)
    • 2 degrees is appreciable in the plot below
  • Effective Visualization: Case Study #2 After

  • Credit: https://callingbull.org/tools/tools_misleading_axes.html

CQ03: Gun deaths in Florida

Case Study #3

Case Study #3

Question: Evaluate the strength of the claim based on the data: “Soon after this legislation was passed, gun deaths sharply declined.”

Very strong, strong, weak, very weak, do not know

  • Weak: claim contains too many qualitative words used in the claim (i.e. “soon”, “sharply declined”). The red background gives the plot a negative connotation. The y-axis 0 actually starts at the top of the graph instead of the bottom. The graph actually shows an increase in gun deaths.

  • Main takeaway: Plots can lead to misleading visualization unintentionally.

  • Credit: https://callingbull.org/tools/tools_misleading_axes.html

Part 2: Extracting insight from visualizations [20 mins]

Great resource for selecting the right plot: https://www.data-to-viz.com/ ; encourage you all to consult it when choosing to visualize data.

Case Study 1: Context matters

Case Study #1

Case Study #1

  • Averaged AD/ASD prevalence and MMR coverage in UK and Scandinavian countries
  • Diesher et al. 2015 Issues in Law and Medicine
  • Correlation does not equal causation
  • Different axes for Autism Prevalence and MMR Coverage even though both are percentages (%)

Case Study 2: Gender Gap in the 100m Dash

Case Study #2

Case Study #2

Case Study 3: Case for pie charts

  • Do not make pie charts (universally hated)

Part 3: Principles of effective visualizations [20 mins]

We will be filling these principles in together as a class

  1. Apply Principle of proportional ink
    • Definition: “The amount of ink used to indicate a value should be proportional to the value itself.”
    • Example: Truncating the y-axis on a bar chart to exaggerate the difference between bars violates the principle of proportional ink
  2. Maintain a high data-to-ink ratio: less is more
    • Definition: remove distracting visual elements to focus attention on the data
    • Examples: Lighten line weights, remove backgrounds, never use 3D or special effects, remove unnecessary/redundant labels, etc…
  3. Always update axes labels and titles on your plots
    • In STAT545/547 we take principles of effective visualizations very seriously and you will lose marks if this isn’t followed
  4. Choose your scale-type carefully
    • Whether you choose a linear, logarithm, sqrt scale depends on your data, context, and purpose
  5. Choose your graph-type carefully
    • Examples: here is a great directory of plots
  6. Choose colours with accessibility and readability in mind
    • Examples: here is a great set of colour schemes that are colour-blind friendly and perceptually uniform

Make a great plot worse

Instructions: Here is a code chunk that shows an effective visualization. First, copy this code chunk into a new cell. Then, modify it to purposely make this chart “bad” by breaking the principles of effective visualization above. Your final chart still needs to run/compile and it should still produce a plot.

Effective Visualization:

library("plotly")
library("tidyverse")
ggplot(airquality, aes(`Month`, `Temp`, group = `Month`)) +
    geom_boxplot(outlier.shape = NA) +
    geom_jitter(alpha = 0.3) +
    labs(x = "",
         y = "",
         title="Maximum temperature by month")+
    theme_bw() + 
    scale_x_continuous(breaks=c(5,6,7,8,9),labels=c("May","June","July","August","September")) +
    annotate("text", x = 4.08, y = 95,label="°F",size=8) +
    coord_cartesian(xlim = c(4.5, 9.5),
                      clip = 'off')+
    theme(panel.grid.minor = element_blank(),
          panel.background = element_blank(), 
          axis.line = element_line(colour = "gray"),
          panel.border = element_blank(),
          text = element_text(size=18)
          )

Ineffective Visualization:

library("plotly")
library("tidyverse")
ggplot(airquality, aes(`Month`, `Temp`, group = `Month`)) +
    geom_line(colour = "yellow") +
    geom_point(colour = "white",fill = "white", alpha = "0.1") + 
    labs(x = "",
         y = "",
         title="") +
    scale_y_log10() +
    ylim(70,200)

How many of the principles did you manage to break?

  • Principal of Proportional Ink
    • Chnaged y-axis minimum to 70 and maximum to 200
    • Changed fill and colour into white
    • Changed alpha to 0.1
  • Always update axes labels and titles on your plots
    • Removed x-axis title and all specified labels
    • Removed y-axis title
    • Removed main title
  • Maintain a high data-to-ink ratio
    • Added back grey background and grid back
  • Choose your scale-type carefully
    • Changed y-axis scale to log10
  • Choose your graph-type carefully
    • Changed geom_boxplot() to geom_line()
  • Choose colours with accessibility and readability in mind
    • Removed geom_jitter()
    • Changed fill and colour into white for geom_point()
    • Changed colour to yellow for geom_line()
    • Changed alpha to 0.1 for both geom_point() and geom_line()

Plotly demo [10 mins]

Did you know that you can make interactive graphs and plots in R using the plotly library? We will show you a demo of what plotly is and why it’s useful, and then you can try converting a static ggplot graph into an interactive plotly graph. (See Chart Studio)

This is a preview of what we’ll be doing in STAT 547 - making dynamic and interactive dashboards using R!

install.packages("plotly")
library(tidyverse)
library(gapminder)
library(plotly)
p <- ggplot(gapminder, aes(x = gdpPercap, y=lifeExp, colour = continent)) +
    geom_point()

# make interactive
p %>%
    ggplotly()
# plot_ly syntax
p <- gapminder %>%
    plot_ly(x = ~gdpPercap,
            y = ~lifeExp,
            color = ~continent,
            type = "scatter",
            mode = "markers")

# Sys.setenv("plotly_username" = "diana.lin")
# Sys.setenv("plotly_api_key" = "API_KEY_REDACTED")

# api_create(p, filename = "cm013-plotly-example")

# URL: https://plot.ly/~diana.lin/1/

Supplemental worksheet (Optional)

You are highly encouraged to the cm013 supplemental exercises worksheet. It is a great guide that will take you through Scales, Colours, and Themes in ggplot. There is also a short guided activity showing you how to make a ggplot interactive using plotly.