hackseq: Trends in Bioinformatics

November 18, 2019

`hackseq`

hackseq is a Vancouver-based hackathon focused on genomics. They want to bring individuals with diverse backgrounds together to collaborate on scientific questions and problems in genomics.

Their philosophy is open-source, open-notebook, open science.

Source: https://www.hackseq.com

Project Goals

Bioinformatics: the development and use of computational methods in genetics and genomics

Graphic Visualization of trends in Bioinformatics

Usage of bioinformatic tools and technqiues (ex. sequence alignment, genome assembly, metagenomics etc)
Relationship of tool development within analytical pipelines
Geographic hotspots for development of bioinformatic tools and techniques

End product: A visualization tool of trends in Bioinformatics, which can help prospective graduate students choose an institution or area of research.

Source: http://tiny.cc/hs19-readme

Trends in Bioinformatics

Workflow

Divide the field of Bioinformatics into topics
- Sequencing, Assembly, …
Determine search terms for each subjects
- Sequencing: Sanger, next-generation sequencing, …
Analyze trends by webscraping/textmining
- using fulltext, pubchunks

Analogy

Field	Topic	Search Terms
Bioinformatics	Sequencing	Sanger, next-generation sequencing, …
Bioinformatics	Assembly	Short read assembly, long read assembly, …
Fruits	Apples	Ambrosia, Gala, McIntosh, Granny Smith, …
Fruits	Oranges	Navel, Mandarin, Tangerine, Clementine, …

Webscraping Results

Dataframe:

## # A tibble: 12 x 4
##    db        found searchTerm           topic     
##    <chr>     <dbl> <chr>                <chr>     
##  1 plos       9554 sanger sequencing    Sequencing
##  2 bmc       48995 sanger sequencing    Sequencing
##  3 crossref  37023 sanger sequencing    Sequencing
##  4 entrez    96509 sanger sequencing    Sequencing
##  5 arxiv     54228 sanger sequencing    Sequencing
##  6 scopus    89260 sanger sequencing    Sequencing
##  7 plos      34911 long read sequencing Sequencing
##  8 bmc      324406 long read sequencing Sequencing
##  9 crossref 434845 long read sequencing Sequencing
## 10 entrez   339648 long read sequencing Sequencing
## 11 arxiv    145774 long read sequencing Sequencing
## 12 scopus    39175 long read sequencing Sequencing

Webscraping Results

Stacked Bar Graph:

Source: https://dy-lin.github.io/hs19-trends/R/overview.html

Our Dataset

Let’s look at the search terms of one specific topic, one specific jounral:

Sequencing in the Public Library of Science (PLoS)

## # A tibble: 10 x 9
##    doi    journal  publisher  author1  affiliation  title  abstract  topic  Year
##    <chr>  <chr>    <chr>      <chr>    <chr>        <chr>  <chr>     <chr> <dbl>
##  1 10.13… PLoS ONE Public Li… Zaragoz… Genetics & … Mitoc… Mutation… sang…  2010
##  2 10.13… PLoS ONE Public Li… Ji, Hez… National HI… HIV D… Backgrou… sang…  2010
##  3 10.13… PLOS ONE Public Li… Landria… Department … Inher… Spinocer… sang…  2017
##  4 10.13… PLOS ONE Public Li… Guinois… INSERM U966… Deep … Hepatiti… sang…  2017
##  5 10.13… PLOS ONE Public Li… Iyer, S… Department … Compa… Massivel… sang…  2015
##  6 10.13… PLOS ONE Public Li… Rebolle… Department … Compa… The refe… sang…  2015
##  7 10.13… PLOS ONE Public Li… Vivien,… Swiss Centr… Next-… Aquatic … sang…  2016
##  8 10.13… PLOS ONE Public Li… Pandey,… AIT Austria… MutAi… Traditio… sang…  2016
##  9 10.13… PLoS ONE Public Li… Lopez-R… Laboratorio… Compa… Backgrou… sang…  2012
## 10 10.13… PLoS ONE Public Li… Shokral… Biodiversit… Pyros… DNA barc… sang…  2011

To follow along with some live-coding, download these worksheets: http://tiny.cc/rladies-ws1
http://tiny.cc/rladies-ws2

Racing Bar Graph

What is a Racing Bar Graph?

Source: https://emilykuehler.github.io/bar-chart-race/

What is a Racing Bar Graph?

Source: https://emilykuehler.github.io/bar-chart-race/

Regular Bar Graph

First, let’s load our dataset and make a regular bar graph:

base <- "https://raw.githubusercontent.com/dy-lin/hs19-trends/master/workshop/"
seq_spec <- "data/sequencing-specific-processed.csv"
url <- paste0(base, seq_spec)
seq_data <- read_csv(url)

A glimpse into the dataset (top 5):

topic	Year	total	cum_total
illumina	2013	171	404
capillary	2013	153	495
sanger	2013	152	406
illumina	2014	142	546
10x genomics	2015	141	525

Regular Bar Graph

reg <- seq_data %>%
  drop_na() %>%
  ggplot(aes(x = topic, y = cum_total, fill = topic)) +
  geom_col() + 
  facet_wrap(~ Year, ncol = 8) + 
  coord_flip() +
  scale_fill_viridis_d() +
  theme(axis.text.y = element_blank(),
        axis.ticks.y = element_blank(),
        legend.position = "bottom") +
  geom_text(aes(y = cum_total, 
                label = topic), 
            hjust = "left", 
            fontface = "bold", 
            nudge_y = 50)

Regular Bar Graph

Racing Bar Graph

In order to make a racing bar chart, where the bars overtake one another, we need to rank the topics for each year:

ordered_df <- NULL

for (yr in 2003:2019) {
  order <- seq_data %>% 
    filter(Year == yr) %>% 
    arrange(cum_total) %>% 
    mutate(ordering = row_number())
  
  ordered_df <- ordered_df %>% rbind(order)
}

Racing Bar Graph

Here’s what the dataset looks like now:

topic	Year	total	cum_total	ordering
10x genomics	2004	1	1	1
capillary	2004	2	2	2
next-generation	2005	1	1	1
sanger	2005	1	1	2
capillary	2005	2	4	3
oxford nanopore	2006	1	1	1
10x genomics	2006	1	2	2
next-generation	2006	4	5	3
sanger	2006	5	6	4

Racing Bar Graph

Here is an overview of the code:

# plot
p <- ordered_df %>% 
  ggplot(aes(ordering, group = topic)) +
  geom_col(aes(y = cum_total, width = 0.9, fill = topic)) +
  geom_text(aes(y = cum_total, label = topic), 
            hjust = "left", fontface = "bold", nudge_y = 50) +
  coord_cartesian(clip = "off", expand = FALSE) +
  scale_fill_viridis_d() +
  coord_flip() +
# animate
  transition_states(Year, transition_length = 8, state_length = 4, wrap = FALSE) +
  ease_aes("cubic-in-out") +
# aesthetics
  labs(subtitle = "Trends in sequencing methods", title = "Year {closest_state}", 
       y = "cumulative total papers") +
  theme(plot.background = element_blank(), legend.position = "none",
        axis.ticks.y = element_blank(), axis.text.y = element_blank(),
        text = element_text(size=14), plot.title = element_text(size = 35)) +
  ylim(0,1300) +
  xlab("")

Racing Bar Graph

Let’s start plotting:

# plot
p <- ordered_df %>% 
  ggplot(aes(ordering, group = topic)) +
  geom_col(aes(y = cum_total, width = 0.9, fill = topic)) +
  geom_text(aes(y = cum_total, label = topic), 
            hjust = "left", fontface = "bold", nudge_y = 50) +
  coord_cartesian(clip = "off", expand = FALSE) +
  scale_fill_viridis_d() +
  coord_flip() +
# animate
  transition_states(Year, transition_length = 8, state_length = 4, wrap = FALSE) +
  ease_aes("cubic-in-out") +
# aesthetics
  labs(subtitle = "Trends in sequencing methods", title = "Year {closest_state}", 
       y = "cumulative total papers") +
  theme(plot.background = element_blank(), legend.position = "none",
        axis.ticks.y = element_blank(), axis.text.y = element_blank(),
        text = element_text(size=14), plot.title = element_text(size = 35)) +
  ylim(0,1300) +
  xlab("")

Racing Bar Graph

coord_cartesian(clip = "off", expand = FALSE)

Usage:

coord_cartesian(xlim = NULL, ylim = NULL, expand = TRUE, default = FALSE, clip = "on")

Description:

The Cartesian coordinate system is the most familiar, and common, type of coordinate system. Setting limits on the coordinate system will zoom the plot (like you’re looking at it with a magnifying glass), and will not change the underlying data like setting limits on a scale will.

Arguments

expand: If TRUE, the default, adds a small expansion factor to the limits to ensure that data and axes don’t overlap. If FALSE, limits are taken exactly from the data or xlim/ylim.
clip: Should drawing be clipped to the extent of the plot panel? A setting of “on” (the default) means yes, and a setting of “off” means no. In most cases, the default of “on” should not be changed, as setting clip = “off” can cause unexpected results. It allows drawing of data points anywhere on the plot, including in the plot margins.

Racing Bar Graph

Next, to set up some animation parameters:

# plot
p <- ordered_df %>% 
  ggplot(aes(ordering, group = topic)) +
  geom_col(aes(y = cum_total, width = 0.9, fill = topic)) +
  geom_text(aes(y = cum_total, label = topic), 
            hjust = "left", fontface = "bold", nudge_y = 50) +
  coord_cartesian(clip = "off", expand = FALSE) +
  scale_fill_viridis_d() +
  coord_flip() +
# animate
  transition_states(Year, transition_length = 8, 
                    state_length = 4, wrap = FALSE) +
  ease_aes("cubic-in-out") +
# aesthetics
  labs(subtitle = "Trends in sequencing methods", y = "cumulative total papers",
       title = "Year {closest_state}") +
  theme(plot.background = element_blank(), legend.position = "none",
        axis.ticks.y = element_blank(), axis.text.y = element_blank(),
        text = element_text(size=14), plot.title = element_text(size = 35)) +
  ylim(0,1300) +
  xlab("")

Racing Bar Graph

transition_states(Year, transition_length = 8, state_length = 4, wrap = FALSE)

Usage:

transition_states(states, transition_length = 1, state_length = 1, wrap = TRUE)

Description:

This transition splits your data into multiple states based on the levels in a given column, much like ggplot2::facet_wrap() splits up the data in multiple panels. It then tweens between the defined states and pauses at each state.

Arguments:

states: The unquoted name of the column holding the state levels in the data.
transition_length: The relative length of the transition. Will be recycled to match the number of states in the data
state_length: The relative length of the pause at the states. Will be recycled to match the number of states in the data
wrap: Should the animation wrap-around? If TRUE the last state will be transitioned into the first.

Racing Bar Graph

ease_aes("cubic-in-out")

Usage:

ease_aes(default = "linear", ...)

Description:

Easing defines how a value change to another during tweening. Will it progress linearly, or maybe start slowly and then build up momentum. In gganimate, each aesthetic or computed variable can be tweened with individual easing functions using the ease_aes() function. All easing functions implemented in tweenr are available, see tweenr::display_ease for an overview. Setting an ease for x and/or y will also affect the other related positional aesthetics (e.g. xmin, yend, etc).

Functions

cubic: Models a power-of-3 function
-in-out: The first half of the transition it is applied as-is, while in the last half it is reversed

Racing Bar Graph

Lastly, set up some aesthetics:

# plot
p <- ordered_df %>% 
  ggplot(aes(ordering, group = topic)) +
  geom_col(aes(y = cum_total, width = 0.9, fill = topic)) +
  geom_text(aes(y = cum_total, label = topic), 
            hjust = "left", fontface = "bold", nudge_y = 50) +
  coord_cartesian(clip = "off", expand = FALSE) +
  scale_fill_viridis_d() +
  coord_flip() +
# animate
  transition_states(Year, transition_length = 8, state_length = 4, wrap = FALSE) +
  ease_aes("cubic-in-out") +
# aesthetics
  labs(subtitle = "Trends in sequencing methods", 
       title = "Year {closest_state}", y = "cumulative total papers") +
  theme(plot.background = element_blank(), legend.position = "none",
        axis.ticks.y = element_blank(), axis.text.y = element_blank(),
        text = element_text(size=14), plot.title = element_text(size = 35)) +
  ylim(0,1300) +
  xlab("")

Racing Bar Graph

Let’s animate!

The figure shown on the next slide was generated using these parameters:

# render the animation
animate(p, nframes = 750, fps = 20, end_pause = 10)

However, due to the lengthy time it takes to generate, we should reduce these numbers:

# rendering the animation
animate(p, nframes = 100, fps = 5, end_pause = 10)

Racing Bar Graph

animate(p, nframes = 100, fps = 5, end_pause = 10)

Usage:

animate(plot, ...)

Description:

This function takes a gganim object and renders it into an animation. The nature of the animation is dependent on the renderer, but defaults to using gifski to render it to a gif. The length and framerate is decided on render time and can be any two combination of nframes, fps, and duration. Rendering is happening in discrete time units.

Arguments

plot: A gganim object
nframes: The number of frames to render (default 100)
fps: The framerate of the animation in frames/sec (default 10)
duration: The length of the animation in seconds (unset by default)
start_pause,end_pause: Number of times to repeat the first and last frame in the animation (default is 0 for both)

Racing Bar Graph

Source:https://dy-lin.github.io/hs19-trends/R/general_vis.html

Sankey Diagrams

What is a Sankey Diagram?

Sankey Diagram: a type of flow diagram where the width of the arrows is proportional to the flow rate Source: https://cran.r-project.org/web/packages/googleVis/vignettes/googleVis_examples.html

Sankey Diagrams

Here are the packages we need to load for a Sankey diagram:

library(tidyverse)
library(ggrepel)
library(grid)
library(ggalluvial)
library(egg)

Sankey Diagram

Let’s load in our pre-processed dataset:

base <- "https://raw.githubusercontent.com/dy-lin/hs19-trends/master/workshop/"
SK <- "data/sankey-processed.csv"
url <- paste0(base, SK)
datSK <- read_csv(url)

From	To	Weight	count
EMBL	Databases	14	26
Harvard U	Phylogenetics	12	19
U Washington	Variant Calling	10	25
Baylor College of Medicine	Variant Calling	10	26
UC San Diego	Genome Annotation	8	17
U Michigan	Phylogenetics	8	16

Sankey Diagram

Here’s an overview of plotting the Sankey diagram using ggplot:

sankey <- ggplot(datSK, aes(y = Weight, axis1 = From, axis2 = To)) +
  geom_alluvium(aes(fill = From), width = 1 / 12) +
  geom_stratum(alpha = 0, width = 1 / 12, color = "black") +
  scale_x_discrete(limits = c("From", "To"), expand = c(0.3, 0.1)) +
  scale_fill_viridis_d() +
  theme_void() +
  theme(axis.title.y = element_blank(), axis.title.x = element_blank(),
    axis.ticks.x = element_blank(), axis.ticks.y = element_blank(),
    axis.text.x = element_blank(),axis.text.y = element_blank(),
    legend.position = "none", plot.title = element_text(hjust = 0.5)) +
  ggrepel::geom_label_repel(
    aes(label = From), stat = "stratum", size = 3, direction = "x", hjust = 10) +
  ggrepel::geom_label_repel(
    aes(label = To), stat = "stratum", size = 3, direction = "y", nudge_x = 0.5) +
  geom_label(aes(label = Weight), stat = "stratum", alpha = 0.8) +
  ggtitle("Top 10 Institutions Publications By Topic")

sankey <- set_panel_size(sankey, width  = unit(18, "cm"), height = unit(10, "cm"))
grid.newpage()
grid.draw(sankey)

Sankey Diagram

Let’s take a closer look at the geom functions used:

sankey <- ggplot(datSK, aes(y = Weight, axis1 = From, axis2 = To)) +
  geom_alluvium(aes(fill = From), width = 1 / 12) +
  geom_stratum(alpha = 0, width = 1 / 12, color = "black") +
  scale_x_discrete(limits = c("From", "To"), expand = c(0.3, 0.1)) +
  scale_fill_viridis_d() +
  theme_void() +
  theme(axis.title.y = element_blank(), axis.title.x = element_blank(),
    axis.ticks.x = element_blank(), axis.ticks.y = element_blank(),
    axis.text.x = element_blank(),axis.text.y = element_blank(),
    legend.position = "none", plot.title = element_text(hjust = 0.5)) +
  ggrepel::geom_label_repel(
    aes(label = From), stat = "stratum", size = 3, direction = "x", hjust = 10) +
  ggrepel::geom_label_repel(
    aes(label = To), stat = "stratum", size = 3, direction = "y", nudge_x = 0.5) +
  geom_label(aes(label = Weight), stat = "stratum", alpha = 0.8) +
  ggtitle("Top 10 Institutions Publications By Topic")

sankey <- set_panel_size(sankey, width  = unit(18, "cm"), height = unit(10, "cm"))
grid.newpage()
grid.draw(sankey)

Sankey Diagram

geom_alluvium(aes(fill = From), width = 1 / 12)

Usage:

geom_alluvium(mapping = NULL, data = NULL, stat = "alluvium", position = "identity", width = 1/3, knot.pos = 1/6, na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, ...)

Description:

geom_alluvium receives a dataset of the horizontal (x) and vertical (y, ymin, ymax) positions of the lodes of an alluvial diagram, the intersections of the alluvia with the strata. It plots both the lodes themselves, using geom_lode(), and the flows between them, using geom_flow().

Arguments:

mapping: Set of aesthetic mappings created by aes()
width: Numeric; the width of each stratum, as a proportion of the distance between axes. Defaults to 1/3.

Sankey Diagram

geom_stratum(alpha = 0, width = 1 / 12, color = "black")

Usage:

geom_stratum(mapping = NULL, data = NULL, stat = "stratum", position = "identity", show.legend = NA, inherit.aes = TRUE, width = 1/3, na.rm = FALSE, ...)

Description:

geom_stratum receives a dataset of the horizontal (x) and vertical (y, ymin, ymax) positions of the strata of an alluvial diagram. It plots rectangles for these strata of a provided width.

Arguments:

width: Numeric; the width of each stratum, as a proportion of the distance between axes. Defaults to 1/3.

Sankey Diagram

Source: https://dy-lin.github.io/hs19-trends/R/visualization_sankey.html

Mapping

We want to look at several trends

Most Common topic per country
Number of papers per country

If you still haven’t downloaded the live-coding worksheet for this section: http://tiny.cc/rladies-ws2

Mapping

Source: https://dy-lin.github.io/hs19-trends/R/visualization_map.html

Mapping

Here are all the packages we need to format the data and do the plotting

library(ggplot2)
library(dplyr)
library(plotly)
library(here)
library(ggmap)
library(viridis)
library(rgeos)
library(maptools)
library(maps)
library(sf)
library(readr)
library(tidyverse)
library(knitr)
library(broom)

Mapping

Packages needed for plotting

ggplot2: package for data visualizations

plotly: makes interactive visualizations

viridis: good colour palatte for dealing with colourblindness and greyscale issues

Mapping

There is a lot of code in the worksheet that describes how we got the data into the format we needed

We will do one global plot and one of Europe

Mapping

Here is an example of what the data looks like:

	long	lat	group	order	region	subregion	Count	topicMax
14572	-61.10518	45.94472	252	14823	Canada	Cape Breton Island	616	transcript quantification
14573	-61.07134	45.93711	252	14824	Canada	Cape Breton Island	616	transcript quantification
14574	-60.93657	45.98555	252	14825	Canada	Cape Breton Island	616	transcript quantification
14575	-60.86524	45.98350	252	14826	Canada	Cape Breton Island	616	transcript quantification
14576	-60.86840	45.94863	252	14827	Canada	Cape Breton Island	616	transcript quantification
14577	-60.98428	45.91069	252	14828	Canada	Cape Breton Island	616	transcript quantification
14578	-61.03755	45.88222	252	14829	Canada	Cape Breton Island	616	transcript quantification
14579	-60.97060	45.85581	252	14830	Canada	Cape Breton Island	616	transcript quantification
14580	-60.97153	45.83799	252	14831	Canada	Cape Breton Island	616	transcript quantification

Mapping

pl <- ggplot() +
  geom_polygon(data = world_data2, aes(x = long, y = lat,
  group = group,fill = log(Count),text=Count),colour="lightgrey") +
  coord_fixed(1.3)+
  scale_fill_viridis()+
  theme_void()

world_plotly=ggplotly(pl,tooltip = "text")

Mapping

geom_polygon(data = world_data2, aes(x = long, y = lat,group = group,fill = log(Count),text=Count))

Usage

geom_polygon:mapping = NULL, data = NULL, stat = "identity",position = "identity", rule = "evenodd", ..., na.rm = FALSE,show.legend = NA, inherit.aes = TRUE)

Description

Polygons are very similar to paths (as drawn by geom_path()) except that the start and end points are connected and the inside is coloured by fill. The group aesthetic determines which cases are connected together into a polygon.

Arguments

group:By default, the group is set to the interaction of all discrete variables in the plot. This often partitions the data correctly, but when it does not, or when no discrete variable is used in the plot, you will need to explicitly define the grouping structure, by mapping group to a variable that has a different value for each group.

Mapping

pl <- ggplot() +
  geom_polygon(data = world_data2, aes(x = long, y = lat,
  group = group,fill = log(Count),text=Count)) +
  coord_fixed(1.3)+
  scale_fill_viridis()+
  theme_void()

world_plotly=ggplotly(pl,tooltip = "text")

Mapping

Source: https://dy-lin.github.io/hs19-trends/R/visualization_map.html

Mapping

If we want to look at certain countries we can do that too

depending on how much data is in one region this may be a good idea
will have to figure out the coordinates you want
plot will be set up similarily to the global one but we will add in new elements

Mapping

	X	region	long	lat	group	order	subregion
67	67	Albania	19.96484	39.87227	6	836	NA
68	68	Albania	19.85186	40.04356	6	837	NA
69	69	Albania	19.48457	40.20996	6	838	NA
70	70	Albania	19.39814	40.28486	6	839	NA
71	71	Albania	19.36016	40.34771	6	840	NA
72	72	Albania	19.32227	40.40708	6	841	NA

Mapping

 pl <- ggplot() + 
  geom_polygon(data = europe_data, aes(x = long, y = lat, 
  group = group, fill = log(Count),
  text=paste(region,Count, sep=";"))) +
  geom_point(data=points_europe,aes(x=X,y=Y,
  text=str_wrap(affiliation,50)),
  alpha=0.5,size=0.5,colour="grey")+
  coord_fixed(1.3)+
  scale_fill_viridis()+
  theme_void()

Mapping

 pl <- ggplot() + 
  geom_polygon(data = europe_data, aes(x = long, y = lat, 
  group = group, fill = log(Count),
  text=paste(region,Count, sep=";"))) +
  geom_point(data=points_europe,aes(x=X,y=Y,
  text=str_wrap(affiliation,50)),
  alpha=0.5,size=0.5,colour="grey")+
  coord_fixed(1.3)+
  scale_fill_viridis()+
  theme_void()

Mapping

Source: https://dy-lin.github.io/hs19-trends/R/visualization_map.html

Bigrams

Bigrams are two consecutive words within a given text input

ex. We love coding in R
(We Love), (Love coding), (coding in), (in R)

Bigrams

library(tidyverse)
library(tidytext)
library(tm)
library(widyr)
library(igraph)
library(ggplot2)
library(ggraph)
library(readr)
library(tidygraph)

Bigrams

ggraph: visualizations for network structures

ggplot2: package for data visualizations

Bigrams

visualize_bigrams: extracts bigrams from a text field, calculates frequency of bigrams, and creates a bigram plot to visualize relationships between words

df_name: name of dataframe that contains the text field of interest
textfield: name of text field (ie. column name)

visualize_bigrams <- function(df_name, textfield, topic_title)

Bigrams

In the worksheet, you will see a lot of code that just talks about the formatting of the data. If you are interested in that look into it, but we’re just covering the plotting

Bigrams

pl <- graph_hold %>%
      ggraph(layout = "fr") +
      geom_edge_link(aes(edge_alpha = Edge_Frequency),
      show.legend = TRUE) +
      geom_node_point(aes(color = Term_Frequency, 
      size = Term_Frequency), alpha = 0.7) +
      scale_fill_viridis_c() +
      geom_node_text(aes(label = name), repel = TRUE) +
      scale_color_viridis_c(direction = -1) +
      theme_void() +
      guides(size=FALSE) +
      labs(title = quo_name(topic_title)) +
      theme(plot.title = element_text(size = 26, face = "bold"))

Bigrams

geom_edge_link(aes(edge_alpha = Edge_Frequency),show.legend = TRUE)

Usage

geom_edge_link(mapping = NULL, data = get_edges("short"),position = "identity", arrow = NULL, n = 100, lineend = "butt",linejoin = "round", linemitre = 1, label_colour = "black",label_alpha = 1, label_parse =FALSE, check_overlap = FALSE,angle_calc = "rot", force_flip = TRUE, label_dodge = NULL,label_push = NULL, show.legend = NA, ...)

Description

This geom draws edges in the simplest way - as straight lines between the start and end nodes. Not much more to say about that…

Bigrams

ggraph(layout = "fr")

Usage

ggraph(graph, layout = "auto", ...)

Description

This function is the equivalent of ggplot2::ggplot() in ggplot2. It takes care of setting up the plot object along with creating the layout for the plot based on the graph and the specification passed in. Alternatively a layout can be prepared in advance using create_layout and passed as the data argument.

Arguments

layout:The type of layout to create. Either a valid string, a function, a matrix, or a data.frame

fr is within the igraph layout options for constucting node diagrams

Bigrams

geom_node_point(aes(color = Term_Frequency, size = Term_Frequency), alpha = 0.7)

Usage

geom_node_point: geom_node_point(mapping = NULL, data = NULL, position = "identity",show.legend = NA, ...)

Description

This geom is equivalent in functionality to ggplot2::geom_point() and allows for simple plotting of nodes in different shapes, colours and sizes.

Arguments

We don’t use any arguments in this case, just change some of the aes

Bigrams

## 
## 
## |Search Term           |
## |:---------------------|
## |Assembly              |
## |Databases             |
## |Epigenetics           |
## |Gene Expression       |
## |Genome Annotation     |
## |Phylogenetics         |
## |Sequence Alignment    |
## |Sequencing            |
## |Structural Prediction |
## |Variant Calling       |

Bigrams

df_assembly <- df %>% 
  filter(topic == "Assembly")
visualize_bigrams(df_assembly, abstract, "")

Bigrams

Source: https://dy-lin.github.io/hs19-trends/R/bigram_relationships.html

GitHub

For more information regarding the Hackseq project: https://github.com/dy-lin/hs19-trends

Acknowledgements

Jasmine Lai
Raissa Philibert
Lucia Darrow
Shannon Lo
Morgana Xu
Elliot YKF
Swapna Menon

Thank you RLadies and Dialpad for hosting us!