WOMBAT 2025 Tutorial

Visualising Uncertainty

Harriet Mason, Dianne Cook

Department of Econometrics and Business Statistics

Welcome 👋🏼

Thanks for joining to learn about making data plots today.

🦘 Di is a Professor of Statistics. She has more than 30 years of research and teaching of data visualisation, and open source software development.
🐨 Harriet is a final year PhD student, working on better representation of uncertainty in data visualisations, particularly focused on spatial data.

We are both in Econometrics and Business Statistics, at Monash University.

🧩 Feel free to ask questions any time. 🤔

🎯 The objectives for today are:

understand what uncertainty means, in particular in relation to data visualisation.
be able to plot represent uncertainty on different types of plots.
assess whether one representation may be better than another, based on cognitive perception principles and visual testing.
apply these new approaches for showing uncertainty in spatial data visualisation.

Caution: Incorporating uncertainty into plots is far from a finshed state of best practices. This is our best attempt to summarise current literature, what we find to be valuable approaches and available tools.

Session 1
Foundations of uncertainty in data visualisations

Introduction

What is uncertainty?

You don’t know what you don’t know

Statistical (aleatory) uncertainty
- notion of randomness
- variability in the outcome/measurements
Systemic (epistemic) uncertainty
- due to bias, misunderstanding, assumptions
- measurement error
- handling of missing information, and pre-processing choices
- model choices
- incorrect comparisons
- can you think of any others?

Mostly we are concerned about representing statistical uncertainty.

Showing uncertainty

Show the data (1/4)

The most valuable way to show uncertainty is to show all the data.

Plot shows first preference % for greens in the 2019 Australian Federal Election, for the 150 electorates.

Plot of choice is the jittered dotplot, where points are spread vertically according to density.

Code

election <- read_csv(here::here("data/election2019.csv"),
  skip = 1,
  col_types = cols(
    .default = col_character(),
    OrdinaryVotes = col_double(),
    AbsentVotes = col_double(),
    ProvisionalVotes = col_double(),
    PrePollVotes = col_double(),
    PostalVotes = col_double(),
    TotalVotes = col_double(),
    Swing = col_double()
  )
)
e_grn <- election |>
  group_by(DivisionID) |>
  summarise(
    DivisionNm = unique(DivisionNm),
    State = unique(StateAb),
    votes_GRN = TotalVotes[which(PartyAb == "GRN")],
    votes_total = sum(TotalVotes)
  ) |>
  mutate(perc_GRN = votes_GRN / votes_total * 100)

e_grn |>
  mutate(State = fct_reorder(State, perc_GRN)) |>
  ggplot(aes(x=perc_GRN, y=State)) +
    geom_quasirandom(groupOnX = FALSE, varwidth = TRUE) +
    labs(
      x = "First preference votes %",
      y = ""
    ) +
  xlim(c(0,50))

Show the data (2/4)

What do we learn?

Different number of observations in each state
One outlier in Vic
As a group, ACT has higher %’s
Vic has a small cluster of points with higher %’s
%’s are mostly very low

This plot ONLY shows uncertainty!

Show the data (3/4)

What would be other common ways to display this data?

Side-by-side boxplots
Side-by-side violin
On a map of electorates

For each plot think about

what is uncertainty, and what is estimate
what the plot shows or hides

Dotplot
Boxplot
Violin
Map

Code

e_grn |>
  mutate(State = fct_reorder(State, perc_GRN)) |>
  ggplot(aes(x=perc_GRN, y=State)) +
    geom_boxplot(varwidth = TRUE) +
    labs(
      x = "First preference votes %",
      y = ""
    ) +
  xlim(c(0,50))

Code

e_grn |>
  mutate(State = fct_reorder(State, perc_GRN)) |>
  ggplot(aes(x=perc_GRN, y=State)) +
    geom_violin(draw_quantiles = c(0.25, 0.5, 0.75),
      fill="#006dae", alpha=0.5) +
    labs(
      x = "First preference votes %",
      y = ""
    ) +
  xlim(c(0,50))

Code

oz_states <- ozmaps::ozmap_states %>% filter(NAME != "Other Territories")
oz_votes <- rmapshaper::ms_simplify(ozmaps::abs_ced)
oz_votes_grn <- full_join(oz_votes, e_grn, by=c("NAME"="DivisionNm"))

ggplot(oz_votes_grn, aes(fill=perc_GRN)) +
  geom_sf(colour="white") +
  scale_fill_viridis_c(direction=-1, trans = "log", 
    guide = "colourbar", 
    labels = scales::label_number(accuracy = 0.1)) +
  theme_map() +
  theme(legend.position = "right", 
    legend.title = element_blank())

Show the data (4/4)

Even when you think you are showing the data, it is often an estimate and some representation of uncertainty.

The election data is actually estimates. The electorates are strata, so what was shown was % computed on each strata.

What is the full data? What are different strata possible?

Generally, we trust the values provided by AEC, and we explore the distribution of votes by different strata in the electorate structure. The goal being to understand the variability in the way the people have voted, identify electorates where the winner might flip next time, …

It’s really difficult to concretely define uncertainty!

Terminology

Names for main thing:

estimate
statistic
signal

Names for uncertainty, needed to understand main thing:

variation
variability
variance/standard deviation
error/standard error
IQR/MAD
noise

Displaying uncertainty is described signal suppression.

Code

load("data/melbtemp.rda")
melbtemp_2019 <- melbtemp |>
  filter(year == 2019)
  
d1 <- ggplot(melbtemp_2019, aes(x=month, y=temp)) +
  geom_quasirandom() + 
  stat_summary(geom="point", fun="median", 
    colour="red", size=3) +
  xlab("") + ylab("Temp (C)") +
  ggtitle("A. ggbeeswarm::geom_quasirandom")
  
library(ggforce)
d2 <- ggplot(melbtemp_2019, aes(x=month, y=temp)) +
  geom_violin(fill = "#6F7C4D", colour=NA, alpha=0.7) +
  geom_sina() +
  xlab("") + ylab("Temp (C)") +
  ggtitle("B. geom_violin + ggforce::geom_sina")

library(ggridges)
d3 <- ggplot(melbtemp_2019, aes(x=temp, y=month)) +
  geom_density_ridges(scale = 1.5, 
                      quantile_lines = TRUE,
                      quantiles = 2,
                      fill = "#6F7C4D") +
  xlab("Temp (C)") + ylab("") + 
  theme_ridges() +
  ggtitle("C. ggridges::geom_density_ridges")

library(ggdist)
d4 <- ggplot(melbtemp_2019, aes(x=temp, y=month)) +
  stat_halfeye(fill="#6F7C4D", alpha=0.7) +
  geom_point(pch = "|", size = 2,
    position = position_nudge(y = -.15)) +
  xlab("Temp (C)") + ylab("") +
  ggtitle("D. ggdist::stat_halfeye")

lout <- c(area(1,2),
          area(3),
          area(4))
lout <- "
AACD
BBCD
"
d1 + d2 + d3 + d4 + plot_layout(design=lout)

What is the main element?
How is the uncertainty displayed?
What is the uncertainty pattern? And then, what would be an appropriate representation?
What are the key features of the data that we need to preserve in a plot?
Why the different aspect ratios?

Exercise 1

Continue working with the Melbourne temperature data:

Decide the appropriate information about the uncertainty to include in the display.
Play with different options on your choice of display to make various displays. Aim to have three different designs.
Is there a winner, or several roughly equally good displays?

10:00

How this affects perception

Why it it “signal suppression”?

model
and data
and SE
individual fits
which is honest?

Plotting the fitted model alone

Code

data("wages")
wages_fct <- wages |>
  select(id, ln_wages, xp, high_grade) |>
  mutate(high_grade = factor(high_grade))
wages_fit <- lmer(ln_wages~xp + high_grade + (xp|id), data=wages_fct)
wages_fe <- summary(wages_fit)$coefficients
wages_fe_d <- tibble(xp = rep(seq(0, 13, 1), 7),
     high_grade = rep(c(6, 7, 8, 9, 10, 11, 12), rep(14, 7))) |>
  mutate(ln_wages = case_when(
    high_grade == 6 ~ wages_fe[1,1] + wages_fe[2,1]*xp,
    high_grade == 7 ~ wages_fe[1,1] + wages_fe[3,1] + wages_fe[2,1]*xp,
    high_grade == 8 ~ wages_fe[1,1] + wages_fe[4,1]  + wages_fe[2,1]*xp,
    high_grade == 9 ~ wages_fe[1,1] + wages_fe[5,1]  + wages_fe[2,1]*xp,
    high_grade == 10 ~ wages_fe[1,1] + wages_fe[6,1]  + wages_fe[2,1]*xp,
    high_grade == 11 ~ wages_fe[1,1] + wages_fe[7,1]  + wages_fe[2,1]*xp,
    high_grade == 12 ~ wages_fe[1,1] + wages_fe[8,1]  + wages_fe[2,1]*xp)
  ) |>
  mutate(high_grade = factor(high_grade))
ggplot(wages_fe_d) + 
  geom_line(aes(x=xp, 
                y=ln_wages, 
                colour=high_grade, 
                group=high_grade)) +
  scale_colour_discrete_divergingx(palette = "Zissou 1") +
  labs(x="Experience (years)", y="Wages (ln)", colour="Grade")

Adding the data makes the model look less impressive

Code

ggplot() + 
  geom_line(data=wages_fct, aes(x=xp, y=ln_wages, group=id), alpha=0.1) +
  geom_line(data=wages_fe_d, aes(x=xp, 
                y=ln_wages, 
                colour=high_grade, 
                group=high_grade)) +
  scale_colour_discrete_divergingx(palette = "Zissou 1") +
  labs(x="Experience (years)", y="Wages (ln)", colour="Grade")

Standard errors as produced by the model fit.

Code

wages_fe_d <- wages_fe_d |>
  mutate(ln_wages_l = case_when(
    high_grade == 6 ~ wages_fe[1,1] - wages_fe[1,2] +
                      (wages_fe[2,1]-wages_fe[2,2])*xp ,
    high_grade == 7 ~ wages_fe[1,1] - wages_fe[1,2] + 
                      wages_fe[3,1] - wages_fe[3,2] + 
                      (wages_fe[2,1]-wages_fe[2,2])*xp,
    high_grade == 8 ~ wages_fe[1,1] - wages_fe[1,2] + 
                      wages_fe[4,1] - wages_fe[4,2] + 
                      (wages_fe[2,1]-wages_fe[2,2])*xp,
    high_grade == 9 ~ wages_fe[1,1] - wages_fe[1,2] + 
                      wages_fe[5,1] - wages_fe[5,2] + 
                      (wages_fe[2,1]-wages_fe[2,2])*xp,
    high_grade == 10 ~ wages_fe[1,1] - wages_fe[1,2] + 
                      wages_fe[6,1] - wages_fe[6,2] + 
                      (wages_fe[2,1]-wages_fe[2,2])*xp,
    high_grade == 11 ~ wages_fe[1,1] - wages_fe[1,2] + 
                      wages_fe[7,1] - wages_fe[7,2] + 
                      (wages_fe[2,1]-wages_fe[2,2])*xp,
    high_grade == 12 ~ wages_fe[1,1] - wages_fe[1,2] + 
                      wages_fe[8,1] - wages_fe[8,2] + 
                      (wages_fe[2,1]-wages_fe[2,2])*xp)
  ) |>
  mutate(ln_wages_u = case_when(
    high_grade == 6 ~ wages_fe[1,1] + wages_fe[1,2] +
                      (wages_fe[2,1]+wages_fe[2,2])*xp ,
    high_grade == 7 ~ wages_fe[1,1] + wages_fe[1,2] + 
                      wages_fe[3,1] + wages_fe[3,2] + 
                      (wages_fe[2,1]+wages_fe[2,2])*xp,
    high_grade == 8 ~ wages_fe[1,1] + wages_fe[1,2] + 
                      wages_fe[4,1] + wages_fe[4,2] + 
                      (wages_fe[2,1]+wages_fe[2,2])*xp,
    high_grade == 9 ~ wages_fe[1,1] + wages_fe[1,2] + 
                      wages_fe[5,1] + wages_fe[5,2] + 
                      (wages_fe[2,1]+wages_fe[2,2])*xp,
    high_grade == 10 ~ wages_fe[1,1] + wages_fe[1,2] + 
                      wages_fe[6,1] + wages_fe[6,2] + 
                      (wages_fe[2,1]+wages_fe[2,2])*xp,
    high_grade == 11 ~ wages_fe[1,1] + wages_fe[1,2] + 
                      wages_fe[7,1] + wages_fe[7,2] + 
                      (wages_fe[2,1]+wages_fe[2,2])*xp,
    high_grade == 12 ~ wages_fe[1,1] + wages_fe[1,2] + 
                      wages_fe[8,1] + wages_fe[8,2] + 
                      (wages_fe[2,1]+wages_fe[2,2])*xp)
  ) 

ggplot() + 
  geom_ribbon(data=wages_fe_d, 
            aes(x=xp, 
                ymin=ln_wages_l,
                ymax=ln_wages_u,
                fill=high_grade), colour=NA, alpha=0.1) +
  geom_line(data=wages_fe_d, 
            aes(x=xp, 
                y=ln_wages, 
                colour=high_grade, 
                group=high_grade)) +
  scale_fill_discrete_divergingx(palette = "Zissou 1") +
  scale_colour_discrete_divergingx(palette = "Zissou 1") +
  labs(x="Experience (years)", y="Wages (ln)", colour="Grade", fill="Grade")

Examine individual fits. Too many to show all - sample multiple times to get through them all.

Code

wages_full <- wages_fct |>
  add_predictions(wages_fit, 
                  var = "pred") |>
  add_residuals(wages_fit, 
                var = "res")
set.seed(1222)
wages_full |> add_n_obs() |> filter(n_obs > 4) |>
  sample_n_keys(size = 12) |>
  ggplot() + 
  geom_line(aes(x = xp, y = pred, group = id, 
             colour = factor(id))) + 
  geom_point(aes(x = xp, y = ln_wages, 
                 colour = factor(id))) + 
  facet_wrap(~id, ncol=4)  +
  scale_x_continuous("Experience (years)", 
    breaks=seq(0, 12, 2)) +
  ylab("Wages (ln)") +
  theme(aspect.ratio = 0.6, legend.position = "none")

Plotting the model alone suggests higher education increases wages (by XXX), particularly if the student has 12 years (full high school).
Adding the individual level observations shows the large individual-to-individual variability, which swamps the education differences. It also suggests the individual observations might not have linear increase.
Adding representation of the standard error shows the “overlap” between education levels, that the estimate for wage increase with 12 years of education is not substantially better than 10 years, but it is better than 6 years.
Displaying the observed and fitted values separately by individual, examines individual level uncertainty.

Note that, this particular model fit makes the assumption that wages increase linearly with years of experience.

Perception and uncertainty

There can be multiple levels of uncertainty:
- Example had two measurement levels - fixed effect/demographic strata, random effects/individuals - but multiple ways to represent these.
- A classical example is in a simple regression we have confidence intervals for the model estimates, and also prediction intervals for predicting new observations.
Uncertainty is not just another variable, but some measures might be handled this way, and encoded in the data as an extra column, e.g. standard error, IQR. This can be used in the plot to incorporate uncertainty.
You need to decide what is the appropriate uncertainty measure for the problem.

Including uncertainty representation adds to the complexity, and multiple elements on a plot can interfere with the perception of either one.

Perceptual principles

Hierarchy of mappings:
1. Position - common scale (BEST): scatterplot, barchart
2. Position - nonaligned scale: side-by-side boxplot, stacked barchart
3. Length, direction, angle: piechart, rose plot, gauge plot, donut, wind direction map, starplot
4. Area: treemap, bubble chart, mosaicplot
5. Volume, curvature: chernoff face
6. Shading, color (WORST): choropleth map
Pre-attentive: noticed before you even realise it.
Color palettes: qualitative, sequential, diverging.
Proximity: Place elements for primary comparison close together.
Change blindness: When focus is interrupted differences may not be noticed.

Applying these to making plots

Make primary information prominent: use pre-attentive elements like colour, make larger.
Place items to compare first near each other, main thing + uncertainty.
When using colour to map a two-ended continuous variable use a diverging palette, and conversely, for low-high continuous variable such as confidence use a sequential palette.
Axes and labels should be faint in the background, to be examined only when needing to interpret or quantify patterns.
Order items, categories by a numerical rank.
For separate plots, common axes to make comparison easier.

Accessibility considerations when adding uncertainty to plots are harder because there is more information provided.

What is the focus? line, or the confidence bands?
Does the uncertainty representation obscure the signal?
Reading colour is already hard. For spatial plots where a 2D palette is used for signal and uncertainty is hard to read. Mapping a variable to saturation hurts accessibility. Stay tuned for session 2!

Application to uncertainty visualisation

Make the main thing pre-attentive. It will draw attention to the primary information, such as trend, median, mean, estimate, first.
Map the uncertainty to a lower level of attentiveness, so it can be considered secondarily.
Uncertainty needs to be placed with the main thing in order to make comparison. The purpose of including uncertainty representation is to compare pattern in the main thing relative to variation remaining.

Source: http://dx.doi.org/10.4172/2153-0602.1000139

At the core of statistical reasoning is asking compared to what.

Example

What principle(s) is this using?

What principle(s) is this using?

Exercise 2

Take a look at the plots made in Exercise 1. Ask yourself whether the main thing is pre-attentive, and the uncertainty representation is sitting a little into the background?
Tinker with the design of one plot to make it better fit this principle.

Inspirations
ggbeeswarm
violin + sina

Make the median more prominent, relative the the observations.

Code

ggplot(melbtemp_2019, aes(x=month, y=temp)) +
  geom_quasirandom() + 
  stat_summary(geom="point", fun="median", 
    colour="red", size=5) +
  xlab("") + ylab("Temp (C)") +
  ggtitle("A. ggbeeswarm::geom_quasirandom")

Add a representation of the median, fade the density by making it light grey.

Code

ggplot(melbtemp_2019, aes(x=month, y=temp)) +
  geom_violin(fill = "grey80", colour=NA, alpha=0.7) +
  geom_sina() +
  stat_summary(geom="point", fun="median", 
    colour="red", size=5) +
  xlab("") + ylab("Temp (C)") +
  ggtitle("B. geom_violin + ggforce::geom_sina")

10:00

Common measures and representations

Barcharts

Melbourne pedestrian counts at Southern Cross Station, Sunday Aug 31, 2025.

Bars
bar+CI
CI
Gradient
Ribbon
LineRibbon
Loess
Code

Code

load("data/ped_Aug2025.rda")
ped_sc <- ped |>
  filter(Sensor == "Southern Cross Station") |>
  filter(Date == ymd("2025-08-31")) |>
  group_by(Time) |>
  summarise(Count = sum(Count), .groups = "drop") |>
  mutate(se = sqrt(Count))
b1 <- ggplot(ped_sc, aes(x=Time, y=Count)) +
  geom_col(fill = "#20794D") +
  xlab("Hour")
b2 <- ggplot(ped_sc, aes(x=Time, y=Count)) +
  geom_col(fill = "#b9ca4a") +
  geom_errorbar(aes(ymin = Count - se, ymax = Count + se),
    width=0.5, colour="#20794D") +
  xlab("Hour")
b3 <- ggplot(ped_sc, aes(x=Time,
    ydist=distributional::dist_normal(Count, se))) +
  stat_pointinterval(colour = "#20794D") +
  xlab("Hour") + ylab("Count")
b4 <- ggplot(ped_sc, aes(x=Time,
    ydist=distributional::dist_normal(Count, se))) +
  stat_gradientinterval(colour = NA, fill="#20794D", 
    .width=1) +
  geom_line(aes(x=Time, y=Count), colour="#20794D") +
  xlab("Hour") + ylab("Count")
b5 <- ggplot(ped_sc, aes(x=Time, y=Count)) +
  geom_ribbon(aes(ymin = Count - qnorm(0.975)*se, 
                  ymax = Count + qnorm(0.975)*se),
    fill = "#b9ca4a") +
  geom_line(colour="#20794D") +
  xlab("Hour")
ped_sc_ci <- ped_sc |>
  mutate(l50 = Count - qnorm(0.75)*se,
         u50 = Count + qnorm(0.75)*se,
         l80 = Count - qnorm(0.9)*se,
         u80 = Count + qnorm(0.9)*se,
         l99 = Count - qnorm(0.995)*se,
         u99 = Count + qnorm(0.995)*se
  ) |>
  pivot_longer(cols=l50:u99, names_to = "intprob", 
    values_to="value") |>
  mutate(bound = str_sub(intprob, 1, 1),
       prob = str_sub(intprob, 2, 3)) |>
  select(Time, Count, se, prob, bound, value) |>
  pivot_wider(names_from = bound, values_from = value)
b6 <- ggplot(ped_sc_ci, aes(x=Time, y=Count)) +
  geom_lineribbon(aes(ymin = l, ymax = u, fill = prob)) +
  labs(x="Hour", fill="Confidence") +
  scale_fill_discrete_sequential(palette = "Greens", 
    rev=FALSE, n=5)  
b7 <- ggplot(ped_sc, aes(x=Time, y=Count)) +
  geom_smooth(colour = "#20794D", fill = "#b9ca4a") +
  geom_point(colour = "#20794D") +
  xlab("Hour")

Broader applicability

The approaches used on the barcharts here are the same approaches that apply to many other types of displays.

Error bars
Error bands
Gradients
Multiple samples, such as bootstrap or simulation (stay tuned!)

Exercise 3

This is a forecast of business trips to Melbourne for 2018-2022, based on data from 2000-2017. How is the uncertainty represented?

Code

tourism_melb <- tourism %>%
  filter(Region == "Melbourne", Purpose == "Business")
fit <- tourism_melb %>%
  model(
    ets = ETS(Trips ~ trend("A"))
  )
fc <- fit |>
  forecast(h = "5 years")
fc |>
  autoplot(tourism_melb) +
    theme(aspect.ratio = 0.6)

In what ways do you think this representation is better (or worse) than the previous representation?

Code

fc_b <- fit |>
  forecast(h = "5 years", bootstrap = TRUE)
fc_b_samples <- fit |>
  generate(h = 20, times = 50, bootstrap = TRUE)
fc_b_samples |>
  ggplot() +
    geom_line(aes(x = Quarter, y = .sim, group = .rep),
      colour = "#027EB6", alpha=0.1) +
    geom_line(data=fc_b, aes(x = Quarter, y = .mean), 
      colour = "#027EB6") +
    autolayer(tourism_melb, Trips) +
    ylab("Trips") +
    theme(aspect.ratio = 0.6)

Deciding which is the best design

Evaluation criteria

Uncertainty visualisation should:

Reinforce signals that are important
Hide signals that are primarily noise

to enable making the better decisions and conclusions, or dare we say, inference.

Objective testing procedure:

Simulate data to control pattern being examined.
Use the lineup protocol to determine if a reader can detect the structured plot from not structured null plots.

We’ll illustrate this with a simplified pedestrian count plot example.

Try this

01:00

start
1a
1b
2a
2b

Process

group a or b
pick the plot that is different
record choice, and why that plot looks different

Lineup experiment

Simulated data setup
Testing protocol
Code

Time: 8 instead of 24
Count: pattern is large wave
Variance model: uniform ranging from small to large. Poisson used to generate count.

Code

nlev <- 8
x <- 1:nlev
y_large <- 2 + (x-nlev/2)^2
y_large <- max(y_large) - y_large
y_small <- y_large
#y_small[5] <- y_small[5] + 5
#y_small[6] <- y_small[6] - 5
#y_small[7] <- y_small[7] - 6
struct_d <- tibble(x, y=y_small)

Plot the structured data among a field of null plots, in two ways. (Can be more than two.)
Ask two sets of independent observers to pick the plot that is most different.
Compute ratio of number of detections to observers, for each design.
The design with the highest ratio is the more powerful design.

Source: https://doi.org/10.1109/TVCG.2012.230

set.seed(130)
noise_param <- 1
noise_scale <- 30
m <- 12 # multiple of 3
noise1 <- tibble(x=rep(x, m), 
                 y=rpois(nlev*m, noise_param)+1, 
                 .sample=rep(1:m, rep(nlev, m))
)
set.seed(258)
pos <- sample(1:m, 1)
noise1_lup <- noise1 |>
  mutate(yn = y*noise_scale) |>
  mutate(y = if_else(.sample == pos, struct_d$y+y, y)) |>
  group_by(.sample) |>
  mutate(y = round((y-min(y))/(max(y)-min(y))*90+10, 0))
l1 <- ggplot(noise1_lup, aes(x=x, y=y,
    ymin = y-yn, ymax = y+yn)) + 
  geom_ribbon(colour=NA, fill="#b9ca4a") +
  geom_line(colour="#20794D") +
  facet_wrap(~.sample, ncol=m/3, scales="free_y") +
  theme(axis.text = element_blank(),
        axis.title = element_blank(),
        axis.ticks = element_blank(),
        panel.grid.major = element_blank(),
        aspect.ratio = 0.6)
l2 <- ggplot(noise1_lup, aes(x=x, y=y,
         ymin = y-yn, ymax = y+yn)) +
  geom_pointrange(colour = "#20794D") +
  geom_point(colour="#b9ca4a", size=3) +
  facet_wrap(~.sample, ncol=m/3, scales="free_y") +
  theme(axis.text = element_blank(),
        axis.title = element_blank(),
        axis.ticks = element_blank(),
        panel.grid.major = element_blank(),
        aspect.ratio = 0.6)
noise_scale <- 10
set.seed(432)
pos <- sample(1:m, 1)
noise1_lup <- noise1 |>
  mutate(yn = y*noise_scale) |>
  mutate(y = if_else(.sample == pos, struct_d$y+y, y)) |>
  group_by(.sample) |>
  mutate(y = round((y-min(y))/(max(y)-min(y))*90+10, 0))
l3 <- ggplot(noise1_lup, aes(x=x, y=y,
    ymin = y-yn, ymax = y+yn)) + 
  geom_ribbon(colour=NA, fill="#b9ca4a") +
  geom_line(colour="#20794D") +
  facet_wrap(~.sample, ncol=m/3, scales="free_y") +
  theme(axis.text = element_blank(),
        axis.title = element_blank(),
        axis.ticks = element_blank(),
        panel.grid.major = element_blank(),
        aspect.ratio = 0.6)
l4 <- ggplot(noise1_lup, aes(x=x, y=y,
         ymin = y-yn, ymax = y+yn)) +
  geom_pointrange(colour = "#20794D") +
  geom_point(colour="#b9ca4a", size=3) +
  facet_wrap(~.sample, ncol=m/3, scales="free_y") +
  theme(axis.text = element_blank(),
        axis.title = element_blank(),
        axis.ticks = element_blank(),
        panel.grid.major = element_blank(),
        aspect.ratio = 0.6)

End of session 1

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

WOMBAT 2025 Tutorial

Welcome 👋🏼

Session 1 Foundations of uncertainty in data visualisations

Introduction

What is uncertainty?

Showing uncertainty

Show the data (1/4)

Show the data (2/4)

Show the data (3/4)

Show the data (4/4)

Terminology

Example: distributions

Exercise 1

How this affects perception

Why it it “signal suppression”?

Perception and uncertainty

Perceptual principles

Applying these to making plots

Application to uncertainty visualisation

Example

Exercise 2

Common measures and representations

Barcharts

Broader applicability

Exercise 3

Deciding which is the best design

Evaluation criteria

Try this

Lineup experiment

End of session 1

Session 1
Foundations of uncertainty in data visualisations