WOMBAT 2025 Tutorial

Visualising Uncertainty

Harriet Mason, Dianne Cook

Department of Econometrics and Business Statistics

Session 2
Diving deeper into uncertainty visualisation using examples in spatial data

Introduction to Spatial Visualisation

Why focus on spatial visualisations?

  • Spatial case is a good example to work through because the aesthetics we have to express estimates are limited
  • Maps take up most of the usual aesthetics by being a representation of space
    • position, size, shape, etc. all have an implicit meaning in the mapping context
    • colour/fill is usually the only aesthetic we have left
    • can also get creative and do glyph maps (we will ignore this variation here)


Spatial data takes up the two dimensions of the display leaving colour and fill to map uncertainty.

Example: Citizen Scientist Data

  • There have been reports of a strange spatial pattern in the temperatures of Iowa
  • We get some citizen scientists to measure data at their home and report back
  • To maintain anonymity, we are only provided with the county of each scientist
scientistID county_name recorded_temp
#74991 Lyon County 21.1
#22780 Dubuque County 28.9
#55325 Crawford County 26.4
#46379 Allamakee County 27.1
#84259 Jones County 34.2

990 citizen scientists participated

We could just plot the data…

  • We often can plot the longitude and latitude directly using geom_point.
  • While this approach has a low barrier to entry, it lacks the contextual information that gives our plots meaning.

Spatial features objects

  • SF objects are differentiated from our usual tibble by the additional metata in the Coordinate reference system (CRS)
    • Assumptions about the shape of the planet (geodetic datum)
    • Distortions we will/won’t accept when drawing the map (map projection)

Can you see the spatial trend?

Estimate the county mean

  • Visualising an estimate, such as a mean, can make trends easier to see
    • This estimate has a standard error, but we rarely integrate it into our plots
Code
# Calculate County Mean
toy_temp |> 
  group_by(county_name) |>
  summarise(temp_mean = mean(recorded_temp),
            temp_se = sd(recorded_temp)/sqrt(n()),
            n = n()) 
county_name temp_mean temp_se n
Adair County 29.7 0.907 6
Adams County 29.6 1.003 9
Allamakee County 26.3 0.550 8
Appanoose County 22.8 0.831 14
Audubon County 27.6 0.893 11

Can you see the trend now?

Common Map Visualisations

  • Usually spatial data is shown using a choropleth map,
    • Choropleth maps shade an area according our statistic of interest
  • We can also weight the area by another variable, such as sample size
    • e.g. cartograms, and bubble plots
  • Does this plot follow the principles of signal suppression?
  • Is there a noticeable difference in the way these plots convey signal?

But what if the error is worse?

  • It turns out the citizen scientists are using some pretty old tools.
  • The standard error could be up to three times what we would estimate with our usual assumptions.
  • We want to see both versions of the data so we can see the impact of this change.
county_name temp_mean low_se high_se
Adair County 29.7 0.907 2.72
Adams County 29.6 1.003 3.01
Allamakee County 26.3 0.550 1.65

Spot the difference

One of these maps was made using the estimate with the high standard error, the other was made with the estimate from the low standard error. Can you tell which is which?

Exercise 1

Get comfortable working with spatial features yourself

  • The citizen scientist data is available in the ggdibbler package as toy_temp.
  • Go through the steps we worked through thus far in the tutorial, and get your estimate and standard error variables.
  • Experiment changing the standard error to see if there is an impact on the plot.
  • You can also try adding county names, changing the colour scale, and making other aesthetic changes to the map.

10:00

Approaches to Spatial Uncertainty

Looking at current approaches

We are going to go through some uncertainty visualisation methods assess them on the signal suppression criteria.

Remember, uncertainty visualisation should

  1. Reinforce justified signals
    We want to trust the results

  2. Hide signals that are primarily noise
    We don’t want to see something that isn’t there

Solution 1: add an axis for uncertainty (Bivar)

Questions to think about…

  • Is there a visible difference between the high and low uncertainty cases?
  • Is the trend still visible in the high uncertainty case?
  • Is this approach accessible?
    • Is colour a simple 3D space?
    • Can everyone see changes in saturation?

Solution 2: blend the colours together (VSUP)

Questions to think about…

  • Is there a visible difference between the high and low uncertainty cases?
  • Is the trend still visible in the high uncertainty case?
  • Is this approach accessible?
  • At what level of uncertainty should you blend two colours together?

Solution 3: simulate a sample (pixel)

Questions to think about…

  • Is there a visible difference between the high and low uncertainty cases?
  • Is the trend still visible in the high uncertainty case?
  • Is this approach accessible?
  • What has replaced the manual colour blending in this approach?

Making a Pixel Map with ggdibbler

A ggdibbler example

distributional and ggdibbler ecosystem

Using distributional

  • Expressing an estimate as a random variable using distributional makes the uncertainty more explicit
    • We will make a new variable, temp_dist that contains the sampling distribution of temp_mean
    • Note: distributional uses standard deviation, but prints the variance.
Code
# Calculate County Distribution
toy_temp_mean <- toy_temp_mean |>
  mutate(temp_dist = dist_normal(temp_mean, temp_se))
county_name temp_mean temp_se temp_dist n
Adair County 29.7 0.907 N(30, 0.82) 6
Adams County 29.6 1.003 N(30, 1) 9
Allamakee County 26.3 0.550 N(26, 0.3) 8
Appanoose County 22.8 0.831 N(23, 0.69) 14
Audubon County 27.6 0.893 N(28, 0.8) 11

Comparing ggplot to ggdibbler

ggplot code

toy_temp_mean |> 
  ggplot() + 
  geom_sf(aes(geometry = county_geometry,
                     fill=temp_mean))

ggdibbler code

toy_temp_mean |> 
  ggplot() + 
  geom_sf_sample(aes(geometry = county_geometry,
                     fill = temp_dist))

High and low variance comparison with ggdibbler

Remember, the plot is random

High and low variance comparison with ggdibbler

Remember, the plot is random

Exercise 2

Here is the code that was used to make the cartogram from earlier in the session. Using distributional and the standard error provided in toy_temp_mean, can you make a ggdibbler version of this plot?

10:00
Code
# Transform to a the crs needed to do the cartogram transformation
toy_merc <- st_transform(toy_temp_mean, 3857)
# cartogram transformation
toy_cartogram <- cartogram_cont(toy_merc, weight = "n", itermax = 5)
# Transform back to original crs 
toy_cartogram <- st_transform(toy_cartogram, st_crs(toy_temp_mean))

# Plot cartogram using ggplot2
ggplot(toy_cartogram) +
  geom_sf(aes(fill = temp_mean), linewidth = 0, alpha = 0.9) +
  theme_minimal() +
  scale_fill_distiller(palette = "YlOrRd", direction= 1) +
  xlab("Longitude") +
  ylab("Latitude") +
  labs(fill = "Temperature") +
  theme(aspect.ratio=0.7)

Code
# only change to data is distribution
toy_cartogram |>
  mutate(temp_dist = dist_normal(temp_mean, temp_se^2)) |>
  ggplot() +
  geom_sf_sample(aes(geometry=county_geometry, 
                     fill=temp_dist), linewidth=0) +
   geom_sf(aes(geometry=county_geometry), fill=NA, colour="white") +
  theme_minimal() +
  scale_fill_distiller(palette = "YlOrRd", direction= 1) +
  xlab("Longitude") +
  ylab("Latitude") +
  labs(fill = "Temperature") +
  theme(aspect.ratio=0.7)

Where to learn more

End of session 2

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.