WOMBAT 2025 Tutorial

Visualising Uncertainty

Harriet Mason, Dianne Cook

Department of Econometrics and Business Statistics

Session 2
Diving deeper into uncertainty visualisation using examples in spatial data

Introduction to Spatial Visualisation

Why focus on spatial visualisations?

Spatial case is a good example to work through because the aesthetics we have to express estimates are limited
Maps take up most of the usual aesthetics by being a representation of space
- position, size, shape, etc. all have an implicit meaning in the mapping context
- colour/fill is usually the only aesthetic we have left
- can also get creative and do glyph maps (we will ignore this variation here)

Spatial data takes up the two dimensions of the display leaving colour and fill to map uncertainty.

Example: Citizen Scientist Data

There have been reports of a strange spatial pattern in the temperatures of Iowa
We get some citizen scientists to measure data at their home and report back
To maintain anonymity, we are only provided with the county of each scientist

scientistID	county_name	recorded_temp
#74991	Lyon County	21.1
#22780	Dubuque County	28.9
#55325	Crawford County	26.4
#46379	Allamakee County	27.1
#84259	Jones County	34.2

990 citizen scientists participated

We could just plot the data…

We often can plot the longitude and latitude directly using geom_point.
While this approach has a low barrier to entry, it lacks the contextual information that gives our plots meaning.

Spatial features objects

SF objects are differentiated from our usual tibble by the additional metata in the Coordinate reference system (CRS)
- Assumptions about the shape of the planet (geodetic datum)
- Distortions we will/won’t accept when drawing the map (map projection)

Can you see the spatial trend?

Estimate the county mean

Visualising an estimate, such as a mean, can make trends easier to see
- This estimate has a standard error, but we rarely integrate it into our plots

Code

# Calculate County Mean
toy_temp |> 
  group_by(county_name) |>
  summarise(temp_mean = mean(recorded_temp),
            temp_se = sd(recorded_temp)/sqrt(n()),
            n = n())

county_name	temp_mean	temp_se	n
Adair County	29.7	0.907	6
Adams County	29.6	1.003	9
Allamakee County	26.3	0.550	8
Appanoose County	22.8	0.831	14
Audubon County	27.6	0.893	11

Can you see the trend now?

Common Map Visualisations

Usually spatial data is shown using a choropleth map,
- Choropleth maps shade an area according our statistic of interest
We can also weight the area by another variable, such as sample size
- e.g. cartograms, and bubble plots
Does this plot follow the principles of signal suppression?
Is there a noticeable difference in the way these plots convey signal?

Choropleth Map
Cartogram
Bubble Map

But what if the error is worse?

It turns out the citizen scientists are using some pretty old tools.
The standard error could be up to three times what we would estimate with our usual assumptions.
We want to see both versions of the data so we can see the impact of this change.

county_name	temp_mean	low_se	high_se
Adair County	29.7	0.907	2.72
Adams County	29.6	1.003	3.01
Allamakee County	26.3	0.550	1.65

Spot the difference

One of these maps was made using the estimate with the high standard error, the other was made with the estimate from the low standard error. Can you tell which is which?

Exercise 1

Get comfortable working with spatial features yourself

The citizen scientist data is available in the ggdibbler package as toy_temp.
Go through the steps we worked through thus far in the tutorial, and get your estimate and standard error variables.
Experiment changing the standard error to see if there is an impact on the plot.
You can also try adding county names, changing the colour scale, and making other aesthetic changes to the map.

10:00

Approaches to Spatial Uncertainty

Looking at current approaches

We are going to go through some uncertainty visualisation methods assess them on the signal suppression criteria.

Remember, uncertainty visualisation should

Reinforce justified signals
We want to trust the results
Hide signals that are primarily noise
We don’t want to see something that isn’t there

Solution 1: add an axis for uncertainty (Bivar)

Questions to think about…

Is there a visible difference between the high and low uncertainty cases?
Is the trend still visible in the high uncertainty case?
Is this approach accessible?
- Is colour a simple 3D space?
- Can everyone see changes in saturation?

Solution 2: blend the colours together (VSUP)

Questions to think about…

Is there a visible difference between the high and low uncertainty cases?
Is the trend still visible in the high uncertainty case?
Is this approach accessible?
At what level of uncertainty should you blend two colours together?

Solution 3: simulate a sample (pixel)

Questions to think about…

Is there a visible difference between the high and low uncertainty cases?
Is the trend still visible in the high uncertainty case?
Is this approach accessible?
What has replaced the manual colour blending in this approach?

Popular R packages

ggdibbler
- Data: distribution from distributional
- Maps: pixel map, other non-spatial maps
Vizumap
- Data: Depends on the plot. Can take a distribution as a q function (pixel), or an estimate and standard error as two variables (bivar/VSUP and glyph)
- Maps: Bivar/VSUP, pixel, glyph
biscale
- Data: Estimate and standard error as two variables
- Maps: Bivar/VSUP map

Making a Pixel Map with `ggdibbler`

A `ggdibbler` example

`distributional` and `ggdibbler` ecosystem

Using distributional

Expressing an estimate as a random variable using distributional makes the uncertainty more explicit
- We will make a new variable, temp_dist that contains the sampling distribution of temp_mean
- Note: distributional uses standard deviation, but prints the variance.

Code

# Calculate County Distribution
toy_temp_mean <- toy_temp_mean |>
  mutate(temp_dist = dist_normal(temp_mean, temp_se))

county_name	temp_mean	temp_se	temp_dist	n
Adair County	29.7	0.907	N(30, 0.82)	6
Adams County	29.6	1.003	N(30, 1)	9
Allamakee County	26.3	0.550	N(26, 0.3)	8
Appanoose County	22.8	0.831	N(23, 0.69)	14
Audubon County	27.6	0.893	N(28, 0.8)	11

Comparing `ggplot` to `ggdibbler`

`ggplot` code

toy_temp_mean |> 
  ggplot() + 
  geom_sf(aes(geometry = county_geometry,
                     fill=temp_mean))

`ggdibbler` code

toy_temp_mean |> 
  ggplot() + 
  geom_sf_sample(aes(geometry = county_geometry,
                     fill = temp_dist))

High and low variance comparison with `ggdibbler`

Remember, the plot is random

High and low variance comparison with `ggdibbler`

Remember, the plot is random

Exercise 2

Here is the code that was used to make the cartogram from earlier in the session. Using distributional and the standard error provided in toy_temp_mean, can you make a ggdibbler version of this plot?

10:00

Cartogram with no uncertainty
Check

Code

# Transform to a the crs needed to do the cartogram transformation
toy_merc <- st_transform(toy_temp_mean, 3857)
# cartogram transformation
toy_cartogram <- cartogram_cont(toy_merc, weight = "n", itermax = 5)
# Transform back to original crs 
toy_cartogram <- st_transform(toy_cartogram, st_crs(toy_temp_mean))

# Plot cartogram using ggplot2
ggplot(toy_cartogram) +
  geom_sf(aes(fill = temp_mean), linewidth = 0, alpha = 0.9) +
  theme_minimal() +
  scale_fill_distiller(palette = "YlOrRd", direction= 1) +
  xlab("Longitude") +
  ylab("Latitude") +
  labs(fill = "Temperature") +
  theme(aspect.ratio=0.7)

Code

# only change to data is distribution
toy_cartogram |>
  mutate(temp_dist = dist_normal(temp_mean, temp_se^2)) |>
  ggplot() +
  geom_sf_sample(aes(geometry=county_geometry, 
                     fill=temp_dist), linewidth=0) +
   geom_sf(aes(geometry=county_geometry), fill=NA, colour="white") +
  theme_minimal() +
  scale_fill_distiller(palette = "YlOrRd", direction= 1) +
  xlab("Longitude") +
  ylab("Latitude") +
  labs(fill = "Temperature") +
  theme(aspect.ratio=0.7)

Where to learn more

Wilke, C. Fundamentals of Data Visualization
Healy, K. Data visualization
Kay, M. ggdist: Visualizations of distributions and uncertainty
Wilke, C. O. ggridges
Pedersen, T. ggforce
Hofmann, Follett, Majumder, Cook (2012) Graphical Tests for Power Comparison of Competing Designs
nullabor package
Spiegelhalter, D. (2017), Risk and uncertainty communication
Correll, Moritz, Heer (2018) Value-Suppressing Uncertainty Palettes
Hullman et al (2018) Imagining Replications
Lucchesi, Kuhnert, Wikle Vizumap
Distributional package
ggdibbler package
Kinkeldey, MacEachren, Riveiro, Schiewe (2017) Evaluating the effect of visually represented geodata uncertainty on decision-making
Mason, Cook, Goodwin, Tanaka, VanderPlas (2024) The Noisy Work of Uncertainty Visualisation Research

End of session 2

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

WOMBAT 2025 Tutorial

Session 2 Diving deeper into uncertainty visualisation using examples in spatial data

Introduction to Spatial Visualisation

Why focus on spatial visualisations?

Example: Citizen Scientist Data

We could just plot the data…

Spatial features objects

Can you see the spatial trend?

Estimate the county mean

Can you see the trend now?

Common Map Visualisations

But what if the error is worse?

Spot the difference

Exercise 1

Get comfortable working with spatial features yourself

Approaches to Spatial Uncertainty

Looking at current approaches

Solution 1: add an axis for uncertainty (Bivar)

Questions to think about…

Solution 2: blend the colours together (VSUP)

Questions to think about…

Solution 3: simulate a sample (pixel)

Questions to think about…

Popular R packages

Making a Pixel Map with ggdibbler

A ggdibbler example

distributional and ggdibbler ecosystem

Using distributional

Comparing ggplot to ggdibbler

ggplot code

ggdibbler code

High and low variance comparison with ggdibbler

High and low variance comparison with ggdibbler

Exercise 2

Where to learn more

End of session 2

Session 2
Diving deeper into uncertainty visualisation using examples in spatial data

Making a Pixel Map with `ggdibbler`

A `ggdibbler` example

`distributional` and `ggdibbler` ecosystem

Comparing `ggplot` to `ggdibbler`

`ggplot` code

`ggdibbler` code

High and low variance comparison with `ggdibbler`

High and low variance comparison with `ggdibbler`