Skip to contents

Estimate model uncertainty

Usage

bootstrap_error(
  x,
  n_samples = 50,
  sd_x_ppm = NULL,
  n_replicates = NULL,
  sample_from = "gasdata",
  rep_cols = NULL
)

# S3 method for class 'cfp_altres'
bootstrap_error(
  x,
  n_samples = 50,
  sd_x_ppm = NULL,
  n_replicates = NULL,
  sample_from = "gasdata",
  rep_cols = NULL
)

# S3 method for class 'cfp_dat'
bootstrap_error(
  x,
  n_samples = 50,
  sd_x_ppm = NULL,
  n_replicates = NULL,
  sample_from = "gasdata",
  rep_cols = NULL
)

# S3 method for class 'cfp_fgmod'
bootstrap_error(
  x,
  n_samples = 50,
  sd_x_ppm = NULL,
  n_replicates = NULL,
  sample_from = "gasdata",
  rep_cols = NULL
)

# S3 method for class 'cfp_pfmod'
bootstrap_error(
  x,
  n_samples = 50,
  sd_x_ppm = NULL,
  n_replicates = NULL,
  sample_from = "gasdata",
  rep_cols = NULL
)

make_bootstrap_model(
  x,
  n_samples = 50,
  sd_x_ppm = NULL,
  n_replicates = NULL,
  sample_from = "gasdata",
  rep_cols = NULL
)

# S3 method for class 'cfp_pfmod'
make_bootstrap_model(
  x,
  n_samples = 50,
  sd_x_ppm = NULL,
  n_replicates = NULL,
  sample_from = "gasdata",
  rep_cols = NULL
)

calculate_bootstrap_error(x, y)

# S3 method for class 'cfp_pfmod'
calculate_bootstrap_error(x, y)

Arguments

x

A cfp_pfres model result from a call to pro_flux().

n_samples

The number of samples to take in the bootstrapping.

sd_x_ppm

An optional estimate of the standard deviation of x_ppm. Can be either

  • a single value applied equally to all

  • a data.frame with a column of the same name that maps a value to every observation depth. See depth_structure() for an easy way to create it.

  • be provided as its own column already present in x$gasdata.

n_replicates

The number of replicates to be generated if sd_x_ppm is set.

sample_from

From which dataset to sample the bootstrapping dataset. Can either be 'gasdata' or 'soilphys' or 'both'.

rep_cols

The id_cols that represent repetitions. If removed, the repetitions in soilphys of each profile must match in their structure exactly.

y

The result of the bootstrap model.

Value

x with added columns DELTA_flux and DELTA_prod as an estimate of the error of of the corresponding columns in the same units.

General procedure

bootstrap_error() is mostly a wrapper around two functions that can also be run separately.

In make_bootstrap_model(), for sample_from = "gasdata" the gasdata concentration data is resampled for every depth and profile a total number of n_samples. This is done by randomly sampling the observations at each depth without changing the number of observations but while allowing replacing. If rep_cols are given, these columns are removed from the id_cols and the resulting profiles combined as one.

For sample_from = "soilphys", the soilphys data is combined using the rep_cols as repetitions. Among every remaining profile and depth, one observation across all repetitions is chosen for each of n_samples. sample_from = "both" applies both methods above. Each newly sampled profile is identifiable by the added bootstrap_id column which is also added to id_cols.

After this new model is run again, the bootstap error is calculated in calculate_bootstrap_error(). This is the standard deviation of the production and flux parameters across all bootstrapped model runs and is calculated for each profile and layer of the original model, or for each distinct profile in the new model without rep_cols. These are returned together with the mean values of prod, flux and F0 across all runs in the PROFLUX data.frame and can thereby be extracted by efflux() and production().

Artificial observations in gasdata

If there are not enough observations per depth (e.g.) because there is only one measurement per depth, it is possible to create artificial observations by providing n_replicates and sd_x_ppm. Here, every depth of every profile is first averaged to its mean (redundant if there is only one observation). Then, a random dataset of n_replicates observations is generated that is normally distributed around the mean with a standard deviation (in ppm) of sd_x_ppm. These observations are then resampled as described above. Note that this error should be representative of the sampling error in the field and not the measurement error of the measurement device, which is much lower.

Examples

if (FALSE) { # interactive()
PROFLUX <- pro_flux(ConFluxPro::base_dat)
PROFLUX_BSE <- bootstrap_error(PROFLUX)
efflux(PROFLUX_BSE)

PROFLUX_BSE <- bootstrap_error(PROFLUX, n_replicates = 5, sd_x_ppm = 25)
efflux(PROFLUX_BSE)
}