This vignette introduces the central data classes in
ConFluxPro and shows how to combine all necessary data into a
single cfp_dat()
object. We start by explaining the concept
of a ‘profile’ and how soil data is structured in the package. We then
introduce three essential classes cfp_gasdata()
,
cfp_soilphys()
and cfp_layers_map()
that frame
different data types within ConFluxPro. Finally, we combine
these objects into a single cfp_dat()
object that holds all
information needed to start modeling soil gas fluxes.
Soil data and profiles
What is a soil profile?
A profile in ConFluxPro is a distinct observation of the
vertical profile of a soil. For example, this can mean the concentration
profile of CO2 at a specific time point, or the vertical
distribution of soil moisture. Profiles are identified by their
id_cols
, column names that together uniquely identify each
profile. This can be whatever combination of columns happened to be in
your data, but will typically include a datetime column to identify the
time point or a column that identifies the site if you measured at
multiple locations. Consider the soilphys
dataset included
in the package.
library(dplyr)
data("soilphys", package = "ConFluxPro")
soilphys %>%
relocate(site, Date, gas) %>%
head()
#> site Date gas TPS a b depth upper lower SWC t
#> 1 site_a 2021-01-01 CO2 0.38 1.2 1.5 -80 -60 -100 0.16135277 -0.789903514
#> 2 site_a 2021-02-01 CO2 0.38 1.2 1.5 -80 -60 -100 0.12392743 0.005814771
#> 3 site_a 2021-03-01 CO2 0.38 1.2 1.5 -80 -60 -100 0.12227893 2.059937865
#> 4 site_a 2021-04-01 CO2 0.38 1.2 1.5 -80 -60 -100 0.14464345 4.725217894
#> 5 site_a 2021-05-01 CO2 0.38 1.2 1.5 -80 -60 -100 0.11446588 8.825682240
#> 6 site_a 2021-06-01 CO2 0.38 1.2 1.5 -80 -60 -100 0.09537391 10.377836383
#> p AFPS DSD0 D0 DS c_air
#> 1 1013 0.2186472 0.1226866 1.373780e-05 1.685444e-06 44.73588
#> 2 1013 0.2560726 0.1554984 1.381053e-05 2.147515e-06 44.60556
#> 3 1013 0.2577211 0.1570023 1.399908e-05 2.197889e-06 44.27263
#> 4 1013 0.2353565 0.1370158 1.424543e-05 1.951850e-06 43.84798
#> 5 1013 0.2655341 0.1641957 1.462819e-05 2.401886e-06 43.21035
#> 6 1013 0.2846261 0.1822189 1.477426e-05 2.692149e-06 42.97380
This dataset contains information on different soil physical
parameters. We will come back to that later. For now, notice that there
are the site
and Date
column. All combinations
identify a unique profile, namely twelve dates for two sites
"site_a"
and "site_b"
for a total of 24
profiles. We can create a cfp_profile()
object from this
dataset. The only thing this does is to add the information which
columns identify the profiles to the dataset with the
id_cols
argument. Internally, the number of distinct
profiles is calculated as the unique combinations of the
id_cols
and can be accessed by
n_profiles()
.
soilphys <-
cfp_profile(soilphys, id_cols = c("site", "Date"))
n_profiles(soilphys)
#> [1] 24
The concept of profiles and id_cols
applies to almost
all objects created in ConFluxPro. You can always see what
id_cols
an object has by calling the function
cfp_id_cols()
.
cfp_id_cols(soilphys)
#> [1] "site" "Date"
Soil data structure
Within a soil profile, parameters change with depth. In ConFluxPro we
differentiate between two types of data. Point data have
distinct values at a precise depth
in the soil, such as the
soil gas concentration. For layered data , the soil is
subdivided into layers that are delimited by their upper
and lower
boundary. Layered data must be without gaps or
overlaps, so that the entire profile can be described by distinct
values. depth
, upper
and lower
are in centimeters and a higher value indicates the upward direction.
Both positive and negative values are allowed. This means you could set
the mineral/organic soil boundary to have depth = 0
,
describe the mineral soil with increasingly negative values of
depth
and have the organic layer on top have positive
values.
We will now present the three main data types that are needed to model gas fluxes.
cfp_gasdata()
: Gas concentration data
Soil gas concentration data is stored in a data.frame
.
The data needs to have at least the columns gas
,
depth
and x_ppm
, as well as any
id_cols
that identify the soil profiles. The
gas
is a character()
that identifies which gas
was measured, so for example "CO2"
. depth
is
as explained above and x_ppm
is the mixing ratio in ppm.
Consider the example data that ships with the package:
data("gasdata", package = "ConFluxPro")
head(gasdata)
#> # A tibble: 6 × 5
#> site Date depth x_ppm gas
#> <chr> <date> <dbl> <dbl> <chr>
#> 1 site_a 2021-01-01 5 423. CO2
#> 2 site_a 2021-01-01 0 552. CO2
#> 3 site_a 2021-01-01 0 556. CO2
#> 4 site_a 2021-01-01 0 557. CO2
#> 5 site_a 2021-01-01 -10 716. CO2
#> 6 site_a 2021-01-01 -10 713. CO2
Because this data.frame
already follows the requirements
defined above, we can create a cfp_gasdata()
object right
away by providing the necessary id_cols
. The function
checks, whether the data.frame
meets the criteria and
returns an object of class cfp_gasdata
.
# create working cfp_gasdata object
my_gasdata <- cfp_gasdata(gasdata, id_cols = c("site", "Date"))
#>
#> added 'gas' to id_cols
head(my_gasdata)
#>
#> A cfp_gasdata object
#> id_cols: site Date gas
#> 1 unique profiles
#>
#> site Date depth x_ppm gas
#> 1 site_a 2021-01-01 5 422.7419 CO2
#> 2 site_a 2021-01-01 0 552.1412 CO2
#> 3 site_a 2021-01-01 0 555.8525 CO2
#> 4 site_a 2021-01-01 0 556.9315 CO2
#> 5 site_a 2021-01-01 -10 716.4256 CO2
#> 6 site_a 2021-01-01 -10 713.3633 CO2
# won't work, because depth is missing
gasdata_broken <- gasdata %>% select(!depth)
cfp_gasdata(gasdata_broken, id_cols = c("site", "Date"))
#>
#> added 'gas' to id_cols
#> Error in validate_cfp_gasdata(x): data.frame lacks obligatory columns
The data is in a long format, following the tidy data
principles, where each observation is in a new row. If your data is
not in this format, you can use the tidyr::pivot_longer()
and other functions in the tidyr
package. Notice that
cfp_gasdata()
automatically added the gas
column to the id_cols
. gas
is always a
required column in ConFluxPro, so that many functions will automatically
add it to the id_cols
if its present and will fail if its
missing.
cfp_soilphys()
: Soil physical data
Create a cfp_soilphys()
object
Soil physical data includes everything else needed to describe the
gas transport characteristics of the soil. This data is split into
homogeneous layers with constant values for each layer and profile. At
the minimum, cfp_soilphys()
requires two parameters:
DS
is the gas-specific apparent diffusion coefficient of
the soil in m2 s-1 and c_air
is the
number density of the soil air in mol m-3. If these
parameters are provided, a valid cfp_soilphys
object can be
created. This function ensures that each profile is correctly layered
without overlaps or gaps and that the required columns are present.
data("soilphys", package = "ConFluxPro")
my_soilphys <- cfp_soilphys(soilphys, id_cols = c("site", "Date"))
#>
#> added 'gas' to id_cols
Derive parameters with complete_soilphys()
Often DS
and c_air
are derived from other
parameters that are easier to measure. This then requires a bit more
work to get a working cfp_soilphys()
object. Typically, you
will need the following information:
Measured parameters:
-
TPS
: total porosity as a volume fraction. -
SWC
: soil water content as a volume fraction. -
t
: air temperature in °C. -
p
: air pressure in hPa.
Derived parameters:
c_air
: air number density, derived fromt
andp
with ideal gas law.-
AFPS
: air-filled pore space, defined asAFPS = TPS - SWC
. -
DSD0
: relative diffusivity of the soil, unit-less. Derived fromAFPS
with transfer function. -
D0
: gas-specific free-air diffusion coefficient in m2 s-1. Derived fromt
andp
with empiric function. DS
: gas-specific apparent diffusion coefficient in m2 s-1, derived asDS = D0 * DSD0
.
These relationships are invoked within the
complete_soilphys()
function. Consider the provided example
data: It already has every parameter we need. Now let’s remove the
derived parameters and run complete_soilphys()
again. We
have to provide a formula for the calculation of the relative
diffusivity DSD0
. In our case, we have fitting parameters
a
and b
from a Currie-style model that
calculates as DSD0 = a * AFPS ^ b
. We provide this formula
as a character()
to the function:
soilphys_measured_only <-
soilphys %>%
select(!all_of(c("AFPS", "DSD0", "D0", "DS", "c_air")))
head(soilphys_measured_only)
#> site TPS a b depth upper lower Date SWC t p
#> 1 site_a 0.38 1.2 1.5 -80 -60 -100 2021-01-01 0.16135277 -0.789903514 1013
#> 2 site_a 0.38 1.2 1.5 -80 -60 -100 2021-02-01 0.12392743 0.005814771 1013
#> 3 site_a 0.38 1.2 1.5 -80 -60 -100 2021-03-01 0.12227893 2.059937865 1013
#> 4 site_a 0.38 1.2 1.5 -80 -60 -100 2021-04-01 0.14464345 4.725217894 1013
#> 5 site_a 0.38 1.2 1.5 -80 -60 -100 2021-05-01 0.11446588 8.825682240 1013
#> 6 site_a 0.38 1.2 1.5 -80 -60 -100 2021-06-01 0.09537391 10.377836383 1013
#> gas
#> 1 CO2
#> 2 CO2
#> 3 CO2
#> 4 CO2
#> 5 CO2
#> 6 CO2
my_soilphys_completed <-
complete_soilphys(soilphys_measured_only, DSD0_formula = "a*AFPS^b")
#> The following columns were added: AFPS DSD0 D0 DS c_air
#> The following columns were overwritten:
head(my_soilphys_completed)
#> site TPS a b depth upper lower Date SWC t p
#> 1 site_a 0.38 1.2 1.5 -80 -60 -100 2021-01-01 0.16135277 -0.789903514 1013
#> 2 site_a 0.38 1.2 1.5 -80 -60 -100 2021-02-01 0.12392743 0.005814771 1013
#> 3 site_a 0.38 1.2 1.5 -80 -60 -100 2021-03-01 0.12227893 2.059937865 1013
#> 4 site_a 0.38 1.2 1.5 -80 -60 -100 2021-04-01 0.14464345 4.725217894 1013
#> 5 site_a 0.38 1.2 1.5 -80 -60 -100 2021-05-01 0.11446588 8.825682240 1013
#> 6 site_a 0.38 1.2 1.5 -80 -60 -100 2021-06-01 0.09537391 10.377836383 1013
#> gas AFPS DSD0 D0 DS c_air
#> 1 CO2 0.2186472 0.1226866 1.373780e-05 1.685444e-06 44.73588
#> 2 CO2 0.2560726 0.1554984 1.381053e-05 2.147515e-06 44.60556
#> 3 CO2 0.2577211 0.1570023 1.399908e-05 2.197889e-06 44.27263
#> 4 CO2 0.2353565 0.1370158 1.424543e-05 1.951850e-06 43.84798
#> 5 CO2 0.2655341 0.1641957 1.462819e-05 2.401886e-06 43.21035
#> 6 CO2 0.2846261 0.1822189 1.477426e-05 2.692149e-06 42.97380
Notice the gas
column. This is essential because
different gases have specific diffusion coefficients. We can also remove
the gas
column and provide a list of gases to
complete_soilphys()
:
soilphys_measured_only %>%
select(!gas) %>%
complete_soilphys(DSD0_formula = "a*AFPS^b", gases = c("CO2", "CH4")) %>%
head()
#> The following columns were added: AFPS DSD0 D0 DS c_air
#> The following columns were overwritten:
#> site TPS a b depth upper lower Date SWC t p
#> 1 site_a 0.38 1.2 1.5 -80 -60 -100 2021-01-01 0.1613528 -0.789903514 1013
#> 2 site_a 0.38 1.2 1.5 -80 -60 -100 2021-01-01 0.1613528 -0.789903514 1013
#> 3 site_a 0.38 1.2 1.5 -80 -60 -100 2021-02-01 0.1239274 0.005814771 1013
#> 4 site_a 0.38 1.2 1.5 -80 -60 -100 2021-02-01 0.1239274 0.005814771 1013
#> 5 site_a 0.38 1.2 1.5 -80 -60 -100 2021-03-01 0.1222789 2.059937865 1013
#> 6 site_a 0.38 1.2 1.5 -80 -60 -100 2021-03-01 0.1222789 2.059937865 1013
#> AFPS DSD0 gas D0 DS c_air
#> 1 0.2186472 0.1226866 CO2 1.373780e-05 1.685444e-06 44.73588
#> 2 0.2186472 0.1226866 CH4 1.941795e-05 2.382322e-06 44.73588
#> 3 0.2560726 0.1554984 CO2 1.381053e-05 2.147515e-06 44.60556
#> 4 0.2560726 0.1554984 CH4 1.952075e-05 3.035445e-06 44.60556
#> 5 0.2577211 0.1570023 CO2 1.399908e-05 2.197889e-06 44.27263
#> 6 0.2577211 0.1570023 CH4 1.978726e-05 3.106646e-06 44.27263
This replicates each row of the data.frame
and
calculates distinct values for each gas.
From measurements to layered data:
discretize_depth()
As seen above, a cfp_soilphys
object often requires the
combination of many different parameters. These parameters are typically
measured in very different ways and may be present in different data
formats. This presents a hurdle, as cfp_soilphys()
requires
these datasets to all be present in the same layered format. The
function discretize_depth()
helps with this by providing a
unified way to interpolate different observations to the same layered
structure. The basic idea is that you provide a vector of depths that
represent the layer boundaries. For
layers, you would need
depths. Let’s say we measured a temperature profile at three depths: 0,
-30 and -100 cm:
temperature_profile <-
data.frame(depth = c(0, -30, -100),
t = c(25, 20, 16))
If we want to map this to 10 cm-thick layers, we can simply define
the layer boundaries and use discretize_depth()
to linearly
interpolate the observations:
boundaries <- seq(0, -100, by = -10)
discretize_depth(temperature_profile,
param = "t",
method = "linear",
depth_target = boundaries)
#>
#> A cfp_layered_profile object
#> id_cols:
#> 1 unique profiles
#>
#> t depth upper lower
#> 1 16.28571 -95 -90 -100
#> 2 16.85714 -85 -80 -90
#> 3 17.42857 -75 -70 -80
#> 4 18.00000 -65 -60 -70
#> 5 18.57143 -55 -50 -60
#> 6 19.14286 -45 -40 -50
#> 7 19.71429 -35 -30 -40
#> 8 20.83333 -25 -20 -30
#> 9 22.50000 -15 -10 -20
#> 10 24.16667 -5 0 -10
The function can handle profile data and treats every profile
distinctly. So if we adapt our example data to two profiles identified
by a date
column, the interpolation works in the same way,
if we provide the necessary id_cols
.
temperature_profile_dates <-
data.frame(date_id = rep(c("date_1", "date_2"), each = 3),
depth = rep(c(0, -30, -100), times = 2),
t = c(25, 20, 16, 15, 10, 8)) %>%
cfp_profile(id_cols = c("date_id"))
discretize_depth(temperature_profile_dates,
param = "t",
method = "linear",
depth_target = boundaries)
#>
#> A cfp_layered_profile object
#> id_cols: date_id
#> 2 unique profiles
#>
#> date_id t depth upper lower
#> 1 date_1 16.285714 -95 -90 -100
#> 2 date_1 16.857143 -85 -80 -90
#> 3 date_1 17.428571 -75 -70 -80
#> 4 date_1 18.000000 -65 -60 -70
#> 5 date_1 18.571429 -55 -50 -60
#> 6 date_1 19.142857 -45 -40 -50
#> 7 date_1 19.714286 -35 -30 -40
#> 8 date_1 20.833333 -25 -20 -30
#> 9 date_1 22.500000 -15 -10 -20
#> 10 date_1 24.166667 -5 0 -10
#> 11 date_2 8.142857 -95 -90 -100
#> 12 date_2 8.428571 -85 -80 -90
#> 13 date_2 8.714286 -75 -70 -80
#> 14 date_2 9.000000 -65 -60 -70
#> 15 date_2 9.285714 -55 -50 -60
#> 16 date_2 9.571429 -45 -40 -50
#> 17 date_2 9.857143 -35 -30 -40
#> 18 date_2 10.833333 -25 -20 -30
#> 19 date_2 12.500000 -15 -10 -20
#> 20 date_2 14.166667 -5 0 -10
We can use different interpolation methods that may be more
applicable to other parameters. See the function documentation
?discretize_depth
for details.
cfp_layers_map()
: Create the model layers
The final aspect of a ConFluxPro model is the “layers_map”. This is a
layered data.frame
that defines for which layers we want to
calculate flux rates. We provide a dataset that matches the other
example datasets:
data("layers_map", package = "ConFluxPro")
layers_map
#> site upper lower
#> 1 site_a 5 0
#> 2 site_b 7 0
#> 3 site_a 0 -100
#> 4 site_b 0 -100
As you can see, we divide each of the site into two layers, one for
the organic layer with depth > 0
and one for the mineral
soil below. Notice, that the upper boundary is different between the
sites. This is because they have a different organic layer, that is also
reflected in the other datasets:
my_gasdata %>%
group_by(site) %>%
summarise(max_depth = max(depth))
#> # A tibble: 2 × 2
#> site max_depth
#> <chr> <dbl>
#> 1 site_a 5
#> 2 site_b 7
my_soilphys %>%
group_by(site) %>%
summarise(max_upper = max(upper))
#> # A tibble: 2 × 2
#> site max_upper
#> <chr> <dbl>
#> 1 site_a 5
#> 2 site_b 7
Notice, that the extends of the layers_map should match your measured data if you want to include it in the models. You cannot extend the layers_map beyond your measurements.
How many layers to define in the cfp_layers_map()
is up
to the user and depends on the research question and the resolution of
the recorded data. While it may be tempting to define a layer between
every observation depth in cfp_gasdata()
, this may result
in artefacts in the modeled fluxes. Combining these layers may therefore
yield more robust results.
To create a valid cfp_layers_map
, we still need to add
some information. These are only needed for a pro_flux()
inverse model and are set to NA
if missing.
-
gas
: The gas as acharacter()
. -
lowlim
: The lower limit of production allowed in that layer in µmol m-3 s-1 -
highlim
: The upper limit of production allowed in that layer in µmol m-3 s-1 -
layer_couple
: Penalty factor, defaults to 0. See?cfp_layers_map()
We can either do so in the function call or provide distinct values
ourselves. The highlim
and lowlim
values can
help to explicitly exclude unrealisitic processes in the optimization in
pro_flux()
. For example, we can set lowlim
to
0, if we want to prevent consumption of CO2. Again, these
values are only relevant in the pro_flux()
model. In our
case, we want to add the limits for the gas "CO2"
:
my_layers_map <-
cfp_layers_map(
layers_map,
id_cols = "site",
gas = "CO2",
lowlim = 0,
highlim = 1000
)
#>
#> added 'gas' to id_cols
my_layers_map
#>
#> A cfp_layers_map object
#> id_cols: site gas
#> 2 unique profiles
#>
#> site upper lower gas lowlim highlim layer_couple layer pmap
#> 1 site_a 0 -100 CO2 0 1000 0 1 1
#> 2 site_b 0 -100 CO2 0 1000 0 1 1
#> 3 site_a 5 0 CO2 0 1000 0 2 2
#> 4 site_b 7 0 CO2 0 1000 0 2 2
Note that we only added "site"
as id_cols
,
because we don’t need the model frame to change over time and therefore
use the same mapping for every Date
. You can always use a
subset of the id_cols
in cfp_soilphys()
in
your cfp_layers_map()
. At the minimum, you need to provide
gas
within your id_cols
.
With this complete, we now have everything to create a unified dataset for a ConFluxPro model.
cfp_dat()
: Combining the datasets
As a last step, we combine the three datasets into one unified frame
with cfp_dat()
.
my_dat <- cfp_dat(my_gasdata, my_soilphys, my_layers_map)
#>
#> validating datasets
#> id_cols: site, Date, gas
#> 24 unique profiles
my_dat
#>
#> A cfp_dat object to be used as input in ConFluxPro models.
#> id_cols: site Date gas
#> number of profiles: 24
#> number of groups: 2
This function does multiple things. (I) It ensures that the upper and
lower boundaries between the datasets match. The highest
upper
of soilphys
and gasdata
should match the highest depth
in gasdata
,
with the lower
boundary correspondingly. (II) It matches
each profile to another and makes a list of all profiles and the
corresponding datasets. ConFluxPro does not require the
id_cols
between the different datasets to match exactly, as
long as the different profiles can be mapped to another. This makes it
possible, for example, to map the same soilphys
profiles to
multiple distinct gasdata
profiles, without having to
replicate the soilphys
dataset manually. We already make
use of this by providing only one layers_map
per
site
in our example that is not replicated for the
different Date
of the the other datasets. (III) For each
profile, it separates the layers of soilphys
, at each depth
where there is an observation in gasdata
. This does not
change the calculation but is needed internally for the
pro_flux()
model. (IV) It adds the column pmap
to the soilphys
dataset that indicates which layer, as
defined in cfp_layers_map()
corresponds to that layer in
soilphys
.
At its core, a cfp_dat
object is simply a list. We can
access the different elements, including the generated
profiles
data.frame
like any element in a list
with the $
operator.
my_dat$profiles %>% head()
#>
#> A cfp_profile object
#> id_cols: prof_id
#> 6 unique profiles
#>
#> site Date gas gd_id sp_id group_id prof_id
#> 1 site_a 2021-01-01 CO2 1 1 1 1
#> 2 site_b 2021-01-01 CO2 13 13 2 2
#> 3 site_a 2021-02-01 CO2 2 2 1 3
#> 4 site_b 2021-02-01 CO2 14 14 2 4
#> 5 site_a 2021-03-01 CO2 3 3 1 5
#> 6 site_b 2021-03-01 CO2 15 15 2 6
Within this data.frame
we have identifiers for each
profile of the different datasets (gd_id
,
sp_id
, group_id
) and an identifier of each
profile of the combined dataset prof_id
.
We now are ready model.