Title: | Isotonic Distributional Regression (IDR) |
---|---|
Description: | Distributional regression under stochastic order restrictions for numeric and binary response variables and partially ordered covariates. See Henzi, Ziegel, Gneiting (2021) <doi:10.1111/rssb.12450>. |
Authors: | Alexander Henzi [aut, cre] |
Maintainer: | Alexander Henzi <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.2.0 |
Built: | 2025-02-17 02:43:49 UTC |
Source: | https://github.com/alexanderhenzi/isodistrreg |
Isotonic distributional Regression (IDR) is a nonparametric method to estimate conditional distributions under monotonicity constraints.
Read the arXiv preprint ‘Isotonic Distributional Regression’ on
https://arxiv.org/abs/1909.03725 (published article:
https://doi.org/10.1111/rssb.12450) or by calling
browseVignettes(package = "isodistrreg")
.
To make probabilistic forecasts with IDR,
call idr(y = y, X = X, ...)
, where y
is the
response variable (e.g. weather variable observations) and X
is a
data.frame
of covariates (e.g. ensemble forecasts).
use predict(fit, data)
, where fit
is the model fit computed with idr
and data
is the data based
on which you want to make predictions.
Try idrbag
for IDR with (su)bagging.
The following pre-defined functions are available to evaluate IDR predictions:
cdf
and qpred
to compute the cumulative
distribution function (CDF) and quantile function of IDR predictions.
bscore
and qscore
to calculate Brier scores
for probability forecasts for threshold exceedance (e.g. probability of
precipitation) and quantile scores (e.g. mean absolute error of median
forecast.)
crps
to compute the continuous ranked probability score
(CRPS).
pit
to compute the probability integral transform (PIT).
plot
to plot IDR predictive CDFs.
Use the dataset rain
to test IDR.
Henzi, A., Ziegel, J.F. and Gneiting, T. (2021), Isotonic distributional regression. J R Stat Soc Series B, 83: 963-993. https://doi.org/10.1111/rssb.12450
## A usage example: # Prepare dataset: Half of the data as training dataset, other half for validation. # Consult the R documentation (?rain) for details about the dataset. data(rain) trainingData <- subset(rain, date <= "2012-01-09") validationData <- subset(rain, date > "2012-01-09") # Variable selection: use HRES and the perturbed forecasts P1, ..., P50 varNames <- c("HRES", paste0("P", 1:50)) # Partial orders on variable groups: Usual order of numbers on HRES (group '1') and # increasing convex order on the remaining variables (group '2'). groups <- setNames(c(1, rep(2, 50)), varNames) orders <- c("comp" = 1, "icx" = 2) # Fit IDR to training dataset. fit <- idr( y = trainingData[["obs"]], X = trainingData[, varNames], groups = groups, orders = orders ) # Make prediction for the first day in the validation data: firstPrediction <- predict(fit, data = validationData[1, varNames]) plot(firstPrediction) # Use cdf() and qpred() to make probability and quantile forecasts: ## What is the probability of precipitation? 1 - cdf(firstPrediction, thresholds = 0) ## What are the predicted 10%, 50% and 90% quantiles for precipitation? qpred(firstPrediction, quantiles = c(0.1, 0.5, 0.9)) # Make predictions for the complete verification dataset and compare IDR calibrated # forecasts to the raw ensemble (ENS): predictions <- predict(fit, data = validationData[, varNames]) y <- validationData[["obs"]] ## Continuous ranked probability score (CRPS): CRPS <- cbind( "ens" = crps(validationData[, varNames], y), "IDR" = crps(predictions, y) ) apply(CRPS, 2, mean) ## Brier score for probability of precipitation: BS <- cbind( "ens" = bscore(validationData[, varNames], thresholds = 0, y), "IDR" = bscore(predictions, thresholds = 0, y) ) apply(BS, 2, mean) ## Quantile score of forecast for 90% quantile: QS90 <- cbind( "ens" = qscore(validationData[, varNames], quantiles = 0.9, y), "IDR" = qscore(predictions, quantiles = 0.9, y) ) apply(QS90, 2, mean) ## Check calibration using (randomized) PIT histograms: pitEns <- pit(validationData[, varNames], y) pitIdr <- pit(predictions, y) hist(pitEns, main = "PIT of raw ensemble forecasts", freq = FALSE) hist(pitIdr, main = "PIT of IDR calibrated forecasts", freq = FALSE)
## A usage example: # Prepare dataset: Half of the data as training dataset, other half for validation. # Consult the R documentation (?rain) for details about the dataset. data(rain) trainingData <- subset(rain, date <= "2012-01-09") validationData <- subset(rain, date > "2012-01-09") # Variable selection: use HRES and the perturbed forecasts P1, ..., P50 varNames <- c("HRES", paste0("P", 1:50)) # Partial orders on variable groups: Usual order of numbers on HRES (group '1') and # increasing convex order on the remaining variables (group '2'). groups <- setNames(c(1, rep(2, 50)), varNames) orders <- c("comp" = 1, "icx" = 2) # Fit IDR to training dataset. fit <- idr( y = trainingData[["obs"]], X = trainingData[, varNames], groups = groups, orders = orders ) # Make prediction for the first day in the validation data: firstPrediction <- predict(fit, data = validationData[1, varNames]) plot(firstPrediction) # Use cdf() and qpred() to make probability and quantile forecasts: ## What is the probability of precipitation? 1 - cdf(firstPrediction, thresholds = 0) ## What are the predicted 10%, 50% and 90% quantiles for precipitation? qpred(firstPrediction, quantiles = c(0.1, 0.5, 0.9)) # Make predictions for the complete verification dataset and compare IDR calibrated # forecasts to the raw ensemble (ENS): predictions <- predict(fit, data = validationData[, varNames]) y <- validationData[["obs"]] ## Continuous ranked probability score (CRPS): CRPS <- cbind( "ens" = crps(validationData[, varNames], y), "IDR" = crps(predictions, y) ) apply(CRPS, 2, mean) ## Brier score for probability of precipitation: BS <- cbind( "ens" = bscore(validationData[, varNames], thresholds = 0, y), "IDR" = bscore(predictions, thresholds = 0, y) ) apply(BS, 2, mean) ## Quantile score of forecast for 90% quantile: QS90 <- cbind( "ens" = qscore(validationData[, varNames], quantiles = 0.9, y), "IDR" = qscore(predictions, quantiles = 0.9, y) ) apply(QS90, 2, mean) ## Check calibration using (randomized) PIT histograms: pitEns <- pit(validationData[, varNames], y) pitIdr <- pit(predictions, y) hist(pitEns, main = "PIT of raw ensemble forecasts", freq = FALSE) hist(pitIdr, main = "PIT of IDR calibrated forecasts", freq = FALSE)
Computes the Brier score of forecast probabilities for exceeding given thresholds.
bscore(predictions, thresholds, y)
bscore(predictions, thresholds, y)
predictions |
either an object of class |
thresholds |
numeric vector of thresholds at which the CDF will be evaluated. |
y |
a numeric vector of obervations of the same length as the number of
predictions, or of length 1. In the latter case, |
The Brier score for the event of exceeding a given threshold z is defined as
where y is the
observation and P(y > z) the forecast probability for exceeding the
threshold z
.
A matrix of the Brier scores for the desired thresholds, one column per threshold.
Gneiting, T. and Raftery, A. E. (2007), 'Strictly proper scoring rules, prediction, and estimation', Journal of the American Statistical Association 102(477), 359-378
data("rain") ## Postprocess HRES forecast using data of 3 years X <- rain[1:(3 * 365), "HRES", drop = FALSE] y <- rain[1:(3 * 365), "obs"] fit <- idr(y = y, X = X) ## Compute Brier score for postprocessed probability of precipitation ## forecast using data of the next 2 years (out-of-sample predictions) data <- rain[(3 * 365 + 1):(5 * 365), "HRES", drop = FALSE] obs <- rain[(3 * 365 + 1):(5 * 365), "obs"] predictions <- predict(fit, data = data) score <- bscore(predictions, thresholds = 0, y = obs) mean(score)
data("rain") ## Postprocess HRES forecast using data of 3 years X <- rain[1:(3 * 365), "HRES", drop = FALSE] y <- rain[1:(3 * 365), "obs"] fit <- idr(y = y, X = X) ## Compute Brier score for postprocessed probability of precipitation ## forecast using data of the next 2 years (out-of-sample predictions) data <- rain[(3 * 365 + 1):(5 * 365), "HRES", drop = FALSE] obs <- rain[(3 * 365 + 1):(5 * 365), "obs"] predictions <- predict(fit, data = data) score <- bscore(predictions, thresholds = 0, y = obs) mean(score)
Evaluate the the cumulative distribution function (CDF) of IDR predictions or
of unprocessed forecasts in a data.frame
.
cdf(predictions, thresholds) ## S3 method for class 'idr' cdf(predictions, thresholds) ## S3 method for class 'data.frame' cdf(predictions, thresholds)
cdf(predictions, thresholds) ## S3 method for class 'idr' cdf(predictions, thresholds) ## S3 method for class 'data.frame' cdf(predictions, thresholds)
predictions |
either an object of class |
thresholds |
numeric vector of thresholds at which the CDF will be evaluated. |
The CDFs are considered as piecewise constant stepfunctions: If x
are
the points where the IDR fitted CDF (or the empirical distribution of the
forecasts) has jumps and p
the corresponding CDF values, then for
x[i] <= x < x[i + 1]
, the CDF at x
is p[i]
.
A matrix of probabilities giving the evaluated CDFs at the given thresholds, one column for each threshold.
data("rain") ## Postprocess HRES forecast using data of 3 years X <- rain[1:(3 * 365), "HRES", drop = FALSE] y <- rain[1:(3 * 365), "obs"] fit <- idr(y = y, X = X) ## Compute probability of precipitation given that the HRES forecast is ## 0 mm, 0.5 mm or 1 mm predictions <- predict(fit, data = data.frame(HRES = c(0, 0.5, 1))) 1 - cdf(predictions, thresholds = 0)
data("rain") ## Postprocess HRES forecast using data of 3 years X <- rain[1:(3 * 365), "HRES", drop = FALSE] y <- rain[1:(3 * 365), "obs"] fit <- idr(y = y, X = X) ## Compute probability of precipitation given that the HRES forecast is ## 0 mm, 0.5 mm or 1 mm predictions <- predict(fit, data = data.frame(HRES = c(0, 0.5, 1))) 1 - cdf(predictions, thresholds = 0)
Computes the CRPS of IDR or raw forecasts.
crps(predictions, y) ## S3 method for class 'idr' crps(predictions, y) ## S3 method for class 'data.frame' crps(predictions, y)
crps(predictions, y) ## S3 method for class 'idr' crps(predictions, y) ## S3 method for class 'data.frame' crps(predictions, y)
predictions |
either an object of class |
y |
a numeric vector of obervations of the same length as the number of
predictions, or of length 1. In the latter case, |
This function uses adapted code taken from the function crps_edf
of
the scoringRules package.
A vector of CRPS values.
Jordan A., Krueger F., Lerch S. (2018). "Evaluating Probabilistic Forecasts with scoringRules." Journal of Statistical Software. Forthcoming.
Gneiting, T. and Raftery, A. E. (2007), 'Strictly proper scoring rules, prediction, and estimation', Journal of the American Statistical Association 102(477), 359-378
data("rain") ## Postprocess HRES forecast using data of 3 years X <- rain[1:(3 * 365), "HRES", drop = FALSE] y <- rain[1:(3 * 365), "obs"] fit <- idr(y = y, X = X) ## Compute CRPS of postprocessed HRES forecast using data of the next 2 years ## (out-of-sample predictions) data <- rain[(3 * 365 + 1):(5 * 365), "HRES", drop = FALSE] obs <- rain[(3 * 365 + 1):(5 * 365), "obs"] predictions <- predict(fit, data = data) idrCrps <- crps(predictions, y = obs) ## Compare this to CRPS of the raw ensemble of all forecasts (high resolution, ## control and 50 perturbed ensemble forecasts) rawData <- rain[(3 * 365 + 1):(5 * 365), c("HRES", "CTR", paste0("P", 1:50))] rawCrps <- crps(rawData, y = obs) c("idr_HRES" = mean(idrCrps), "raw_all" = mean(rawCrps))
data("rain") ## Postprocess HRES forecast using data of 3 years X <- rain[1:(3 * 365), "HRES", drop = FALSE] y <- rain[1:(3 * 365), "obs"] fit <- idr(y = y, X = X) ## Compute CRPS of postprocessed HRES forecast using data of the next 2 years ## (out-of-sample predictions) data <- rain[(3 * 365 + 1):(5 * 365), "HRES", drop = FALSE] obs <- rain[(3 * 365 + 1):(5 * 365), "obs"] predictions <- predict(fit, data = data) idrCrps <- crps(predictions, y = obs) ## Compare this to CRPS of the raw ensemble of all forecasts (high resolution, ## control and 50 perturbed ensemble forecasts) rawData <- rain[(3 * 365 + 1):(5 * 365), c("HRES", "CTR", paste0("P", 1:50))] rawCrps <- crps(rawData, y = obs) c("idr_HRES" = mean(idrCrps), "raw_all" = mean(rawCrps))
Fits distributional index model with user-specified index function to training dataset. See the examples at the bottom to learn how to specify a distributional single index model.
dindexm( formula, indexfit, data, response, pars = osqpSettings(verbose = FALSE, eps_abs = 1e-05, eps_rel = 1e-05, max_iter = 10000L), progress = TRUE, ... )
dindexm( formula, indexfit, data, response, pars = osqpSettings(verbose = FALSE, eps_abs = 1e-05, eps_rel = 1e-05, max_iter = 10000L), progress = TRUE, ... )
formula |
object of class |
indexfit |
function that fits the index model to training data. Should
accept arguments |
data |
|
response |
name of the response variable in |
pars |
parameters for quadratic programming optimization (only relevant
for multivariate index functions), set using
|
progress |
display progressbar for fitting idr? |
... |
further arguments passed to |
This function fits a distributional index model (DIM) to training data. The
DIM assumes that the response is more likely to attain higher values when the
values of the index function increases. The index function can be
estimated by parametric methods like lm
or
glm
or also nonparametrically.
The formal mathematical assumption of the DIM is that the conditional CDFs
at each fixed threshold z decreases, as
g(x) increases. Here
y
denotes the response, x
, X
are the covariates in data
and g
is the index function
estimated by indexfit
.
Estimation is performed in two steps: indexfit
is applied to
data
to estimate the function g
. With this estimate,
idr
is applied with the pseudo-covariates g(x)
and
response y
.
Object of class dindexm
: A list containing the index model (first
component) and the IDR fit on the pseudo-data with the index as covariate
(second component).
Alexander Henzi, Gian-Reto Kleger & Johanna F. Ziegel (2021) Distributional (Single) Index Models, Journal of the American Statistical Association, DOI: <10.1080/01621459.2021.1938582>
idr
for more information on IDR,
predict.dindexfit
for (out-of-sample) predictions based on a
model with with dindexm
.
n <- 1000 X <- data.frame(x1 = rnorm(n), x2 = rnorm(n), x3 = rnorm(n)) y <- rnorm(n, 1 - X[, 1] + X[, 2]^2 / 3 - (1 - X[, 3]) * (1 + X[, 3]) / 2) data <- cbind(y = y, as.data.frame(X)) ## data for out-of-sample prediction newX <- data.frame(x1 = rnorm(10), x2 = rnorm(10), x3 = rnorm(10)) ## linear regression model for index model <- dindexm( formula = y ~ poly(x1, degree = 2) + poly(x2, degree = 2) + poly(x3, degree = 2), indexfit = lm, response = "y", data = data ) pred <- predict(model, data = newX) ## plot plot(pred, 1, main = "LM based DIM") grd <- pred[[1]]$points trueCdf <- pnorm( grd, 1 - newX[1, 1] + newX[1, 2]^2 / 3 - (1 - newX[1, 3]) * (1 + newX[1, 3]) / 2 ) points(grd, trueCdf, type = "l", col = 2)
n <- 1000 X <- data.frame(x1 = rnorm(n), x2 = rnorm(n), x3 = rnorm(n)) y <- rnorm(n, 1 - X[, 1] + X[, 2]^2 / 3 - (1 - X[, 3]) * (1 + X[, 3]) / 2) data <- cbind(y = y, as.data.frame(X)) ## data for out-of-sample prediction newX <- data.frame(x1 = rnorm(10), x2 = rnorm(10), x3 = rnorm(10)) ## linear regression model for index model <- dindexm( formula = y ~ poly(x1, degree = 2) + poly(x2, degree = 2) + poly(x3, degree = 2), indexfit = lm, response = "y", data = data ) pred <- predict(model, data = newX) ## plot plot(pred, 1, main = "LM based DIM") grd <- pred[[1]]$points trueCdf <- pnorm( grd, 1 - newX[1, 1] + newX[1, 2]^2 / 3 - (1 - newX[1, 3]) * (1 + newX[1, 3]) / 2 ) points(grd, trueCdf, type = "l", col = 2)
Fits isotonic distributional regression (IDR) to a training dataset.
idr(y, X, weights = NULL, groups = setNames(rep(1, ncol(X)), colnames(X)), orders = c("comp" = 1), stoch = "sd", pars = osqpSettings(verbose = FALSE, eps_abs = 1e-5, eps_rel = 1e-5, max_iter = 10000L), progress = TRUE)
idr(y, X, weights = NULL, groups = setNames(rep(1, ncol(X)), colnames(X)), orders = c("comp" = 1), stoch = "sd", pars = osqpSettings(verbose = FALSE, eps_abs = 1e-5, eps_rel = 1e-5, max_iter = 10000L), progress = TRUE)
y |
numeric vector (the response variable). |
X |
data frame of numeric or ordered factor variables (the regression covariates). |
weights |
vector of positive weights (same length as y). Default is all weights equal to one. |
groups |
named vector of length |
orders |
named vector giving for each group in |
stoch |
stochastic order constraint used for estimation. Default is
|
pars |
parameters for quadratic programming optimization (only relevant
if |
progress |
display progressbar ( |
This function computes the isotonic distributional regression (IDR)
of a response y on on one or more covariates X. IDR estimates
the cumulative distribution function (CDF) of y conditional on
X by monotone regression, assuming that y is more likely to
take higher values, as X increases. Formally, IDR assumes that the
conditional CDF at each fixed threshold z
decreases, as x increases, or equivalently, that the exceedance
probabilities for any threshold
z
increase
with x.
The conditional CDFs are estimated at each threshold in unique(y)
.
This is the set where the CDFs may have jumps. If X
contains more
than one variable, the CDFs are estimated by calling
solve_osqp
from the package osqp
length(unique(y))
times. This might take a while if the training
dataset is large.
Use the argument groups
to group exchangeable covariates.
Exchangeable covariates are indistinguishable except from the order in
which they are labelled (e.g. ensemble weather forecasts, repeated
measurements under the same measurement conditions).
The following orders are available to perform the monotone regression in IDR:
Componentwise order ("comp"
): A covariate
vector x1
is greater than x2
if x1[i] >= x2[i]
holds
for all components i
. This is the standard order used in
multivariate monotone regression and should not be used for
exchangeable variables (e.g. perturbed ensemble forecasts).
Stochastic dominance ("sd"
): x1
is greater than x2
in
the stochastic order, if the (empirical) distribution of the elements of
x1
is greater than the distribution of the elements of x2
(in
first order) stochastic dominance. The "sd"
order is invariant under
permutations of the grouped variables and therefore suitable for
exchangeable covariables.
Increasing convex order ("icx"
):
The "icx"
order can be used for groups of exchangeable variables. It
should be used if the variables have increasing variability, when their
mean increases (e.g. precipitation forecasts or other variables with
right-skewed distributions). More precisely, "icx"
uses the
increasing convex stochastic order on the empirical distributions of the
grouped variables.
An object of class "idrfit"
containing the following
components:
X |
data frame of all distinct covariate combinations used for the fit. |
y |
list of all observed responses in the training data for
given covariate combinations in |
cdf |
matrix containing the estimated CDFs, one CDF per row,
evaluated at |
weights |
aggregated weights of the observations. |
thresholds |
the thresholds at which the CDFs in |
groups , orders
|
the groups and orders used for estimation. |
diagnostic |
list giving a bound on the precision of the CDF
estimation (the maximal downwards-step in the CDF that has been detected)
and the fraction of CDF estimations that were stopped at the iteration
limit |
indices |
the row indices of the covariates in |
constraints |
(in multivariate IDR, |
The function idr
is only intended for fitting IDR model for a
training dataset and storing the results for further processing, but not
for prediction or evaluation, which is done using the output of
predict.idrfit
.
Code for the Pool-Adjacent Violators Algorithm (PAVA) is adapted from R code by Lutz Duembgen (available on https://www.imsv.unibe.ch/about_us/files/lutz_duembgen/software/index_eng.html).
Henzi, A., Moesching, A. & Duembgen, L. Accelerating the Pool-Adjacent-Violators Algorithm for Isotonic Distributional Regression. Methodol Comput Appl Probab (2022). https://doi.org/10.1007/s11009-022-09937-2
Stellato, B., Banjac, G., Goulart, P., Bemporad, A., & Boyd, S. (2020). OSQP: An operator splitting solver for quadratic programs. Mathematical Programming Computation, 1-36.
Bartolomeo Stellato, Goran Banjac, Paul Goulart and Stephen Boyd (2019). osqp: Quadratic Programming Solver using the 'OSQP' Library. R package version 0.6.0.3. https://CRAN.R-project.org/package=osqp
The S3 method predict.idrfit
for predictions based on
an IDR fit.
data("rain") ## Fit IDR to data of 185 days using componentwise order on HRES and CTR and ## increasing convex order on perturbed ensemble forecasts (P1, P2, ..., P50) varNames <- c("HRES", "CTR", paste0("P", 1:50)) X <- rain[1:185, varNames] y <- rain[1:185, "obs"] ## HRES and CTR are group '1', with componentwise order "comp", perturbed ## forecasts P1, ..., P50 are group '2', with "icx" order groups <- setNames(c(1, 1, rep(2, 50)), varNames) orders <- c("comp" = 1, "icx" = 2) fit <- idr(y = y, X = X, orders = orders, groups = groups) fit
data("rain") ## Fit IDR to data of 185 days using componentwise order on HRES and CTR and ## increasing convex order on perturbed ensemble forecasts (P1, P2, ..., P50) varNames <- c("HRES", "CTR", paste0("P", 1:50)) X <- rain[1:185, varNames] y <- rain[1:185, "obs"] ## HRES and CTR are group '1', with componentwise order "comp", perturbed ## forecasts P1, ..., P50 are group '2', with "icx" order groups <- setNames(c(1, 1, rep(2, 50)), varNames) orders <- c("comp" = 1, "icx" = 2) fit <- idr(y = y, X = X, orders = orders, groups = groups) fit
Computes IDR predictions with bootstrap aggregating (bagging) or subsample aggregation (subagging).
idrbag(y, X, groups = setNames(rep(1, ncol(X)), colnames(X)), orders = c("comp" = 1), stoch = "sd", pars = osqpSettings(verbose = FALSE, eps_abs = 1e-5, eps_rel = 1e-5, max_iter = 10000L), progress = TRUE, newdata, digits = 3, interpolation = "linear", b, p, replace = FALSE, grid = NULL)
idrbag(y, X, groups = setNames(rep(1, ncol(X)), colnames(X)), orders = c("comp" = 1), stoch = "sd", pars = osqpSettings(verbose = FALSE, eps_abs = 1e-5, eps_rel = 1e-5, max_iter = 10000L), progress = TRUE, newdata, digits = 3, interpolation = "linear", b, p, replace = FALSE, grid = NULL)
y |
numeric vector (the response variable). |
X |
data frame of numeric or ordered factor variables (the regression covariates). |
groups |
named vector of length |
orders |
named vector giving for each group in |
stoch |
stochastic order constraint used for estimation. Default is
|
pars |
parameters for quadratic programming optimization (only relevant
if |
progress |
display progressbar ( |
newdata |
|
digits |
number of decimal places for the predictive CDF. |
interpolation |
interpolation method for univariate data. Default is
|
b |
number of (su)bagging samples. |
p |
size of (su)bagging samples relative to training data. |
replace |
draw samples with ( |
grid |
grid on which the predictive CDFs are evaluated. Default are
the unique values of |
This function draws b
times a random subsample of size
ceiling(nrow(X)*p)
) from the training data, fits IDR to each
subsample, computes predictions for the new data supplied in newdata
,
and averages the predictions derived from the b
subsamples. There are
no default values for b
and p
.
A list of predictions, see predict.idrfit
.
Computes the probability integral transform (PIT) of IDR or raw forecasts.
pit(predictions, y, randomize = TRUE, seed = NULL) ## S3 method for class 'idr' pit(predictions, y, randomize = TRUE, seed = NULL) ## S3 method for class 'data.frame' pit(predictions, y, randomize = TRUE, seed = NULL)
pit(predictions, y, randomize = TRUE, seed = NULL) ## S3 method for class 'idr' pit(predictions, y, randomize = TRUE, seed = NULL) ## S3 method for class 'data.frame' pit(predictions, y, randomize = TRUE, seed = NULL)
predictions |
either an object of class |
y |
a numeric vector of obervations of the same length as the number of predictions. |
randomize |
PIT values should be randomized at discontinuity points of the
predictive CDF (e.g. at zero for precipitation forecasts). Set |
seed |
argument to |
Vector of PIT values.
Gneiting, T., Balabdaoui, F. and Raftery, A. E. (2007), 'Probabilistic forecasts, calibration and sharpness', Journal of the Royal Statistical Society: Series B (Statistical Methodology) 69(2), 243-268.
data("rain") require("graphics") ## Postprocess HRES forecast using data of 4 years X <- rain[1:(4 * 365), "HRES", drop = FALSE] y <- rain[1:(4 * 365), "obs"] fit <- idr(y = y, X = X) ## Assess calibration of the postprocessed HRES forecast using data of next 4 ## years and compare to calibration of the raw ensemble data <- rain[(4 * 365 + 1):(8 * 365), "HRES", drop = FALSE] obs <- rain[(4 * 365 + 1):(8 * 365), "obs"] predictions <- predict(fit, data = data) idrPit <- pit(predictions, obs, seed = 123) rawData <- rain[(4 * 365 + 1):(8 * 365), c("HRES", "CTR", paste0("P", 1:50))] rawPit <- pit(rawData, obs, seed = 123) hist(idrPit, xlab = "Probability Integral Transform", ylab = "Density", freq = FALSE, main = "Postprocessed HRES") hist(rawPit, xlab = "Probability Integral Transform", ylab = "Density", freq = FALSE, main = "Raw ensemble")
data("rain") require("graphics") ## Postprocess HRES forecast using data of 4 years X <- rain[1:(4 * 365), "HRES", drop = FALSE] y <- rain[1:(4 * 365), "obs"] fit <- idr(y = y, X = X) ## Assess calibration of the postprocessed HRES forecast using data of next 4 ## years and compare to calibration of the raw ensemble data <- rain[(4 * 365 + 1):(8 * 365), "HRES", drop = FALSE] obs <- rain[(4 * 365 + 1):(8 * 365), "obs"] predictions <- predict(fit, data = data) idrPit <- pit(predictions, obs, seed = 123) rawData <- rain[(4 * 365 + 1):(8 * 365), c("HRES", "CTR", paste0("P", 1:50))] rawPit <- pit(rawData, obs, seed = 123) hist(idrPit, xlab = "Probability Integral Transform", ylab = "Density", freq = FALSE, main = "Postprocessed HRES") hist(rawPit, xlab = "Probability Integral Transform", ylab = "Density", freq = FALSE, main = "Raw ensemble")
Plot an IDR predictive CDF.
## S3 method for class 'idr' plot( x, index = 1, bounds = TRUE, col.cdf = "black", col.bounds = "blue", lty.cdf = 1, lty.bounds = 3, xlab = "Threshold", ylab = "CDF", main = "IDR predictive CDF", ... )
## S3 method for class 'idr' plot( x, index = 1, bounds = TRUE, col.cdf = "black", col.bounds = "blue", lty.cdf = 1, lty.bounds = 3, xlab = "Threshold", ylab = "CDF", main = "IDR predictive CDF", ... )
x |
object of class |
index |
index of the prediction in |
bounds |
whether the bounds should be plotted or not (see
|
col.cdf |
color of the predictive CDF. |
col.bounds |
color of the bounds. |
lty.cdf |
linetype of the predictive CDF. |
lty.bounds |
linetype of the CDF bounds. |
xlab |
label for x axis. |
ylab |
label for y axis. |
main |
main title. |
... |
further arguments to |
The data based on which the plot is drawn (returned invisible).
data("rain") require("graphics") ## Postprocess HRES and CTR forecast using data of 2 years X <- rain[1:(2 * 365), c("HRES", "CTR"), drop = FALSE] y <- rain[1:(2 * 365), "obs"] ## Fit IDR and plot the predictive CDF when the HRES forecast is 1 mm and ## CTR is 0 mm fit <- idr(y = y, X = X) pred <- predict(fit, data = data.frame(HRES = 1, CTR = 0)) plot(pred)
data("rain") require("graphics") ## Postprocess HRES and CTR forecast using data of 2 years X <- rain[1:(2 * 365), c("HRES", "CTR"), drop = FALSE] y <- rain[1:(2 * 365), "obs"] ## Fit IDR and plot the predictive CDF when the HRES forecast is 1 mm and ## CTR is 0 mm fit <- idr(y = y, X = X) pred <- predict(fit, data = data.frame(HRES = 1, CTR = 0)) plot(pred)
Prediction based on distributional index model fit.
## S3 method for class 'dindexfit' predict( object, data = NULL, digits = 3, interpolation = "linear", asplitAvail = TRUE, ... )
## S3 method for class 'dindexfit' predict( object, data = NULL, digits = 3, interpolation = "linear", asplitAvail = TRUE, ... )
object |
DIM fit (object of class |
data |
optional |
digits |
number of decimal places for the predictive CDF. |
interpolation |
interpolation method for univariate index Default is
|
asplitAvail |
use |
... |
further arguments passed to the index prediction function. |
A list of predictions, as for predict.idrfit
.
Examples in dindexm
.
Prediction based on IDR model fit.
## S3 method for class 'idrfit' predict(object, data = NULL, digits = 3, interpolation = "linear", ...)
## S3 method for class 'idrfit' predict(object, data = NULL, digits = 3, interpolation = "linear", ...)
object |
IDR fit (object of class |
data |
optional |
digits |
number of decimal places for the predictive CDF. |
interpolation |
interpolation method for univariate data. Default is
|
... |
included for generic function consistency. |
If the variables x = data[j,]
for which predictions are
desired are already contained in the training dataset X
for the fit,
predict.idrfit
returns the corresponding in-sample prediction.
Otherwise monotonicity is used to derive upper and lower bounds for the
predictive CDF, and the predictive CDF is a pointwise average of these
bounds. For univariate IDR with a numeric covariate, the predictive CDF is
computed by linear interpolation. Otherwise, or if
interpolation != "linear"
, midpoint interpolation is used, i.e.
default weights of 0.5
for both the lower and the upper bound.
If the lower and the upper bound on the predictive cdf are far apart (or
trivial, i.e. constant 0 or constant 1), this indicates that the prediction
based on x
is uncertain because either the training dataset is too
small or only few similar variable combinations as in x
have been
observed in the training data. However, the bounds on the predictive
CDF are not prediction intervals and should not be interpreted as such. They
only indicate the uncertainty of out-of-sample predictions for which the
variables are not contained in the training data.
If the new variables x
are greater than all X[i, ]
in the
selected order(s), the lower bound on the cdf is trivial (constant 0) and the
upper bound is taken as predictive cdf. The upper bound on the cdf is trivial
(constant 1) if x
is smaller than all X[i, ]
. If x
is
not comparable to any row of X
in the given order, a prediction based
on the training data is not possible. In that case, the default forecast is
the empirical distribution of y
in the training data.
A list of predictions. Each prediction is a data.frame
containing the following variables:
points |
the points where the predictive CDF has jumps. |
cdf |
the estimated CDF evaluated at the |
lower , upper
|
(only for out-of-sample predictions) bounds for the estimated CDF, see 'Details' above. |
The output has the attribute incomparables
, which gives the indices
of all predictions for which the climatological forecast is returned because
the forecast variables are not comparable to the training data.
idr
to fit IDR to training data.
cdf
, qpred
to evaluate the CDF or quantile
function of IDR predictions.
bscore
, qscore
, crps
,
pit
to compute Brier scores, quantile scores, the CRPS and the
PIT of IDR predictions.
plot
to plot IDR predictive CDFs.
data("rain") ## Fit IDR to data of 185 days using componentwise order on HRES and CTR and ## increasing convex order on perturbed ensemble forecasts (P1, P2, ..., P50) varNames <- c("HRES", "CTR", paste0("P", 1:50)) X <- rain[1:185, varNames] y <- rain[1:185, "obs"] ## HRES and CTR are group '1', with componentwise order "comp", perturbed ## forecasts P1, ..., P50 are group '2', with "icx" order groups <- setNames(c(1, 1, rep(2, 50)), varNames) orders <- c("comp" = 1, "icx" = 2) fit <- idr(y = y, X = X, orders = orders, groups = groups) ## Predict for day 186 predict(fit, data = rain[186, varNames])
data("rain") ## Fit IDR to data of 185 days using componentwise order on HRES and CTR and ## increasing convex order on perturbed ensemble forecasts (P1, P2, ..., P50) varNames <- c("HRES", "CTR", paste0("P", 1:50)) X <- rain[1:185, varNames] y <- rain[1:185, "obs"] ## HRES and CTR are group '1', with componentwise order "comp", perturbed ## forecasts P1, ..., P50 are group '2', with "icx" order groups <- setNames(c(1, 1, rep(2, 50)), varNames) orders <- c("comp" = 1, "icx" = 2) fit <- idr(y = y, X = X, orders = orders, groups = groups) ## Predict for day 186 predict(fit, data = rain[186, varNames])
Evaluate the the quantile function of IDR predictions or of unprocessed
forecasts in a data.frame
.
qpred(predictions, quantiles) ## S3 method for class 'idr' qpred(predictions, quantiles) ## S3 method for class 'data.frame' qpred(predictions, quantiles)
qpred(predictions, quantiles) ## S3 method for class 'idr' qpred(predictions, quantiles) ## S3 method for class 'data.frame' qpred(predictions, quantiles)
predictions |
either an object of class |
quantiles |
numeric vector of desired quantiles. |
The quantiles are defined as lower quantiles, that is,
A matrix of forecasts for the desired quantiles, one column per quantile.
data("rain") ## Postprocess HRES forecast using data of 3 years X <- rain[1:(3 * 365), "HRES", drop = FALSE] y <- rain[1:(3 * 365), "obs"] fit <- idr(y = y, X = X) ## Compute 95%-quantile forecast given that the HRES forecast is ## 2.5 mm, 5 mm or 10 mm predictions <- predict(fit, data = data.frame(HRES = c(2.5, 5, 10))) qpred(predictions, quantiles = 0.95)
data("rain") ## Postprocess HRES forecast using data of 3 years X <- rain[1:(3 * 365), "HRES", drop = FALSE] y <- rain[1:(3 * 365), "obs"] fit <- idr(y = y, X = X) ## Compute 95%-quantile forecast given that the HRES forecast is ## 2.5 mm, 5 mm or 10 mm predictions <- predict(fit, data = data.frame(HRES = c(2.5, 5, 10))) qpred(predictions, quantiles = 0.95)
Computes quantile scores of IDR quantile predictions or of quantile
predictions from raw forecasts in a data.frame
.
qscore(predictions, quantiles, y)
qscore(predictions, quantiles, y)
predictions |
either an object of class |
quantiles |
numeric vector of desired quantiles. |
y |
a numeric vector of obervations of the same length as the number of
predictions, or of length 1. In the latter case, |
The quantile score of a forecast x for the u-quantile is defined as
where y is the observation. For u = 1/2, this equals the mean absolute error of the median forecast.
A matrix of the quantile scores for the desired quantiles, one column per quantile.
Gneiting, T. and Raftery, A. E. (2007), 'Strictly proper scoring rules, prediction, and estimation', Journal of the American Statistical Association 102(477), 359-378
data("rain") ## Postprocess HRES forecast using data of 3 years X <- rain[1:(3 * 365), "HRES", drop = FALSE] y <- rain[1:(3 * 365), "obs"] fit <- idr(y = y, X = X) ## Compute mean absolute error of the median postprocessed forecast using ## data of the next 2 years (out-of-sample predictions) and compare to raw ## HRES forecast data <- rain[(3 * 365 + 1):(5 * 365), "HRES", drop = FALSE] obs <- rain[(3 * 365 + 1):(5 * 365), "obs"] predictions <- predict(fit, data = data) idrMAE <- mean(qscore(predictions, 0.5, obs)) rawMAE <- mean(qscore(data, 0.5, obs)) c("idr" = idrMAE, "raw" = rawMAE)
data("rain") ## Postprocess HRES forecast using data of 3 years X <- rain[1:(3 * 365), "HRES", drop = FALSE] y <- rain[1:(3 * 365), "obs"] fit <- idr(y = y, X = X) ## Compute mean absolute error of the median postprocessed forecast using ## data of the next 2 years (out-of-sample predictions) and compare to raw ## HRES forecast data <- rain[(3 * 365 + 1):(5 * 365), "HRES", drop = FALSE] obs <- rain[(3 * 365 + 1):(5 * 365), "obs"] predictions <- predict(fit, data = data) idrMAE <- mean(qscore(predictions, 0.5, obs)) rawMAE <- mean(qscore(data, 0.5, obs)) c("idr" = idrMAE, "raw" = rawMAE)
Accumulated 06-30 hour precipitation observations and operational ECMWF ensemble forecasts for Frankfurt airport, Germany. The observations are airport station observations (WMO station index 10637), the forecasts are gridded forecasts for the 0.25 degrees latitude/longitude box containing the station. Dates range from 2007-01-01 to 2017-01-01, days with missing values have been removed.
data("rain")
data("rain")
A data frame with 3617 rows. The first column gives the dates, the second column are the observations. The remaining columns are the ensemble forecasts (high resolution HRES, 50 perturbed forecasts P1 to P50 and the control forecast CTR for the perturbed forecasts). The units of the forecasts and observations are mm/m^2.
Observations: http://www.ogimet.com/synops.phtml.en
Forecasts: available on TIGGE https://confluence.ecmwf.int/display/TIGGE/TIGGE+archive
Bougeault et al. (2010) The THORPEX Interactive Grand Global Ensemble. Bull. Amer. Meteor. Soc., 91, 1059-1072.
Swinbank et al. (2016) The TIGGE project and its achievements. Bull. Amer. Meteor. Soc., 97, 49-67.