Various metrics for measuring model performance.


score_log_loss(y, y_hat, na_rm = FALSE)

score_quadratic(y, y_hat, na_rm = FALSE)


metrics_unc(score_fun, y, na_rm = FALSE)

metrics_R2(score_fun, y, y_hat, na_rm = FALSE)

metrics_fit_calib(y, y_hat, rev_fct = FALSE)

metrics_DI(score_fun, y, y_hat_calib, na_rm = FALSE)

metrics_MI(score_fun, y, y_hat, y_hat_calib, na_rm = FALSE)

metrics_r2(y, y_hat, y_hat_calib, na_rm = FALSE)



Vector of observations.


Vector of predictions.


Logical, defaults to FALSE. Should NAs be removed?


A scoring function: score_quadratic, score_log_loss, or a user-defined scoring rule. See below for more details.


Logical, defaults to FALSE. Switch the factor level of the data before performing calibration. Only relevant for binary response.


Vector of calibrated predictions. See below for more details.


metrics_fit_calib returns an mgcv::gam() model fit, otherwise a number.


  • score_log_loss(): Binary log loss score

  • score_quadratic(): Quadratic score

  • check_score_fun(): Utility function for checking the properties of a user-defined score_fun.

  • metrics_unc(): Uncertainty

  • metrics_R2(): R^2 metric

  • metrics_fit_calib(): Fit calibration curve using mgcv::gam(). Note that NAs are always dropped.

  • metrics_DI(): Discrimination index

  • metrics_MI(): Miscalibration index

  • metrics_r2(): r^2 metric based on slope of lm

Scoring function

One can use predefined scores like score_quadratic or score_log_loss. If those do not fit the needs, a user-defined scoring function can also be used. This function needs to take exactly 3 arguments: y (truth values), y_hat (estimated values), and na_rm (should NAs be removed?):

  • both y and y_hat are numeric (not factors!)

  • na_rm is a scalar logical

It needs to return a number. There is a utility function check_score_fun to check if the user-defined function is programmed correctly. It checks the input and the output, but not if the actual returned value makes sense.


To obtain calibrated predictions, fit a calibration model and predict based on that model. Users can use their own calibration model or make use of metrics_fit_calib, which fits an mgcv::gam() model with smoother mgcv::s(., k = -1) (automatic knot selection). If the input y is a factor, then a binomial family is used, otherwise a gaussian. NAs are always dropped.

Continuous response example:

calibration_model <- metrics_fit_calib(
  y = truth,
  y_hat = prediction
calib_pred <- predict(calibration_model)

Binary response example:

calibration_model <- metrics_fit_calib(
  y = factor(truth, levels = c("0", "1")),
  y_hat = prediction
calib_pred <- predict(calibration_model, type = "response")

In the binary case, make sure that:

  • y is a factor with correct level setting. Usually "0" is the reference (first) level and "1" is the event (second level). This may clash with yardstick setting where the first level is by default the "event" level.

  • y_hat are probabilities (not a log of odds).

  • returned calibrated predictions calib_pred are also probabilities by setting type = "response".


# Scores
score_quadratic(y = c(1.34, 2.8), y_hat = c(1.34, 2.8)) # must be 0
#> [1] 0
score_quadratic(y = 0.5, 0) # must be 0.5**2 = 0.25
#> [1] 0.25

score_log_loss(y = c(0, 1), y_hat = c(0.01, 0.9)) # must be close to 0
#> [1] 0.05770543
score_log_loss(y = 0, y_hat = 0) # undefined
#> [1] NaN

check_score_fun(score_quadratic) # passes without errors

# Metrics based on `lm` model
mod <- lm(hp ~ ., data = mtcars)
truth <- mtcars$hp
pred <- predict(mod)

# calibration fit and calibrated predictions
calib_mod <- metrics_fit_calib(y = truth, y_hat = pred)
calib_pred <- predict(calib_mod)

metrics_unc(score_fun = "score_quadratic", y = truth)
#> [1] 4553.965
metrics_R2(score_fun = "score_quadratic", y = truth, y_hat = pred)
#> [1] 0.9027993
metrics_DI(score_fun = "score_quadratic", y = truth, y_hat_calib = calib_pred)
#> [1] 0.9222323
metrics_MI(score_fun = "score_quadratic", y = truth, y_hat = pred, y_hat_calib = calib_pred)
#> [1] 0.01943302
# Note that R^2 = DI - MI
metrics_r2(y = truth, y_hat = pred, y_hat_calib = calib_pred)
#> [1] 0.9027993

# Metrics based on `glm` model (logistic regression)
# Note the correct setting of levels
mod <- glm(factor(vs, levels = c("0", "1")) ~ hp + mpg, data = mtcars, family = "binomial")
truth_fct <- factor(mtcars$vs, levels = c("0", "1"))
truth_num <- mtcars$vs
pred <- predict(mod, type = "response") # type = "response" returns probabilities

# calibration fit and calibrated predictions
calib_mod <- metrics_fit_calib(y = truth_fct, y_hat = pred)
calib_pred <- predict(calib_mod, type = "response") # type = "response" returns probabilities

metrics_unc(score_fun = "score_quadratic", y = truth_num)
#> [1] 0.2460938
metrics_R2(score_fun = "score_quadratic", y = truth_num, y_hat = pred)
#> [1] 0.6498564
metrics_DI(score_fun = "score_quadratic", y = truth_num, y_hat_calib = calib_pred)
#> [1] 0.7358166
metrics_MI(score_fun = "score_quadratic", y = truth_num, y_hat = pred, y_hat_calib = calib_pred)
#> [1] 0.08596014
# Note that R^2 = DI - MI
metrics_r2(y = truth_num, y_hat = pred, y_hat_calib = calib_pred)
#> [1] 0.6499537