Package 'predtools'

Title: Prediction Model Tools
Description: Provides additional functions for evaluating predictive models, including plotting calibration curves and model-based Receiver Operating Characteristic (mROC) based on Sadatsafavi et al (2021) <arXiv:2003.00316>.
Authors: Mohsen Sadatsafavi [aut, cph] , Amin Adibi [cre] , Abdollah Safari [aut], Tae Yoon Lee [aut]
Maintainer: Amin Adibi <[email protected]>
License: GPL
Version: 0.0.3
Built: 2024-10-27 04:55:28 UTC
Source: https://github.com/resplab/predtools

Help Index


Calculates the absolute surface between the empirical and expected ROCs

Description

Calculates the absolute surface between the empirical and expected ROCs

Usage

calc_mROC_stats(y, p, ordered = FALSE, fast = TRUE)

Arguments

y

y vector of binary responses

p

p vector of predicted probabilities (same length as y)

ordered

defaults to false

fast

defaults to true

Value

Returns a list with the A (mean calibration statistic) and B (mROC/ROC equality statistic) as well as the direction of poential miscalibration (sign of the difference between the ctual and predicted mean risk)


Calculates the first two moments of the bivariate distribution of NB_model and NB_all

Description

Calculates the first two moments of the bivariate distribution of NB_model and NB_all

Usage

calc_NB_moments(Y, pi, z, weights = NULL)

Arguments

Y

Vector of the binary response variable

pi

Vector of predicted risks

z

Decision threshold at which the NBs are calculated

weights

Optinal - observation weights

Value

Two means, two SDs, and one correlation coefficient. First element is for the model and second is for treating all


Title Create calibration plot based on observed and predicted outcomes.

Description

Title Create calibration plot based on observed and predicted outcomes.

Usage

calibration_plot(
  data,
  obs,
  follow_up = NULL,
  pred,
  group = NULL,
  nTiles = 10,
  legendPosition = "right",
  title = NULL,
  x_lim = NULL,
  y_lim = NULL,
  xlab = "Prediction",
  ylab = "Observation",
  points_col_list = NULL,
  data_summary = FALSE
)

Arguments

data

Data include observed and predicted outcomes.

obs

Name of observed outcome in the input data.

follow_up

Name of follow-up time (if applicable) in the input data.

pred

Name of first predicted outcome in the input data.

group

Name of grouping column (if applicable) in the input data.

nTiles

Number of tiles (e.g., 10 for deciles) in the calibration plot.

legendPosition

Legend position on the calibration plot.

title

Title on the calibration plot.

x_lim

Limits of x-axis on the calibration plot.

y_lim

Limits of y-axis on the calibration plot.

xlab

Label of x-axis on the calibration plot.

ylab

Label of y-axis on the calibration plot.

points_col_list

Points' color on the calibration plot.

data_summary

Logical indicates whether a summary of the predicted and observed outcomes. needs to be included in the output.

Value

Returns calibration plot (a ggplot object) and a dataset including summary statistics of the predicted and observed outcomes (if data_summary set to be TRUE).

Examples

library(predtools)
library(dplyr)
x <- rnorm(100, 10, 2)
y <- x + rnorm(100,0, 1)
data <- data.frame(x, y)
calibration_plot(data, obs = "x", pred = "y")

model development data

Description

A dataset containing sample model development data

Format

A data frame with 500 rows and 5 variables:

  • ageage

  • severitywhether or not the disease was severe

  • sexbinary sex variable, 1 for female and 0 for male

  • comorbiditywhether or not comorbidities are present

  • yresponse variable

Source

Simulated


EVPI (Expected Value of Perfect Information) for validation Takes a vector of mean and a 2X2 covariance matrix

Description

EVPI (Expected Value of Perfect Information) for validation Takes a vector of mean and a 2X2 covariance matrix

Usage

evpi_val(
  Y,
  pi,
  method = c("bootstrap", "bayesian_bootstrap", "asymptotic"),
  n_sim = 1000,
  zs = (0:99)/100,
  weights = NULL
)

Arguments

Y

Binary response variable

pi

Mean of the second distribution

method

EVPI calculation method

n_sim

Number of Monte Carlo simulations (for bootstrap-based methods)

zs

vector of risk thresholds at which EVPI is to be calculated

weights

(optional) observation weights

Value

Returns a data frame containing thresholds, EVPIs, and some auxilary output.


Anonymized data from the gusto trial

Description

A dataset containing anonymized data from the gusto trial

Format

A data frame with 40830 rows and 29 variables:

  • day30whether death happened by day 30 after intervention

  • showhether cardiac shock was present

  • higwhether the patient hat high blood pressure

  • diawhether the patient had diabetes

  • hrtwhether the patient was on hormone replacement therapies

Source

Internet


Takes in a mROC object and calculates the area under the curve

Description

Takes in a mROC object and calculates the area under the curve

Usage

mAUC(mROC_obj)

Arguments

mROC_obj

An object of class mROC

Value

Returns the area under the mROC curve


Calculates mROC from the vector of predicted risks Takes in a vector of probabilities and returns mROC values (True positives, False Positives in an object of class mROC)

Description

Calculates mROC from the vector of predicted risks Takes in a vector of probabilities and returns mROC values (True positives, False Positives in an object of class mROC)

Usage

mROC(p, ordered = FALSE)

Arguments

p

A numeric vector of probabilities.

ordered

Optional, if the vector p is ordered from small to large (if not the function will do it; TRUE is to facilitate fast computations).

Value

This function returns an object of class mROC. It has three vectors: thresholds on predicted risks (which is the ordered vector of input probabilities), false positive rates (FPs), and true positive rates (TPs). You can directly call the plot function on this object to draw the mROC


Main eROC analysis that plots ROC and eROC

Description

Main eROC analysis that plots ROC and eROC

Usage

mROC_analysis(y, p, inference = 0, n_sim, fast = TRUE)

Arguments

y

y vector of observed responses.

p

p vector of predicted probabilities (the same length as observed responses)

inference

0 for no inference, 1 for p-value only, and 2 for p-value and 95 percent CI.

n_sim

number of simulations

fast

defaults to true

Value

returns a list containing the results of mROC analysis.


Statistical inference for comparing empirical and expected ROCs. If CI=TRUE then also returns pointwise CIs

Description

Statistical inference for comparing empirical and expected ROCs. If CI=TRUE then also returns pointwise CIs

Usage

mROC_inference(y, p, n_sim = 1e+05, CI = FALSE, aux = FALSE, fast = TRUE)

Arguments

y

vector of binary response values

p

vector of probabilities

n_sim

number of Monte Carlo simulations to calculate p-value

CI

optional. Whether confidence interval should be calculated for each point of mROC. Default is FALSE.

aux

aux optional. whether additional results (component-wise p-values etc) should be written in the package's aux variable. Default is FALSE.

fast

fast optional. Whether the fast code (C++) or slow code (R) should be called. Default is TRUE (R code will be slow unless the dataset is small)

Value

Returns an object of type mROC_inference containing the results of statistical inference for the mROC curve


Calculates the expected value of the maximum of two random variables with zero-truncated bivariate normal distribution Takes a vector of mean and a 2X2 covariance matrix

Description

Calculates the expected value of the maximum of two random variables with zero-truncated bivariate normal distribution Takes a vector of mean and a 2X2 covariance matrix

Usage

mu_max_trunc_bvn(
  mu1,
  mu2,
  sigma1,
  sigma2,
  rho,
  precision = .Machine$double.eps
)

Arguments

mu1

Mean of the first distribution

mu2

Mean of the second distribution

sigma1

SD of the first distribution

sigma2

SD of the second distribution

rho

Correlation coefficient of the two random variables

precision

Numerical precision value

Value

A scalar value for the expected value


Title Update a prediction model for a binary outcome by multiplying a fixed odd-ratio to the predicted odds.

Description

Title Update a prediction model for a binary outcome by multiplying a fixed odd-ratio to the predicted odds.

Usage

odds_adjust(p0, p1, v)

Arguments

p0

Mean of observed risk or predicted risk in development sample.

p1

Mean of observed risk in target population.

v

Variance of predicted risk in development sample.

Value

Returns a correction factor that can be applied to the predicted odds in order to update the predictions for a new target population.


Title Estimate mean and variance of prediction based on model calibration output.

Description

Title Estimate mean and variance of prediction based on model calibration output.

Usage

pred_summary_stat(calibVector)

Arguments

calibVector

Vector of predicted probability of risk per decile or percentile (e.g., from a calibration plot).

Value

Returns mean and variance of predictions based on the predicted probabilities.


model validation data

Description

A dataset containing sample model validation data

Format

A data frame with 400 rows and 5 variables:

  • ageage of the patient

  • severitywhether or not the disease was severe

  • sexbinary sex variable, 1 for female and 0 for male

  • comorbiditywhether or not comorbidities are present

  • yresponse variable

Source

Simulated