| Title: | Bayesian Sample Size and Precision Considerations for Risk Prediction Models |
|---|---|
| Description: | Performs Bayesian sample size, precision, and value-of-information analysis for external validation of existing multi-variable prediction models using the approach proposed by Sadatsafavi and colleagues (2026) <doi:10.1002/sim.70389>. |
| Authors: | Mohsen Sadatsafavi [aut, cre] (ORCID: <https://orcid.org/0000-0002-0419-7862>), Anna Luo [ctb] |
| Maintainer: | Mohsen Sadatsafavi <[email protected]> |
| License: | GPL-3 |
| Version: | 0.0.2 |
| Built: | 2026-06-05 11:55:42 UTC |
| Source: | https://github.com/resplab/bayespmtools |
Bayesian precision and value-of-information calculator for external validation studies of risk prediction models at fixed sample sizes.
bpm_valprec( N, evidence, targets, n_sim = NULL, method = "sample", threshold = NULL, dist_type = "logitnorm", impute_cor = TRUE, ex_args = NULL )bpm_valprec( N, evidence, targets, n_sim = NULL, method = "sample", threshold = NULL, dist_type = "logitnorm", impute_cor = TRUE, ex_args = NULL )
N |
Numeric vector of sample sizes to evaluate. |
evidence |
A named list containing prior evidence components for model performance
parameters (e.g., prevalence, discrimination, calibration).
Alternatively, |
targets |
A named list of targets to compute.
|
n_sim |
#' Number of Monte Carlo simulations used to generate the pre-posterior distribution. If evidence is a data frame from previous calls to relevant functions, n_sim will automatically be set to the number of rows of the data frame. |
method |
Method to compute CI widths. One of |
threshold |
Decision threshold for net benefit calculations.
Required if |
dist_type |
Distribution for calibrated risks. Default is
|
impute_cor |
Logical; whether to induce correlation between parameters. |
ex_args |
Optional list of extra arguments. May include
|
A list with elements:
Matrix of requested metrics by sample size.
Monte Carlo sample used for computations.
Processed evidence object.
Targets as supplied by the user.
Simulated CI widths for requested metrics.
evidence <- list( prev ~ beta(116, 155), # Outcome prevalence cstat ~ beta(3628, 1139), # C-statistic cal_mean ~ norm(-0.009, 0.125), # Mean calibration error cal_slp ~ norm(0.995, 0.024) # Calibration slope ) res <- bpm_valprec( N = c(1000, 1500), evidence = evidence, targets = list(eciw.cstat = TRUE, qciw.cal_slp=0.9, voi.nb=0.8), threshold=0.2, n_sim = 100 # faster and safer on CRAN. Please increase this value for real-world use. ) print(res$results)evidence <- list( prev ~ beta(116, 155), # Outcome prevalence cstat ~ beta(3628, 1139), # C-statistic cal_mean ~ norm(-0.009, 0.125), # Mean calibration error cal_slp ~ norm(0.995, 0.024) # Calibration slope ) res <- bpm_valprec( N = c(1000, 1500), evidence = evidence, targets = list(eciw.cstat = TRUE, qciw.cal_slp=0.9, voi.nb=0.8), threshold=0.2, n_sim = 100 # faster and safer on CRAN. Please increase this value for real-world use. ) print(res$results)
Bayesian sample size calculation for external validation studies of clinical risk prediction models. The function evaluates sample sizes required to meet precision-, assurance-, or decision-based targets using pre-posterior simulation.
bpm_valsamp( evidence, targets, n_sim = NULL, method = "sample", threshold = NULL, dist_type = "logitnorm", impute_cor = TRUE, ex_args = NULL )bpm_valsamp( evidence, targets, n_sim = NULL, method = "sample", threshold = NULL, dist_type = "logitnorm", impute_cor = TRUE, ex_args = NULL )
evidence |
A named list containing prior evidence components for model performance
parameters (e.g., prevalence, discrimination, calibration).
Alternatively, |
targets |
A named list specifying sample size targets. Supported targets include:
For example, |
n_sim |
Number of Monte Carlo simulations used to generate the pre-posterior distribution. If evidence is a data frame from previous calls to relevant functions, n_sim will automatically be set to the number of rows of the data frame. |
method |
Method used to compute the pre-posterior distribution of 95\
One of |
threshold |
Risk threshold used for decision-analytic quantities and net benefit
calculations. Required if |
dist_type |
Distribution assumed for calibrated risks. Default is
|
impute_cor |
Logical indicating whether correlation between performance measures
should be induced when simulating from marginal evidence distributions.
Default is |
ex_args |
Optional list of additional arguments passed to internal simulation or root-finding routines (experimental feature). |
A list with the following components:
results: Estimated sample sizes required to meet each target.
sample: Data frame of pre-posterior simulation draws.
evidence: Processed evidence object used in the analysis.
trace: Trace output from the stochastic root-finding algorithm.
targets: The targets argument supplied to the function.
evidence <- list( prev ~ beta(116, 155), # Outcome prevalence cstat ~ beta(3628, 1139), # C-statistic cal_mean ~ norm(-0.009, 0.125), # Mean calibration error cal_slp ~ norm(0.995, 0.024) # Calibration slope ) targets <- list( eciw.cstat = 0.1, qciw.cstat = c(0.9, 0.1), oa.nb = 0.8 ) samp <- bpm_valsamp( evidence = evidence, targets = targets, n_sim = 1000, threshold = 0.2 ) samp$resultsevidence <- list( prev ~ beta(116, 155), # Outcome prevalence cstat ~ beta(3628, 1139), # C-statistic cal_mean ~ norm(-0.009, 0.125), # Mean calibration error cal_slp ~ norm(0.995, 0.024) # Calibration slope ) targets <- list( eciw.cstat = 0.1, qciw.cstat = c(0.9, 0.1), oa.nb = 0.8 ) samp <- bpm_valsamp( evidence = evidence, targets = targets, n_sim = 1000, threshold = 0.2 ) samp$results
Calculates pre-posterior distribution of 95% CI widths using two-step method.
calc_ciw_2s(N, parms)calc_ciw_2s(N, parms)
N |
A vector of sample sizes |
parms |
Parameters for the distribution containing: cal_int: calibration intercept cal_slp: calibration slope prev: prevalence dist_type: distribution type cstat: c-statistic dist_type: one of ("logitnorm", "beta, "probitnorm") dist_parm1: first parameter of the distribution dist_parm2: second parameter of the distribution |
List of length N, of vectors containing 95% confidence interval width for each of: cstat: c-statistic cal_oe: observed to expected ratio cal_mean: mean calibration cal_int: calibration intercept cal_slp: calibration slope
Calculates pre-posterior distribution of 95% CI widths based on given method
calc_ciw_mc(N, parms_sample, method)calc_ciw_mc(N, parms_sample, method)
N |
A vector of sample sizes |
parms_sample |
Matrix of parameters for the distribution each row with appropriate parameters: cstat: c-statistic prev: prevalence dist_type: distribution type dist_parm1: first parameter of distribution dist_parm2: second parameter of distribution cal_int: calibration intercept cal_slp: calibration slope |
method |
Method to calculate 95% confident interval width, one of sample, 2s |
List of matrices each with dimension (number of rows in parms_sample x length N) containing 95% confidence interval width for each of: cstat: c-statistic cal_oe: observed to expected ratio cal_mean: mean calibration cal_int: calibration intercept cal_slp: calibration slope
Calculates pre-posterior distribution of 95% CI widths using sampling-based simulation
calc_ciw_sample(N, parms)calc_ciw_sample(N, parms)
N |
A vector of sample sizes |
parms |
Parameters for the distribution containing: prev: prevalence dist_type: distribution type dist_parm1: first parameter of distribution dist_parm2: second parameter of distribution cal_int: calibration intercept cal_slp: calibration slope |
List of length N, of vectors containing 95% confidence interval width for each of: cstat: c-statistic cal_oe: observed to expected ratio cal_mean: mean calibration cal_int: calibration intercept cal_slp: calibration slope
Calculates the c-statistic given the model type and parameters.
calc_cstat(type, parms, m = NULL)calc_cstat(type, parms, m = NULL)
type |
A character string; one of c("beta", "logitnorm", "probitnorm") indicating the model type. |
parms |
A numeric vector containing parameters relevant to the model. |
m |
Mean, default is NULL |
The C-statistic
Calculates approximate variances performance metrics and covariance of calibration intercept and slope using the Riley framework
calc_riley_vars(N, parms)calc_riley_vars(N, parms)
N |
sample size of the validation dataset |
parms |
list containing model and distribution parameters: prev: expected prevalence cstat: c-statistic of the model dist_type: one of ("logitnorm", "beta, "probitnorm") dist_parm1: first parameter of the distribution dist_parm2: second parameter of the distribution cal_int: calibration intercept cal_slp: calibration slope |
list of approximate variances and covariance of the performance metrics.
Calculate the sensitivity and specificity of the model at given threshold
calc_se_sp(dist_type, dist_parms, cal_int, cal_slp, threshold, prev)calc_se_sp(dist_type, dist_parms, cal_int, cal_slp, threshold, prev)
dist_type |
The distribution type, one of c("logitnorm", "beta", "probitnorm"). |
dist_parms |
Vector of the two parameters of interest given the distribution. |
cal_int |
The calibration intercept. |
cal_slp |
The calibration slope. |
threshold |
The risk threshold |
prev |
The outcome prevalence, the expectation of the model |
A vector containing sensitivity and specificity
calc_se_sp("beta", c(1,1), 0.9, 0.75, 0.5, 0.5)calc_se_sp("beta", c(1,1), 0.9, 0.75, 0.5, 0.5)
Calculates sample size N, so that the mean confidence interval is equal to given target, assumes function is decreasing and convex
find_n_mean(target, N, ciws, decreasing = T, convex = T)find_n_mean(target, N, ciws, decreasing = T, convex = T)
target |
The target mean confidence interval width |
N |
Sample sizes corresponding to each row of ciws,= |
ciws |
Matrix of confidence intervals widths, each row corresponding to N |
decreasing |
Logical. Constraining function to decreasing |
convex |
Logical. Constraining function to convex |
Integer. Estimated sample size needed to achieve the target
Find sample size N, so that the specified quantile is equal to given target
find_n_quantile(target, N, q, ciws)find_n_quantile(target, N, q, ciws)
target |
The desired quantile target value |
N |
Sample sizes corresponding to each row of ciws |
q |
Desired quantile level, between 0 and 1. |
ciws |
A matrix of confidence intervals widths, each row corresponding to N |
Estimated sample size needed to achieve the target
Infer calibration intercept from mean calibration given a fixed calibration slope and a given distribution for calibrated risks
infer_cal_int_from_mean(dist_type, dist_parms, cal_mean, cal_slp, prev = NULL)infer_cal_int_from_mean(dist_type, dist_parms, cal_mean, cal_slp, prev = NULL)
dist_type |
The distribution type, one of c("logitnorm", "probitnorm", "beta"). |
dist_parms |
The two parameters that index the type. |
cal_mean |
The mean calibration. |
cal_slp |
The calibration slope. |
prev |
Outcome prevalence. Optional; if not provided, estimate is as the expected value of the distribution of calibrated risks. |
The estimated calibration intercept
Infer calibration intercept from observed-to-expected outcome ratio given a fixed calibration slope and a given distribution for calibrated risks
infer_cal_int_from_oe(dist_type, dist_parms, cal_oe, cal_slp, prev = NULL)infer_cal_int_from_oe(dist_type, dist_parms, cal_oe, cal_slp, prev = NULL)
dist_type |
The distribution type, one of c("logitnorm", "probitnorm", "beta"). |
dist_parms |
The two parameters that index the type. |
cal_oe |
The observed-to-expected outcome ratio. |
cal_slp |
The calibration slope. |
prev |
Outcome prevalence. Optional; if not provided, estimate is as the expected value of the distribution of calibrated risks. |
The estimated calibration intercept
Calculates correlation based on simulated data
infer_correlation(dist_type, dist_parms, cal_int, cal_slp, n, n_sim)infer_correlation(dist_type, dist_parms, cal_int, cal_slp, n, n_sim)
dist_type |
The distribution type |
dist_parms |
The two parameters of interest for the given distribution type |
cal_int |
The calibration intercept. |
cal_slp |
The calibration slope. |
n |
number of observations for each simulation. |
n_sim |
number of simulations |
correlation among the simulated data
Calculate the model parameters given the distribution type, mean, quantile, and percentile.
inv_mean_quantile(type, m, q, p)inv_mean_quantile(type, m, q, p)
type |
The distribution type, one of c("norm", "beta", "logitnorm", "probitnorm"). |
m |
Mean of the of distribution. |
q |
The quantile value. |
p |
The percentile at which the quantile occurs. |
The model parameters of the given type.
Calculates the model parameters of interest given the first two moments.
inv_moments(type, moments)inv_moments(type, moments)
type |
The distribution type, one of c("norm", "beta", "logitnorm"). |
moments |
A numeric vector containing the first two moments of the model |
Returns the two parameters for each model. mean and sd for norm mu and sigma for logitnorm shape1 (alpha) and shape2 (beta) for beta
Data from the International Severe Acute Respiratory and Emerging Infection Consortium regarding Regions in the UK.
isaricisaric
A data frame with 8 rows and 10 columns
Region where the sample was drawn
Raw number of total subjects available in the region's dataset
Number of subjects used in analysis after exclusions
Number of positive subjects
C-statistic
Lower bound for the confidence interval of the C-statistic
Calibration Mean
Lower bound for the confidence interval of the calibration mean
Calibration slope
Lower bound of the confidence interval of the calibration slope
Simulated Data
Calculates the first two moments (mean and variance) of the given model type and parameters.
moments(type, parms)moments(type, parms)
type |
The distribution type, one of c("norm", "beta", "logitnorm", "probitnorm"). |
parms |
A numeric vector containing parameters relevant to the model. |
A numeric vector representing the mean and variance.
simulates calibration curves based on given method, and uses plot to visualize calibration distance (difference between predicted and observed)
plot_cal_distance(N, sample, method = "loess", X = (1:99)/100)plot_cal_distance(N, sample, method = "loess", X = (1:99)/100)
N |
Number of observations to simulate in each sample |
sample |
Data frame with columns: dist_type: distribution type dist_parm1: first distribution parameter (e.g. mean, alpha, shape1) dist_parm2: second distribution parameter (e.g. sd, beta, shape2) cal_int: calibration intercept cal_slp: calibration slope |
method |
One of loess or line, on default is loess |
X |
Vector of predicted probabilities, on default is 0.01 to 0.99 |
Plot of simulated calibration curves
sample <- data.frame( dist_type = rep("beta", 3), dist_parm1 = c(1,2,3), dist_parm2 = c(3,4,5), cal_int = c(0, 0.05, 0.1), cal_slp = c(1, 0.9, 0.8)) plot_cal_distance(N=200, sample=sample)sample <- data.frame( dist_type = rep("beta", 3), dist_parm1 = c(1,2,3), dist_parm2 = c(3,4,5), cal_int = c(0, 0.05, 0.1), cal_slp = c(1, 0.9, 0.8)) plot_cal_distance(N=200, sample=sample)
Simulates calibration curves based on given method, and uses plot to visualize calibration instability.
plot_cal_instability(N, sample, method = "loess", X = (1:99)/100)plot_cal_instability(N, sample, method = "loess", X = (1:99)/100)
N |
Number of observations to simulate in each sample |
sample |
Data frame with columns: dist_type: distribution type dist_parm1: first distribution parameter (e.g. mean, alpha, shape1) dist_parm2: second distribution parameter (e.g. sd, beta, shape2) cal_int: calibration intercept cal_slp: calibration slope |
method |
One of loess or line, on default is loess |
X |
Vector of predicted probabilities, on default is 0.01 to 0.99 |
Plot of simulated calibration curves
sample <- data.frame( dist_type = rep("beta", 3), dist_parm1 = c(1,2,3), dist_parm2 = c(3,4,5), cal_int = c(0, 0.05, 0.1), cal_slp = c(1, 0.9, 0.8)) plot_cal_instability(N=200, sample=sample)sample <- data.frame( dist_type = rep("beta", 3), dist_parm1 = c(1,2,3), dist_parm2 = c(3,4,5), cal_int = c(0, 0.05, 0.1), cal_slp = c(1, 0.9, 0.8)) plot_cal_instability(N=200, sample=sample)
Verifies that an evidence object has the required members and standardizes it
into a bpm_evidence object. Each element's distribution is recorded together
with both its native parameters ($parms) and its first two moments
($moments, named m and v).
process_evidence(evidence)process_evidence(evidence)
evidence |
A named list of evidence elements. The required members are:
Each element may be given either as a formula, |
The two parameters of each element may be characterized flexibly, as either
native distribution parameters or summary moments. The parameters must be
either all unnamed or all named (a mix such as beta(0.4, var = 0.04) is
ambiguous and raises an error).
Unnamed (positional) parameters are taken as the native parameters of the distribution:
norm(mean, sd),
beta(shape1, shape2),
logitnorm(mu, sigma),
probitnorm(mu, sigma).
Named parameters are matched against the following aliases (pick one pair per element):
moments with a variance: mean/var (or m/v),
moments with a standard deviation: mean/sd (or m/sd),
a mean and an upper 97.5\
native beta parameters: alpha/beta,
native logitnorm/probitnorm parameters: mu/sigma.
When moments are supplied, the native parameters are obtained by the method of
moments (or, for cih, by matching the requested quantile).
A bpm_evidence object: the standardized, restructured evidence list.
# Formula form, mixing native parameters and moments: evidence <- list( prev ~ beta(116, 155), # native beta parameters cstat ~ beta(mean = 0.76, sd = 0.006), cal_mean ~ norm(-0.009, 0.125), cal_slp ~ norm(0.995, 0.024)) process_evidence(evidence = evidence) # Equivalent named-list form: evidence <- list( prev=list(type="beta", mean=0.38, sd=0.2), cstat=list(mean=0.7, sd=0.05), cal_int=list(mean=0.2, sd=0.2), cal_slp=list(mean=0.8, sd=0.3)) process_evidence(evidence=evidence)# Formula form, mixing native parameters and moments: evidence <- list( prev ~ beta(116, 155), # native beta parameters cstat ~ beta(mean = 0.76, sd = 0.006), cal_mean ~ norm(-0.009, 0.125), cal_slp ~ norm(0.995, 0.024)) process_evidence(evidence = evidence) # Equivalent named-list form: evidence <- list( prev=list(type="beta", mean=0.38, sd=0.2), cstat=list(mean=0.7, sd=0.05), cal_int=list(mean=0.2, sd=0.2), cal_slp=list(mean=0.8, sd=0.3)) process_evidence(evidence=evidence)
generates samples from a normal distribution using marginal means, variances, and covariance
rbnorm(n, mu1, mu2, var1, var2, cov)rbnorm(n, mu1, mu2, var1, var2, cov)
n |
Number of samples to be generated |
mu1 |
Mean of first variable |
mu2 |
Mean of second variable |
var1 |
Variance of first variable |
var2 |
Variance of second variable |
cov |
Covariance between the two variables |
Matrix of nx2 where column 1 contains samples for the first variable, and column 2 contains samples for the second variable conditioned on the first
Calculates sample size that achieves target confidence interval widths using Riley's framework
riley_samp(target_ciws, parms)riley_samp(target_ciws, parms)
target_ciws |
Named list containing target confidence interval width for at least one of: prev: prevalence cstat: c-statistic cal_mean: mean calibration cal_oe: observed to expected outcome ratio cal_int: calibration intercept cal_slp: calibration slope |
parms |
List containing model parameters and distribution: prev: expected prevalence cstat: c-statistic of the model dist_type: one of ("logitnorm", "beta, "probitnorm") dist_parm1: first parameter of the distribution dist_parm2: second parameter of the distribution cal_int: calibration intercept cal_slp: calibration slope |
A named list of estimated sample sizes that achieve target confidence interval widths: fciw.prev, fciw.cstat, fciw.cal_mean, fciw.cal_oe, fciw.cal_int, fciw.cal_slp