Power calculations for prognostic accuracy measures with survival data

  Cohort study design

note: cutoff value applies to all measures besides the AUC

Model Parameters:

Censoring Parameters

Hypothesis Test:


When designing a study to evaluate the clinical utility of a risk prediction biomarker for time to event data, what sample size should one use to ensure that definite conclusions can be drawn? The purpose of this web application is to guide researchers in this decision.

Often a goal is to test whether a marker has clinical value with respect to a given accuracy measure \(A\) (i.e. \(AUC\), \(TPR\), \(FPR\), …). This test takes the form:

\[ H_0 : A \le A_0 \; vs. \ H_a : A > A_0 \]

where \(A_0\) is the minimal performance requirement for the marker.

Given a biomarker with expected accuracy measure performance \(A = A_{expected}\), this application uses Monte Carlo simulation to estimate the sample size needed to achieve a specified level of power (tab: 'Calculate Sample Size'). However, estimated sample sizes may not yield exact performance standards, so the user is then encouraged to assess the estimation performance for all accuracy measures for studies with a user provided sample size (tab: 'Simulate Power').

Methods for Estimating Sample Size

We estimate the needed sample size to achieve \(1-\beta\) level power using asymptotic theory and Monte Carlo simulation. An \(\alpha\) level test of the form shown above for accuracy measure \(A\) requires a sample size equal to:

\[ n = \frac{\{\phi^{-1}(1-\alpha) + \phi^{-1}(1-\beta)\}^2 \sigma^2}{(A_0 - A_{expected})^2} \]

However, obtaining a value for \(\sigma^2\), the asymptotic variance of \(\hat{A}\), is difficult. This application simulates data such that \(A = A_{expected}\) several times (1,000) to obtain an estimate for \(\sigma\) via the sampling distribution of \(\hat{A}\).


In order to generate time to event data such that the performance of the marker yields \(A_{expected}\) for a given measure, we make the following modelling choices:

  • Distribution of the marker \(Y\): This application assumes \(Y \sim N(0,1)\). More options for the distribution of \(Y\) may be added later.

  • The failure time \(T\): We model survival via the survival function \(S(t |Y) = P(T >t|Y)\) and assume a Cox-Proportional hazards model:

\[ S(t|Y) = S_0 (t)^{e^{\beta Y}} \]

where the baseline survival distribution \(S_0(t)\) is assumed to be exponential: \[ S_0(t) = e^{-at} \]

The value for \(a\) is determined by the inputs for baseline survival \(S_0(t_0)\) for a given time \(t_0\), and the value for \(\beta\) is determined by the expected value of the accuracy measure of interest.

  • Censoring: We assume non-informative censoring. Censoring times follow an exponential distribution with rate parameter to chosen to fix the average percentage of censored observations.

Time-dependent measures of accuracy

For a continuous marker measured at baseline, \(Y\), a cutpoint \(c\) must be specified such that \(Y>c\) determines the subgroup of observations deemed ''high risk'' and \(Y \le c\) ''low risk''. Once this is specified, along with a future time \(t_{predict}\) at which we wish to predict performance, the following time-dependent measures of accuracy can be considered for sample size and power calculations:

  • True positive rate: \(TPR_t ( c ) = P(Y > c | T \le t_{predict})\)
  • False positive rate: \(FPR_t ( c ) = P(Y > c | T > t_{predict})\)

  • Area under the ROC curve: \(AUC_t = \int_0 ^1 TPR_{t_{predict}} (FPR_{t_{predict}} ^{-1}(u)) du\)

  • Positive predictive value: \(PPV_t ( c ) = P(T \le t_{predict} | Y > c)\)

  • Negative predictive value: \(NPV_t ( c ) = P(T > t_{predict} | Y \le c)\)


Estimates for all measures can be calculated via non-parametric or semi-parametric methods. Non-parametric estimates are computed using double inverse probability weighting (DIPW), and semi-parametric estimates assume a Cox proportional hazards model. An R package that implements the estimation procedures used in this application is available here. Please see the referenced papers for detailed information regarding estimation.



  1. Liu D, Cai T, Zheng Y. Evaluating the predictive value of biomarkers with stratified case-cohort design. Biometrics 2012, 4: 1219-1227.

  2. Pepe MS, Zheng Y, Jin Y. Evaluating the ROC performance of markers for future events. Lifetime Data Analysis. 2008, 14: 86-113.

  3. Zheng Y, Cai T, Pepe MS, Levy, W. Time-dependent predictive values of prognostic biomarkers with failure time outcome. JASA 2008, 103: 362-368.


This app was built using R and Shiny. Please direct any questions to mdbrown@fhcrc.org.

Instructions for use

  1. Using the tab “Model Characteristics” to guide your decisions, choose the accuracy measure you wish to compute sample size. Specify the study design, the marker cutpoint (if needed), the baseline hazard, and other parameters, including the expected value of the measure, and the hypothesis test you wish to calculate power for.

  2. Under the “Calculate Sample Size” tab, set the power you hope to achieve and click “calculate sample size” to run a simulation to estimate the needed sample size.

  3. Once a sample size \(n\) is chosen, navigate to the “Simulate Power” tab to run simulations to learn how studies with \(n\) observations perform with respect to your measure of interest and other measures.

It may take a moment to display figures.

Distribution of Survival and Y

For survival time T, the figure on the left shows survival, S(t |Y) = Pr(T > t |Y), with respect to time for several different marker values. The vertical line highlights the prediction time. On the lower right, the marker distribution, along with cutpoint is shown. The upper right shows S(t|Y) by marker value when t equals the prediction time chosen.

Event Rates


Given the inputs received, the table below shows the true value for each measure, including β, the coefficient in the Cox-Proportional hazards model. Also shown are the values of all measures under H0. The figure below shows the true receiver operating characteristic (ROC) curve, along with the curve under the null hypothesis. In addition, curves for PPV(c) and NPV(c) are shown with respect to marker quantile. The chosen cutpoint is highlighted on each curve.

This simulation will take between 10-20 seconds to run for semiparametric estimates and 70-80 seconds for nonparametric estimates.


Note: The results of this simulation are based on asymptotic theory which may not hold exactly for small samples. Therefore, the estimate provided should serve as a starting point for further power simulations. See the tab ''Simulate Power'' to check the performance of studies with a fixed sample size.

This simulation can take from a few seconds to several minutes to run. Running at the largest settings of 1,000 simulations with a sample size of 2,500 takes roughly 20+ minutes to complete, while running 100 simulations at sample size 500 takes approximately 30 seconds. It is highly recommended to run this app from your local machine for larger simulations. If you would like to do this, please email mdbrown@fhcrc.org for access to source code

Other Measures:

Calculation IN PROGRESS...