Analysing population-based cancer survival – settling the controversies

Our examples illustrate how the described measures provide fundamentally different, but complementary, information.

This is best illustrated by the 10-year age-related patterns for prostate cancer (Fig. 3d). The U-shape curve of net mortality reflects the worse prognosis of prostate cancer among young and old patients. However, among young patients, net mortality and probability of dying from the cancer are very comparable because these patients mostly died from their cancer, as shown by their very low probability of dying from other causes. By contrast, despite high net mortality denoting a poor prognosis of prostate cancer, less than half of old patients died from their cancer. However, because of the rapid, secular increase in life expectancy, i.e. decrease in probability of dying from other causes, the gap between net and crude cancer mortalities implies that prostate cancer may become an even bigger public health problem in a near future. Indeed, the number of deaths due to prostate cancer may increase dramatically among the elderly patients if net survival of prostate cancer, i.e. its prognosis, does not improve significantly.

The choice of which measure to report is delicate. Crude mortality is more relevant for health policy-makers [11, 14], since it quantifies the actual contribution of the disease to overall mortality. Net survival is the survival probability derived solely from the cancer-specific hazard of dying. Because it is unaffected by differences in mortality from other causes, it is the only measure allowing a proper comparison of different populations according to time, geography or other characteristics.

However, this does not mean that an observed difference in net survival between two groups cannot come from their different demographic structure – for example, if age affects the cancer specific hazard, then the net survival of groups with different age structure is expected to differ and has to be taken into account for example by age-standardization.

Relative survival ratio used to be the main reported measure as it was thought to equal net survival. We believe it may be still appealing as a direct comparison between the overall survival of the patients and the expected survival from the general population.

Two further terms, “relative survival” and “cause-specific survival”, are still often used in the literature as they would describe measures. The former is confusing as it could apply to any measure within the “relative survival setting”. The latter is unfortunate since, while cause-specific mortality aims to estimate the proportion of patients dying from each specific cause, one has to survive all causes to be still alive. We propose to avoid both terms.

Once the measure of interest is determined, one should choose among the methods that estimate it. We describe here the most common alternatives that appear in the field.

One option is to use model-based predictions, i.e. a parametric estimator of the measures. Here, one should keep in mind that the first step in the analysis of survival data is the non-parametric estimation of the survival curve, which gives us a description of the data. This analysis may be followed by modelling the effect of covariates (looking at trends, etc.), which requires modelling assumptions, that can be rather complex and often include interactions. In any case the model specification needs to be checked; a first evaluation can be done simply by drawing the survival curve predicted by the model and check whether it fits well to the non-parametric curve. Therefore, as for any type of analysis, using only the parametric approach to describe the data is not appropriate, but once the model is in this way proven to be acceptable, it is a powerful tool to understand the data in depth and make predictions.

A lot of confusion exists about the non-parametric method to use for net survival estimation while the estimation proposals of the above three other measures are quite clear. In the past, the Ederer I or its correction, the Hakulinen method, has been used for estimation. More recently, some authors claim Ederer II should be used [12], but all three methods have been theoretically proven not to be consistent [4]. While the bias in small samples might be hard to discern due to large variation, it still persists in large samples (i.e. the method is not consistent), where variation of the estimates becomes negligible. An example is given in our simulated data set, where at 10 years Ederer II misses the true value by 5% (Fig. 4) and Ederer I misses it for 13% (Fig. 2). This means that, though analysing these same data, one would conclude to significantly different results. Furthermore, a researcher using the Ederer II method on our data would wrongfully conclude that net survival worsened from 1990 to 2000. This example illustrates how the choice of approach can affect results of comparisons between population groups, an issue already raised by Seppå et al. [6].

https://static-content.springer.com/image/art%3A10.1186%2Fs12885-016-2967-9/MediaObjects/12885_2016_2967_Fig4_HTML.gif
Fig. 4

Ederer II and PP estimator. The comparison of Ederer II (blue solid curve) and PP estimator (black solid curve) on the simulated data set A. The dashed curves present the confidence intervals for each method

Fortunately, most publications in the past reported “age-standardized results”, which implies analysis stratified by age and sex, a situation in which the Ederer I, Hakulinen and Ederer II methods give comparable results, provided that the stratification was fine enough. In that case, all patients of each strata have roughly the same hazard and therefore, the relative survival ratio and net survival within the strata become roughly equal. Recent publications have focused on age-stratified Ederer II [8, 9] and have used simulations to support this theoretical fact and to evaluate the size of the bias in practice where the age intervals for stratification are rather wide. Further, the mean square errors of the age-stratified Ederer II were compared to those of the PP method, but the simulations were largely affected by the fact that they included very old patients, for whom very little or no information on long-term net survival is available in the data since their probability of dying of other causes is so high. In such cases, the variance of the PP method becomes very large reflecting that no information is available to estimate net survival of the old patients. On the other hand the variance of the age-stratified Ederer II remains comparatively small, since the estimator relies on the assumption that the younger patients still in the risk set carry the required information. In practice, this assumption may or may not be true, therefore, their variance and bias strongly depended on the simulation parameters and results presented in both papers do not entirely agree. Further work may be needed to clarify these issues.

In practice, a common criterion in choosing a certain method is also the availability of the software. In the relative survival setting, the most recently introduced method is the PP estimator and the inexistence of this method in some of the standard software (e.g. SAS) can be a clear reason for the age-stratified Ederer II method to be used. However, the command availability is not a problem for the R or Stata users. All the methods mentioned in this paper are available in R package relsurv [15], in Stata, the commands stns [16], strs [17] and stnet [18] include the required options. In SEER*Stat the PP estimator is under development.