Validation of the CancerMath prognostic tool for breast cancer in Southeast Asia

Women diagnosed with pathological stage I to III breast cancer according to American Joint Committee on Cancer Staging Manual sixth edition, who underwent surgery, were identified from the Singapore Malaysia Hospital-Based Breast Cancer Registry, which combined databases from three public tertiary hospitals. The breast cancer registry at National University Hospital (NUH) in Singapore collects information on breast cancer patients diagnosed since 1990. The Tan Tock Seng Hospital (TTSH) registry registers patients diagnosed from 2001 onwards. The University Malaya Medical Centre (UMMC), located in Kuala Lumpur, Malaysia, has prospectively collected data on breast cancer patients diagnosed since 1993 [24]. No consent was needed and ethics approval was obtained from Domain Specific Review Board under National Healthcare Group in Singapore and Medical Ethics Committee under UMMC. The consolidated registry included information on ethnicity, age and date of diagnosis, histologically determined tumor size, number of positive lymph nodes, ER and progesterone receptor (PR) status (positive defined as 1 % or more positively stained tumor cells at NUH or 10 % or more positively stained tumor cells at TTSH and UMMC, negative, or unknown), HER2 status based on fluorescence in situ hybridization (FISH) or immunohistochemistry (IHC) if FISH was not performed (positive defined as FISH positive or IHC score of 3+, negative defined as FISH negative or IHC scored of 0 or 1+, equivocal defined as IHC score of 2+, or unknown), histological type (ductal, lobular, mucinous, others, or unknown), grade (1, 2, 3, or unknown), type of surgery (no surgery, mastectomy, breast conserving surgery, or unknown), chemotherapy (yes, no or unknown), hormone therapy (yes, no, or unknown), and radiotherapy (yes, no, or unknown). Detailed chemotherapeutic treatment regimens were only available for UMMC patients. For chemotherapy, cyclophosphamide, methotrexate and fluorouracil (CMF) was categorized as first generation regimen and fluorouracil, epirubicin and cyclophosphamide (FEC), and doxorubicin and cyclophosphamide (AC) followed by paclitaxel were second generation. Docetaxel, doxorubicin and cyclophosphamide (TAC), and FEC followed by docetaxel were categorized as third generation. Hormone therapy was categorized into five groups: tamoxifen, aromatase inhibitors (AI), tamoxifen followed by AI, ovarian ablation, and ovarian ablation plus tamoxifen. Vital status was obtained from the hospitals’ medical records and ascertained by linkage to death registries in both countries. Patients diagnosed until 31st December 2011 were followed up from date of diagnosis until date of death or date of last fellow-up, whichever came first. Date of last follow-up was 1st March 2013 for UMMC, 31st July 2013 for NUH, and 1st October 2012 for TTSH. Male patients, patients with unknown age at diagnosis and tumor size were excluded from this analysis as these two were essential predictors for all four CancerMath calculators.

Javascript codes of all four CancerMath calculators which contained predetermined parameters and mathematical equations were exported on 9th Nov 2013 from its website by selecting “view-??source” in the browser menu. The script was then transcribed into R script to allow calculation for a group of patients. For nodal status calculator, patient’s age, tumor size, ER and PR status, histological type, and grade were used by the program to calculate probability of positive nodes for each patient. Overall mortality risk at each year up to 15 year after diagnoses was predicted by outcome calculator, based on age, tumor size, number of positive nodes, grade, histological type, ER, PR, and HER2 status. Effect of hormone and chemotherapeutic regimen on overall mortality was further adjusted by the therapy calculator and number of years since diagnosis were considered in the conditional survival calculator. Results from R script and website were crosschecked with a random subset of 20 patients to verify the accuracy of R script. Histological type recorded as others was re-categorized as unknown. If HER2 status was equivocal based on IHC and FISH was not performed, HER2 status was treated as unknown. Evidence of recurrence was set as unknown for conditional survival calculation.

In total, 7064 female breast cancer patients were included. Only cases with known nodal status (N?=?6807) were included for validation of nodal status calculator and their individual probability of positive lymph nodes was calculated. For outcome calculator, two separate subsets of patients with minimum 5-year follow up (UMMC and NUH patients diagnosed in 2007 and earlier and TTSH patient diagnosed in 2006 and earlier, N?=?4517) and patients with 10-year follow-up UMMC and NUH cases diagnosed in 2002 and earlier, N?=?1649) were selected for comparison of observed and predicted survival. As NUH and TTSH did not collect details of hormone therapy and chemotherapy regimen data before 2006, therapy calculator was only validated for UMMC patients with minimum 5-year follow up (N?=?1538).

Statistical analysis

Nodal status calculator

Observed and predicted probability of positive lymph nodes were compared. Calibration was assessed by dividing the data into deciles based on the predicted probability of positive nodes and then plotting the observed probability of positive nodes against means of predicted probability for each decile. A 45 degree diagonal line was plotted to illustrate perfect agreement. Discrimination of nodal status calculator was evaluated by area under the curve (AUC) in receiver operating characteristic analysis. A value of 0.5 indicates no discrimination and a value of 1.0 means perfect discrimination.

Outcome and therapy calculator

Ratio of observed and predicted numbers of death within 5 years and 10 years of diagnosis were calculated as mortality ratio (MR) with 95 % confidence interval (CI) constructed by exact procedure [25]. MR was also calculated for different subgroups by country, period of diagnosis, age, race, and other clinical characteristics. Observed 5-year and 10-year survival rates were compared with the median predicted survival from CancerMath. A difference of less than 3 % would be considered reliable enough for clinical use as 10-year survival benefit of 3–5 % is an indication for adjuvant chemotherapy [26]. The relationship of average 5-year and 10-year predicted survival and observed 5-year and 10-year survival was illustrated by the calibration plot. Discrimination of outcome and therapy calculator was evaluated by AUC using dataset with minimum 5-year and 10-year follow-up accordingly. Outcome calculator was further evaluated using concordance index (c-index) proposed by Harrell et al. for the entire dataset regardless of follow-up time [27]. C-index is the probability of correctly distinguishing patient who survives longer within a random pair of patients [27]. Like for the AUC, a c-index of 0.5 indicates no discrimination and a c-index of 1.0 means perfect discrimination.

Conditional survival calculator

For patients who survived two years after diagnosis, predicted 5-year survival was compared with observed 5-year survival. Similarly predicted 10-year survival was compared with observed 10-year survival for patients who survived 5 years and 7 years respectively. Discrimination was evaluated by AUC.