Development and validation of new glomerular filtration rate predicting models for Chinese patients with type 2 diabetes

Subjects and study design

The study recruited consecutive patients who had known type 2 diabetes for whom complete
clinical data were available in the Third Affiliated Hospital of Sun Yat-sen University,
China. Exclusion criteria were: (1) age less than 18Â years old; (2) having acute kidney
function deterioration (the level of serum creatinine on the day of undergoing GFR
measurement differed more than 15Â % compared with that on the day of admission), skeletal
muscle atrophy, edema, pleural effusion or ascites, malnutrition, amputation, heart
failure, or ketoacidosis; (3) being treated with dialysis at the time of the study;
(4) taking cimetidine, trimethoprim and injection of albumin or diuretics intravenously
before the measurement of GFR. Finally 519 type 2 diabetic patients were enrolled.
Patients treated from January 2005 through December 2012 were randomly divided into
the development data-set (nÂ =Â 276) and the internal data-set (nÂ =Â 138). Data obtained
from January 2013 to June 2013 were used as the external validation data-set (nÂ =Â 105).
Written informed consent was obtained from all subjects. The study was approved by
the institutional review board at the Third Affiliated Hospital of Sun Yat-sen University.

Laboratory measurements

The measurement of GFR was obtained by a technetium 99Â m diethylene-triaminepentaacetic
acid (
^99m
Tc-DTPA) renal dynamic imaging method (modified Gateâ€™s method), using a Millennium
TMMPR SPECT with the General Electric Medical System (Discovery VH, GE Healthcare,
Little Chalfont, UK). The details have been described previously 20]. The measured GFR (mGFR) was calibrated to equal the dual plasma sample
^99m
Tc-DTPA GFR. Applying Open Epi software (Version 2), the minimum sample size was calculated
as 36 21] (95Â % confidence level and 80Â % power). The calibrated GFR measurement was referred
to as the standard GFR (sGFR) in our study. Two and four hours after the injection
of
^99m
Tc-DTPA into the opposite forearm, blood samples were drawn intravenously and collected
in heparinized tubes. Radioactivity in the separated plasma was recorded by a multi-function
well counter (ZD-6000 multi-function instrument from Zhida Technology Company, Xian,
China). Serum creatinine analysis was performed by a Hitachi 7180 autoanalyzer (Hitachi,
Tokyo, Japan; reagents from Roche Diagnostics, Mannheim, Germany) using the enzymatic
method, and after the year 2010 serum creatinine was traceable by isotope dilution
mass spectrometry. HbA1c was detected by high performance liquid chromatography, while
UACR was determined by the immuneturbidimetric assay.

Metrics for the development of the new regression models

The development of the new regression models was based on both the development and
internal validation data sets. The predictor variables involved in the establishment
were age, sex, serum creatinine, BMI, HbA1c, and UACR. Age, sex, and serum creatinine
were included in all new models, with BMI, HbA1c, and UACR added separately or in
combination. In the development of the new equations, sGFR and serum creatinine were
transformed to a log scale, while BMI, HbA1c, and UACR were on the natural scale.
Least squares linear regression was adopted to relate sGFR to the predictor variables.
As the method mentioned in the establishment of the CKD-EPI equation, a nonparametric
smoothing spline was used to configure the shape of the relationship of log standard
GFR with log creatinine, and the nonlinearity relationship represented in the smoothing
splines was characterized by means of piecewise linear splines.

Metrics for the development of the new ANN models

The new ANN models were programmed by MATLAB 2011A (The MathWorks Inc, Natick, MA,
USA). A three-layer back-propagation (BP) network consisting of an input layer, a
hidden layer and an output layer was established. The predictor variables, the same
as the description above, were the input variables with standard GFR as the output
variable. Each neuron in the hidden layer took the S function as an exciting function
and with different numbers of neurons in the hidden layer (1â€“11), several networks
were programmed. After the random initialization, all net works were trained in the
development data-set by learning the rule of back propagation. Their performance in
the internal validation data-set determined the optimal network. Performance was assessed
by mean square error in the internal validation data-set and the smallest mean square
error meant the best performance. With thresholds and weights specified after training,
the output of the network was calculated by the weighted summation of each neuron
to approximate sGFR. The introduction of the genetic algorithm into the BP network
(GABP network) enabled optimized initialization of the weights and thresholds, leading
to improvement of performance of the ANN models. In the GABP network, encoded as a
chromosome, all weights and thresholds of one network evolved from one generation
to another through the progression of mutation and crossing. The initial weights and
thresholds were chosen for the next generation if the network demonstrated better
performance in the internal validation data-set. Superior initial weights and thresholds
were eventually applied in the initialization of the network. Details of the construction
of the new ANN model are presented in the Additional file 1.

Determination of the optimal models

All the new models were compared with the new Japanese equations and the CKD-EPI equation
in the external validation data set. The performance was defined by bias, precision
and accuracy.

Expression of the equations

CKD-EPI equation 4]

? is 0.7 for female and 0.9 for male, ? is 0.329 for female and 0.411 for male, min
indicates the minimum of SC/? or 1, and max indicates the maximum of SC/? or 1Japanese
equationÂ 1 22]

Japanese equationÂ 2 9]

Statistical analyses

Results are expressed as meanÂ Â±Â SD or as median. Bias was assessed as the median of
the difference between sGFR and eGFR, and precision was defined as the inter-quartile
range (IQR) of the difference. Accuracy was measured as the percentage of eGFRs within
30Â % of the sGFR. The 95Â % confidence intervals were calculated by the bootstrap method
(2000 bootstraps). The quantitative variables between two data-sets were compared
using the independent samples t test or the Mannâ€“Whitney test. Differences and accuracy within the data-set were
compared by Wilcoxon signed rank test and McNemar test. All analyses were conducted
using SPSS software (version 13.0 SPSS), R (R i386 3.0.2) and MATLAB software (version
2011b).