Making health insurance pro-poor: evidence from a household panel in rural China


The puding panel dataset

This paper uses a census-type rural household panel with a total of four waves, of
which we primarily use three in the analysis, as the baseline predates NCMS implementation.
Data was collected in 2004 (pre-NCMS), and again in 2006, 2009 and 2011. The data
cover three administrative villages in Puding County, Guizhou Province, Southwestern
China.
The three villages lie within 10 km of the county seat. This dataset has the advantage
of covering every household in the area. As a complete census, this dataset is uniquely
adapted to studying questions of inequality, because unlike with drawn samples we
can be sure that the data captures any year-on-year changes in income distribution.
The sample size is about 800 households in each year and 3500 individuals in each
wave of the panel, for a total of ?=?14469 observations (after 48 individuals were removed due to missing or corrupt data.
No entire household was removed). All households are tracked in all 4 years of data,
except for those who left the area, newly formed households (by former members of
another household) or households who migrated into the area. Approval and funding
for data collection was obtained from the Chinese National Science Foundation, without
additional ethical review requirements for the collection of economic data.

The Puding panel surveys collected health status, medical expenditures, and medical
insurance reimbursements for each household member of each home in the three villages.
We have yearly data at the individual level, but not at the “health event” level:
we know how much was spent on medical care for each person, but not how many times
they saw a doctor or what treatment they received each time. We do have variables
reflecting disease type and location of treatment for the most notable occurrence
in the past year, and whether an individual suffers from any chronic disease. In addition,
the dataset provides detailed individual demographics, employment information, household
expenditures and income etc. Linking health and economic information in the survey
allows us to analyze the impacts of NCMS on poverty and inequality, as well as to
identify which socio-economic groups benefit from the insurance.

Table 1 reports statistics from our sample. This region is among the poorest and most unequal
in China; Puding is on the government’s official list of “impoverished” counties.
Villagers depend heavily on agriculture, even though land is scarce and soils are
poor. Poverty is high but decreasing: using the 2004 national poverty line of 668
RMB per person per day and measures of deflated income (to 2004 RMB), the poverty
rate decreased from 26 to 15.6 % between 2004 and 2011. Using the international dollar-a-day
poverty line yields different poverty rates, but also points to sharply decreasing
poverty.
Inequality, however, rose in that same period: the Gini coefficient increased from
0.41 to 0.55.

Table 1. Survey summary statistics

The number of households in the sample increased somewhat over the period, reaching
900 in 2011. The composition of the population also evolved somewhat, primarily due
to migration of workers. The proportion of households with migrants increased from
36.7 to 41.9 %, which was accompanied by an increase in the share of elderly population
rising from 7.0 to 9.6 %. Such shifts in the population are the reason why it is necessary
to approach health-related questions with a regression framework that can control
for them.

The NCMS was not active in this county at the time of the first survey wave in 2004.
The second wave (2006) was the first year surveyed households could enroll in NCMS,
with 82 % of households participating. Participation rose to 98 % and 95 % in 2009
and 2011, respectively.
Premiums increased dramatically, from 45 RMB to 230 RMB, but most of this is publically
sponsored, so that the share owed by farmers remains a small fraction of that (10–30
RMB). High participation rates suggest that self-selection into enrollment is not
a major concern in this context. This allows us to focus on the question of how different
enrollees benefit from the scheme. The table also shows that the share of population
seeking treatment fluctuates between 45 % and 60 %. The percentage of those receiving
reimbursement went from 14 % in 2006 to 60 % in 2009 to 39 % in 2011. These fluctuations
suggest that disease patterns and prevalence are not stable from one year to the next.
This is one of the reasons why we focus on the relationship between income and NCMS
benefits rather than looking at specific treatments received, which are likely to
be heavily influenced by fluctuating environmental factors.

Empirical framework

We use a regression framework to assess whether income bears relation to benefits
an individual reaps from NCMS. While benefits are individual, not all patients are
income earners, such that it makes sense to pool income at the household level. The
relation we are trying to test can be expressed in the following generic equation:

Where Zi,t is a vector of individual characteristics, Xh,t a vector of household characteristics, Tt a series of year dummies, and household income is expressed per capita. Year dummies
capture the year-specific mean shifts. We can divide the sample into two periods:
pre-reform (2006) and post-reform (2009, 2011), so as to highlight reform impacts.
Similarly, in some specifications we add a reform-income interaction term to the right-hand
side, to capture the fact that income affects benefits differently before and after
the reform.

Many NCMS-related outcomes could be used on the left-hand side, such as use of healthcare
services or out-of-pocket payments; indeed such variables are often used in the literature
on health insurance. However, such outcomes are inherently reflective of healthcare
needs that may vary with income, in which case “inequality cannot be interpreted as
inequity” 2]. Using measures of reimbursement allows to eliminate confounding factors.

We use three measures of benefits: a reimbursement dummy (binary variable for having
received reimbursement), the amount of reimbursement received, and the rate of reimbursement
(reimbursement/total cost). Depending on the explained variable, we restrict the sample
to those who sought medical care or to those who received reimbursement, in order
to eliminate concerns about (self-) selection processes.
If, among those who sought medical care, individuals from higher income strata are
more likely to obtain reimbursement than poorer ones, it may indicate that the breadth
of coverage is not well targeted to the needs of the poor. If, among those who received
reimbursement, the wealthier tend to receive higher payments or higher reimbursement
rates, it suggests that the depth of coverage is not well targeted to the needs of
the poor.

Explained and explanatory variables

Table 2 summarizes all relevant variables, with means and standard deviations. The reimbursement
dummy is equal to 1 if an individual received any NCMS reimbursement during a given
year. We also compute the amount of reimbursement received in a year expressed in
log, and the fraction of costs reimbursed in a year.

Table 2. Summary Statistics for Dependent and Independent Variables, pooling years 2004, 2006,
2009 and 2011

The key independent variable is the natural log of non-transfer income (no household
reported zero income). Total non-transfer income was computed using exhaustive information
on all sources of income: agricultural production and/or sales, salaries, wage work
and odd jobs, gifts and remittances, income from rents, profits from trade, service
provision, and non-farm enterprises of all sorts.

Computing incomes from agriculture required calculating the value of output and netting
out all costs. Output outliers were identified and corrected on the basis of yield:
where recorded output-per-hectare was unrealistically high, the average yield in the
village was used to compute a new output estimate. Own consumption of agricultural
output was valued at farmgate prices, which were culled from sales data. Cost data
collected in the questionnaire was exhaustive and highly disaggregated, including
totals spent on seed, fertilizer and other chemicals, irrigation, labor, and mechanized
inputs and fuel. Any costs incurred in kind (for instance, seed borrowed from neighbors)
was also valued in the questionnaire and therefore easy to incorporate into our calculations.
Costs of livestock breeding activities included feed, medicine, labor, veterinary
services, and costs of machinery and fuel. Enterprise profits are computed as revenues
net of costs, both of which are reported by the respondents for each activity they
engage in. Summing income from all these sources yields total non-transfer income.
Income coming from government transfers is included in the regressions separately,
as it could possibly be an indicator of poverty rather than wealth, since government
transfers are often targeted at those who need them most. This is done as a precaution
and helps avoid confusion in the interpretation of results, but does not significantly
alter them. Transfer income is also expressed in natural log (with 1 RMB added to
ensure no missing values).

Running specifications at the individual level allows us to control for self-reported
health status (coded on a 1-to-4 scale), sex, age, education and marital status. We
also include indicator variables for usual diagnostic place and type of treatment,
and, in some specifications, a dummy for NCMS membership. We include household-level
controls for farmer households, ethnic minority households, and household size. Some
specifications also include a dummy for chronic diseases, and for weather the household
was in debt at the end of the previous year.

Model Specifications

Depending on which variable is on the left-hand side, we run the regressions as probit
models, as ordinary least squares (OLS), or as tobit models. Probit regression is
appropriate when the explained variable is binary, such as our reimbursement indicator.
Probit regression coefficients can be interpreted as impacts on the probability to
receive reimbursement. For continuous variables such as reimbursement amounts or reimbursement
rates, we can use OLS regressions. To eliminate confounding factors, we run OLS only
on the subsample of households who received NCMS reimbursement. This leads to a relatively
small sample size in 2006 (?=?246 in 2006, ?=?1792 post-reform). An alternative way of controlling for confounding factors without
restricting the sample is to use tobit specifications 39]. This assumes that the distribution of reimbursements is left-censored at zero, and
allows for the underlying process governing the amount of reimbursement to be different
from the one determining whether any reimbursement was actually received. We run tobits
on the larger sample of all those who had medical expenses (1720 observations in 2006).
All data compilation and analysis was conducted using the Stata software, regressions
made use of the reg, probit and tobit commands native to the software.