A functional data analysis approach for forecasting age-specific population size: A case study for the United Kingdom

Han Lin Shang writes about forecasting age-specific population size based on an example of the UK.

In recent decades, we have seen a considerable amount of development in the stochastic modelling and forecasting of population. Cohort component projection models are often used to model the evolution of age-specific population, and are particularly useful to highlight which demographic component contributes the most to population change. Many methods have been proposed in the demographical forecasting literature to forecast the four contributors to demographic change, namely mortality, fertility, emigration and immigration. However, existing methods provide only a best estimate as their prediction of future population size. This best estimate does not provide decision makers with a sense of the range of possible outcomes. In contrast, probabilistic methods take into account the uncertainty associated with forecasting population and provide a distribution of future values. The statistical method we propose in a recent article is a multilevel functional data analytic approach, where the age-specific mortality and migration for females and males are modelled and forecasted jointly. The forecast uncertainty associated with each demographic component is incorporated through parametric bootstrapping.

We consider a functional data analytic approach for forecasting population. As a generalisation of the popular Lee-Carter method, functional data analysis combines ideas from nonparametric smoothing, functional principal component regression, and differential equations, to name only a few. It has important advantages:

1) Smoothing. Data are smoothed in order to reduce the effect of noisy and missing observations;

2) Decomposing smoothed rates into a number of functional principal components and their associated scores. Each functional principal component captures a particular pattern of variation over age, and its associated principal component scores can be used to describe how this pattern changes over time. Multiple functional principal components allow different patterns to be captured;

3) Forecasting the principal component scores through parametric bootstrapping. Conditional on the estimated mean and functional principal component functions, probabilistic forecasts of future realisations can be obtained through parametric bootstrapping.

By drawing random samples from multivariate normal distributions, the simulated principal component scores can be obtained and forecasted using a univariate time-series forecasting technique for each replication.

The historical UK population data include observations from 1975 to 2009, from which we aim to forecast population by age and sex from 2010 to 2030. The age-specific fertility data were obtained from the Human Fertility Database, while the age-specific mortality data were obtained from the Human Mortality Database. The emigration and immigration counts were obtained directly from the Office for National Statistics. The UK population was obtained from Human Mortality Database.

By using the multilevel functional principal component regression, we forecast each component of population in a stochastic manner. We find that the age-specific mortality rates are likely to decline, especially for elderly people. The decline seems to be more rapid for males than females, although females have higher life expectancy than males. As a result of mortality decrease, we observe an increase in life expectancy at birth for female data (Figure 1a, left) and male data (Figure 1b, right).

As with age-specific fertility rates, the greatest forecast change is a continuing decrease in early age fertility, for ages between 17 and 30, and a steady increase in late fertility, for ages between 30 and 40. From the forecasted age-specific fertility, we obtain the total fertility rates, which are likely to decrease until 2015, then increase thereafter possibly as a consequence of the end of postponement (fertility decline across younger ages) and compensatory fertility increase at higher reproductive ages. The median total fertility rates are likely to fluctuate at slightly more than two children per family, as shown in Figure 2.


Figure. 2. Based on age-specific fertility rates from 1975 to 2009, obtained forecasts of total fertility rates from 2010 to 2030.

For the age-specific emigration and immigration, the greatest forecast change is a continuing increase in emigration and immigration for ages between 20 and 45 for female data (left column) and male data (right column) of Figure 3.

This slideshow requires JavaScript.

With the forecasted age-specific mortality, fertility, emigration and immigration, we obtain forecasted population via the cohort component projection model. We found that the age profile of the population in 2030 is mainly driven by future (im)migration and, to some extent, fertility. The largest uncertainties for both males and females are associated with the number of newborns, as well as the population at ages between 20 and 45. It is also expected that the number of older people will be increasing in 2030, as will the working age group between 20 and 45. The total population in the UK will exceed 70 million by the middle of 2029, as shown in Figure 4.


Fig. 4. Forecasted age profiles of males and females (on the left) and forecasted population sizes of females, males and the total (on the right) for the year 2030. The grey regions show simulated paths of the forecast population, and the shaded green lines represent 10%, 20%,…, 90% quantiles of the samples of population forecasts.

In this paper, we present the functional data analytic approach for estimating and forecasting age profiles of the four demographic components of changes in the UK. We combine the forecasts of age-specific mortality, fertility, emigration and immigration into the forecast of population size, through a cohort component projection model. The advantage of our functional models can be attributed to: (1) the use of a smoothing technique to smooth out noisy or missing observations; (2) the use of higher order functional principal components to extract patterns in the data; (3) accounting for the uncertainties embedded in mortality, fertility and migration for each age and gender. The advantage of the multilevel functional data model is that it incorporates correlation between two genders and thus allows each component of population to be modelled jointly.


Han Lin Shang is an Associate Professor at the Research School of Finance, Actuarial Studies and Statistics, Australian National University. He is also an affiliate member of the ESRC Centre for Population Change at the University of Southampton. His research interests include: Bayesian computation, demographic forecasting, functional time series analysis and nonparametric functional regression. He is currently serving as an Associate Editor for the Journal of Computational and Graphical Statistics, Australian and New Zealand Journal of Statistics. For more details on the featured blog post, refer to the working paper 41 of the ESRC Centre for Population Change, University of Southampton. This article has been published in 2016 at the International Journal of Forecasting, 32(3), 629-649. 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: