Correlation is not causation
by Andrew Teller
So many people keep believing that energy conservation – the famous negawatts – will provide the answer to tomorrow’s energy needs. It is therefore worth devoting this column to another approach that highlights the consequences of relying too much on savings. In a paper written in 20001, Alan D. Pasternak shows that forecasts failing to take human development needs into account can only seriously underestimate the world’s future energy requirements.
His argument is based on the use of the Human Development Index (HDI), a measure of human well-being compiled annually by the United Nations for each and every country. This index is a combination of life expectancy at birth, level of education and Gross Domestic Product (GDP) per capita. It ranges from a theoretical minimum of zero (for a life expectancy = 25 years, complete illiteracy and a GDP per capita = $100 at purchasing power parity) to a theoretical maximum of one (for a life expectancy = 85 years, 100% literacy and a GDP per capita = $40,000 at purchasing power parity)2 . In practice, the observed range is 0.3 – 0.97. Charting the HDI against the yearly electric consumption per capita, A. Pasternak obtains figure 1, from which he observes that 4,000 kWh per capita constitutes a threshold.
The reasons for choosing 4,000 kWh are the following. No country with an electric consumption below 4,000 kWh has an HDI above 0.9 and, barring four cases (South Africa, Saudi Arabia, Russia and South Korea), all countries consuming more than this value per capita have an HDI greater than 0.9. This observation is then used to assess the consequences of growth scenarios assuming that, over the 2000 – 2020 time frame, an increasing number of countries reach or exceed the 4,000 kWh threshold. This assumption is combined with the universally accepted assumption that world population will keep growing over the same period. The resulting electricity, and hence primary energy, requirements are indeed noticeably higher than those obtained without taking the human development factor into account. Comparisons with forecasts by the US Department of Energy show that Pasternak’s figures are 50 to 100% higher. The author also demonstrates in his paper that correlating HDI with electricity consumption gives better results than with primary energy consumption. Figure B-1 shows the logarithmic relationship between HDI and kWh/capita, which gives a correlation factor R² higher than 0.84, indicating a good fit3. Fitting a curve on HDI plotted against primary energy consumption would yield an R² of about 0.80.
The paper concludes that its “estimates of electricity use associated with high levels of human development […] argue for substantially increased energy and electricity supplies in the developing countries and the formulation of supply scenarios that can deliver the needed energy within resource, capital and environmental constraints. Neither the Human Development Index nor the Gross Domestic Product of developing countries will increase without an increase in electricity use”.
This, I thought, was an argument for the development of nuclear energy that nobody would be able to deny. Well… not exactly: I found on the Internet another paper4 that uses the same line of reasoning but – unexpected to me – reaches the conclusion that HDI increases can be achieved without raising the world’s energy consumption. Its author, Manuel Garcia, proceeds in four steps. First, he uses a different formula for trend fitting displaying a stiffer bend (see curve in red on figure 2, the curve in blue is based on the abovementioned logarithmic law).
This enables him to choose the lower threshold of 2,000 kWh/capita. Second, he considers that, while all low consumption countries would increase their use of electricity, the richest ones would decrease theirs by 6,000 to 8,000 kWh/capita. This, he reckons (no details given), would require a doubling of the current (2006) world electricity needs. Third, he makes no allowance for population growth. Fourth, he submits that the aforementioned doubling could be taken care of mainly by energy conservation measures and by increasing energy efficiency. Granting for a moment that these measures would suffice in the case of a doubling of the requirements, would they still be enough if the needs triple, as would be the case once population growth is included in the picture?
While Pasternak’s paper clearly constitutes much better research than the second one (more on this in the technical appendix), there seems to be one point both papers have missed: the nature of the correlation between electricity and HDI remains mysterious in both cases. Is there really a causal relationship between electricity use and HDI? Alan Pasternak seems to believe it since he sees an increase of the former as the path to increasing the latter, as indicated by the quotation reproduced above. Similarly, in the second paper, a trend curve fitted on the dots of the chart is considered as providing a one-to-one relationship that links the two variables: moving the electricity consumption will set the HDI moving in accordance on the curve. But correlation is not causation and it is easy to find examples supporting this statement. The most convincing I know is the following one. Would the reader please note that there is a clear, positive correlation in the reading ability of children and the length of their feet. Does this mean that an improving reading ability is caused by growing feet (or conversely that improving one’s reading ability makes one’s feet grow)? Of course not; there is a third factor coming into play: age. The older children are, the longer they will have practiced reading and the bigger will their feet have become5. I submit that there is something similar at play in the HDI, kWh/capita case. Apart from the fact that the widespread use of electricity undoubtedly contributes to a population’s well-being, it is also a sign of the capability of this population of engaging in a fairly high-tech activity. Generating and distributing electricity requires skilled personnel, hence a high literacy rate; it requires large investments, hence a sufficiently high GDP, which in turn is needed to enable the consumers to pay their bills; finally, it is best ensured in periods of peace, which in turn promotes schooling and ensures higher life expectancy. Seen in this light, HDI and electricity consumption appear to be two facets of a more general phenomenon, i.e. the capability of a society of generating sufficient income and using it to the benefit of its members. I see a confirmation of this interpretation in Pasternak’s finding that electricity consumption provides a better predictor of HDI than primary energy consumption: it does indeed take more skills overall to generate and distribute electricity than to drive cars and burn coal. The main consequence of the postulated relationship is that taking dedicated measures aimed at pushing electricity consumption will not boost HDI to the extent predicted by the correlation curve, at least not in the short run. It is the society’s workings that must be improved in every respect to reach this goal and it takes time to achieve this while maintaining a balance between all the intervening factors. It must be noted in this respect that Figure 1 does not allow identifying the dots corresponding to countries having reached such balanced state (e.g. Canada) and those that have not yet reached it (e.g. South Korea, judging from its evolution between 1997 and 2006). Similarly, reducing electricity consumption too abruptly in the richer countries might well have a detrimental impact on their HDI through the disruption of the existing equilibriums.
The lessons that can be drawn from the foregoing are the following:
Moving countries in the HDI kWh/capita plane must be done in an orderly way, which will take time to reap the full benefits of such moves.
Banking on reductions in electricity consumption is a misguided strategy; aiming for a decrease of the primary energy consumption while leaving electricity consumption on an upward trend seems much preferable;
In this context, all CO2-free ways of producing electricity will necessarily have a role to play.
M. Garcia’s argument is impaired by an improper use of trend fitting. He rejects the logarithmic curve for two reasons, which both appear to be unjustified. First, he notes that choosing the analytic function of the curve fitted is something entirely left to the researcher and must be guided only by the quality of the fit. Not so in the present case. The HDI – kWh/capita issue clearly suggests a decreasing effectiveness of kWh consumption in improving HDI. It makes therefore sense to assume that, at any point of the curve, an increment δHDI would be inversely proportional to the number x of kWh/capita already achieved and proportional to the increment δx coming in addition. In other words: δHDI = kδx/x, where “k” is a suitable proportionality factor. But a differential equation of this type yields a logarithmic function by integration, which justifies the choice made by Pasternak. Second, Garcia notes that the logarithmic function is not appropriate because it tends to -∞ when x tends to 0 and yields values of HDI > 1 for large values of x. But this problem is eliminated a) by postulating that the lower limit of the definition range of x be 1 kWh/capita (what happens between 0 and 1 kWh is strictly irrelevant to the problem at hand) and b) by noting that HDI can indeed exceed 1 if life expectancy and GDP/capita exceed the current limits used to compute the index. On the other hand, his own preferred correlation function, named H4 in Figure 2, is based on the hyperbolic tangent, which leads to a marginally better correlation coefficient, but bears no relation with the actual mechanism at play in the problem.
Worse still, he makes the following observation:
The world averaged values of HDI and x are equal to 0.741 and 2,465 kWh/capita respectively. But for HDI = 0.741, the H4 correlation gives x = 1,150 kWh. Therefore, the world is consuming more than twice (2,465/1,150 = 2.14) the electricity needed to achieve this level of HDI.
This result is attributed by the author to the wastage induced by wars. While it is true that wars entail considerable wastage, and not only in human lives, this explanation is completely irrelevant here. The reason for the gap between the average value of x observed and the value of x corresponding to the average value of HDI through the correlation function is due to the nonlinearity of the said function. Only in the case of a straight line do the two coincide. Since we have just seen that there are very good reasons for the applicable correlation function not to be linear, the abovementioned gap is unavoidable and entirely determined by the weighting factors (population sizes) affecting each country plotted in the figure: no need to feel guilty about that.
1,Global Energy Futures and Human Development: A Framework for Analysis, October 2000; can be downloaded from
2 Further details can be obtained from the Internet, e.g. from Wikipedia.
3 R² = 0 indicates no relationship at all and R² = 1 is obtained for a perfect fit (no scatter around the curve).
4 An Introduction Linking Energy Use and Human Development, April 2006; can be downloaded from www.idiom.com/~garcia/EFHD_01.htm
5 I owe this example to Russel Langley, Practical Statistics simply explained, revised edition, Dover Publications, New York, 1971