Chapter33 - CM Box User Guide

(Difference between revisions)

Revision as of 10:31, 4 October 2006

5.2. Filling gaps in agricultural statistics.

René Gommes, Linda See, Peter Hoefsloot

The idea behind this chapter is to use pixel-based information that is generally available from non-national sources, like satellite imagery, or topographic information from digital terrain models, etc., to assist with the estimation of the missing statistical information by administrative unit (AU).
First level AUs are not necessarily the most appropriate choice for meaningful presentation of agricultural statistics, because the criteria adopted by the countries to define such units are very rarely based on agro-ecological zones. In addition, their size varies over orders of magnitude and some are so heterogeneous that the climate may vary from desert to tropical humid. AU’s are nevertheless the unit used by national statistical services to report most data.

Examples of “missing” statistical data

National agricultural statistics data are easily available, as they are published by the countries as statistical yearbooks and systematically assembled by FAO into the annual Production Yearbooks (FAOSTAT ). Sub-national data, on the other hand, are usually difficult to retrieve, and subject to a number of complications when they have to be consolidated into a homogeneous and consistent set at the continental level.
Several difficulties are associated with language, units and national typologies; they can be dealt with rather easily, and only rarely contribute to data being “missing” in a strict sense. Some examples are given below:

“language” (Guinea corn is actually sorghum, makopa is dried cassava chops...)
units (‘bags per acre”, where the weight of a “bag” is variable from crop to crop and from country to country);
typologies (walo and dieri in Mauritania, which refer to the type of water control...).

It occurs rather often that the data are unavailable altogether. Possible causes:

no sampling is carried out at the national level. Sometimes subjective estimates are produced, but they are of uncertain quality;
data are collected at the national level, but never documented or actually published in national statistical yearbooks. However, some of those data are available nationally from the concerned services;
different data are collected for different geographic units (for instance, not all AUs collect data for all crops, or sometimes the different AUs apply for agriculture and, say population;
data are aggregated (by areas or by crops) in a way which is not compatible with the reporting of other countries. A typical example would be “millet” which can be bulrush millet or finger millet, or both together, although the crops are rather different from an auto-ecological point of view. The worst example being “millet and sorghum” reported as either “millet” or “sorghum”;
data are not available for all the years from 1986 to 1990 in the reference period, or they are available only for years outside the reference period
the AU units were modified during the reference period. This creates a minor difficulty when AU are aggregated, but when they are split, it is not always possible to redistribute the statistics between the new AU;
the amounts cultivated and harvested are deemed to be negligible if they are below a cut-off that varies according to countries;
crops are not reported on separately, for instance white potatoes can sometimes be lumped with vegetables and appear nowhere in the statistics.

For a continental study, these practices lead to a high percentage of “missing” data. In order to fill the gaps, a number of items had thus to be interpolated or otherwise estimated.

External variables useable for spatial interpolation

As indicated, the spatial interpolation can either be purely geo-statistical, or take advantage of the additional knowledge obtained from external variables. In the first category, the method known as “inverse distance weighting”. In the second, the method was Satellite Enhanced Data Interpolation (SEDI).
In semi-arid areas, good correlations can be found between between environmental conditions and yield (K/H). Once average AU values of NDVI, elevation, etc. are available for surrounding areas, yields can be regressed against the external environmental variables . The method is not applicable if cultivars vary in the same agroclimatic area. Generally this information is not included in the statistics, and the database was thus considered cultivar- independent. It was also not feasible to distinguish between irrigated and rainfed crops, subsistence farming and large scale modern agricultural production.
The main external variable of interest was NDVI (Normalized Difference vegetation Index), created by NASA/GIMMS and regularly available every 10 days since 1981. It represents one of the most popular remotely sensed indicators for monitoring the response of vegetation to weather condition in several parts of Africa.
NDVI is an indicator of the density of living green mass; in theory, it varies from -1 to +1, but in practice only values between 0 to 0.7 are found on land areas.
1981-1991 average monthly NDVI data from ARTEMIS (FAO, 1993) were used to derive the NDVI variables included in the AGDAT database. The most relevant, in the current context, are NDVI monthly average, maximum and minimum together with the same values relative to the value of 0.12. This threshold corresponds to the occurrence of green vegetation on the ground. The interpretation of NDVI in humid tropical regions is difficult due to the absorption of infra-red light by water vapor and because the response of the index as a unction of biomass reaches a plateau (saturation) at high biomass values.
Figure 1 below illustrates a typical relation between yield and NDVI, showing, among others, NDVI values starting at about 0.07 and yields levelling off from values above 0.2. Note that the values indicated correspond to the spatial average over a whole AU, where no crops are actually grown below about 0.12. This explains why relatively high uields can be found at low NDVI when some crops are irrigated or grown in the wettest parts of the AU only.

Coarse grain yield and NDVI in Burkina Faso and the countries surrounding Burkina Faso. The figure was restricted to yields below 1200 Kg/Ha.

Inverse distance weighting

Geostatistical interpolation of missing data consists in the estimation of missing values at one point in space based on the known values of neighboring entities. The inverse-distance weighting is one of the most straightforward methods; it takes into account the distance between the “known” and “unknown” points and their relative importance in the estimation. For instance, close-by AU’s of the same country are assigned a higher weight than the AU’s of neighboring countries distance. The reason for this is that it is considered that production and feeding behavior are more homogeneous within the country. For the AGDAT database, after testing, conditions for interpolation are at least three to ten neighboring AU’s and maximum distance between the AU’s centers is 600 km.
For the geo-statistical interpolation, it was generally considered that the reported crop yield corresponds to the centre of gravity of the AU . The software used for the inverse-distance weighting was mostly FAOMET (Gommes and See, 1993). Inverse-distance weighting was applied mainly to Area cultivated per capita, per capita production and relative area.

Satellite Enhanced Data Interpolation, or SEDI

SEDI takes advantage of the correlation between and environmental variable, for instance the above mentioned NDVI/biomass and agricultural yields. One of the ways to approach this is co-kriging, a variant of kriging using one or more ausiliary variable and exploiting both the spatial features of the variable to be interpolated and the correlations between the variable and the ausiliary variables (Bogaert et al, 1995)
The SEDI interpolation method originated in a Harare based FAO Regional Remote Sensing Project. It was originally developed to interpolate rainfall data collected at station level using the additional information provided by METEOSAT cold cloud duration images. The methods proved powerful and versatile, and it is now regularly used by FAO to spatially interpolate other parameters as well (e.g. potential evapotranspiration, crop yields, actual crop evapotranspiration estimates etc.
The concepts of this interpolation method and software implementing the technique have been described by Hoefsloot, 1996. The SEDI functions were recently incorporated into the WINDISP_3 software (Pfirman and Hogue, 1998)
SEDI is a simple and straightforward method for 'assisted' interpolation. The method can be applied to any parameter of which the values are available for a number of geographical locations, as long as a 'background' field is available that has a negative or positive relation to the parameter that needs to be interpolated.
Three requirements are a prerequisite for the successful application of the SEDI method:

The availability of the parameter to interpolate as point data at different geographical locations (e.g. rainfall, potential evapotranspiration, crop yields). In the present case of statistical variables, they were assigned a co-ordinate corresponding to the centre of gravity of the AU;
The availability of a background parameter in the form of a regularly spaced grid (or field) for the same geographical area (e.g. the above-mentioned NDVI variables , altitude).
A relation between the two parameters (negative or positive; Yield/NDVI is positive, temperature/altitude would be negative). A Spearman rank correlation test can reveal whether a relation exists, and how strong this relation is.

The SEDI method yields the parameter mentioned under point 1 as a field (i.e. an image covering the whole area under consideration. The average of the field value over the AU provides the estimation of the spatialized statistic. The method is illustrated below using rainfall and “Cold Cloud Duration”.
A detailed description of the SEDI method can be found here

Retrieved from "http://hoefsloot.com/wiki/index.php?title=Chapter33"

CM Box User Guide	Main Page \| About \| Special pages \| Log in
	Printable version \| Disclaimer \| Privacy policy \| Current revision

 For a continental study, these practices lead to a high  percentage of  “missing” data. In order to fill the gaps, a number of items had thus to be interpolated or otherwise estimated.
-==Methods for spatial interpolation of missing agricultural statistics==
-As indicated, the spatial interpolation can either be purely geo-statistical, or take advantage of the additional knowledge obtained from external variables. In the first category, the method known as “inverse distance weighting”. In the second, the method was Satellite Enhanced Data Interpolation (SEDI).
 ===External variables useable for spatial interpolation===
+As indicated, the spatial interpolation can either be purely geo-statistical, or take advantage of the additional knowledge obtained from external variables. In the first category, the method known as “inverse distance weighting”. In the second, the method was Satellite Enhanced Data Interpolation (SEDI).
 In semi-arid areas, good correlations can be found between between environmental conditions and yield (K/H). Once average AU values of NDVI, elevation, etc. are available for surrounding areas, yields can be regressed against the external environmental variables . The method is not applicable if cultivars vary in the same agroclimatic area. Generally this information is not included in the statistics, and the database was thus considered cultivar- independent. It was also not feasible to distinguish between irrigated and rainfed crops, subsistence farming and large scale modern agricultural production.