CM Box User GuideMain Page | About | Special pages | Log in

Printable version | Disclaimer | Privacy policy | Current revision

(Difference between revisions)

Revision as of 08:48, 15 September 2006
Peter (Talk | contribs)

← Previous diff
Revision as of 14:18, 22 September 2006
Peter (Talk | contribs)

Next diff →
Line 1: Line 1:
-__NOTOC__+ 
<blockquote> <blockquote>
-==Special analyses for crop forecasting==+__NOTOC__
 +==7.1. Practical introduction to multiple regression techniques and the selection of variables through a principal components analysis==
 +------------------------------------
Besides the FAO water balance method, there are many other useful analysis methods used in crop forecasting. The following functions (supported by the CM Box) are discussed here. Besides the FAO water balance method, there are many other useful analysis methods used in crop forecasting. The following functions (supported by the CM Box) are discussed here.

Revision as of 14:18, 22 September 2006

7.1. Practical introduction to multiple regression techniques and the selection of variables through a principal components analysis


Besides the FAO water balance method, there are many other useful analysis methods used in crop forecasting. The following functions (supported by the CM Box) are discussed here.

  1. Calculating the length of the growing period
  2. Using Principle Component Analysis to find independent parameters that influence yield.
  3. Use Gamma distribution to calculate Rainfall Probabilities.

1. Calculating the Length of the Growing Period (LGP)

The length of the growing period (LGP), as defined by the Agro-Ecological Zones project carried out at FAO during the last two decades [refs], is the period (in days) during a year when precipitation exceeds half the potential evapotranspiration, plus a period required to evapotranspire an assumed 100 mm of water from excess precipitation stored in the soil profile.

The LGP is a useful concept for calculating agricultural potential, and can be used as a criterion for classifying areas and in roughly determining crop cycle lengths. The calculation of the growing period is based on a simple water balance model, comparing precipitation with PET, using monthly values. A "normal" growing period has the following characteristics:

  1. A Beginning. The beginning coincides with the start of the normal rainy season and is taken as the date when precipitation equals half PET, denoted as a in fig.5.4a. A value of 1/2 PET has been chosed because the water requirements of germinating crops are much below the full rate of PET, reflected clearly in the magnitude of the crop coefficients, and false starts to the rainy season are eliminated.
  2. A Humid Period. This is the period during which precipitation exceeds PET. The beginning and ending dates are the two points where the precipitation and PET curves cross. During this period, crops are able to meet their full water requirements and the soil moisture deficit is replenished. The ending date of the humid period coincides with the end of the rainy season and crops mature largely from water stored in the soil.
  3. An End to the Growing Period. This occurs at the point where the precipitation curve crosses the 1/2 PET curve (labeled as d in fig. 5.4a) and takes into consideration that most crops continue to grow beyond the end of the rainy season. The soil water-holding capacity (defined as....) is assumed to be 100 mm and the time taken to deplete the remaining soil reserves at the end of the season is added on to the LGP.

In addition to a normal period, three other types of growing periods can be defined:

  1. Intermediate Growing Period. Throughout the year, the average monthly precipitation does not exceed the full rate of the average monthly PET, but it does exceed half the PET. The beginning and the end of such an intermediate growing period are defined as the points where the precipitation curve crosses the 0.5 PET curve and there is no humid period.
  2. All Year Round Humid Growing Period. The average monthly pptn, for every month of the year, exceeds the full rate of the average monthly PET. Thus, there is no true start to the growing period or to the humid period. Areas with all year round humid growing periods have been included and inventoried as areas with a normal growing period of 365 days.
  3. All Year Round Dry Period. The average monthly precipitation for every month of the year is lower than half the average monthly PET. Areas with all year round dry periods have been inventoried separately as areas with a growing period of 0 days.

An example


Examine the sample data file, AL-LGP.DAT. The file contains station names in Africa, geographical coordinates between 2 North and 2 South, and 12 monthly values of rainfall and PET for each station. Notice that the input file requires the rainfall and PET to appear on the same line.
Activate the Tools - Length of growing period function. A settings window appears and among other settings, you will be prompted to accept or change the default value of 1/2 PET to define the beginning and the end of the growing season as given in the original AEZ definition. For this example, accept the value and click on the Compute button to perform the calculation.
A new record of data (36 dekadal values for a year) can be entered after pressing the + sign in the toolbar at the top of the screen. In the next window, the station name can be entered.

The first line of the file lists the limit you choose for determining the beginning and end of the growing season, i.e., 1/2 PET. This is followed by columns with the station name, latitude, longitude, altitude and several growing season characteristics. The next table describes the parameters in the output file and the range of possible answers:

Column HeadingMeaning of the column variableRange of Answers
SeastotSeasonal total rainfall (mm)N.A.
NrNumber of seasons1 - single; > 1 - multiple
TypeType of season1 - dry; 2 - intermediate; 3 - normal; 6 - normal with no dry period
LGSLength of the Growing Season (days)0 to 365
BegSd.mBeginning of the season(day.month)
BegSJulBeginning of the season(Julian days)0 to 365
EndSd.mEnd of the season(day.month)
BegHd.mBeginning of the humid period (day.month)-999 if no humid period
EndHd.mEnd of the humid period (day.month)-999 if no humid period


Note that for a station with a year round dry period, indicated as 1 under the column "Type", the LGS will be listed as having 365 days instead of 0 days.

2. Finding parameters that influence crop yields using PCA

Principal components analysis (PCA) is a powerful tool used to examine the structure of an input matrix containing the variables used in regression model building. As we are not certain whether the variables are independent of one another or whether they are important in explaining the variance in our yield, we can use principal components analysis to indicate variables with similar behaviours by redistributing the variance of the matrix among a new set of uncorrelated axes (the principal components). This analysis permits the replacement of the initial, generally correlated, variables by non-correlated ones. In addition, the analysis of the components often permits the reduction of a certain number of variables; one can, in effect, often cancel the third or sometimes even the second principal component of the three. The analysis also provides a complementary interpretation of the initial variables.

The correlations between the components and the original variables give some indications as to which variables are important, i.e., which variables are worth retaining in the multiple regression leading to the yield function.

An example


To illustrate how principal component analysis can be used, examine the input file SP-MET.DAT. The file contains annual yield from 1920 to 1930 and three climatological variables; the mean temperature, precipitation and radiation measured during the month of July. We imagine that the yield is a function of these three parameters but we suspect that the three variables are not entirely independent of one another.
Activate the Tools - Principle Component Analysis function. A settings window appears.
Select the variables you want to work with. For this example, select the variables TemJul, RnJul and RdJul. Clicking on the Ok button will start the calculations.grap86.jpg
The output from the analysis is presented in two ways: as an output file, containing the values of the principal components themselves.
This information allows you to interpret the results and determine which variables can be used in the regression model. The log file lists the variables, their extreme values and the % of variance explained by each component. This is followed by the accumulated variance of each component and a correlation matrix. This information tells us that the first and second principal components together explain 94.4% of the total variance and the third component can therefore be discarded. Geometrically, this means that all the observed points find themselves on a plane and 94.4% of the variation is parallel to this plane while 5.6% is perpendicular to it. In addition, 81.6% of the variation corresponds to the principal component and only 18.5% to the other two. The second and third components can therefore be discarded.


A correlation matrix is also provided in the output file and one can interpret the behaviour of the initial variables. Since the first principal component is positively correlated with temperature and radiation, and negatively correlated with precipitation, the first principal component is an indicator of alternative good and bad weather. High values of the component correspond to high temperatures, high levels of solar radiation and low precipitation. The second component is poorly correlated with radiation but has some correlation with temperature and precipitation and can thus be considered as marking the annual distinction between hot and humid years and relatively annual cold and dry years.


3. Use Gamma distribution to calculate Rainfall Probabilities.

Rainfall is responsible for most of the year-to-year variability in crop yields in many developing countries. If the varieties used are locally-adapted, the output will be normal whenever the rains are normal, given no other limiting factors such as pests or diseases.

When working operationally with rainfall data, one is often interested in knowing whether the rainfall recorded was unusually high or low. This can only be assessed by comparing the current rainfall with historical rainfall records covering a period of many years. Take the rainfall record for Rome, which goes from 1782 to 1980, as an example. Suppose that the rainfall recorded in June 1981 was the lowest ever registered during that month. We can say that the rainfall received was exceptionally low and that rain fed crops in the field were likely to have suffered a severe water shortage, i.e., drought conditions.

A very high (or low) amount of rain that occurs, on average during a few years every century, is said to have a very low probability of occurrence. This is usually expressed as the probability of exceedence (P) of a given amount of rainfall. If P is below 5%, it means that rainfall was exceptionally high, since it will be exceeded (on average) in only 5 years out of every 100. Likewise, values above 95% correspond to exceptionally dry conditions. The probability range from 5 to 20 and from 80 to 95 is termed unusual and values from 21 to 79 are considered normal.

This option calculates rainfall probabilities based on the incomplete gamma distribution which best approximates the positively skewed rainfall distribution of tropical countries, for short periods of a month or less. By integrating the area under the curve between 0 and the probability of exceedence desired, one can determine the rainfall threshold.

An example


Load the sample file SG-ROME.DAT. The file contains monthly rainfall time series data for Rome, from 1782 to 1980. Years are listed along the rows and the columns are the months. Note that each year is enclosed in quotes or the program will treat them as regular numbers and include them in the probability calculations. The order of the columns is not important, so the columns could cover July through June, and data can belong to different stations.
Activate the Tools - Gamma Distribution function. A settings window appears. You are prompted to enter values for the probability table. For this example, enter 0 for LOW, 200 for HIGH and 20 for STEP.
The log file contains the list of variables followed by the statistics of the gamma distribution (Gamma and Beta) together with the number of years, the number of years with no rain (number of zeros), and the long-term average. This is followed by a probability of exceedence table, listing the rainfall amounts ranging from 0 to 200 on the left-hand side followed by the probabilities.





Page generated in 0.293320 seconds.