(Difference between revisions)
Revision as of 14:22, 22 September 2006 Peter (Talk | contribs) ← Previous diff |
Current revision Peter (Talk | contribs) |
||
Line 1: | Line 1: | ||
<blockquote> | <blockquote> | ||
__NOTOC__ | __NOTOC__ | ||
- | ==7.3. Computing crop yield maps create forecasts.== | + | ==7.2. Considerations when computing crop yield maps and create forecasts.== |
------------------------------------ | ------------------------------------ | ||
+ | {| style="background-color:#F5F5F5; border-collapse:collapse" cellspacing="7" border="1" bordercolorlight="#0000FF" bordercolordark="#0000FF"> | ||
+ | |style="border-style: solid; border-width: 1px"|''René Gommes, Peter Hoefsloot'' | ||
+ | |- | ||
+ | |} | ||
- | The present note tries to summarise some of the considerations which the crop forecaster should keep in mind when deriving multiple regression equations (so-called Yield Functions) which will eventually be used for forecasting crop yields. The process by which the coefficients of a yield function are derived are known as calibration15. The rules below are purely empirical or based on common sense: | ||
- | * Use only variables which are known to be meaningful for the crop under consideration. When there are good reasons to suspect that the response of crop production to a given variable is not linear, use a quadratic term in addition to the linear term. | ||
- | * Retain only those variables for which the coefficients are significantly different from 0. This is to say that the regression coefficients must be significantly larger (absolute values) than their standard errors. This can be tested statistically (ratio of coefficient to its error), but common sense is usually enough. | ||
- | * The sign of the coefficients must correspond to what is known about the response of the crop to the variable considered. This applies also to the quadratic terms. | ||
- | * The coefficients must be spatially coherent, which is to say that they must vary smoothly over adjacent districts | ||
- | * The quality of a regression equation is given, in addition to the statistics (R, R2, coefficients significantly different from 0), by the average error of estimated yields. | ||
- | * Trends MUST be removed before carrying out the regression work proper. The trends need not be linear. | ||
- | * Be aware of the fact that there are two types of variables: continuous quantitative ones (e.g. minimum temperature affecting crops through night-time respiration) and qualitative ones (e.g. male sterility induced by high temperatures). | ||
- | * Always use a variable which stands for the local yield potential | ||
- | * A yield function does not have to be linear. In some cases, a multiplicative function can be more appropriate. | ||
- | ===Some additional advice=== | + | The present chapter tries to summarize some of the considerations which the crop forecaster should keep in mind when deriving multiple regression equations (so-called Yield Functions) which will eventually be used for forecasting crop yields. The process by which the coefficients of a yield function are derived are known as calibration. Over the years quite some experience has been gained, leading to a number of rules of thumb. These rules below are purely empirical or based on common sense: |
- | * Compute the correlation matrix between all variables to get a better feel for the redundancy of the information; | + | |
- | * Plot the yield against time, to get an idea of the shape of the trend and decide on which function should be used for the time trend; | + | ===Variables correlated to yield=== |
- | * Run a Principal Components analysis on the calibration matrix to realise how redundant your data set actually is, and to identify the most important factors. Run the PCA twice (1) excluding the yield as a variable, to get a feel for the variable groupings and redundancy and (2) with yield to identify the variables which are associated with the yield, as well as those that are irrelevant; | + | |
- | * Pay attention to the fact that the weather variables may play a secondary rôle, and ignore them altogether. For coffee in Mexico, it was shown that the most important variables influencing yields included altitude above sea level, number of weeding rounds, age of the plantation and type of smallholding (Becerril- Roman and Ortega-Obregon, 1979); | + | Use only variables which are known to be meaningful for the crop under consideration. When there are good reasons to suspect that the response of crop production to a given variable is not linear, use a quadratic term in addition to the linear term. |
- | * After removing the trend, plot de-trended yield against each individual variable to see the shape of the regression curve and the strength of the statistical correlation, if any relation is clearly non-linear, add a quadratic term16 to account for curvilinearity; | + | |
- | * As far as possible, ignore redundant variables or use the regression through a principal component analysis. Always prefer techniques with (manual or “automatic”) addition of variables to techniques with deletion of variables; | + | |
- | * Use techniques to ensure the stability of the coefficients (randomly or systematically eliminating up to 50% of the observation points of the time series); | + | Retain only those variables for which the coefficients are significantly different from 0. This is to say that the regression coefficients must be significantly larger (absolute values) than their standard errors. This can be tested statistically (ratio of coefficient to its error), but common sense is usually enough. |
- | * Use jack-knifing to determine the actual accuracy of the method; | + | |
- | * Yield functions typically “expire” after a couple of years, after which they need recalibrating. A yield function older than 3 years is definitely worthless! | + | |
+ | The sign of the coefficients must correspond to what is known about the response of the crop to the variable considered. This applies also to the quadratic terms. | ||
+ | |||
+ | |||
+ | The coefficients must be spatially coherent, which is to say that they must vary smoothly over adjacent areas. | ||
+ | |||
+ | |||
+ | The quality of a regression equation is given, in addition to the statistics (R, R2, coefficients significantly different from 0), by the average error of estimated yields. | ||
+ | |||
+ | |||
+ | Trends must be removed before carrying out the regression work proper. The trends need not be linear. | ||
+ | |||
+ | |||
+ | Be aware of the fact that there are two types of variables: continuous quantitative ones (e.g. minimum temperature affecting crops through night-time respiration) and qualitative ones (e.g. male sterility induced by high temperatures). | ||
+ | |||
+ | |||
+ | Always use a variable which stands for the local yield potential | ||
+ | |||
+ | |||
+ | Compute the correlation matrix between all variables to get a better feel for the redundancy of the information; | ||
+ | |||
+ | |||
+ | ===The yield function=== | ||
+ | |||
+ | A yield function does not have to be linear. In some cases, a multiplicative function can be more appropriate. | ||
+ | |||
+ | |||
+ | Plot the yield against time, to get an idea of the shape of the trend and decide on which function should be used for the time trend; | ||
+ | |||
+ | |||
+ | Run a Principal Components analysis on the calibration matrix to realize how redundant your data set actually is, and to identify the most important factors. Run the PCA twice (1) excluding the yield as a variable, to get a feel for the variable groupings and redundancy and (2) with yield to identify the variables which are associated with the yield, as well as those that are irrelevant; | ||
+ | |||
+ | |||
+ | Yield functions typically “expire” after a couple of years, after which they need recalibrating. A yield function older than 5 years is definitely worthless! | ||
+ | |||
+ | ===The validity of the method=== | ||
+ | |||
+ | Pay attention to the fact that the weather variables may play a secondary role, and ignore them altogether. For coffee in Mexico, it was shown that the most important variables influencing yields included altitude above sea level, number of weeding rounds, age of the plantation and type of smallholding (Becerril- Roman and Ortega-Obregon); | ||
+ | |||
+ | |||
+ | After removing the trend, plot de-trended yield against each individual variable to see the shape of the regression curve and the strength of the statistical correlation, if any relation is clearly non-linear, add a quadratic term16 to account for curvilinearity; | ||
+ | |||
+ | |||
+ | As far as possible, ignore redundant variables or use the regression through a principal component analysis. Always prefer techniques with (manual or “automatic”) addition of variables to techniques with deletion of variables; | ||
+ | |||
+ | |||
+ | Use techniques to ensure the stability of the coefficients (randomly or systematically eliminating up to 50% of the observation points of the time series); | ||
+ | |||
+ | |||
+ | Use jack-knifing to determine the actual accuracy of the method; | ||
+ | |||
+ | ---------------------------------------- | ||
+ | |||
</blockquote> | </blockquote> |
Current revision
[edit]7.2. Considerations when computing crop yield maps and create forecasts.
René Gommes, Peter Hoefsloot
The present chapter tries to summarize some of the considerations which the crop forecaster should keep in mind when deriving multiple regression equations (so-called Yield Functions) which will eventually be used for forecasting crop yields. The process by which the coefficients of a yield function are derived are known as calibration. Over the years quite some experience has been gained, leading to a number of rules of thumb. These rules below are purely empirical or based on common sense:[edit]Variables correlated to yield
Use only variables which are known to be meaningful for the crop under consideration. When there are good reasons to suspect that the response of crop production to a given variable is not linear, use a quadratic term in addition to the linear term.
Retain only those variables for which the coefficients are significantly different from 0. This is to say that the regression coefficients must be significantly larger (absolute values) than their standard errors. This can be tested statistically (ratio of coefficient to its error), but common sense is usually enough.
The sign of the coefficients must correspond to what is known about the response of the crop to the variable considered. This applies also to the quadratic terms.
The coefficients must be spatially coherent, which is to say that they must vary smoothly over adjacent areas.
The quality of a regression equation is given, in addition to the statistics (R, R2, coefficients significantly different from 0), by the average error of estimated yields.
Trends must be removed before carrying out the regression work proper. The trends need not be linear.
Be aware of the fact that there are two types of variables: continuous quantitative ones (e.g. minimum temperature affecting crops through night-time respiration) and qualitative ones (e.g. male sterility induced by high temperatures).
Always use a variable which stands for the local yield potential
Compute the correlation matrix between all variables to get a better feel for the redundancy of the information;
[edit]The yield function
A yield function does not have to be linear. In some cases, a multiplicative function can be more appropriate.
Plot the yield against time, to get an idea of the shape of the trend and decide on which function should be used for the time trend;
Run a Principal Components analysis on the calibration matrix to realize how redundant your data set actually is, and to identify the most important factors. Run the PCA twice (1) excluding the yield as a variable, to get a feel for the variable groupings and redundancy and (2) with yield to identify the variables which are associated with the yield, as well as those that are irrelevant;
Yield functions typically “expire” after a couple of years, after which they need recalibrating. A yield function older than 5 years is definitely worthless![edit]The validity of the method
Pay attention to the fact that the weather variables may play a secondary role, and ignore them altogether. For coffee in Mexico, it was shown that the most important variables influencing yields included altitude above sea level, number of weeding rounds, age of the plantation and type of smallholding (Becerril- Roman and Ortega-Obregon);
After removing the trend, plot de-trended yield against each individual variable to see the shape of the regression curve and the strength of the statistical correlation, if any relation is clearly non-linear, add a quadratic term16 to account for curvilinearity;
As far as possible, ignore redundant variables or use the regression through a principal component analysis. Always prefer techniques with (manual or “automatic”) addition of variables to techniques with deletion of variables;
Use techniques to ensure the stability of the coefficients (randomly or systematically eliminating up to 50% of the observation points of the time series);
Use jack-knifing to determine the actual accuracy of the method;