Browsing by Author "De Villiers, Margaret"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
- ItemPredicting tomato crop yield from weather data using statistical learning techniques(Stellenbosch : Stellenbosch University, 2017-03) De Villiers, Margaret; Uys, Daniel W.; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.ENGLISH SUMMARY : Predicting crop harvest quantities accurately is important in managing a farming enterprise effectively, facilitating decisions regarding crop management, allocation of resources, anticipated delivery times and quantities to customers and produce pricing, to name but a few. The aim of this project is to develop statistical models for predicting harvest quantity of field-grown crops on a commercial tomato farm in South Africa using weather data. Planting and harvest data for a seven-year period were provided by the tomato farm, while daily and 10-daily weather data for the same period were obtained from a nearby weather station and from satellite data. The data sets were cleaned, and the time series data were summarised in the form of a single summary statistic for each weather variable over the growing period of each crop. Median and total harvest density (t/ha) were each modelled using multiple linear regression, the lasso, regression trees, bagged regression trees, random forests and boosted regression trees. All of the crop and weather variables turned out to be informative in predicting tomato yield, with the average of the daily average wind speed and the average of the daily maximum relative humidity readings over the crops' growing periods emerging as the most important predictors of median and total harvest density, respectively. Random forests modelled median harvest density the most accurately with an estimated mean absolute prediction error of 0:37 t/ha, while bagged regression trees modelled total harvest density the most accurately with a mean absolute prediction error of 12:67 t/ha. The model parameter estimators of all of the modelling techniques tended to have low variances, and the sizes of the prediction errors are most likely due to factors such as the absence of important predictors (soil fertility, irrigation regimes, etc.) from the models and the summary of the weather time series over the crops' growing periods into single values.