Predicting tomato crop yield from weather data using statistical learning techniques

Date
2017-03
Journal Title
Journal ISSN
Volume Title
Publisher
Stellenbosch : Stellenbosch University
Abstract
ENGLISH SUMMARY : Predicting crop harvest quantities accurately is important in managing a farming enterprise effectively, facilitating decisions regarding crop management, allocation of resources, anticipated delivery times and quantities to customers and produce pricing, to name but a few. The aim of this project is to develop statistical models for predicting harvest quantity of field-grown crops on a commercial tomato farm in South Africa using weather data. Planting and harvest data for a seven-year period were provided by the tomato farm, while daily and 10-daily weather data for the same period were obtained from a nearby weather station and from satellite data. The data sets were cleaned, and the time series data were summarised in the form of a single summary statistic for each weather variable over the growing period of each crop. Median and total harvest density (t/ha) were each modelled using multiple linear regression, the lasso, regression trees, bagged regression trees, random forests and boosted regression trees. All of the crop and weather variables turned out to be informative in predicting tomato yield, with the average of the daily average wind speed and the average of the daily maximum relative humidity readings over the crops' growing periods emerging as the most important predictors of median and total harvest density, respectively. Random forests modelled median harvest density the most accurately with an estimated mean absolute prediction error of 0:37 t/ha, while bagged regression trees modelled total harvest density the most accurately with a mean absolute prediction error of 12:67 t/ha. The model parameter estimators of all of the modelling techniques tended to have low variances, and the sizes of the prediction errors are most likely due to factors such as the absence of important predictors (soil fertility, irrigation regimes, etc.) from the models and the summary of the weather time series over the crops' growing periods into single values.
AFRIKAANSE OPSOMMING : Die akkurate voorspelling van oes opbrengs is belangrik in die effektiewe bestuur van ‘n boerdery. Beter besluite ten opsigte van gewasbestuur, die toewysing van hulpbronne,afleweringstye, hoeveelhede wat aan klante gelewer moet word en produkpryse kan geneem word indien opbrengs akkuraat voorspel word. Die doel van hierdie projek is om statistiese modelle te ontwikkel wat aangewend kan word om tamatie oes opbrengs te voorspel deur gebruik te maak van klimaatsveranderlikes. Oop land tamatie aanplantings van ‘n kommersiele tamatie boerdery in Suid-Afrika is vir die doel gebruik. Aanplantings- en opbrengsdata vir ‘n sewejaar tydperk is vanaf die tamatieplaas verkry, terwyl daaglikse en 10-daaglikse klimaatsmetings vir dieselfde tydperk by ‘n nabygelee weerstasie asook vanaf satellietdata ingesamel is. Die datastelle is skoongemaak, en opsommende maatstawwe is vir elke klimaatsveranderlike oor die groeitydperk van elke aanplanting bereken. Mediaan en totale opbrengs (t/ha) is afsonderlik gemodelleer met behulp van meervoudige line^ere regressie, die lasso, regressiebome, “bagged” regressiebome, ewekansige woude en “boosted” regressiebome. Al die aanplantings- en klimaatsveranderlikes is betekenisvol in die voorspelling van opbrengs, met die gemiddelde van die daaglikse gemiddelde windspoed en die gemiddelde van die daaglikse maksimum relatiewe humiditeitslesings oor die aanplantings se groeitydperke as belangrikste voorspellers van onderskeidelik mediaan en totale oes opbrengs. Ewekansige woude het mediaan opbrengs die akkuraatste voorspel met ‘n beraamde gemiddelde absolute voorspellingsfout van 0:37 t/ha, terwyl “bagged” regressiebome die totale opbrengs die akkuraatste voorspel het met ‘n gemiddelde absolute voorspellingsfout van 12:67 t/ha. Die beramers van modelparameters van al die modelleringstegnieke het klein variansies. Die groottes van die voorspellingsfoute is waarskynlik te wyte aan faktore soos die afwesigheid van belangrike voor spellers (soos bv. grondvrugbaarheid en besproeiingstegnieke), asook die opsommende maatstawwe wat vir die klimaatsveranderlikes oor die groeitydperk van elke aanplanting bereken is.
Description
Thesis (MCom)--Stellenbosch University, 2017.
Keywords
Crop yields -- South Africa, Weather time series, Crops and climate -- South Africa, Tomatoes -- Yields -- South Africa
Citation