Forecasting with Science

Data Science

Dairy and data science

Predicting the future of dairy markets with time-series forecasting

Markets just got more complex

Dairy companies and consultants all over the world have been trying for decades to predict and anticipate market trends. Recently, we have seen the market impact of Covid-19, the results of the war in Ukraine, inflation, etc. Such external and internal factors, which have an impact limited in time, combine with long-term trends (e.g. the rise of plant-based products, the need for a sustainable dairy industry) to make accurate forecasting of the market extremely tough.

The appliance of science

Sometimes we need to challenge conventional wisdom. Experts often have a good ‘feel’ for the market and an understanding of how it reacts under certain circumstances. But this becomes trickier in uncertain and unprecedented situations such as those described above. How, then, can we hope to predict the future with any degree of accuracy? That’s where data science and machine learning (time-series forecasting) come in.

Let’s look at an example: monthly raw milk deliveries to dairies in Germany.

  • We begin by collating all the required data from the past (10, 20, 50 years if necessary).
  • Then we divide the monthly data by the number of days in a month to avoid any bias.
  • Next, we decompose the data to observe trends.
Decomposition
Decomposition
Seasonality
Seasonality

Focus on SARIMAX modelling

Seasonal Auto-Regressive Integrated Moving Average with eXogenous factors – or SARIMAX – is the modelling technique we’ve chosen. Let’s apply our modeling methodology to the data:

  • S: Seasonality. Your data present a seasonal component, a scheme repeated each year.
  • AR: Auto Regressive. To predict the future values, we use the past values of the data.
  • I: Integrated. This parameter determines whether we work on growth rate or pure data.
  • MA: Moving Average. This parameter uses smoothing of past data to help predict the next.
  • X: Exogenous variables. For each variable we will explore the lags. Examples could be:
    • Cereals prices.
    • Cow numbers classified by age group.
    • Raw milk price.
    • Other indices (e.g. Agriculture Energy Index, Fertiliser Index).
    • Precipitation and temperature.

We add the parameters one by one. At each execution of our code, we look at which external parameter would most improve the result on the dataset. We then proceed to add this parameter in order to arrive at the best model possible.

In order to forecast using this univariate methodology, we have to predict each of the exogenous variables we use. For example, if we need information on the agricultural energy cost index two months prior to predicting the volume of raw milk, we also need m+22 of the agricultural energy cost index in order to predict m+24 of raw milk. To make this model work, we need first to predict the exogenous variables, then use these predictions to forecast milk deliveries.

Results - more than a ‘feeling’

We have now identified a set of key factors which help us predict raw milk deliveries in Germany. In descending order of importance, they are:

  • Past data: the most important factor for predicting future deliveries is past data, including seasonality and the data of the month prior to the one we predict.
  • Agriculture Energy Index (lag of two months). Its influence is particularly visible in recent years.
  • A DatumLocus combination of Fertiliser Price Indices (lag of three months). Once again, the impact of this factor is highly visible in recent times.
  • Raw Milk Price (lag of two months). As prices rise, production rises with a two-month time lag.
  • A DatumLocus combination of Cereals Price Indices (zero lag).

An interesting omission from this list is cow numbers, a fundamental variable, which analysts have always used as a basis to get a ‘feeling’ for future milk production and deliveries. We have not explicitly used this variable, because all the information it carries is also implicit in the other variables (costs – cereal, fertilisers, energy, and farm income).

An accuracy level of 98%+

Using data to 2020, we were able to ‘train’ our model to ‘predict’ milk deliveries in Germany in 2021-22 with an accuracy level in excess of 98%. Although milk deliveries have shown stability over the years, this is still a very good result, given the uncertainties generated by recent events.