## Dairy and data science

### Markets just got more complex

### The appliance of science

Sometimes we need to challenge conventional wisdom. Experts often have a good ‘feel’ for the market and an understanding of how it reacts under certain circumstances. But this becomes trickier in uncertain and unprecedented situations such as those described above. How, then, can we hope to predict the future with any degree of accuracy? That’s where data science and machine learning (time-series forecasting) come in.

Let’s look at an example: monthly raw milk deliveries to dairies in Germany.

- We begin by collating all the required data from the past (10, 20, 50 years if necessary).
- Then we divide the monthly data by the number of days in a month to avoid any bias.
- Next, we decompose the data to observe trends.

### Focus on SARIMAX modelling

Seasonal Auto-Regressive Integrated Moving Average with eXogenous factors – or SARIMAX – is the modelling technique we’ve chosen. Let’s apply our modeling methodology to the data:

- S: Seasonality. Your data present a seasonal component, a scheme repeated each year.
- AR: Auto Regressive. To predict the future values, we use the past values of the data.
- I: Integrated. This parameter determines whether we work on growth rate or pure data.
- MA: Moving Average. This parameter uses smoothing of past data to help predict the next.
- X: Exogenous variables. For each variable we will explore the lags. Examples could be:
- Cereals prices.
- Cow numbers classified by age group.
- Raw milk price.
- Other indices (e.g. Agriculture Energy Index, Fertiliser Index).
- Precipitation and temperature.

We add the parameters one by one. At each execution of our code, we look at which external parameter would most improve the result on the dataset. We then proceed to add this parameter in order to arrive at the best model possible.

In order to forecast using this univariate methodology, we have to predict each of the exogenous variables we use. For example, if we need information on the agricultural energy cost index two months prior to predicting the volume of raw milk, we also need m+22 of the agricultural energy cost index in order to predict m+24 of raw milk. To make this model work, we need first to predict the exogenous variables, then use these predictions to forecast milk deliveries.

### Results - more than a ‘feeling’

We have now identified a set of key factors which help us predict raw milk deliveries in Germany. In descending order of importance, they are:

- Past data: the most important factor for predicting future deliveries is past data, including seasonality and the data of the month prior to the one we predict.
- Agriculture Energy Index (lag of two months). Its influence is particularly visible in recent years.
- A DatumLocus combination of Fertiliser Price Indices (lag of three months). Once again, the impact of this factor is highly visible in recent times.
- Raw Milk Price (lag of two months). As prices rise, production rises with a two-month time lag.
- A DatumLocus combination of Cereals Price Indices (zero lag).

An interesting omission from this list is cow numbers, a fundamental variable, which analysts have always used as a basis to get a ‘feeling’ for future milk production and deliveries. We have not explicitly used this variable, because all the information it carries is also implicit in the other variables (costs – cereal, fertilisers, energy, and farm income).