9. Understanding Sequence and Time Series Data – AI and Machine Learning for Coders

Chapter 9. Understanding Sequence and Time Series Data

Time series are everywhere. You’ve probably seen them in things like weather forecasts, stock prices, and historic trends like Moore’s law (Figure 9-1). If you’re not familiar with Moore’s law, it predicts that the number of transistors on a microchip will roughly double every two years. For almost 50 years it has proven to be an accurate predictor of the future of computing power and cost.

Figure 9-1. Moore’s law

Time series data is a set of values that are spaced over time. When plotted, the x-axis is usually temporal in nature. Often there are a number of values plotted on the time axis, such as in this example where the number of transistors is one plot and the predicted value from Moore’s law is the other. This is called a multivariate time series. If there’s just a single value—for example, volume of rainfall over time—it’s called a univariate time series.

With Moore’s law, predictions are simple because there’s a fixed and simple rule that allows us to roughly predict the future—a rule that has held for about 50 years.

But what about a time series like that in Figure 9-2?

Figure 9-2. A real-world time series

While this time series was artificially created (you’ll see how to do that later in this chapter), it has all the attributes of a complex real-world time series like a stock chart or seasonal rainfall. Despite the seeming randomness, time series have some common attributes that are helpful in designing ML models that can predict them, as described in the next section.

Common Attributes of Time Series

While time series might appear random and noisy, often there are common attributes that are predictable. In this section we’ll explore some of these.


Time series typically move in a specific direction. In the case of Moore’s law, it’s easy to see that over time the values on the y-axis increase, and there’s an upward trend. There’s also an upward trend in the time series in Figure 9-2. Of course, this won’t always be the case: some time series may be roughly level over time, despite seasonal changes, and others have a downward trend. For example, this is the case in the inverse version of Moore’s law that predicts the price per transistor.


Many time series have a repeating pattern over time, with the repeats happening at regular intervals called seasons. Consider, for example, temperature in weather. We typically have four seasons per year, with the temperature being highest in summer. So if you plotted weather over several years, you’d see peaks happening every four seasons, giving us the concept of seasonality. But this phenomenon isn’t limited to weather—consider, for example, Figure 9-3, which is a plot of traffic to a website.

Figure 9-3. Website traffic

It’s plotted week by week, and you can see regular dips. Can you guess what they are? The site in this case is one that provides information for software developers, and as you would expect, it gets less traffic on weekends! Thus, the time series has a seasonality of five high days and two low days. The data is plotted over several months, with the Christmas and New Year’s holidays roughly in the middle, so you can see an additional seasonality there. If I had plotted it over some years, you’d clearly see the additional end-of-year dip.

There are many ways that seasonality can manifest in a time series. Traffic to a retail website, for instance, might peak on the weekends.


Another feature that you may see in time series is when there’s predictable behavior after an event. You can see this in Figure 9-4, where there are clear spikes, but after each spike, there’s a deterministic decay. This is called autocorrelation.

In this case, we can see a particular set of behavior, which is repeated. Autocorrelations may be hidden in a time series pattern, but they have inherent predictability, so a time series containing many of them may be predictable.

Figure 9-4. Autocorrelation


As its name suggests, noise is a set of seemingly random perturbations in a time series. These perturbations lead to a high level of unpredictability and can mask trends, seasonal behavior, and autocorrelation. For example, Figure 9-5 shows the same autocorrelation from Figure 9-4, but with a little noise added. Suddenly it’s much harder to see the autocorrelation and predict values.

Figure 9-5. Autocorrelated series with added noise

Given all of these factors, let’s explore how you can make predictions on time series that contain these attributes.

Techniques for Predicting Time Series

Before we get into ML-based prediction—the topic of the next few chapters—we’ll explore some more naive prediction methods. These will enable you to establish a baseline that you can use to measure the accuracy of your ML predictions.

Naive Prediction to Create a Baseline

The most basic method to predict a time series is to say that the predicted value at time t + 1 is the same as the value from time t, effectively shifting the time series by a single period.

Let’s begin by creating a time series that has trend, seasonality, and noise:

def plot_series(time, series, format="-", start=0, end=None):
    plt.plot(time[start:end], series[start:end], format)

def trend(time, slope=0):
    return slope * time

def seasonal_pattern(season_time):
    """Just an arbitrary pattern, you can change it if you wish"""
    return np.where(season_time < 0.4,
                    np.cos(season_time * 2 * np.pi),
                    1 / np.exp(3 * season_time))

def seasonality(time, period, amplitude=1, phase=0):
    """Repeats the same pattern at each period"""
    season_time = ((time + phase) % period) / period
    return amplitude * seasonal_pattern(season_time)

def noise(time, noise_level=1, seed=None):
    rnd = np.random.RandomState(seed)
    return rnd.randn(len(time)) * noise_level

time = np.arange(4 * 365 + 1, dtype="float32")
baseline = 10
series = trend(time, .05)  
baseline = 10
amplitude = 15
slope = 0.09
noise_level = 6

# Create the series
series = baseline + trend(time, slope) 
                  + seasonality(time, period=365, amplitude=amplitude)
# Update with noise
series += noise(time, noise_level, seed=42)

After plotting this you’ll see something like Figure 9-6.

Figure 9-6. A time series showing trend, seasonality, and noise

Now that you have the data, you can split it like any data source into a training set, a validation set, and a test set. When there’s some seasonality in the data, as you can see in this case, it’s a good idea when splitting the series to ensure that there are whole seasons in each split. So, for example, if you wanted to split the data in Figure 9-6 into training and validation sets, a good place to do this might be at time step 1,000, giving you training data up to step 1,000 and validation data after step 1,000.

You don’t actually need to do the split here because you’re just doing a naive forecast where each value t is simply the value at step t – 1, but for the purposes of illustration in the next few figures we’ll zoom in on the data from time step 1,000 onwards.

To predict the series from a split time period onwards, where the period that you want to split from is in the variable split_time, you can use code like this:

naive_forecast = series[split_time - 1:-1]

Figure 9-7 shows the validation set (from time step 1,000 onwards, which you get by setting split_time to 1000) with the naive prediction overlaid.

Figure 9-7. Naive forecast on time series

It looks pretty good—there is a relationship between the values—and, when charted over time, the predictions appear to closely match the original values. But how would you measure the accuracy?

Measuring Prediction Accuracy

There are a number of ways to measure prediction accuracy, but we’ll concentrate on two of them: the mean squared error (MSE) and mean absolute error (MAE).

With MSE, you simply take the difference between the predicted value and the actual value at time t, square it (to remove negatives), and then find the average over all of them.

With MAE, you calculate the difference between the predicted value and the actual value at time t, take its absolute value to remove negatives (instead of squaring), and find the average over all of them.

For the naive forecast you just created based on our synthetic time series, you can get the MSE and MAE like this:

print(keras.metrics.mean_squared_error(x_valid, naive_forecast).numpy())
print(keras.metrics.mean_absolute_error(x_valid, naive_forecast).numpy())

I got an MSE of 76.47 and an MAE of 6.89. As with any prediction, if you can reduce the error, you can increase the accuracy of your predictions. We’ll look at how to do that next.

Less Naive: Using Moving Average for Prediction

The previous naive prediction took the value at time t – 1 to be the forecasted value at time t. Using a moving average is similar, but instead of just taking the value from t – 1, it takes a group of values (say, 30), averages them out, and sets that to be the predicted value at time t. Here’s the code:

  def moving_average_forecast(series, window_size):
  """Forecasts the mean of the last few values.
     If window_size=1, then this is equivalent to naive forecast"""
  forecast = []
  for time in range(len(series) - window_size):
    forecast.append(series[time:time + window_size].mean())
  return np.array(forecast)

moving_avg = moving_average_forecast(series, 30)[split_time - 30:]

plt.figure(figsize=(10, 6))
plot_series(time_valid, x_valid)
plot_series(time_valid, moving_avg)

Figure 9-8 shows the plot of the moving average against the data.

Figure 9-8. Plotting the moving average

When I plotted this time series I got an MSE and MAE of 49 and 5.5, respectively, so it’s definitely improved the prediction a little. But this approach doesn’t take into account the trend or the seasonality, so we may be able to improve it further with a little analysis.

Improving the Moving Average Analysis

Given that the seasonality in this time series is 365 days, you can smooth out the trend and seasonality using a technique called differencing, which just subtracts the value at t – 365 from the value at t. This will flatten out the diagram. Here’s the code:

diff_series = (series[365:] - series[:-365])
diff_time = time[365:]

You can now calculate a moving average of these values and add back in the past values:

diff_moving_avg = 
    moving_average_forecast(diff_series, 50)[split_time - 365 - 50:]

diff_moving_avg_plus_smooth_past = 
    moving_average_forecast(series[split_time - 370:-360], 10) + 

When you plot this (see Figure 9-9), you can already see an improvement in the predicted values: the trend line is very close to the actual values, albeit with the noise smoothed out. Seasonality seems to be working, as does the trend.

Figure 9-9. Improved moving average

This impression is confirmed by calculating the MSE and MAE—in this case I got 40.9 and 5.13, respectively, showing a clear improvement in the predictions.


This chapter introduced time series data and some of the common attributes of time series. You created a synthetic time series and saw how you can start making naive predictions on it. From these predictions, you established baseline measurements using mean squared error and mean average error. It was a nice break from TensorFlow, but in the next chapter you’ll go back to using TensorFlow and ML to see if you can improve on your predictions!