Extrapolation of Neural Networks.

Christina Seventikidou
5 min readMar 4, 2022

Hi everyone! I write to share one story! As part of my thesis, I am working on an approach, inspired by classic Neural Odes , which were firstly proposed by University of Toronto and Vector Institute. This was awarded the best paper of Neurips 2018. It helps basically in the solution of time series problem but it also gave a lot inspiration to classic Machine Learning methods.

The topic is highly mathematical and so it refers to people that are already in the field with some background, so that it will help to understand the approach. I’ll try my best to keep thing simple!

1. Why we care about Neural ODEs?

As we know, Neural Networks are universal approximators. A neural network is a model that is differentiable , continuous and can interpolate any function, no matter how complex it is. They are an example of machine learning algorithm that, given a dataset it seeks for the best parameters, minimizing a loss function with an optimization algorithm, in order to describe the given data in the best way. The loss function is minimized using the dataset that we have in hands, and that’s why we say that we minimize the training error, of course we can pretend like we don’t know some parts of the dataset and test also the error there, this is called test error. The main goal is to minimize the test error , and so we manage to minimize the generalization error.

Things to remember:

  1. Generalization error: For any data that are not consisted in the dataset, it is the difference between the actual value and the value that the model believes.
  2. Interpolation : Suppose we have some known data points with ti in T=(a,b). Interpolation is a way of estimating the function that describes these points in T.
  3. Extrapolation : a way of estimating the function out of T.

The main limitation , is that the generalization error is good only in the domain that we trained the Neural Network. So it is not sufficient for extrapolation.

In other words, if I train a network with some images of dogs and cats, then the domain is {cat image, dog image}. A neural network can easily predict if in a totally new image there is a dog or cat, but it has no idea if you give it a frog! In time series this is a huge limitation, because if we can predict with confidence also for points out of the domain then we can forecast the future !

Time series for real phenomenon due to their repeated change over time are referred as dynamical systems. Dynamical systems occur in a wide variety fields and learning dynamical systems from data is generally a difficult task. For continuous dynamical systems the time-evolution can in many cases be described by differential equations. Let’s suppose that we have time series that can be described by differential equations and dive into the method ❤

Here is a simple sin function. The neural network is trained in T=(-4,4) for 100 points , and the green line is the model for 1000 points in (-4,10). We see that it is perfect in T, but outside T says nothing, and so it is not accurate for extrapolation out of t=4.
Here is a simple y(t)=sint function. The neural network is trained in T=(-4,5) with 100 points , and the green & purple line is the model for 1000 points in (-4,10). We see that it is perfect in T, but outside T says nothing, and so it is not accurate for extrapolation out of t=5.

2. Description of the approach

Problem definition : We want a model, that fits some data of domain T, predicts y(ti) for new ti in T, but also forecast for new ti out of T(future).

2.1 Let’s suppose that we have some points (ti, yi) with ti in domain T=(a,b). We want to interpolate and find a model close enough to y(t). One first usual approach is to train a Neural Network, a feedforward Neural Network with one hidden layer for example.

2.2 One second approach is a try taking into consideration also the dynamical evolution of the system.

  • Suppose N(t) to be the neural network interpolation from 2.1 , then
  • Suppose that y(t) is the solution of an unknown differential equation of first order. As we said Neural networks are differentiable and so we can have the derivative of N(t).
  • We will train the neural network N1 and find the parameters θ by minimizing the loss function:
  • Now we have the differential equation, and if we solve it we will take the solution y(t) in the hope that it is better than N(t) out of domain T and so takes under consideration the dynamic evolution of the system!
  • How do we solve the ODE? Of course numerically. There are a lot of algorithms to solve a differential equation numerically. Some of them are Runge Kutta method , or Euler’s method which is simpler or even with deep learning. I will make an other blog article to explain the numerical methods for differential equations and how deep learning solves. So , I trained a new neural network to solve my differential equation in (-4,5) and the result is this:

3. Conclusion

To sum up, in continuous dynamical systems the time evolution can in many cases be described by differential equations and in this thesis I am trying to explore how ODEs can help.

I hope you find this article interesting and useful, thanks for reading! Connect me for any idea, extension or even just share any discussion on the topic!

--

--

Christina Seventikidou

Mathematician and MSc data scientist / love math, travelling, storytelling and many other.