Fit an Ornstein–Uhlenbeck process with discrete time series data

As we know, a Brownian motion is usually formulated as $$dx_t = \mu,dt+\sigma,dW_t$$ which is the continuous case of a random walk. In some cases, it is quite convenient to use this formulation to describe the characteristic of asset prices due to its highly unpredictable behavior.

However, there are financial indicators or variables that also exhibit, at least temporarily, stable behavior. For example, two companies with great homogeneity may be reflected in the synchronous co-movement of their stock prices. The value of a neutral portfolio composed of a long position of the one stock and a appropriately determined short position of the other may also exhibit such stationary mean-reverting behavior. Such behavior can be captured by Ornstein-Uhlenbeck process. It is often used to characterize stationary mean-reverting data-generating process like $$dx_t = \theta (\mu-x_t),dt + \sigma, dW_t$$ where $x_t$ denotes the the value of the portfolio, often called the spread, $\theta$ and $\mu$ are two parameters to capture the magnitude of the mean-reverting force, and $\sigma$ is a parameter to capture the diverting volatility.

In R, a package named {sde} provides functions to deal with a wide range of stochasic differential equations including the discrete version of Ornstein-Uhlenbeck process. Here, I will show you how to fit an OU-process with discrete time series data.

First, we simulate an OU-process to generate some discrete data. Although {sde} package does not provide a specific function that only simulates this stochastic process, it offers a much more general one. sde.sim is a function to simulate any stochastic differential equation in the form $$dx_t = \mu(x_t,t),dt + \sigma(x_t,t), dW_t$$ where $\mu(x_t,t)$ is the drift function with respect to $x_t$ and $t$ , and $\sigma(x_t,t)$ is the volatility function also with respect to $x_t$ and $t$ .

The following code shows how we may simulate an OU-process with sde.sim. It also demonstrate the power of R because it allows you to create symbolic expressions to represent any stochastic differential equation that can be expressed by the formula above.

library(sde)
spread <- sde.sim(0,1,0,1000,
  drift = expression(0-0.5*x),
  sigma = expression(0.8),sigma.x = expression(0))

Once we simulate a spread process generated by an OU-process formulated by $$dx_t = (0-0.5x),dt + 0.8, dW_t$$ we may then use MLE to estimate the coefficients. Here we build a function called ou.lik that returns a closure (a function returned by an enclosing function) to represent the maximum likelihood function generated by a specific set of data x.

ou.lik <- function(x) {
  function(theta1,theta2,theta3) {
    n <- length(x)
    dt <- deltat(x)
    -sum(dcOU(x=x[2:n], Dt=dt, x0=x[1:(n-1)],
      theta=c(theta1,theta2,theta3), log=TRUE))
  }
}

One thing to remind is that dcOU function calculates the joint density of a given set of data assuming that the data follows an OU-process. However, it parameterizes OU-process in a little bit different way as we have just mentioned. The equation for dcOU is $$dx_t = (\theta_1 - \theta_2 x_t),dt + \theta_3 , dW_t$$ where $(\theta_1,\theta_2,\theta_3)$ is used to fully characterize the OU-process.

Given the above parameterization, then we call mle to perform maximal likelihood estimation for the spread process given an initial trial value of $(\theta_1,\theta_2,\theta_3)$ . To boost the speed, we constrain the parameter space to a smaller one, say, from $(0,10^{-5},10^{-3})$ to $(1,1,1)$ .

ou.fit <- mle(ou.lik(spread),
  start=list(theta1=0,theta2=0.5,theta3=0.2),
  method="L-BFGS-B",lower=c(0,1e-5,1e-3), upper=c(1,1,1))
ou.coe <- coef(ou.fit)
ou.coe

The fitting result will be printed to the console then.