# Mathematical Software - Stochastic Processes - Tutorial

## Stochastic Processes - Introduction

• ### Definition of Dynamical System

Nowadays, the concept of a Dynamical System covers systems of any nature (physical, chemical, biological, economical etc) both deterministic and stochastic. Such a system can be described with differential equations, functions from algebra of logic, graphs, Markov chains, etc. (Butenin et al., 1987, p.8)

• ### Statistical Ensamble

Let us consider an ensamble of N different phase trajectories (starting from the same initial condition). This ensamble is determined by an ensamble of realizations of random sources perturbing the system. The statistical ensamble of N → ∞ realizations defines a stochastic process {X(t), t ∈ T}. A single phase trajectory (a realization of stochastic process) is denoted as x(t).

• ### Joint Probability Densities

Let us measure realizations of the process X(t): x1, x2, … xn at times t1, t2, … tn to find the joint probability density pn(x1, t1; …; xn, tn) for their occurence. The complete information on the dynamics of the stochastic process X(t) is given by the infinite sequence of n-time (n-dimensional) joint probability densities:

pn(x1, t1; …; xn, tn) = ⟨δ(x1 - x(t1))…δ(xn - x(tn))⟩

where δ() is the Dirac's delta-function. Since an infinite number of distributions law is needed to describe fully a random process it is impossible to do this, in general. Thus one should extract as much knowledge as possible from an one - dimensional distribution and low-order moments (expectation, variance and autocorrelation function).

### Numerical Approximations

A numerical approximation of a trajectory x(t) is given by such a finite set of random variables

{ xτi: i =0, 1, … I }

that these variables xτi approximate unknown random variables x(ti) at

ti = t0 + iτ

where

τ = (T - t0)/I.

A numerical approximation of a stochastic process X(t) is given by such a set of K random variables (set of trajectories)

{ {xτi}Kk=1 } Ii=0.

## Stochastic Processes - Stationarity Analysis

Generally stationarity means invariance of some property under a time shift.

A stochastic process is called strictly stationary (stationarity in a narrow sense) if the joint probability density depends on the differences between two subsequent times. A change of ti → ti + τ for all i does not affect the probability density. Neither characteristic of a process changes under a time shift.

A stochastic process is called weakly stationary (stationarity in a wide sense) if its expectation, variance and autocorrelation function do not change under a time shift.

If a property of a stochastic process (e.g. its variance) does not change in time then the process is stationary with regard to that property.

An asymptotically stationary stochastic process is when a change of ti → ti + τ for ti → ∞ does not affect the probability density.

Note. For a stationary process (in both senses), expectation and variance are constant values

E[X(t)] = const
σ2(t) = E[X(t) - E[X(t)]]2 = const

and autocorrelation function depends only on the time difference

ρ(t1, t2) = f(t2 - t1).

### Stationarity - Quantile

p-quantile linep) is a deterministic line used to estimate dynamic behavior of time series. For p from the range (0, 1) and a given stochastic process X

X = {X(t): t ∈ [t0, T]}

a p-quantile line is defined as

P{X(t) ≤ ζp(t)} = p

and is approximated by the following sequence obtained from K realization of the stochastic process

ζp(i) ≈ xKi,p = xi, (ki(p))

where xi,(ki(p)) is a random value taken from a non - decaying sequence built from K realization of a stochastic process at the i-th time

ki(p) = {
 pK when pK ∈ N ⌊pK⌋ + 1, otherwise.

Note. For a stationary processes p - quantile line are parallel to each other.

## Stochastic Processes - Ergodicity

A process is called strictly ergodic if all its characteristics can be determined from its single (infinitely long) realization. Temporal averaging and ensamble averaging give the same result. In the case when only certain characteristics of a process can be restored from the single realization, the process is called ergodic in respect to those characteristics. An ergodic process is stationary. The reverse statement is not true, in general.

## Stochastic Processes - Spectral Analysis.

Spectral analysis means any representation of a signal as a superposition of some basis functions. The term spectrum refers to a set of those functions (components).

### Fourier Transform and Power Spectrum.

Fourier Transform means decomposition of a signal into harmonic components. A detected signal may represent a superposition of signals from several sources. When each of those sources demonstrates harmonic oscillations with its own frequency then its intensity in observed signal is reflected by the value of the power spectrum at the corresponding frequency.

The Fourier Transform of a continuous signal is defined as

 ∞ F(f) = ∫ x(t) e-i2πft dt -∞

In the case of discrete time a random sequence {x(tn)}N-1n=0 is converting into frequency sequence via the discrete Fourier transform

 N-1 DFT(m) = ∑ x(n)e-i2πnm/N. n=0

DFT(m) is the m-th component of DFT, N is a number of input signal time points and x(n) is the n-th time point of time series x. The DFT sequence is of complex type and its absolute value reads

DFTabs(m) = |DFT(m)| = (DFT2real(m) + DFT2imag(m))1/2

DFT is computed using the standard algorithm of the Fast Fourier Transform (FFT) with a smoothing window:

 N-1 DFTw(m) = ∑ w(n)x(n)e-i2πnm/N n=0

where w(n) window can be of

• triangular type

w(n) = {  n/(N/2) for n = 0,…, N/2 2 - n/(N/2) for n = N/2+1,…, N-1
• Hanning type

w(n) = 0.5 - 0.5 cos(2πn/N) for n = 0,…, N-1
• Hamming type

w(n) = 0.54 - 0.46 cos(2πn/N) for n = 0,…, N-1

#### Power Spectrum

The Power Spectrum (PS) of a single deterministic periodic signal is well-defined. However, when one considers a realisation of a random process X(t) the problem is more complicated. First of all, random quantities are almost always non-periodic. Moreover, integrals for continuous Fourier transforms almost always do not exist. Then spectral properties of a stochastic process are described by the finitary Fourier Transform i. e. integrals over an interval [-T/2, T/2]:

 T/2 FTT(ω) = ∫ x(t) e(-iωt) dt -T/2

To find an estimator for the power spectrum S(ω) the expectation of the |FTT(ω)|2 is computed thus the power spectrum is given by the expression

 S(ω) = lim 2 ⟨ |FTT(ω) |2⟩ / T T → ∞

In the program, the set of {DFTabs(m)}Mm=1 values is obtained numerically with the discrete Fourier transform (DFT) for each of N realizations. Then to get an estimator of the power spectrum S(m) averaging over realizations of the considered process is performed.

Note 1. The Power Spectrum allow to detect different sources of oscillations and estimate their relative intensity.
Note 2. Fourier transforms and Power Spectrum of stochastic processes exist only as expressions averaged values over different realizations.
Note 3. The power spectrum of stationary processes equals the doubled Fourier transform of the autocorrelation function (the Wiener-Khinchin theorem).

## Stochastic Processes - Expectation

The first-order moment is the expectation of X

 ∞ E[X(t)] = ∫ x p(x,t) dx -∞

where p(x,t) is a probability density function.

An unbiased estimator of the expectation from a sample of N independent values is the sample mean:

 N ⟨ x ⟩ = 1/N ∑ xi i=1

Thus, the expectation E[X(tm)] for fixed tm is computed as arithmetic means over N realizations.

## Stochastic Processes - Variance

The second-order central moment is called variance

 ∞ E[(X(t) - E[X(t)])2] = ∫ (x - E[X(t)])2 p(x,t) dx -∞

where p(x,t) is a probability density function.

An unbiased estimator of the variance from a sample of N independent values is the sample variance:

 N σ2 = 1/(N - 1) ∑ (xi - ⟨x⟩)2 i=1

Hence, the expectation E[(X(tm) - E[X(tm)])2] for fixed tm is computed as arithmetic means over N realizations.

## Stochastic Processes - Correlation Functions

### Autocorrelation Function

The correlation function of a stochastic process X(t) can be computed in two ways. First, in both cases, N realizations of the signal X(t) are detected or loaded from a file. Then

1. for fixed t1 and t2 the expectations are computed as arithmetic means over N realizations

 N E[X(t1,2)] = 1/N ∑ xk(t1,2) k=1

After that, ensemble averaging is done to find the autocovariance function defined as the expectation

K(t1,t2) = E[(X(t1) - E[X(t1)])(X(t2) - E[X(t2)])].

If it is normalized by root-mean-squared deviations one gets autocorrelation function

ρ(t1,t2) = K(t1, t2)/(σ(t1)σ(t2))

where

 N σ2(t1,2) = E[(X(t1,2) - E[X(t1,2)])2] = 1/(N-1) ∑ (xk(t1,2) - E[X(t1,2)])2 k=1
2. the time-averaged value of the x variable for a single realization of an ergodic process X(t) is computed as an arithmetic mean

 I-1 ⟨x⟩t = 1/I ∑ x(ti) i=0

where i is a moment in time. After that, time averaging is done to find the mean product of

 I-m-1 ρ(τ) = ⟨x(t)x(t+τ)⟩t = 1/(I-m) ∑ xi xi+m i=0

where m = 0,1,…, (I - L), k = 1, …, N and Δt is a time step. L (cut-off) is set as I/2 but can be changed by an user, however, must be greater than 1. And normalized to the maximum value at τ = 0

ρN(τ) = ρ(τ)/ρ(0).

### Cross-Correlation Function

The cross - correlation function of two stochastic processes X(t) and Y(t) is computed to reveal character of coupling between sources of both signals.
To do this, first, N realizations of signals X(t) and Y(t) are detected or loaded from a file. Next,

1. ensemble averaging is done to find the cross-covariance function defined as the expectation

Kxy(t1,t2) = E[(X(t1) - E[X(t1)])(Y(t2) - E[Y(t2)])].

If it is normalized by root-mean-squared deviations one gets cross-correlation function

ρxy(t1,t2) = Kxy(t1, t2)/(σx(t1y(t2))
2. the time-averaged value of x and y variables for a single realization of ergodic processes X(t) and Y(t) are computed

 I-1 ⟨s⟩t = 1/I ∑ s(ti) i=0

where i denotes temporal index and s = {x, y}. After that, time averaging is done to find the mean product of

 I-m-1 ρxy(τ) = ⟨x(t)y(t+τ)⟩t ≈ 1/(I-m) ∑ xi yi+m = ρxym i=0

and normalized to the maximum value

ρxyN,m = ρxym / ρxy0.

Note 1. The absolute value of a correlation function between two processes vary between 0 and 1. For deterministically linear dependence between them the correlation function reaches 1 while for statistically independent processes it is 0.
Note 2. For the stationary processes the absolute value of the correlation function obeys the Cauchy-Schwartz inequality

xy(τ)|2 ≤ ⟨x2⟩⟨y2⟩.

where

 ρxy(τ) = ∫ dxdyxypxy(x, t; y, t + τ).

### Correlation Function - Correlation Time

The correlation time τC for a continuous normalized autocorrelation function ρN(τ) is defined as

 ∞ τC = ∫ dτ|ρN(τ)| 0

whereas for a discrete sequence {ρN,m} one has

 M τC = ∑ Δt|ρN,m| m=0

where Δt is a time step.

### Correlation Functions - Fourier Transform

Fourier transform of

• the autocorrelation function:

 ∞ Gxx(ω) = ∫ dτ ρxx(τ)e-iωτ -∞
• and of the cross-correlation function

 ∞ Gxy(ω) = ∫ dτ ρxy(τ)e-iωτ -∞

exists if the correlation functions are absolutely integrable.

## Stochastic Processes - Probability Density Function.

### Histogram.

Probability Density Function (PDF) of a random probe of K size embeded in the range [a, b] can be estimated by H(x) function defined as

 K HK(x) = 1/(Kh) ∑ I(xl, xl+1](Xk) k=1

for x ∈ (xl, xl+1] where l = 0,1,…, L-1 and

xl = a + lh, h =(b-a)/L

### Rosenblatt-Parzen Method.

Probability Density Function (PDF) can be estimated by fK(x) function defined as

 K fK(ti,xj) = 1/K ∑ J((xj - Xτi,k)/bKi)/bKi k=1

where xj ∈ ℜ and ti ∈ [t0, T]. J denotes a Rosenblatt-Parzen's kernel of

• rectangular type

J(u) = 1/2 for |u| ≤ 1
J(u) = 0 for |u| > 1
• triangular type

J(u) = 1/√6 - |u|/6 for |u| ≤ √6
J(u) = 0 for |u| > √6
• gaussian type

J(u) = (2π)-1/2e-u2/2
• optimal type

J(u) = 3/(4√5) (1 - u2/5) for |u| ≤ √5
J(u) = 0 for |u| > √5

and

{ {Xτik)}Kk=1 }Ii=0

is a random sample of K size from an unknown distribution. τ is a time step and time indices i form the sequence i = 0,…,I.

## Entropy Function.

Entropy function H(t) is defined as

 ∞ H(t) = - ∫ f(t,x) log f(t,x) dx -∞

where f(t,x) is a probability density function. H(t) is approximated by the sum

 J Hi = H(ti) = - ∑ fi,j log fi,j Δx j=0

## Stochastic Exponent.

Stochastic exponent X(t) of a semimartingale process Z(t) is defined as the solution of the equation

 t X(t) = 1 + ∫ X(s-) dZ(s) 0

where Z(s) can be

• gaussian process with the random measure

M([ti-1,ti)) = τ1/2ζi where ζi = N(0,1)

where N(0,1) is the standard normal distribution

• α - stable Levy motion with the random measure

M([ti-1,ti)) = τ1/αζi where ζi = Sα,β

where Sα, β is a random variable from α-stable Levy distribution

• poissonian process with the random measure

M([ti-1,ti)) = Nλ,i - Nλ,i-1

where Nλ, i is a compensated poissonian process with intensity λ

• or their linear combination

Z(s) often is a cadlag process (or a RCLL process i. e. Right Continuous with Left Limits) i.e.

when t → s then X(s-) → X(t)

X(t) process where t ∈ [0, T] is approximated by a discrete process

{Xτti}Ii=0

defined as

Xτi = Xτi-1 + Xτi-1 M([ti-1,ti))
ti = i τ, τ = T/I

where Xτ0 = 1, i = 0, …,I, and M([ti-1,ti)) is a random measure defined above for each stochastic process.

## Cylindric Measure.

Let's define set of points

til ∈ [t0, T]

and corresponding ranges

[al, bl] when l = 1,…,L

then the expression

P{X(til) ∈ [al, bl]} l = 1,…,L

defines probability that values of the X process belong to the given range.

## Machine Learning - OptFinderML

Package for machine learning - OptFinderML.