Mathematical Software - Stochastic Processes - Tutorial

Stochastic Processes - Introduction

Definitions

Definition of Dynamical System

Nowadays, the concept of a Dynamical System covers systems of any nature (physical, chemical, biological, economical etc) both deterministic and stochastic. Such a system can be described with differential equations, functions from algebra of logic, graphs, Markov chains, etc. (Butenin et al., 1987, p.8)
Statistical Ensamble

Let us consider an ensamble of N different phase trajectories (starting from the same initial condition). This ensamble is determined by an ensamble of realizations of random sources perturbing the system. The statistical ensamble of N → ∞ realizations defines a stochastic process {X(t), t ∈ T}. A single phase trajectory (a realization of stochastic process) is denoted as x(t).
Joint Probability Densities

Let us measure realizations of the process X(t): x₁, x₂, … x_n at times t₁, t₂, … t_n to find the joint probability density p_n(x₁, t₁; …; x_n, t_n) for their occurence. The complete information on the dynamics of the stochastic process X(t) is given by the infinite sequence of n-time (n-dimensional) joint probability densities:

p_n(x₁, t₁; …; x_n, t_n) = ⟨δ(x₁ - x(t₁))…δ(x_n - x(t_n))⟩

where δ() is the Dirac's delta-function. Since an infinite number of distributions law is needed to describe fully a random process it is impossible to do this, in general. Thus one should extract as much knowledge as possible from an one - dimensional distribution and low-order moments (expectation, variance and autocorrelation function).

Numerical Approximations

A numerical approximation of a trajectory x(t) is given by such a finite set of random variables

{ x^τ_i: i =0, 1, … I }

that these variables x^τ_i approximate unknown random variables x(t_i) at

t_i = t₀ + iτ

where

τ = (T - t₀)/I.

A numerical approximation of a stochastic process X(t) is given by such a set of K random variables (set of trajectories)

{ {x^τ_i}^K_k=1 } ^I_i=0.

Stochastic Processes - Stationarity Analysis

Generally stationarity means invariance of some property under a time shift.

A stochastic process is called strictly stationary (stationarity in a narrow sense) if the joint probability density depends on the differences between two subsequent times. A change of t_i → t_i + τ for all i does not affect the probability density. Neither characteristic of a process changes under a time shift.

A stochastic process is called weakly stationary (stationarity in a wide sense) if its expectation, variance and autocorrelation function do not change under a time shift.

If a property of a stochastic process (e.g. its variance) does not change in time then the process is stationary with regard to that property.

An asymptotically stationary stochastic process is when a change of t_i → t_i + τ for t_i → ∞ does not affect the probability density.

Note. For a stationary process (in both senses), expectation and variance are constant values

E[X(t)] = const
σ²(t) = E[X(t) - E[X(t)]]² = const

and autocorrelation function depends only on the time difference

ρ(t₁, t₂) = f(t₂ - t₁).

Stationarity - Quantile

p-quantile line (ζ_p) is a deterministic line used to estimate dynamic behavior of time series. For p from the range (0, 1) and a given stochastic process X

X = {X(t): t ∈ [t₀, T]}

a p-quantile line is defined as

P{X(t) ≤ ζ_p(t)} = p

and is approximated by the following sequence obtained from K realization of the stochastic process

ζ_p(i) ≈ x^K_i,p = x_{i, (k_i(p))}

where x_{i,(k_i(p))} is a random value taken from a non - decaying sequence built from K realization of a stochastic process at the i-th time

k_i(p) =

{

pK when pK ∈ N

⌊pK⌋ + 1, otherwise.

Note. For a stationary processes p - quantile line are parallel to each other.

Stochastic Processes - Ergodicity

A process is called strictly ergodic if all its characteristics can be determined from its single (infinitely long) realization. Temporal averaging and ensamble averaging give the same result. In the case when only certain characteristics of a process can be restored from the single realization, the process is called ergodic in respect to those characteristics. An ergodic process is stationary. The reverse statement is not true, in general.

Stochastic Processes - Spectral Analysis.

Spectral analysis means any representation of a signal as a superposition of some basis functions. The term spectrum refers to a set of those functions (components).

Fourier Transform and Power Spectrum.

Fourier Transform means decomposition of a signal into harmonic components. A detected signal may represent a superposition of signals from several sources. When each of those sources demonstrates harmonic oscillations with its own frequency then its intensity in observed signal is reflected by the value of the power spectrum at the corresponding frequency.

The Fourier Transform of a continuous signal is defined as

	∞
F(f) =	∫	x(t) e^-i2πft dt
	-∞

In the case of discrete time a random sequence {x(t_n)}^N-1_n=0 is converting into frequency sequence via the discrete Fourier transform

	N-1
DFT(m) =	∑	x(n)e^-i2πnm/N.
	n=0

DFT(m) is the m-th component of DFT, N is a number of input signal time points and x(n) is the n-th time point of time series x. The DFT sequence is of complex type and its absolute value reads

DFT_abs(m) = |DFT(m)| = (DFT²_real(m) + DFT²_imag(m))^1/2

DFT is computed using the standard algorithm of the Fast Fourier Transform (FFT) with a smoothing window:

	N-1
DFT_w(m) =	∑	w(n)x(n)e^-i2πnm/N
	n=0

where w(n) window can be of

triangular type

w(n) = {

n/(N/2) for n = 0,…, N/2

2 - n/(N/2) for n = N/2+1,…, N-1
Hanning type

w(n) = 0.5 - 0.5 cos(2πn/N) for n = 0,…, N-1
Hamming type

w(n) = 0.54 - 0.46 cos(2πn/N) for n = 0,…, N-1

Power Spectrum

The Power Spectrum (PS) of a single deterministic periodic signal is well-defined. However, when one considers a realisation of a random process X(t) the problem is more complicated. First of all, random quantities are almost always non-periodic. Moreover, integrals for continuous Fourier transforms almost always do not exist. Then spectral properties of a stochastic process are described by the finitary Fourier Transform i. e. integrals over an interval [-T/2, T/2]:

	T/2
FT_T(ω) =	∫	x(t) e^(-iωt) dt
	-T/2

To find an estimator for the power spectrum S(ω) the expectation of the |FT_T(ω)|² is computed thus the power spectrum is given by the expression

S(ω) =	lim	2 ⟨ \|FT_T(ω) \|²⟩ / T
	T → ∞

In the program, the set of {DFT_abs(m)}^M_m=1 values is obtained numerically with the discrete Fourier transform (DFT) for each of N realizations. Then to get an estimator of the power spectrum S(m) averaging over realizations of the considered process is performed.

Note 1. The Power Spectrum allow to detect different sources of oscillations and estimate their relative intensity.
Note 2. Fourier transforms and Power Spectrum of stochastic processes exist only as expressions averaged values over different realizations.
Note 3. The power spectrum of stationary processes equals the doubled Fourier transform of the autocorrelation function (the Wiener-Khinchin theorem).

Stochastic Processes - Expectation

The first-order moment is the expectation of X

	∞
E[X(t)] =	∫	x p(x,t) dx
	-∞

where p(x,t) is a probability density function.

An unbiased estimator of the expectation from a sample of N independent values is the sample mean:

	N
⟨ x ⟩ = 1/N	∑	x_i
	i=1

Thus, the expectation E[X(t_m)] for fixed t_m is computed as arithmetic means over N realizations.

Stochastic Processes - Variance

The second-order central moment is called variance

	∞
E[(X(t) - E[X(t)])²] =	∫	(x - E[X(t)])² p(x,t) dx
	-∞

where p(x,t) is a probability density function.

An unbiased estimator of the variance from a sample of N independent values is the sample variance:

	N
σ² = 1/(N - 1)	∑	(x_i - ⟨x⟩)²
	i=1

Hence, the expectation E[(X(t_m) - E[X(t_m)])₂] for fixed t_m is computed as arithmetic means over N realizations.

Stochastic Processes - Correlation Functions

Autocorrelation Function

The correlation function of a stochastic process X(t) can be computed in two ways. First, in both cases, N realizations of the signal X(t) are detected or loaded from a file. Then

for fixed t₁ and t₂ the expectations are computed as arithmetic means over N realizations

N

E[X(t_1,2)] = 1/N ∑ x_k(t_1,2)

k=1

After that, ensemble averaging is done to find the autocovariance function defined as the expectation

K(t₁,t₂) = E[(X(t₁) - E[X(t₁)])(X(t₂) - E[X(t₂)])].

If it is normalized by root-mean-squared deviations one gets autocorrelation function

ρ(t₁,t₂) = K(t₁, t₂)/(σ(t₁)σ(t₂))

where

N

σ²(t_1,2) = E[(X(t_1,2) - E[X(t_1,2)])²] = 1/(N-1) ∑ (x_k(t_1,2) - E[X(t_1,2)])²

k=1
the time-averaged value of the x variable for a single realization of an ergodic process X(t) is computed as an arithmetic mean

I-1

⟨x⟩_t = 1/I ∑ x(t_i)
i=0

where i is a moment in time. After that, time averaging is done to find the mean product of

I-m-1

ρ(τ) = ⟨x(t)x(t+τ)⟩_t = 1/(I-m) ∑ x_i x_i+m
i=0

where m = 0,1,…, (I - L), k = 1, …, N and Δt is a time step. L (cut-off) is set as I/2 but can be changed by an user, however, must be greater than 1. And normalized to the maximum value at τ = 0

ρ_N(τ) = ρ(τ)/ρ(0).

Cross-Correlation Function

The cross - correlation function of two stochastic processes X(t) and Y(t) is computed to reveal character of coupling between sources of both signals.
To do this, first, N realizations of signals X(t) and Y(t) are detected or loaded from a file. Next,

ensemble averaging is done to find the cross-covariance function defined as the expectation

K_xy(t₁,t₂) = E[(X(t₁) - E[X(t₁)])(Y(t₂) - E[Y(t₂)])].

If it is normalized by root-mean-squared deviations one gets cross-correlation function

ρ_xy(t₁,t₂) = K_xy(t₁, t₂)/(σ_x(t₁)σ_y(t₂))
the time-averaged value of x and y variables for a single realization of ergodic processes X(t) and Y(t) are computed

I-1

⟨s⟩_t = 1/I ∑ s(t_i)
i=0

where i denotes temporal index and s = {x, y}. After that, time averaging is done to find the mean product of

I-m-1

ρ^xy(τ) = ⟨x(t)y(t+τ)⟩_t ≈ 1/(I-m) ∑ x_i y_i+m = ρ^xy_m
i=0

and normalized to the maximum value

ρ^xy_N,m = ρ^xy_m / ρ^xy₀.

Note 1. The absolute value of a correlation function between two processes vary between 0 and 1. For deterministically linear dependence between them the correlation function reaches 1 while for statistically independent processes it is 0.
Note 2. For the stationary processes the absolute value of the correlation function obeys the Cauchy-Schwartz inequality

|ρ^xy(τ)|² ≤ ⟨x²⟩⟨y²⟩.

where

ρ^xy(τ) = ∫ dxdyxyp_xy(x, t; y, t + τ).

Correlation Function - Correlation Time

The correlation time τ_C for a continuous normalized autocorrelation function ρ_N(τ) is defined as

	∞
τ_C =	∫	dτ\|ρ_N(τ)\|
	0

whereas for a discrete sequence {ρ_N,m} one has

	M
τ_C =	∑	Δt\|ρ_N,m\|
	m=0

where Δt is a time step.

Correlation Functions - Fourier Transform

Fourier transform of

the autocorrelation function:

∞

G_xx(ω) = ∫ dτ ρ^xx(τ)e^-iωτ

-∞
and of the cross-correlation function

∞

G_xy(ω) = ∫ dτ ρ^xy(τ)e^-iωτ

-∞

exists if the correlation functions are absolutely integrable.

Stochastic Processes - Probability Density Function.

Histogram.

Probability Density Function (PDF) of a random probe of K size embeded in the range [a, b] can be estimated by H(x) function defined as

	K
H_K(x) = 1/(Kh)	∑	I_{(x_l, x_l+1]}(X_k)
	k=1

for x ∈ (x_l, x_l+1] where l = 0,1,…, L-1 and

x_l = a + lh, h =(b-a)/L

Rosenblatt-Parzen Method.

Probability Density Function (PDF) can be estimated by f_K(x) function defined as

	K
f_K(t_i,x_j) = 1/K	∑	J((x_j - X^τ_i,k)/b_{K_i})/b_{K_i}
	k=1

where x_j ∈ ℜ and t_i ∈ [t₀, T]. J denotes a Rosenblatt-Parzen's kernel of

rectangular type

J(u) = 1/2 for |u| ≤ 1
J(u) = 0 for |u| > 1
triangular type

J(u) = 1/√6 - |u|/6 for |u| ≤ √6
J(u) = 0 for |u| > √6
gaussian type

J(u) = (2π)^-1/2e^-u²/2
optimal type

J(u) = 3/(4√5) (1 - u²/5) for |u| ≤ √5
J(u) = 0 for |u| > √5

and

{ {X^τ_i(ω_k)}^K_k=1 }^I_i=0

is a random sample of K size from an unknown distribution. τ is a time step and time indices i form the sequence i = 0,…,I.

Entropy Function.

Entropy function H(t) is defined as

	∞
H(t) = -	∫	f(t,x) log f(t,x) dx
	-∞

where f(t,x) is a probability density function. H(t) is approximated by the sum

	J
H_i = H(t_i) = -	∑	f_i,j log f_i,j Δx
	j=0

Stochastic Exponent.

Stochastic exponent X(t) of a semimartingale process Z(t) is defined as the solution of the equation

	t
X(t) = 1 +	∫	X(s-) dZ(s)
	0

where Z(s) can be

gaussian process with the random measure

M([t_i-1,t_i)) = τ^1/2ζ_i where ζ_i = N(0,1)

where N(0,1) is the standard normal distribution
α - stable Levy motion with the random measure

M([t_i-1,t_i)) = τ^1/αζ_i where ζ_i = S_α,β

where S_{α, β} is a random variable from α-stable Levy distribution
poissonian process with the random measure

M([t_i-1,t_i)) = N_λ,i - N_λ,i-1

where N_{λ, i} is a compensated poissonian process with intensity λ
or their linear combination

Z(s) often is a cadlag process (or a RCLL process i. e. Right Continuous with Left Limits) i.e.

when t → s then X(s-) → X(t)

X(t) process where t ∈ [0, T] is approximated by a discrete process

{X^τ_{t_i}}^I_i=0

defined as

X^τ_i = X^τ_i-1 + X^τ_i-1 M([t_i-1,t_i))
t_i = i τ, τ = T/I

where X^τ₀ = 1, i = 0, …,I, and M([t_i-1,t_i)) is a random measure defined above for each stochastic process.

Cylindric Measure.

Let's define set of points

t_{i_l} ∈ [t₀, T]

and corresponding ranges

[a_l, b_l] when l = 1,…,L

then the expression

P{X(t_{i_l}) ∈ [a_l, b_l]} l = 1,…,L

defines probability that values of the X process belong to the given range.

Stochastic Process - References

^{^} Vadim S. Anishchenko et al., Nonlinear Dynamics of Chaotic and Stochastic Systems, Springer-Verlag, 2007
^{^} Boris P. Bezruchko and Dmitry A. Smirnov, Extracting Knowledge From Time Series, Springer-Verlag, 2010
^{^} Aleksander Janicki and Adam Izydorczyk, Komputerowe metody w modelowaniu stochastycznym, Wydawnictwa Naukowo-Techniczne, Warszawa 2001
^{^} Richard G. Lyons, Understanding Digital Signal Processing, Addison Wesley Longman, Inc., 1997

Machine Learning - OptFinderML

Package for machine learning - OptFinderML.

	N
σ²(t_1,2) = E[(X(t_1,2) - E[X(t_1,2)])²] = 1/(N-1)	∑	(x_k(t_1,2) - E[X(t_1,2)])²
	k=1

	I-m-1
ρ^xy(τ) = ⟨x(t)y(t+τ)⟩_t ≈ 1/(I-m)	∑	x_i y_i+m	= ρ^xy_m
	i=0

	I-1
⟨x⟩_t = 1/I	∑	x(t_i)
	i=0

	I-1
⟨s⟩_t = 1/I	∑	s(t_i)
	i=0

	∞
G_xx(ω) =	∫	dτ ρ^xx(τ)e^-iωτ
	-∞

	∞
G_xy(ω) =	∫	dτ ρ^xy(τ)e^-iωτ
	-∞