$ \newcommand{\Ha}{\mathcal H} \newcommand{\p}[1]{\left(#1\right)} \newcommand{\mI}{\mathbb{I}} \newcommand{\tr}{\mathrm{tr}} \newcommand{\vect}[1]{\underline{#1}} \newcommand{\Res}{\mathrm{Res}} \newcommand{\det}{\mathrm{det}} \newcommand{\mR}{\mathcal{R}} \newcommand{\Lf}{\mathcal{L}} \newcommand{\mE}{\mathcal{E}} \newcommand{\mG}{\mathcal{G}} \newcommand{\mA}{\mathcal{A}} \newcommand{\mP}{\mathcal{P}} \newcommand{\mQ}[1]{\mathcal{Q}\p{#1}} \newcommand{\fLf}[1]{ \frac{\Lf\p{#1} } {\Lf\p{\mE^n}}} \newcommand{\Esp}[1]{\left<#1\right>} \newcommand{\bra}[1]{\left<#1\right|} \newcommand{\ket}[1]{\left|#1\right>} \newcommand{\as}[1]{ \overset{\mathrm{a.s}} {#1} } \newcommand{\conv}[1] {\underset{#1 \rightarrow +\infty} {\rightarrow} } \newcommand{\convAs}{ \as{\conv{n}} } \newcommand{\R}{\mathbb{R}} \newcommand{\Prob}{\mathbb{P}} \newcommand{\Dist}[1]{\overset{\mathcal{D}}{#1}} \newcommand{\convD} {\Dist{ \conv{n} } } \newcommand{\Lc}{L_c} \newcommand{\Lg}{L_{\Gamma}} \newcommand{\lNormal}[2]{\mathcal{N}_{#1,#2}} \newcommand{\iid}{\mathrm{i.i.d.}} \newcommand{\cpath}{\mapsto} \newcommand{\wcon}{\leftrightarrow} \newcommand{\scon}{\leftrightarrows} \newcommand {\card} {\mathrm{card}} $
Large deviation for matrix-correlated random variables
4 November 2014
Florian Angeletti
NITheP
Work in collaboration with Hugo Touchette, Patrice Abry and Eric Bertin.
Equilibrium stationary solution:
Microcanonical ensemble: $$ p(x_1, \dots, x_n ) = \mathrm{cst} $$
Canonical ensemble: $$ p(x_1, \dots, x_n ) = e^{-\beta \Ha(x_1,\dots, x_n)} $$
Non equilibrium stationary solution?
A simple and iconic out-of-equilibrium systems
Asymmetric: Particles only move from left to right
Exclusion : One particle by site
Creation rate $ \alpha $
Destruction rate $ \beta $
Matrix product ansatz (Derrida and al 1993)
\[ p(x_1,\dots,x_n)= \frac{\bra{W}R(x_1)\dots R(x_n)\ket{V}}{\bra{W}(R(0)+R(1))^n\ket{V}} \]
matrix $ R(x) $
Long range correlation
Similar solution for 1D diffusion-reaction system
Formal similarity with DMRG
Mathematical model
\[ p(x_1,\dots,x_n) \approx \mR(x_1) \cdots \mR(x_n) \]
Study the statistical properties of theses models
Hidden Markov model representation
Signal processing application
Topology induces correlation
Large deviation functions
Limit distributions for the sums
Limit distributions for the extremes
Then go back to physical models
\[ p(x_1,\dots,x_n) = \fLf{ \mR(x_1) \dots \mR(x_n)} \]
linear form $\Lf$: $ \Lf(M) = \tr(\mA^T M) $
$ \mA $: $ d \times d $ positive matrix
$ \mR(x) $: $ d \times d$ positive matrix function
structure matrix \[ \mE = \int_\R \mR(x) dx \]
probability density function matrix \[ \mP_{i,j} (x) = \mR(x) / \mE_{i,j} \]
$ d> 1 $: Commutativity $\implies$ Correlation
Product structure: $p(x_1,\dots,x_n) = \fLf{ \mR(x_1) \dots \mR(x_n)}$
Moment matrix: $ \mQ{q} = \int x^q R(x) dx $
\[ \Esp{X_k^p} = \frac{\Lf\p{\mE^{k-1} \mQ{p} \mE^{n-k}}}{\Lf\p{\mE^n} } \] \[ \Esp{X_k X_l} = \frac{\Lf\p{\mE^{k-1} \mQ{1} \mE^{l-k-1} \mQ{1} \mE^{n-l}}}{\Lf\p{\mE^n} } \] \[ \Esp{X_k X_l X_m} = \frac{\Lf\p{\mE^{k-1} \mQ{1} \mE^{l-k-1} \mQ{1} \mE^{m-l-1} \mQ{1} \mE^{n-m}}}{\Lf\p{\mE^n} } \] \[ \dots \]
\[ \Esp{X_k X_l} = \fLf{ \mE^{k-1} \mQ{1} \mE^{l-k-1} \mQ{1} \mE^{n-l} } \]
The dependency structure of $X$ depends on the behavior of $\mE^n$
$\lambda_k$ eigenvalues of $\mE$ ordered by their real parts $\Re (\lambda_1) \Re (\lambda_2) > \dots > \lambda_r$
$J_{k,l}$ Jordan block associated with eigeivalue $\lambda_k$
\[ \mE = B^{-1} \begin{pmatrix} J_{1,1} & 0 & \cdots & \cdots & 0 \\ 0 & \ddots & \ddots & & \vdots \\ \vdots & \ddots & J_{k,l} & \ddots& \vdots \\ \vdots & & \ddots & \ddots& 0 \\ 0 & \cdots & \cdots & 0 & J_{r,m} \end{pmatrix} B, \quad J_{k,l} = \begin{pmatrix} \lambda_k& 1 & 0 & \cdots & 0 \\ 0 & \ddots & \ddots & \ddots & \vdots \\ \vdots & \ddots & \ddots & \ddots & 0 \\ \vdots & & \ddots & \ddots & 1 \\ 0 & \cdots & \cdots & 0 & \lambda_k \end{pmatrix} \]
Case 1: Short-range correlation
More than one distinct eigenvalue $\lambda_k$: Short-range exponential correlation
\[ \Esp{X_k X_l} - \Esp{X_k}\Esp{X_l} \approx \sum \alpha_k \frac{\lambda_k}{\lambda_1}^{|k-l|} \]
Case 2: Constant correlation
More than one block $J_{1,k}$: Constant correlation term
Case 3: Long-range correlation
At least one block $J_{1,k}$ with size $p > 1$: Polynomial long range correlation
$\Esp{X_k X_l} - \Esp{X_k}\Esp{X_l} \approx P(\frac{k}{n}, \frac{k-l}{n}, \frac{l}{n} ), \quad P \in \R[X,Y,Z]$
\[ p(x_1,\dots,x_n) = \fLf{ \mR(x_1) \dots \mR(x_n)} \] How do we generate a random vector $X$ for a given triple $(\mA,\mE,\mP)$?
Expand the matrix product \[ p(x_1,\dots,x_n)= \frac 1 {\Lf\p{\mE^n} } \sum_{\Gamma \in \{1,\dots,d\}^{n+1}} \mA_{\Gamma_1,\Gamma_{n+1}} \mE_{\Gamma_1,\Gamma_2} \mP_{\Gamma_1,\Gamma_2}(x_1) \dots \mE_{\Gamma_n,\Gamma_{n+1}} \mP_{\Gamma_n,\Gamma_{n+1}}(x_n) \] \[ p(x_1,\dots,x_n) = \sum_{\Gamma} \kappa(\Gamma) P(\vect X|\Gamma) \]
$\Gamma$, hidden Markov chain
Markov chain $\Gamma$
Observable $\vect X= X_1, \dots, X_k$
$( X_k | \Gamma_k) $ is distributed according to the pdf $ p( X_k | \Gamma_k) $
Hidden Markov Chain $\Gamma$
\[ p( \Gamma_k=i | \Gamma_{k+1}=j, \Gamma_{n}=f ) = \mE_{i,j} \frac{\p{\mE^{n-k}}_{j, f}}{ \p{ \mE^{n-k+1}}_{i,f}} \] \[ p(\Gamma_0=i,\Gamma_n=j) = \frac{A_{i,j} \mE^n_{i,j} }{\Lf\p{\mE^n}} \]
Conditional pdf $(X|\Gamma)$
\[ p( X_k=x | \Gamma_k=i, \Gamma_{k+1}=j) = \mP_{i,j}(x) \]
Non-homogeneous markov chain
Specific non-homogeneous Hidden Markov model:
Hidden Markov Model $\nRightarrow$ Matrix representation
Matrix representation
Algebraic properties
Statistical properties computation
Hidden Markov Model
$2$-layer model:
correlated layer $+$ independant layer
Efficient synthesis
Sum
\[ S(\vect X) = \frac{1} {n} \sum_{i=1}^n X_i \]
Law of large number: existence of a concentration point
Central limit theorem: fluctuations around this concentration point
Large deviation: fluctuations far away of the concentrations points
Large deviation principle
\[ P( S(\vect X)/n = s ) \approx e^{-n I(s) } \]
Do we have a large principle? Two paths:
Hidden Markov chain representation
Matrix representation
Generating function \[ g_n(w) = \Esp{e^{w S_n}} \]
Scaled cumulant generating function $\Phi(w)$ \[ \Phi(w) = \lim_{n \rightarrow \infty} \frac{\ln g_n(w)}{n} \]
Gärtner-Ellis Theorem
If $\Phi(w)$ exists and is differentiable, $I(s)$ exists and is \[ I(s) = \sup_{w} \{w s - \Phi(w) \} \]
\[ g_n(w) = \fLf{\mG(w)^n} \]
matrix generating function: \[ \mG(w) = \int_\R \mR(x) e^{-w x} dx \]
scgf: \[ \Phi(w) = \ln \lambda_1(w) \]
$\lambda_1(w)$ dominant eigenvalue of $\mG(w)$
\[ \Phi(w)= \lim_{n→+\infty} \frac{1}{n}\ln \tr \rho \Pi(w)^{n-1}, \]
$\pi(w)$ tilted matrix: \[ \pi(w)_{i,j}=\pi_{i,j}e^{w j} \]
$\pi$ : transition matrix of the Markov chain
$\rho$ initial distribution
Differences
$\pi(0)$ is a stochastic matrix
$\mG(0)=\mE$ is a positive matrix
Behavior of $I(s)$ around the concentration point?
$\lambda_k(w)$ are roots of
\[\det(\mG(w) - X \mI ) = 0 \]
Inverse function theorem
Outside of eigenvalue collision, $\lambda_1(w)$ is as smooth as $\mG(w)$
$\mE$ irreducible $\iff G(\mE) $ is strongly connected $ \iff \Gamma$ ergodic
Graph representation $G(\mE)$ of $\mE$:
Edge between $i$ and $j$ in $G(\mE)$ $\iff$ $\mE_{i,j}>0$
Graphical representation of the Markov chain transitions
Two types of transitions : reversible and irreversible
Irreversible transitions: ergodicity breakdown
Perron-Frobenius: no collision
Topological property: $\mG(w)$ irreducible
$\Gamma$ is asymptotically homogeneous
\[ \mE= \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}, \]
Possible with Markov chain
\[ \mE= \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix}, \]
Not possible with Markov chain
Collision of two eigenvalues near $0$ : \[ \ln λ_r(w) = \ln λ(0) + μ_r w +O(w^2), \]
\[ \ln λ(w) = \ln λ(0) + \begin{cases} \max \{ μ_r \}\, w +O(w^2) & \text{if }w >0 \\ \min \{μ_r\}\, w + O(w^2) & \text{if } w <0 . \end{cases} \]
Jumps from $\min \mu_r$ to $\max \mu_r$
Information on $I([\min \mu_r, \max \mu_r])$ lost
I(s) strictly non-convex?
Or linear branch ?
Law of large number ?
Theorem
If $\Phi(w)$ is holomorph near $0$, the central limit theorem holds
Collision: \[ \ln λ_r(w) = \frac{σ_r^2} {2} w^2 + O(w^3) \]
On real axis: \[ \Phi(w) = \frac {\max \{ σ_r^2 \}} {2} w^2 + O(w^3), \]
On imaginary axis: \[ \Phi( z=\imath w) = -\frac {\min σ_r^2} {2} w^2 + O(w^3) \]
$\Phi(w)$ is not holomorphic
The central limit theorem might not hold.
Large deviation principle for short-range correlation:
Short-range correlation: $\lambda_1(w)$ is differentiable near $0$
\[ I(s) = \sup_{w} \{w s - \ln \lambda_1(w) \} \]
Long-range or constant correlation
$\Phi(w)$ is not differentiable in $0$.
How to compute the full rate function in presence of long-range correlation?
$\Phi(w)$ erases too much information
Explicit computation of $g_n(w)$ in dimension $2$
Inversion with inverse Laplace transform: \[ P(S_n/n=s) = \frac 1 {2 \imath π }∫_{r - \imath ∞}^{r+ \imath ∞} g_n(w) e^{ - n w s} dw, \]
$r$ arbitrary parameter
\[ \mA= \begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix}, \mE= \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix}, \mG(w) = \begin{pmatrix} e^{\Phi_1(w)} & c(w) \\ 0 & e^{\Phi_2(w)} \end{pmatrix}, \]
Generating function
\[ g_n(w) = c(w) \frac{e^{ n \Phi_1(w) } - e^{ n \Phi_2(w) } }{ e^{ \Phi_1(w) } - e^{ \Phi_2(w) } } \]
$\Phi_1(0)=\Phi_2(0)$ : erasable pole in $O$
Ansatz: $\Phi'_1(0)=μ_1 < \Phi'_2(0) = μ_2$
Splitted generating function \[ g_{n,k}(w) = c(w) \frac{ e^{ n \Phi_k(w) } } { e^{ \Phi_1(w) } - e^{ \Phi_2(w) } }. \]
Non-erasable pole : \[ \Res_{g_{n,1}(w) } (0) = \frac{1}{μ_1 - μ_2} = \Res_{g_{n,2}(w)} (0).\]
Contour integral for $g_{n,k}$ crosses the pole when $s<\mu_k$
I(s) = \begin{cases} I_1(s) & s < μ_1,\\ 0 & μ_1 < s < μ_2,\\ I_2(s) & μ_2 < s. \end{cases}
Irreversible transitions:
The chain $\Gamma$ can only stay in its current state or jump to the next
All chains with a non-zero probability and the same starting and ending points are equiprobable
Polynomial correlation: \[ \Esp{X_k X_l } \approx \sum_{r+s+t=d-1} c_{r,s,t} \p{\frac k n}^{r} \p{\frac {l-k} n}^{s} \p{1- \frac l n}^{t} \]
\[ \mA= \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}, \mE= \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}, \mG(w) = \begin{pmatrix} e^{\Phi_1(w)} & 0 \\ 0 & e^{\Phi_2(w)} \end{pmatrix}, \]
Generating function
\[ g_n(w) = c(w) \frac{e^{ n \Phi_1(w) } - e^{ n \Phi_2(w) } }{ 2 } \]
\[ I(s) = \min \{ I_1(s), I_2(s) \} \]
Disconnected components:
The chain $\Gamma$ is trapped inside its starting state
Constant correlation:
$ \Esp{X_k X_l }- \Esp{X_k}\Esp{X_l} = \frac{\Lf\p{\mQ{1}^2} - \Lf\p{\mQ{1}}^2}{\Lf\p{\mE}}$
How to compute $\mG(w)$ ?
Jordan form too fragile
Use topological properties of $G(\mE)$
\[ \mE = \begin{pmatrix} I_1 & * & T_{k,l} \\ & \ddots & * \\ 0 & & I_r \\ \end{pmatrix} \]
Tedious computations
Using Hidden Markov representation
Short-range correlation
Standard limit laws
Long-range correlation
Discrete mixture of continuous mixture of standard limit law.
Three kind of correlation
Exponential short-range correlation
Constant correlation
Polynomial long-range correlation
Large deviation principle
Short-range correlation: Same behavior as Markov chain
Constant correlation: non-convex rate function
Polynomial correlation: Flat rate function
Law of large number
Short-range correlation: standard limit distribution
Constant correlation: discrete mixture of standard limit distribution
Polynomial correlation: continuous mixture of standard limit distribution