$ \newcommand{\Ha}{\mathcal H} \newcommand{\p}[1]{\left(#1\right)} \newcommand{\mI}{\mathbb{I}} \newcommand{\tr}{\mathrm{tr}} \newcommand{\vect}[1]{\underline{#1}} \newcommand{\Res}{\mathrm{Res}} \newcommand{\det}{\mathrm{det}} \newcommand{\mR}{\mathcal{R}} \newcommand{\Lf}{\mathcal{L}} \newcommand{\mE}{\mathcal{E}} \newcommand{\mG}{\mathcal{G}} \newcommand{\mA}{\mathcal{A}} \newcommand{\mP}{\mathcal{P}} \newcommand{\mQ}[1]{\mathcal{Q}\p{#1}} \newcommand{\fLf}[1]{ \frac{\Lf\p{#1} } {\Lf\p{\mE^n}}} \newcommand{\Esp}[1]{\left<#1\right>} \newcommand{\bra}[1]{\left<#1\right|} \newcommand{\ket}[1]{\left|#1\right>} \newcommand{\as}[1]{ \overset{\mathrm{a.s}} {#1} } \newcommand{\conv}[1] {\underset{#1 \rightarrow +\infty} {\rightarrow} } \newcommand{\convAs}{ \as{\conv{n}} } \newcommand{\R}{\mathbb{R}} \newcommand{\Prob}{\mathbb{P}} \newcommand{\Dist}[1]{\overset{\mathcal{D}}{#1}} \newcommand{\convD} {\Dist{ \conv{n} } } \newcommand{\Lc}{L_c} \newcommand{\Lg}{L_{\Gamma}} \newcommand{\lNormal}[2]{\mathcal{N}_{#1,#2}} \newcommand{\iid}{\mathrm{i.i.d.}} \newcommand{\cpath}{\mapsto} \newcommand{\wcon}{\leftrightarrow} \newcommand{\scon}{\leftrightarrows} \newcommand {\card} {\mathrm{card}} $

Large deviation for matrix-correlated random variables


4 November 2014
Florian Angeletti
NITheP
Work in collaboration with Hugo Touchette, Patrice Abry and Eric Bertin.

Equilibrium stationary solution:

  • Microcanonical ensemble: $$ p(x_1, \dots, x_n ) = \mathrm{cst} $$
  • Canonical ensemble: $$ p(x_1, \dots, x_n ) = e^{-\beta \Ha(x_1,\dots, x_n)} $$

Non equilibrium stationary solution?

A simple and iconic out-of-equilibrium systems

Mathematical model

\[ p(x_1,\dots,x_n) \approx \mR(x_1) \cdots \mR(x_n) \]

Study the statistical properties of theses models

  • Hidden Markov model representation
  • Signal processing application
  • Topology induces correlation
  • Large deviation functions
  • Limit distributions for the sums
  • Limit distributions for the extremes

Then go back to physical models

\[ p(x_1,\dots,x_n) = \fLf{ \mR(x_1) \dots \mR(x_n)} \]
  • $ d> 1 $: Commutativity $\implies$ Correlation
  • Product structure: $p(x_1,\dots,x_n) = \fLf{ \mR(x_1) \dots \mR(x_n)}$
  • Moment matrix: $ \mQ{q} = \int x^q R(x) dx $
\[ \Esp{X_k^p} = \frac{\Lf\p{\mE^{k-1} \mQ{p} \mE^{n-k}}}{\Lf\p{\mE^n} } \] \[ \Esp{X_k X_l} = \frac{\Lf\p{\mE^{k-1} \mQ{1} \mE^{l-k-1} \mQ{1} \mE^{n-l}}}{\Lf\p{\mE^n} } \] \[ \Esp{X_k X_l X_m} = \frac{\Lf\p{\mE^{k-1} \mQ{1} \mE^{l-k-1} \mQ{1} \mE^{m-l-1} \mQ{1} \mE^{n-m}}}{\Lf\p{\mE^n} } \] \[ \dots \]
\[ \Esp{X_k X_l} = \fLf{ \mE^{k-1} \mQ{1} \mE^{l-k-1} \mQ{1} \mE^{n-l} } \]
  • The dependency structure of $X$ depends on the behavior of $\mE^n$
  • $\lambda_k$ eigenvalues of $\mE$ ordered by their real parts $\Re (\lambda_1) \Re (\lambda_2) > \dots > \lambda_r$
  • $J_{k,l}$ Jordan block associated with eigeivalue $\lambda_k$
\[ \mE = B^{-1} \begin{pmatrix} J_{1,1} & 0 & \cdots & \cdots & 0 \\ 0 & \ddots & \ddots & & \vdots \\ \vdots & \ddots & J_{k,l} & \ddots& \vdots \\ \vdots & & \ddots & \ddots& 0 \\ 0 & \cdots & \cdots & 0 & J_{r,m} \end{pmatrix} B, \quad J_{k,l} = \begin{pmatrix} \lambda_k& 1 & 0 & \cdots & 0 \\ 0 & \ddots & \ddots & \ddots & \vdots \\ \vdots & \ddots & \ddots & \ddots & 0 \\ \vdots & & \ddots & \ddots & 1 \\ 0 & \cdots & \cdots & 0 & \lambda_k \end{pmatrix} \]

Case 1: Short-range correlation

  • More than one distinct eigenvalue $\lambda_k$: Short-range exponential correlation
\[ \Esp{X_k X_l} - \Esp{X_k}\Esp{X_l} \approx \sum \alpha_k \frac{\lambda_k}{\lambda_1}^{|k-l|} \]

Case 2: Constant correlation

  • More than one block $J_{1,k}$: Constant correlation term

Case 3: Long-range correlation

  • At least one block $J_{1,k}$ with size $p > 1$: Polynomial long range correlation
    • $\Esp{X_k X_l} - \Esp{X_k}\Esp{X_l} \approx P(\frac{k}{n}, \frac{k-l}{n}, \frac{l}{n} ), \quad P \in \R[X,Y,Z]$
\[ p(x_1,\dots,x_n) = \fLf{ \mR(x_1) \dots \mR(x_n)} \] How do we generate a random vector $X$ for a given triple $(\mA,\mE,\mP)$?
  • Expand the matrix product \[ p(x_1,\dots,x_n)= \frac 1 {\Lf\p{\mE^n} } \sum_{\Gamma \in \{1,\dots,d\}^{n+1}} \mA_{\Gamma_1,\Gamma_{n+1}} \mE_{\Gamma_1,\Gamma_2} \mP_{\Gamma_1,\Gamma_2}(x_1) \dots \mE_{\Gamma_n,\Gamma_{n+1}} \mP_{\Gamma_n,\Gamma_{n+1}}(x_n) \] \[ p(x_1,\dots,x_n) = \sum_{\Gamma} \kappa(\Gamma) P(\vect X|\Gamma) \]
  • $\Gamma$, hidden Markov chain

  • Markov chain $\Gamma$
  • Observable $\vect X= X_1, \dots, X_k$
  • $( X_k | \Gamma_k) $ is distributed according to the pdf $ p( X_k | \Gamma_k) $

Hidden Markov Chain $\Gamma$

\[ p( \Gamma_k=i | \Gamma_{k+1}=j, \Gamma_{n}=f ) = \mE_{i,j} \frac{\p{\mE^{n-k}}_{j, f}}{ \p{ \mE^{n-k+1}}_{i,f}} \] \[ p(\Gamma_0=i,\Gamma_n=j) = \frac{A_{i,j} \mE^n_{i,j} }{\Lf\p{\mE^n}} \]

Conditional pdf $(X|\Gamma)$

\[ p( X_k=x | \Gamma_k=i, \Gamma_{k+1}=j) = \mP_{i,j}(x) \]
  • Non-homogeneous markov chain
  • Specific non-homogeneous Hidden Markov model:
    • Hidden Markov Model $\nRightarrow$ Matrix representation

Matrix representation

  • Algebraic properties
  • Statistical properties computation

Hidden Markov Model

  • $2$-layer model:
    correlated layer $+$ independant layer
  • Efficient synthesis

Sum

\[ S(\vect X) = \frac{1} {n} \sum_{i=1}^n X_i \]
  • Law of large number: existence of a concentration point
  • Central limit theorem: fluctuations around this concentration point
  • Large deviation: fluctuations far away of the concentrations points

Large deviation principle

\[ P( S(\vect X)/n = s ) \approx e^{-n I(s) } \]
Do we have a large principle? Two paths:
  • Hidden Markov chain representation
  • Matrix representation
  • Generating function \[ g_n(w) = \Esp{e^{w S_n}} \]
  • Scaled cumulant generating function $\Phi(w)$ \[ \Phi(w) = \lim_{n \rightarrow \infty} \frac{\ln g_n(w)}{n} \]

Gärtner-Ellis Theorem

If $\Phi(w)$ exists and is differentiable, $I(s)$ exists and is \[ I(s) = \sup_{w} \{w s - \Phi(w) \} \]
\[ g_n(w) = \fLf{\mG(w)^n} \]
  • matrix generating function: \[ \mG(w) = \int_\R \mR(x) e^{-w x} dx \]
  • scgf: \[ \Phi(w) = \ln \lambda_1(w) \]
  • $\lambda_1(w)$ dominant eigenvalue of $\mG(w)$
  • \[ \Phi(w)= \lim_{n→+\infty} \frac{1}{n}\ln \tr \rho \Pi(w)^{n-1}, \]
    • $\pi(w)$ tilted matrix: \[ \pi(w)_{i,j}=\pi_{i,j}e^{w j} \]
    • $\pi$ : transition matrix of the Markov chain
    • $\rho$ initial distribution

    Differences

    • $\pi(0)$ is a stochastic matrix
    • $\mG(0)=\mE$ is a positive matrix
    • Behavior of $I(s)$ around the concentration point?
    $\lambda_k(w)$ are roots of \[\det(\mG(w) - X \mI ) = 0 \]
    • Inverse function theorem
    • Outside of eigenvalue collision, $\lambda_1(w)$ is as smooth as $\mG(w)$
    • $\mE$ irreducible $\iff G(\mE) $ is strongly connected $ \iff \Gamma$ ergodic
    Graph representation $G(\mE)$ of $\mE$:
    • Edge between $i$ and $j$ in $G(\mE)$ $\iff$ $\mE_{i,j}>0$
    • Graphical representation of the Markov chain transitions
    • Two types of transitions : reversible and irreversible
    • Irreversible transitions: ergodicity breakdown
    • Perron-Frobenius: no collision
    • Topological property: $\mG(w)$ irreducible
    • $\Gamma$ is asymptotically homogeneous
    \[ \mE= \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}, \]
    Possible with Markov chain
    \[ \mE= \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix}, \]
    Not possible with Markov chain
    • Collision of two eigenvalues near $0$ : \[ \ln λ_r(w) = \ln λ(0) + μ_r w +O(w^2), \]
    • \[ \ln λ(w) = \ln λ(0) + \begin{cases} \max \{ μ_r \}\, w +O(w^2) & \text{if }w >0 \\ \min \{μ_r\}\, w + O(w^2) & \text{if } w <0 . \end{cases} \]
    • Jumps from $\min \mu_r$ to $\max \mu_r$
    • Information on $I([\min \mu_r, \max \mu_r])$ lost
      • I(s) strictly non-convex?
      • Or linear branch ?
    • Law of large number ?

    Theorem

    If $\Phi(w)$ is holomorph near $0$, the central limit theorem holds
    Collision: \[ \ln λ_r(w) = \frac{σ_r^2} {2} w^2 + O(w^3) \]
    • On real axis: \[ \Phi(w) = \frac {\max \{ σ_r^2 \}} {2} w^2 + O(w^3), \]
    • On imaginary axis: \[ \Phi( z=\imath w) = -\frac {\min σ_r^2} {2} w^2 + O(w^3) \]

    $\Phi(w)$ is not holomorphic

    The central limit theorem might not hold.

    Large deviation principle for short-range correlation:

    • Short-range correlation: $\lambda_1(w)$ is differentiable near $0$
    \[ I(s) = \sup_{w} \{w s - \ln \lambda_1(w) \} \]

    Long-range or constant correlation

    $\Phi(w)$ is not differentiable in $0$.
    How to compute the full rate function in presence of long-range correlation?
    • $\Phi(w)$ erases too much information
    • Explicit computation of $g_n(w)$ in dimension $2$
    • Inversion with inverse Laplace transform: \[ P(S_n/n=s) = \frac 1 {2 \imath π }∫_{r - \imath ∞}^{r+ \imath ∞} g_n(w) e^{ - n w s} dw, \]
    • $r$ arbitrary parameter
    \[ \mA= \begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix}, \mE= \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix}, \mG(w) = \begin{pmatrix} e^{\Phi_1(w)} & c(w) \\ 0 & e^{\Phi_2(w)} \end{pmatrix}, \]

    Generating function

    \[ g_n(w) = c(w) \frac{e^{ n \Phi_1(w) } - e^{ n \Phi_2(w) } }{ e^{ \Phi_1(w) } - e^{ \Phi_2(w) } } \]
    • $\Phi_1(0)=\Phi_2(0)$ : erasable pole in $O$
    • Ansatz: $\Phi'_1(0)=μ_1 < \Phi'_2(0) = μ_2$
    Splitted generating function \[ g_{n,k}(w) = c(w) \frac{ e^{ n \Phi_k(w) } } { e^{ \Phi_1(w) } - e^{ \Phi_2(w) } }. \]
    • Non-erasable pole : \[ \Res_{g_{n,1}(w) } (0) = \frac{1}{μ_1 - μ_2} = \Res_{g_{n,2}(w)} (0).\]
    • Contour integral for $g_{n,k}$ crosses the pole when $s<\mu_k$
    I(s) = \begin{cases} I_1(s) & s < μ_1,\\ 0 & μ_1 < s < μ_2,\\ I_2(s) & μ_2 < s. \end{cases}
    • Irreversible transitions:
    • The chain $\Gamma$ can only stay in its current state or jump to the next
    • All chains with a non-zero probability and the same starting and ending points are equiprobable
    • Polynomial correlation: \[ \Esp{X_k X_l } \approx \sum_{r+s+t=d-1} c_{r,s,t} \p{\frac k n}^{r} \p{\frac {l-k} n}^{s} \p{1- \frac l n}^{t} \]
    \[ \mA= \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}, \mE= \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}, \mG(w) = \begin{pmatrix} e^{\Phi_1(w)} & 0 \\ 0 & e^{\Phi_2(w)} \end{pmatrix}, \]

    Generating function

    \[ g_n(w) = c(w) \frac{e^{ n \Phi_1(w) } - e^{ n \Phi_2(w) } }{ 2 } \]
    \[ I(s) = \min \{ I_1(s), I_2(s) \} \]
    • Disconnected components:
    • The chain $\Gamma$ is trapped inside its starting state
    • Constant correlation:
      • $ \Esp{X_k X_l }- \Esp{X_k}\Esp{X_l} = \frac{\Lf\p{\mQ{1}^2} - \Lf\p{\mQ{1}}^2}{\Lf\p{\mE}}$
    • How to compute $\mG(w)$ ?
    • Jordan form too fragile
    • Use topological properties of $G(\mE)$
    \[ \mE = \begin{pmatrix} I_1 & * & T_{k,l} \\ & \ddots & * \\ 0 & & I_r \\ \end{pmatrix} \]
    Tedious computations
    • Using Hidden Markov representation
    Short-range correlation

    Standard limit laws
    Long-range correlation

    Discrete mixture of continuous mixture of standard limit law.

    Three kind of correlation

    • Exponential short-range correlation
    • Constant correlation
    • Polynomial long-range correlation

    Large deviation principle

    • Short-range correlation: Same behavior as Markov chain
    • Constant correlation: non-convex rate function
    • Polynomial correlation: Flat rate function

    Law of large number

    • Short-range correlation: standard limit distribution
    • Constant correlation: discrete mixture of standard limit distribution
    • Polynomial correlation: continuous mixture of standard limit distribution