$\newcommand{\Ha}{\mathcal H} \newcommand{\p}[1]{\left(#1\right)} \newcommand{\tr}{\mathrm{tr}} \newcommand{\vect}[1]{\underline{#1}} \newcommand{\mR}{\mathcal{R}} \newcommand{\Lf}{\mathcal{L}} \newcommand{\mE}{\mathcal{E}} \newcommand{\mG}{\mathcal{G}} \newcommand{\mA}{\mathcal{A}} \newcommand{\mP}{\mathcal{P}} \newcommand{\mQ}[1]{\mathcal{Q}\p{#1}} \newcommand{\fLf}[1]{ \frac{\Lf\p{#1} } {\Lf\p{\mE^n}}} \newcommand{\Esp}[1]{\left<#1\right>} \newcommand{\bra}[1]{\left<#1\right|} \newcommand{\ket}[1]{\left|#1\right>} \newcommand{\as}[1]{ \overset{\mathrm{a.s}} {#1} } \newcommand{\conv}[1] {\underset{#1 \rightarrow +\infty} {\rightarrow} } \newcommand{\convAs}{ \as{\conv{n}} } \newcommand{\R}{\mathbb{R}} \newcommand{\Prob}{\mathbb{P}} \newcommand{\Dist}[1]{\overset{\mathcal{D}}{#1}} \newcommand{\convD} {\Dist{ \conv{n} } } \newcommand{\Lc}{L_c} \newcommand{\Lg}{L_{\Gamma}} \newcommand{\lNormal}[2]{\mathcal{N}_{#1,#2}} \newcommand{\iid}{\mathrm{i.i.d.}} \newcommand{\cpath}{\mapsto} \newcommand{\wcon}{\leftrightarrow} \newcommand{\scon}{\leftrightarrows} \newcommand {\card} {\mathrm{card}}$

# Matrix-correlated random variables: A statistical physics and signal processing duet

6 January 2015
Florian Angeletti
NITheP
Work in collaboration with Hugo Touchette, Patrice Abry and Eric Bertin.
• ## Thesis:

"Sums and extremes in statistical physics and signal processing"
Advisors: Eric Bertin and Patrice Abry, Physics laboratory of ENS Lyon.
• ## Postdoc

NITheP, Stellenbosch, South Africa, working with Hugo Touchette on Large deviation theory
• ## Themes:

• Application of statistical physics to signal processing
• Extreme statistics
• Random vectors with matrix representation
• Large deviation functions

# At equilibrium:

• Microcanonical ensemble: $p(x_1, \dots, x_n ) = \mathrm{cst}$
• Canonical ensemble: $p(x_1, \dots, x_n ) = e^{-\beta \Ha(x_1,\dots, x_n)}$

# Out-of-equilibrium:

• Constant flow of heat or particles
• Dynamic description
• Correlation
• Stationary distribution?

# A simple and iconic out-of-equilibrium systems

• Asymmetric: Particles only move from left to right
• Exclusion : One particle by site
• Creation rate $\alpha$
• Destruction rate $\beta$
How do we describe the stationary solution ?
• Matrix product ansatz (Derrida and al 1993)
• $p(x_1,\dots,x_n)= \frac{\bra{W}R(x_1)\dots R(x_n)\ket{V}}{\bra{W}(R(0)+R(1))^n\ket{V}}$
• matrix $R(x)$
• Long range correlation
• Similar solution for 1D diffusion-reaction system
• Formal similarity with DMRG

# Mathematical model

$p(x_1,\dots,x_n) \approx \mR(x_1) \cdots \mR(x_n)$

## Study the statistical properties of theses models

• Hidden Markov model representation
• Signal processing application
• Topology induces correlation
• Large deviation functions
• Limit distributions for the sums
• Limit distributions for the extremes

## Then go back to physical models

$p(x_1,\dots,x_n) = \fLf{ \mR(x_1) \dots \mR(x_n)}$
• linear form $\Lf$: $\Lf(M) = \tr(\mA^T M)$
• $\mA$: $d \times d$ positive matrix
• $\mR(x)$: $d \times d$ positive matrix function
• structure matrix $\mE = \int_\R \mR(x) dx$
• probability density function matrix $\mR_{i,j}(x) = \mE_{i,j} \mP_{i,j} (x)$
• $d> 1$: Non-commutativity $\implies$ Correlation
• Product structure: $p(x_1,\dots,x_n) = \fLf{ \mR(x_1) \dots \mR(x_n)}$
• Moment matrix: $\mQ{q} = \int x^q R(x) dx$
$\Esp{X_k^p} = \frac{\Lf\p{\mE^{k-1} \mQ{p} \mE^{n-k}}}{\Lf\p{\mE^n} }$ $\Esp{X_k X_l} = \frac{\Lf\p{\mE^{k-1} \mQ{1} \mE^{l-k-1} \mQ{1} \mE^{n-l}}}{\Lf\p{\mE^n} }$ $\Esp{X_k X_l X_m} = \frac{\Lf\p{\mE^{k-1} \mQ{1} \mE^{l-k-1} \mQ{1} \mE^{m-l-1} \mQ{1} \mE^{n-m}}}{\Lf\p{\mE^n} }$ $\dots$
Translation invariance: $p(X_{k_1}=x_1,\dots, X_{k_l}= x_l ) = p(X_{c+k_1}=x_1,\dots, X_{c+k_l}= x_l )$

# Sufficient condition

$[\mA^T,\mE] = \mA^T \mE - \mE \mA^T = 0$ $\forall M,\quad \Lf(M\mE) = \Lf(\mE M)$
• $p(X_k=x) = \fLf{\mR(x)\mE^{n-1}}$
• $p(X_k=x, X_l = y) = \fLf{\mR(x)\mE^{l-k-1}\mR(y)\mE^{n-|l-k|-1}}$
• $p(X_k=x,X_l=y, X_m = z ) = \fLf{\mR(x) \mE^{l-k-1}\mR(y)\mE^{m-l-1} \mR(z) \mE^{n-|m-k|-1}}$
$p(x_1,\dots,x_n) = \fLf{ \mR(x_1) \dots \mR(x_n)}$ How do we generate a random vector $X$ for a given triple $(\mA,\mE,\mP)$?
• Expand the matrix product $p(x_1,\dots,x_n)= \frac 1 {\Lf\p{\mE^n} } \sum_{\Gamma \in \{1,\dots,d\}^{n+1}} \mA_{\Gamma_1,\Gamma_{n+1}} \mE_{\Gamma_1,\Gamma_2} \mP_{\Gamma_1,\Gamma_2}(x_1) \dots \mE_{\Gamma_n,\Gamma_{n+1}} \mP_{\Gamma_n,\Gamma_{n+1}}(x_n)$ $p(x_1,\dots,x_n) = \sum_{\Gamma} P(\Gamma) P(\vect X|\Gamma)$
• $\Gamma$, hidden Markov chain

• Markov chain $\Gamma$
• Observable $\vect X= X_1, \dots, X_k$
• $( X_k | \Gamma_k)$ is distributed according to the pdf $p( X_k | \Gamma_k)$

# Hidden Markov Chain

$p( \Gamma ) = \frac{\mA_{\Gamma_1,\Gamma_{n+1}}}{\Lf\p{\mE^n}} \prod_k \mE_{\Gamma_k, \Gamma_{k+1} }$

# Conditional pdf $(X|\Gamma)$

$p( X_k=x | \Gamma) = \mP_{\Gamma_k,\Gamma_{k+1} }(x)$
• $\mE$ non-stochastic $\implies$ Non-homogeneous markov chain
• Specific non-homogeneous Hidden Markov model:
• Hidden Markov Model $\nRightarrow$ Matrix representation
• Generation of random vector with prescribed correlation and marginal distribution:
• Matrix representation: Choice of $(\mA,\mE,\mP)$
• Hidden Markov Model: Numerical generation
• Higher-order dependency structure: correlation of squares
 Realization Marginal Correlation Square corr. Prescribed

# Matrix representation

• Algebraic properties
• Statistical properties computation

# Hidden Markov Model

• $2$-layer model: correlated layer $+$ independant layer
• Efficient synthesis
$\Esp{X_k X_l} = \fLf{ \mE^{k-1} \mQ{1} \mE^{l-k-1} \mQ{1} \mE^{n-l} }$
• The dependency structure of $X$ depends on the behavior of $\mE^n$
• $\lambda_m$ eigenvalues of $\mE$ ordered by their real parts $\Re (\lambda_1) \Re (\lambda_2) > \dots > \lambda_r$
• $J_{m,s}$ Jordan block associated with eigeivalue $\lambda_m$
$\mE = B^{-1} \begin{pmatrix} J_{1,1} & 0 & \cdots & \cdots & 0 \\ 0 & \ddots & \ddots & & \vdots \\ \vdots & \ddots & J_{m,s} & \ddots& \vdots \\ \vdots & & \ddots & \ddots& 0 \\ 0 & \cdots & \cdots & 0 & J_{r,p} \en{pmatrix} B, \quad J_{m,s} = \begin{pmatrix} \lambda_m& 1 & 0 & \cdots & 0 \\ 0 & \ddots & \ddots & \ddots & \vdots \\ \vdots & \ddots & \ddots & \ddots & 0 \\ \vdots & & \ddots & \ddots & 1 \\ 0 & \cdots & \cdots & 0 & \lambda_m \end{pmatrix}$

# Case 1: Short-range correlation

• More than one distinct eigenvalue $\lambda_m$: Short-range exponential correlation
$\Esp{X_k X_l} - \Esp{X_k}\Esp{X_l} \approx \sum \alpha_m \frac{\lambda_m}{\lambda_1}^{|k-l|}$

# Case 2: Constant correlation

• More than one block $J_{1,k}$: Constant correlation term

# Case 3: Long-range correlation

• At least one block $J_{1,k}$ with size $p > 1$: Polynomial long range correlation
• $\Esp{X_k X_l} - \Esp{X_k}\Esp{X_l} \approx P(\frac{k}{n}, \frac{k-l}{n}, \frac{l}{n} ), \quad P \in \R[X,Y,Z]$
• $\mE$ irreducible $\iff \Gamma$ ergodic Irreducible matrix $\mE \iff G(\mE)$ is strongly connected
• Short-range correlation
• Disconnected components:
$\mE = \begin{pmatrix} 1 & & 0 \\ & \ddots & \\ 0& & 1 \end{pmatrix}$
• The chain $\Gamma$ is trapped inside its starting state
• Constant correlation:
• $\Esp{X_k X_l }- \Esp{X_k}\Esp{X_l} = \frac{\Lf\p{\mQ{1}^2} - \Lf\p{\mQ{1}}^2}{\Lf\p{\mE}}$
• Irreversible transitions:
$\mE = \begin{pmatrix} 1 & \epsilon & 0 \\ & \ddots & \ddots & \\ 0 & & 1 \\ \end{pmatrix}$
• The chain $\Gamma$ can only stay in its current state or jump to the next
• All chains with a non-zero probability and the same starting and ending points are equiprobable
• Polynomial correlation: $\Esp{X_k X_l } \approx \sum_{r+s+t=d-1} c_{r,s,t} \p{\frac k n}^{r} \p{\frac {l-k} n}^{s} \p{1- \frac l n}^{t}$
$\mE = \begin{pmatrix} I_1 & * & T_{k,l} \\ & \ddots & * \\ 0 & & I_r \\ \end{pmatrix}$
• Irreducible blocks $I_k$
• Irreversibles transitions $T_{k,l}$
• Correlation: Mixture of short-range, constant and long-range correlations
• Short-range correlation $\implies$ Strongly connected component of size $s > 1$
• More than one weakly connected component $\implies$ Constant correlation
• Polynomial correlation $\implies$ More than one strongly connected component

# Sum

$S(\vect X) = \frac{1} {n} \sum_{i=1}^n X_i$
Correlated random variables
• Law of large numbers?
• Central limit theorem?
• Large deviations?
Two paths:
• Hidden Markov chain representation
• Matrix representation
• $\nu_{i,j}$ fraction of $(i\rightarrow j)$-transition: $\vect{\nu} = \p{ \frac{ \card\{ k / \Gamma_k= i, \Gamma_{k+1}= j\}}{n} }_{i,j}$
• $S(\vect X|\Gamma)$: sum of sums of $\iid$ random variables: $S(\vect X|\Gamma) = \sum_{i,j} \sum_{k=1}^{n \nu_{i,j}} (X_k | i, j ) \equiv S(\vect X | \vect \nu)$
• Standard convergence theorem (law of large numbers or central limit theorem )
$p( S(\vect X) = s ) = \sum_{\vect \nu} p(\vect \nu) p( S(\vect X | \vect \nu ) = s )$

# L. distribution for $S(\vect X|\vect \nu)$

$p( S(\vect X | \vect \nu ) = s )$
+

# L. distribution for $\vect \nu$

$p( \vect \nu )$
$\Downarrow$

# Limit distribution for $S(\vect X)$

$p( S(\vect X) = s )$
• How $\vect \nu$ is distributed?
• Difficulty: $\Gamma$ non-homogeneous Markov chain.
Three important subclasses:
• Irreducible $\mE$ (short-range correlation)
• $\nu$ converges towards a dirac distribution
• Identity $\mE$ (constant correlation)
• $\nu$ converges towards a discrete mixture of dirac distributions
• Linear irreversible $\mE$ (long-range polynomial correlation)
• $\nu$ converges towards a uniform distribution on a $d$-simplex
• Irreducible $\mE$ (short-range correlations):
• Standard limit laws
 + $\implies$
• Identity $\mE$ (constant correlation):
• Discrete mixture of standard limit laws:  + $\implies$
• Linear irreversible $\mE$ (long-range correlation):
• Continuous mixture of limit laws
 + $\implies$
• Combinations of three core behaviors
• Irreducible blocks: Fast convergence to the stationary state : dirac distribution
• Separated componentes : discrete mixture
• Irreversible transitions: continuous mixture
• Limit laws :
• Discrete mixture of continuous mixture of standard limits distributions

# Sum

$S(\vect X) = \frac{1} {n} \sum_{i=1}^n X_i$
• Law of large number: existence of a concentration point
• Central limit theorem: fluctuations around this concentration point
• Large deviation: fluctuations far away of the concentrations points

# Large deviation principle

$P( S(\vect X)/n = s ) \approx e^{-n I(s) }$
Do we have a large principle?
• Generating function $g_n(w) = \Esp{e^{w S_n}}$
• Scaled cumulant generating function $\Phi(w)$ $\Phi(w) = \lim_{n \rightarrow \infty} \frac{\ln g_n(w)}{n}$

# Gärtner-Ellis Theorem

If $\Phi(w)$ exists and is differentiable, $I(s)$ exists and is $I(s) = \sup_{w} \{w s - \Phi(w) \}$
• $\Phi(w) = g(w)$
• $g(w)$ cumulant generating function

# Large deviation principle:

$I(s) = \sup_{w} \{w s - g(w) \}$
• matrix generating function: $\mG(w) = \int_\R \mR(x) e^{-w x} dx$
• $\Phi(w) = \ln \lambda_1(w)$
• $\lambda_1(w)$ dominant eigenvalue of $\mG(w)$

# Large deviation principle for short-range correlation:

• Short-range correlation: $\lambda_1(w)$ is differentiable near $0$
$I(s) = \sup_{w} \{w s - \ln \lambda_1(w) \}$

# Long-range or constant correlation

$\Phi(w)$ is not differentiable in $0$.
• Constant correlation: Non-convex rate function
• Polynomial correlation: Rate function with a flat branch
• Three kind of correlation:
• Exponential short-range correlation
• Constant correlation
• Polynomial long-range correlation
• Extension of the law of large numbers and the central limit theorems:
• Long-range correlation : Continuous and discrete mixture of standard limit laws
• Large deviation principle:
• Long-range correlation: Non-strictly convex rate function

# Perspective

• Extreme statistics
• Physical model
• Infinite dimension, higher tensor order