$ \newcommand{\Ha}{\mathcal H} \newcommand{\p}[1]{\left(#1\right)} \newcommand{\tr}{\mathrm{tr}} \newcommand{\vect}[1]{\underline{#1}} \newcommand{\mR}{\mathcal{R}} \newcommand{\Lf}{\mathcal{L}} \newcommand{\mE}{\mathcal{E}} \newcommand{\mG}{\mathcal{G}} \newcommand{\mA}{\mathcal{A}} \newcommand{\mP}{\mathcal{P}} \newcommand{\mQ}[1]{\mathcal{Q}\p{#1}} \newcommand{\fLf}[1]{ \frac{\Lf\p{#1} } {\Lf\p{\mE^n}}} \newcommand{\Esp}[1]{\left<#1\right>} \newcommand{\bra}[1]{\left<#1\right|} \newcommand{\ket}[1]{\left|#1\right>} \newcommand{\as}[1]{ \overset{\mathrm{a.s}} {#1} } \newcommand{\conv}[1] {\underset{#1 \rightarrow +\infty} {\rightarrow} } \newcommand{\convAs}{ \as{\conv{n}} } \newcommand{\R}{\mathbb{R}} \newcommand{\Prob}{\mathbb{P}} \newcommand{\Dist}[1]{\overset{\mathcal{D}}{#1}} \newcommand{\convD} {\Dist{ \conv{n} } } \newcommand{\Lc}{L_c} \newcommand{\Lg}{L_{\Gamma}} \newcommand{\lNormal}[2]{\mathcal{N}_{#1,#2}} \newcommand{\iid}{\mathrm{i.i.d.}} \newcommand{\cpath}{\mapsto} \newcommand{\wcon}{\leftrightarrow} \newcommand{\scon}{\leftrightarrows} \newcommand {\card} {\mathrm{card}} $

Matrix-correlated random variables:
A statistical physics and signal processing duet


6 January 2015
Florian Angeletti
NITheP
Work in collaboration with Hugo Touchette, Patrice Abry and Eric Bertin.
  • Thesis:

    "Sums and extremes in statistical physics and signal processing"
    Advisors: Eric Bertin and Patrice Abry, Physics laboratory of ENS Lyon.
  • Postdoc

    NITheP, Stellenbosch, South Africa, working with Hugo Touchette on Large deviation theory
  • Themes:

    • Application of statistical physics to signal processing
    • Extreme statistics
    • Random vectors with matrix representation
    • Large deviation functions

At equilibrium:

  • Microcanonical ensemble: $ p(x_1, \dots, x_n ) = \mathrm{cst} $
  • Canonical ensemble: $ p(x_1, \dots, x_n ) = e^{-\beta \Ha(x_1,\dots, x_n)} $

Out-of-equilibrium:

  • Constant flow of heat or particles
  • Dynamic description
  • Correlation
  • Stationary distribution?

A simple and iconic out-of-equilibrium systems

  • Asymmetric: Particles only move from left to right
  • Exclusion : One particle by site
  • Creation rate $ \alpha $
  • Destruction rate $ \beta $
How do we describe the stationary solution ?
  • Matrix product ansatz (Derrida and al 1993)
  • \[ p(x_1,\dots,x_n)= \frac{\bra{W}R(x_1)\dots R(x_n)\ket{V}}{\bra{W}(R(0)+R(1))^n\ket{V}} \]
  • matrix $ R(x) $
  • Long range correlation
  • Similar solution for 1D diffusion-reaction system
  • Formal similarity with DMRG

Mathematical model

\[ p(x_1,\dots,x_n) \approx \mR(x_1) \cdots \mR(x_n) \]

Study the statistical properties of theses models

  • Hidden Markov model representation
  • Signal processing application
  • Topology induces correlation
  • Large deviation functions
  • Limit distributions for the sums
  • Limit distributions for the extremes

Then go back to physical models

\[ p(x_1,\dots,x_n) = \fLf{ \mR(x_1) \dots \mR(x_n)} \]
  • linear form $\Lf$: $ \Lf(M) = \tr(\mA^T M) $
    • $ \mA $: $ d \times d $ positive matrix
  • $ \mR(x) $: $ d \times d$ positive matrix function
    • structure matrix \[ \mE = \int_\R \mR(x) dx \]
    • probability density function matrix \[ \mR_{i,j}(x) = \mE_{i,j} \mP_{i,j} (x) \]
  • $ d> 1 $: Non-commutativity $\implies$ Correlation
  • Product structure: $p(x_1,\dots,x_n) = \fLf{ \mR(x_1) \dots \mR(x_n)}$
  • Moment matrix: $ \mQ{q} = \int x^q R(x) dx $
\[ \Esp{X_k^p} = \frac{\Lf\p{\mE^{k-1} \mQ{p} \mE^{n-k}}}{\Lf\p{\mE^n} } \] \[ \Esp{X_k X_l} = \frac{\Lf\p{\mE^{k-1} \mQ{1} \mE^{l-k-1} \mQ{1} \mE^{n-l}}}{\Lf\p{\mE^n} } \] \[ \Esp{X_k X_l X_m} = \frac{\Lf\p{\mE^{k-1} \mQ{1} \mE^{l-k-1} \mQ{1} \mE^{m-l-1} \mQ{1} \mE^{n-m}}}{\Lf\p{\mE^n} } \] \[ \dots \]
Translation invariance: \[ p(X_{k_1}=x_1,\dots, X_{k_l}= x_l ) = p(X_{c+k_1}=x_1,\dots, X_{c+k_l}= x_l ) \]

Sufficient condition

\[ [\mA^T,\mE] = \mA^T \mE - \mE \mA^T = 0 \] \[ \forall M,\quad \Lf(M\mE) = \Lf(\mE M) \]
  • $ p(X_k=x) = \fLf{\mR(x)\mE^{n-1}}$
  • $ p(X_k=x, X_l = y) = \fLf{\mR(x)\mE^{l-k-1}\mR(y)\mE^{n-|l-k|-1}} $
  • $ p(X_k=x,X_l=y, X_m = z ) = \fLf{\mR(x) \mE^{l-k-1}\mR(y)\mE^{m-l-1} \mR(z) \mE^{n-|m-k|-1}}$
\[ p(x_1,\dots,x_n) = \fLf{ \mR(x_1) \dots \mR(x_n)} \] How do we generate a random vector $X$ for a given triple $(\mA,\mE,\mP)$?
  • Expand the matrix product \[ p(x_1,\dots,x_n)= \frac 1 {\Lf\p{\mE^n} } \sum_{\Gamma \in \{1,\dots,d\}^{n+1}} \mA_{\Gamma_1,\Gamma_{n+1}} \mE_{\Gamma_1,\Gamma_2} \mP_{\Gamma_1,\Gamma_2}(x_1) \dots \mE_{\Gamma_n,\Gamma_{n+1}} \mP_{\Gamma_n,\Gamma_{n+1}}(x_n) \] \[ p(x_1,\dots,x_n) = \sum_{\Gamma} P(\Gamma) P(\vect X|\Gamma) \]
  • $\Gamma$, hidden Markov chain

  • Markov chain $\Gamma$
  • Observable $\vect X= X_1, \dots, X_k$
  • $( X_k | \Gamma_k) $ is distributed according to the pdf $ p( X_k | \Gamma_k) $

Hidden Markov Chain

\[ p( \Gamma ) = \frac{\mA_{\Gamma_1,\Gamma_{n+1}}}{\Lf\p{\mE^n}} \prod_k \mE_{\Gamma_k, \Gamma_{k+1} } \]

Conditional pdf $(X|\Gamma)$

\[ p( X_k=x | \Gamma) = \mP_{\Gamma_k,\Gamma_{k+1} }(x) \]
  • $\mE$ non-stochastic $\implies$ Non-homogeneous markov chain
  • Specific non-homogeneous Hidden Markov model:
    • Hidden Markov Model $\nRightarrow$ Matrix representation
  • Generation of random vector with prescribed correlation and marginal distribution:
    • Matrix representation: Choice of $(\mA,\mE,\mP)$
    • Hidden Markov Model: Numerical generation
  • Higher-order dependency structure: correlation of squares
Realization Marginal Correlation Square corr.
Prescribed

Matrix representation

  • Algebraic properties
  • Statistical properties computation

Hidden Markov Model

  • $2$-layer model: correlated layer $+$ independant layer
  • Efficient synthesis
\[ \Esp{X_k X_l} = \fLf{ \mE^{k-1} \mQ{1} \mE^{l-k-1} \mQ{1} \mE^{n-l} } \]
  • The dependency structure of $X$ depends on the behavior of $\mE^n$
  • $\lambda_m$ eigenvalues of $\mE$ ordered by their real parts $\Re (\lambda_1) \Re (\lambda_2) > \dots > \lambda_r$
  • $J_{m,s}$ Jordan block associated with eigeivalue $\lambda_m$
\[ \mE = B^{-1} \begin{pmatrix} J_{1,1} & 0 & \cdots & \cdots & 0 \\ 0 & \ddots & \ddots & & \vdots \\ \vdots & \ddots & J_{m,s} & \ddots& \vdots \\ \vdots & & \ddots & \ddots& 0 \\ 0 & \cdots & \cdots & 0 & J_{r,p} \en{pmatrix} B, \quad J_{m,s} = \begin{pmatrix} \lambda_m& 1 & 0 & \cdots & 0 \\ 0 & \ddots & \ddots & \ddots & \vdots \\ \vdots & \ddots & \ddots & \ddots & 0 \\ \vdots & & \ddots & \ddots & 1 \\ 0 & \cdots & \cdots & 0 & \lambda_m \end{pmatrix} \]

Case 1: Short-range correlation

  • More than one distinct eigenvalue $\lambda_m$: Short-range exponential correlation
\[ \Esp{X_k X_l} - \Esp{X_k}\Esp{X_l} \approx \sum \alpha_m \frac{\lambda_m}{\lambda_1}^{|k-l|} \]

Case 2: Constant correlation

  • More than one block $J_{1,k}$: Constant correlation term

Case 3: Long-range correlation

  • At least one block $J_{1,k}$ with size $p > 1$: Polynomial long range correlation
    • $\Esp{X_k X_l} - \Esp{X_k}\Esp{X_l} \approx P(\frac{k}{n}, \frac{k-l}{n}, \frac{l}{n} ), \quad P \in \R[X,Y,Z]$
  • $\mE$ irreducible $\iff \Gamma$ ergodic Irreducible matrix $\mE \iff G(\mE) $ is strongly connected
  • Short-range correlation
  • Disconnected components:
    \[ \mE = \begin{pmatrix} 1 & & 0 \\ & \ddots & \\ 0& & 1 \end{pmatrix} \]
  • The chain $\Gamma$ is trapped inside its starting state
  • Constant correlation:
    • $ \Esp{X_k X_l }- \Esp{X_k}\Esp{X_l} = \frac{\Lf\p{\mQ{1}^2} - \Lf\p{\mQ{1}}^2}{\Lf\p{\mE}}$
  • Irreversible transitions:
    \[ \mE = \begin{pmatrix} 1 & \epsilon & 0 \\ & \ddots & \ddots & \\ 0 & & 1 \\ \end{pmatrix} \]
  • The chain $\Gamma$ can only stay in its current state or jump to the next
  • All chains with a non-zero probability and the same starting and ending points are equiprobable
  • Polynomial correlation: \[ \Esp{X_k X_l } \approx \sum_{r+s+t=d-1} c_{r,s,t} \p{\frac k n}^{r} \p{\frac {l-k} n}^{s} \p{1- \frac l n}^{t} \]
\[ \mE = \begin{pmatrix} I_1 & * & T_{k,l} \\ & \ddots & * \\ 0 & & I_r \\ \end{pmatrix} \]
  • Irreducible blocks $I_k$
  • Irreversibles transitions $T_{k,l}$
  • Correlation: Mixture of short-range, constant and long-range correlations
  • Short-range correlation $\implies$ Strongly connected component of size $s > 1$
  • More than one weakly connected component $\implies$ Constant correlation
  • Polynomial correlation $\implies$ More than one strongly connected component

Necessary but non sufficient conditions

Sum

\[ S(\vect X) = \frac{1} {n} \sum_{i=1}^n X_i \]
Correlated random variables
  • Law of large numbers?
  • Central limit theorem?
  • Large deviations?
Two paths:
  • Hidden Markov chain representation
  • Matrix representation
  • $\nu_{i,j}$ fraction of $(i\rightarrow j)$-transition: \[\vect{\nu} = \p{ \frac{ \card\{ k / \Gamma_k= i, \Gamma_{k+1}= j\}}{n} }_{i,j} \]
  • $S(\vect X|\Gamma)$: sum of sums of $\iid$ random variables: \[ S(\vect X|\Gamma) = \sum_{i,j} \sum_{k=1}^{n \nu_{i,j}} (X_k | i, j ) \equiv S(\vect X | \vect \nu) \]
  • Standard convergence theorem (law of large numbers or central limit theorem )
\[ p( S(\vect X) = s ) = \sum_{\vect \nu} p(\vect \nu) p( S(\vect X | \vect \nu ) = s ) \]

L. distribution for $S(\vect X|\vect \nu)$

\[ p( S(\vect X | \vect \nu ) = s ) \]
+

L. distribution for $\vect \nu$

\[ p( \vect \nu ) \]
\[ \Downarrow \]

Limit distribution for $S(\vect X)$

\[ p( S(\vect X) = s ) \]
  • How $\vect \nu$ is distributed?
  • Difficulty: $\Gamma$ non-homogeneous Markov chain.
Three important subclasses:
  • Irreducible $\mE$ (short-range correlation)
    • $\nu$ converges towards a dirac distribution
  • Identity $\mE$ (constant correlation)
    • $\nu$ converges towards a discrete mixture of dirac distributions
  • Linear irreversible $\mE$ (long-range polynomial correlation)
    • $\nu$ converges towards a uniform distribution on a $d$-simplex
  • Irreducible $\mE$ (short-range correlations):
    • Standard limit laws
      + $\implies$
  • Identity $\mE$ (constant correlation):
    • Discrete mixture of standard limit laws:
      + $\implies$
  • Linear irreversible $\mE$ (long-range correlation):
    • Continuous mixture of limit laws
      + $\implies$
  • Combinations of three core behaviors
    • Irreducible blocks: Fast convergence to the stationary state : dirac distribution
    • Separated componentes : discrete mixture
    • Irreversible transitions: continuous mixture
  • Limit laws :
    • Discrete mixture of continuous mixture of standard limits distributions

Sum

\[ S(\vect X) = \frac{1} {n} \sum_{i=1}^n X_i \]
  • Law of large number: existence of a concentration point
  • Central limit theorem: fluctuations around this concentration point
  • Large deviation: fluctuations far away of the concentrations points

Large deviation principle

\[ P( S(\vect X)/n = s ) \approx e^{-n I(s) } \]
Do we have a large principle?
  • Generating function \[ g_n(w) = \Esp{e^{w S_n}} \]
  • Scaled cumulant generating function $\Phi(w)$ \[ \Phi(w) = \lim_{n \rightarrow \infty} \frac{\ln g_n(w)}{n} \]

Gärtner-Ellis Theorem

If $\Phi(w)$ exists and is differentiable, $I(s) $ exists and is \[ I(s) = \sup_{w} \{w s - \Phi(w) \} \]
  • $\Phi(w) = g(w)$
  • $g(w)$ cumulant generating function

Large deviation principle:

\[ I(s) = \sup_{w} \{w s - g(w) \} \]
  • matrix generating function: \[ \mG(w) = \int_\R \mR(x) e^{-w x} dx \]
  • $\Phi(w) = \ln \lambda_1(w)$
  • $\lambda_1(w)$ dominant eigenvalue of $\mG(w)$

Large deviation principle for short-range correlation:

  • Short-range correlation: $\lambda_1(w)$ is differentiable near $0$
\[ I(s) = \sup_{w} \{w s - \ln \lambda_1(w) \} \]

Long-range or constant correlation

$\Phi(w)$ is not differentiable in $0$.
  • Constant correlation: Non-convex rate function
  • Polynomial correlation: Rate function with a flat branch
  • Three kind of correlation:
    • Exponential short-range correlation
    • Constant correlation
    • Polynomial long-range correlation
  • Extension of the law of large numbers and the central limit theorems:
    • Long-range correlation : Continuous and discrete mixture of standard limit laws
  • Large deviation principle:
    • Long-range correlation: Non-strictly convex rate function

Perspective

  • Extreme statistics
  • Physical model
  • Infinite dimension, higher tensor order