$$\newcommand{\Esp}[1]{\mathbb{E}\left[#1\right]} \newcommand{\as}[1]{ \overset{\mathrm{a.s}} {#1} } \newcommand{\convArrow}[2]{ { \underset{#1\rightarrow +\infty}{#2} } } \newcommand{\convAs}[1]{ \as{\convArrow{#1}{\rightarrow}} } \newcommand{\simInf}[1]{\convArrow{#1}{\sim}} \newcommand{\iid}{\mathrm{i.i.d.}} \newcommand{\qc}{q^*} \newcommand{\cut}[1]{{#1}^\dagger} \newcommand{\dom}[1]{{#1}_{\mathrm{m}}} \newcommand{\ydom}{\dom{y}} \newcommand{\ycut}{\cut{y}} \newcommand{\hdom}{\dom{h}} \newcommand{\hcut}{\cut{h}} \newcommand{\R}{\mathbb{R}} \newcommand{\Prob}{\mathbb{P}} \newcommand{\neff}{n_{\mathrm{eff}}} \newcommand{\fun}[1]{\mathrm{#1}} \newcommand{\p}[1]{\left(#1\right)}$$
Critical order in moment estimation :
insights from statistical physics
27 June 2013
• ## Thesis:

"Sums and extremes in statistical physics and signal processing"
Advisors: Eric Bertin and Patrice Abry.
• ## Themes:

Application of statistical physics to signal processing
• Phase transition in moment estimation
• Extreme statistics
• Random vectors with matrix representation

# Moment estimator :

$S(n,q) = \frac{1} {n} \sum_{i=1}^n X_i^q, \quad X_i \ge 0$

# Law of large number:

X_i independent and identically distributed ($$\iid$$ )
 $S(n,q) \convAs{n} \Esp{X^q}$

## Finite sample size n?

• Fixed $$N$$
• Asymptotic behavior : $$\ln S(N,q) \simInf{q} q \ln \max_{k=1,\dots,N} \{ X_k \}$$
• Linearization effect in $$q$$
Transition line : $$S(n,\qc(n))$$
(inspired by [Ben Arous and al., Probab. Theory Related Fields , 2005])

# Power laws

• Finite moment order $$q_l$$, $$\Esp{X^{q_l}} = +\infty$$

# Regular laws

• Characteristic function $$\Esp{e^{\imath w X}}$$ analytic in $$0$$.
• $$\forall q>0,\quad \Esp{X^{q}} \in \R$$
• $$\qc(n) \asymp \frac{\ln n}{\ln \ln n}$$ (A. Kagan and al, Statistics & Probability Letters, 2001)

# Irregular laws

• $$\forall q>0,\quad \Esp{X^{q}} \in \R$$
• $$\Esp{e^{\imath w X}}$$ non-analytic in $$0$$.
• $$\qc(n)$$?

# Exponential irregular laws

• $$X_i = e^{Y_i}$$
• $$\Prob(Y_i>y) = e^{-y^\rho L(y)},$$ with $$\rho>1$$
• L slowly varying function : $$L(tx) \simInf{x} L(x)$$

## $$\rho$$: tail heaviness parameter

Example :
• Log-normal law: $$\rho = 2$$
Frontiers:
• $$\rho \rightarrow +\infty$$ : regular laws
• $$\rho \rightarrow 1$$ : power laws

# Log-weibull distribution family

• $$F_Y(y) \equiv \Prob(Y>y) = e^{-y^\rho}$$

# Log-gamma distribution family

• $$p_Y(y) = \frac{\rho}{2 \Gamma(1/\rho)} e^{-y^\rho}$$
$$\rho=1.5$$
$$\rho=2$$
$$\rho=3$$

# Compound Poisson motion $$Z(t)$$

• $$Q_r(t) = B_r(t) \prod_{p_i \in C_r(t)} W_i$$
• $$Z(t) = \lim_{r\rightarrow 0} \int_{0}^{t} Q_r(s) ds$$
• Increments $$X(a,t)=Z(t+a)-Z(t)$$

# Correlation

$$\Esp {X(a,t) X(a,t+s)} =$$ $$\frac{1}{\lambda(2)\, (\lambda(2)-1)} \left( |s+a|^{\lambda(2)} + |s-a|^{\lambda(2)} - 2 |s|^{\lambda(2)} \right),$$
• No characteristic scale

# Linearisation effect also present

• Moment estimator $$S(n,q(n) )= \frac{1}{n}\sum_{i=1}^{n} X_{i}^{q(n)}$$
• $$Y_i = - \ln X_i$$
• $S(n,q)= \frac{1}{n} \sum_{i=1}^{n} e^{ -q Y_{i}}$
• REM partition function: $Z( n=2^m ,\beta) = \sum_{i=0}^{2^m} e^{-\beta \sqrt{m} E_i}$
A simplified spin glass model [Derrida, Phys. Rev. B, 1981]
• Simple disordered system
• $$\iid$$ random configuration energies
• Glassy transition at finite inverse temperature $$\beta_c$$
How to translate arguments from the REM to moment estimation?

# Competition between two effects

• Concentration effect
• Finite size effect
$\Esp{X^q}=\int e^{qy-\phi(y)}\mathrm{\; d y}\quad \text{ with } \phi = -\ln p_Y.$ Saddle point method when $$q \rightarrow +\infty$$: $\ln \Esp{X^q} \approx q \ydom - \phi(\ydom)$

# Concentration Point $$\ydom(q)$$

$\ydom(q): \phi'(\ydom) = q$
• Unreachable region : $$]\ycut(n),+\infty [$$

# Cutoff point $$\ycut$$

$\ycut(n):\quad P(Y> \ycut(n))=\frac{1}{n} .$
Truncated moments : excluding the contribution of the points beyond $$\ycut$$ $M_t(n,q) = \int_{-\infty}^{\ycut} e^{qy-\phi(y)} \mathrm{d}\;y$
• $$\ydom \lt \ycut,\quad \ln M_t \approx \ln \Esp{X^q}$$
• $$\ydom \gt \ycut, \quad \ln M_t \approx q \ycut - \phi(\ycut)$$

# Theorem:

$\lim_{n\rightarrow + \infty}\frac{\ln S(n,q(n) )}{\ln n} \as{=} \lim_{n\rightarrow + \infty} \frac{ \ln M_t(n,q(n)) } {\ln n}$

# Critical order $$\qc(n)$$

$\qc(n): \ydom(\qc) = \ycut(n)$ $\qc(n) \sim_{n\rightarrow +\infty} \rho \frac{\ln n} {\ycut(n)}$
 Prediction of the linearisation point: $$n=10^2$$ $$n=10^3$$ $$n=10^6$$ Behavior of $$\ln \frac{S(n,q)}{\Esp{X^q}}$$ in function of $$\frac q {\qc}$$

# $$n\rightarrow +\infty$$

$\qc(n) \varpropto (\ln n)^{1 - \frac{1}{\rho}}.$
• Regular laws : $$\rho \rightarrow +\infty, \qc(n) \varpropto \ln n$$
• Log-normal law : $$\rho =2,\quad \qc(n) \varpropto \sqrt{\ln n}$$
• Powers laws: $$\rho \rightarrow 1,\quad \qc(n) \varpropto (\ln n)^{\rho-1}$$
More details in arxiv:1204.3047
$\qc(n) = \color{red}{\rho_l(\ycut(n))} \color{blue}{ \frac{\ln n} {\ycut(n)} } \equiv \color{red}{\rho_E} \color{blue}{\theta}$

# Estimation of $$\qc$$

• $$\qc$$ only depends on information available from the empirical cumulative $$\Rightarrow$$ $$\qc$$is estimable
• $$(1-1/n)$$-quantile estimation ($$\color{blue}{\ycut(n)}$$)
• Local power exponent at the last quantile ($$\equiv \color{red}{\rho_l(\ycut(n))}$$)
• Theoretical and numerical analysis of the performance
$\theta = \frac{\ln n}{F^{-1}_Y(\frac{1}{n})}$

## Estimation of $$F^{-1}_Y(\frac{1}{n})$$

• $$\max\{ Y_i \}$$: potentially asymptotically biased
• order statistics $$Y_{1, n} \ge Y_{2,n} \ge \dots \ge Y_{n,n}$$
• $$\Omega_{k_\theta} = \sum_{j=1}^{k_\theta} \alpha_j Y_{j,n}$$ : $\begin{cases} \forall i, i\neq k & \alpha_i= \frac{\sum_{l=1}^{k_\theta-1} \frac{1}{l}-\gamma}{k_\theta-1} \\ & \alpha_{k_\theta}=1- (k_\theta-1)\alpha_1 \\ \end{cases}$

## $$\hat{\theta}_{k_{\theta}} = \frac{\ln n}{\Omega_{k_{\theta}} }$$

 log-weibull log-gamma $$\rho=1.1$$ $$\rho=2$$ $$\rho=3$$
Relative mean square error
Relative bias
$\rho= \rho_l\left(F^{-1}_n \left(\frac 1 n\right) \right)$
• $$\partial_{\ln Y} F_Y (y) = \rho_l(y)$$
• $$\ln \left(-\ln \frac{i}{n} \right) \approx \rho_E \ln Y_{i,n} + \beta.$$
• linear regression :
• $$C(x,y)= \overline {xy} - \overline{x}$$
• $$\overline x= \frac{1}{k} \sum_{i=1}^{k_\rho} x_i$$
• $$\hat{\rho}_E = \frac { C \ln y, \ln(\ln n - \ln i) )} {C(\ln y,\ln y)},$$
 log-weibull log-gamma $$\rho=1.1$$ $$\rho=2$$ $$\rho=3$$
Relative mean square error
Relative bias
$\hat{q}^* = \color{red}{ \hat{\rho}_ {E,k_{\rho_E}} } \color{blue}{ \hat{ \theta}_{k_\theta}}$
 log-Weibull log-gamma $$n=1000, \rho$$ $$\rho=2, n$$
Relative mean square error
Relative bias
• Exponentially correlated time series $$\mathrm{Corr}(X_k X_{k+T}) \varpropto e^{-t/\tau}$$
• $$\hat{q_c}$$ becomes biased
• Empirical workaround
• Replace $$n$$ with an effective number of independent samples $\neff= \frac{n}{1+\alpha \tau}$
• Compute an sieved order statistics by eliminating maxima too close from each other
• Compute $$\hat{ \theta }$$ and $$\hat{\rho_E}$$ with these sieved order statistics
Correlation $$\mathrm{Corr}(t) = \exp(-t/\tau)$$
$$\color{red}{\tau=10}$$,
$$\color{green!40!black}{\tau=50}$$,
$$\color{blue}{\tau=100}$$
More details in arxiv:1204.3047
• Increment $$X(a,t) = X(t+a) -X(t)$$
• $$\lambda(q)$$: $$\Esp{X(a,t)^q}= C_a a^{\lambda(q)}$$
• $$S(a,q) = \sum_{k=1}^{L/a} X(a,ka)^q$$
• $$n$$ is replaced by $$|\ln a|$$
• Linearisation effect at $$\qc$$ independent of $$a$$

# Scaled variable $$h$$

$H_a(t) = \frac{ \ln X(a,t) } { \ln a } = \frac{ Y(a,t) } {| \ln a |} .$
{}

# Probability density function of $$H_a$$

Large deviation theory and Gartner-Ellis theorem $$\Rightarrow$$ $p_{H_a} (h) \asymp a^{\psi(h)}$ $$\psi(h)$$ Fenchel-Legendre transform of $$\lambda(q)$$
$\Esp {X(a,t)^q} \approx \int_{-\infty}^{+\infty} e^{-|\ln a| [ q h + \psi(h) ]}\, dh$ Saddle point method when $$a \rightarrow 0$$: $\Esp {X(a,t)^q} \approx -|\ln a| [ q \hdom - \psi(\hdom) ]$

# Concentration Point $$\hdom(q)$$

$\hdom(q):\quad \psi'(\hdom(q) ) = q$
• Implicit scale $$a$$ dependency : $\ydom(a,q) = |\ln a| f(q)$
• Cutoff point $$\hcut$$: $\hcut(a):\quad P(\forall k\in\{0,\dots, \frac L a \}, H_a(ka) > \hcut(a)) = \frac{1}{e}$
• Effective number of independent samples $$\neff$$ : $\neff:\quad P(H_a > \hcut(a)) = \frac{1}{\neff(a)}$
• Hypothesis : $\neff(a) \varpropto \frac{L}{a}$
$\hcut(a) = \hdom(\qc)$

# Critical order $$\qc(a)$$

$1 + \qc \lambda'(\qc) - \lambda(\qc) = 0.$
• $$\qc$$ is independent of $$a$$ !
• $$\qc$$ depends only of $$\lambda$$ : intrinsic characteristic of the cascade
• Interference between the dependency in $$a$$ of $$p_Y$$ and the correlation length
More details in arxiv:1012.3688
Under a reasonable conjecture: $\frac{ \ln S(a,q)} {\ln a} \convAs{\ln a} \zeta(q)$

# Linearisation effect

$\zeta(q) =\begin{cases} \lambda(q), & -1 \lt q \leq \qc, \\ 1 + q \lambda'(\qc), & q \gt \qc.\\ \end{cases}$
$$Z \Leftrightarrow S(n,q)$$
• $$Z$$ fonction de partition REM $$\begin{cases} \text{inverse temperature } \beta \leftrightarrow q \\ \text{energy } E_i \leftrightarrow -\ln X_i \\ \end{cases}$$
$$\lambda_e(q) = \lim_{a\rightarrow 0} \ln S(a,q) / \ln a$$

# Analogy REM$$\Leftrightarrow$$ Moments

 Entropy $$s(\beta)$$ $$\Huge{ \Leftrightarrow}$$ $$1+q \lambda_e'(q)-\lambda_e(q)$$ Glassy transition at $$\beta_c$$ $$\Huge{ \Leftrightarrow }$$ Moment linearisation at $$\qc$$

# Summary

• Formal analogy between linearization effect and glassy phase transition
• A notion of critical order for moment estimation :
• Computable
• Estimable
• An alternative interpretation of the linearization effect in multifractal analysis

# Connected questions

• Similar question : behavior of $$\max_{k=1,\dots,n} \{ X_k^{q(n)} \}$$

# Outlook

• A more formal analysis of critical order for correlated random variables?
• Random vector with a matrix representation
• Estimation of large deviation functions