dyson_equalizer.algorithm module#

The dyson_equalizer.algorithm module provides functions implementing the algorithms needed to compute the Dyson Equalizer and related auxiliary functions.

The functions may be used to build specialized implementation of the Dyson Equalizer.

dyson_equalizer.algorithm.compute_low_rank_approximation_mp(svd) tuple[ndarray, int][source]#

Computes the low rank approximation by keeping all eigeinvalues above the maximum of the Marchenko-Pastur distribution. Details, derivation and convergence analysis are provided in [1], in particular Algorithms 2 and 3.

Parameters:
svd:

The svd of the data matrix, computed e.g. using numpy.linalg.svd(Y, full_matrices=False)

Returns:
Y_tr: (m, n) numpy.array

The low-rank approximation of the data matrix truncated to r_hat

r_hat: int

The rank of the truncated matrix

Notes

The threshold for significance is based on the Marchenko-Pastur distribution and is estimated as \(\sqrt{m} + \sqrt{m}\), where \(m \le n\) are the matrix’s dimensions.

References

[1]

Landa B., Kluger Y., “The Dyson Equalizer: Adaptive Noise Stabilization for Low-Rank Signal Detection and Recovery,” arXiv, https://arxiv.org/abs/2306.11263

dyson_equalizer.algorithm.compute_scaling_factors(svd) tuple[ndarray, ndarray][source]#

Compute the scaling factors for the Dyson equalizer

Parameters:
svd:

The svd of the data matrix, computed e.g. using numpy.linalg.svd(Y, full_matrices=False)

Returns:
x_hat: (m) numpy.array

Normalizing factors for the rows

y_hat: (n) numpy.array

Normalizing factors for the columns

See also

numpy.linalg.svd

Notes

This function computes the normalizing factors for the Dyson equalizer. Details, derivation and convergence analysis are provided in [1], in particular Algorithm 1.

First, it computes the solutions to the Dyson equation as

\[\hat{g}_{i}^{(1)} = \sum_{k=1}^{m} \frac{\eta^2}{\sigma^2_k + \eta^2} U^2_{ik}\]
\[\hat{g}_{j}^{(2)} = \frac{1}{\eta} \sum_{k=1}^{m} \left( \frac{\eta^2}{\sigma^2_k + \eta^2} - \frac{1}{\eta} \right) V^2_{jk}\]

Then, assuming \(m \le n\), the normalizing factors are computed as:

\[\hat{x}_i = \frac{1}{\sqrt{m - \eta \Vert \hat{g}_{i}^{(1)} \Vert_{1}}} \left( \frac{1}{\hat{g}_{i}^{(1)}} - \eta \right)\]
\[\hat{y}_i = \frac{1}{\sqrt{n - \eta \Vert \hat{g}_{i}^{(2)} \Vert_{1}}} \left( \frac{1}{\hat{g}_{i}^{(2)}} - \eta \right)\]
where:
  • \(m\) is the number of rows

  • \(m\) is the number of columns

  • \(\eta\) is the median principal value

  • \(\sigma_k\) is the \(k\)-th principal value

  • \(\hat{g}_{i}^{(1)}\) is the solution to the Dyson equation for the smallest dimension

  • \(\hat{g}_{j}^{(2)}\) is the solution to the Dyson equation for the largest dimension

References

[1]

Landa B., Kluger Y., “The Dyson Equalizer: Adaptive Noise Stabilization for Low-Rank Signal Detection and Recovery,” arXiv, https://arxiv.org/abs/2306.11263

dyson_equalizer.algorithm.marchenko_pastur(x: ndarray, gamma: float, sigma: float = 1) ndarray[source]#

Computes the density of the Marchenko-Pastur distribution for the given values

Parameters:
x: (n) numpy.array or float

a vector or a value for the xs to be computed

gamma: float

the ratio between the number of rows and the number of columns (between 0 and 1)

sigma: float, optional

the variance of the entries of the random matrix (defaults to 1)

Returns:
y: (n) numpy.ndarray

The values of the Marchenko-Pastur distribution

Notes

The density of the Marchenko-Pastur distribution can be defined as

\[dF_{\gamma, \sigma}(x) = \frac{\sqrt{(\beta_+ - x)(x - \beta_-)}}{2 \pi \sigma^2 \gamma x} \mathbb{1}(\beta_- \le x \le \beta_+)\]
where:
  • \(m\) is the number of rows

  • \(n\) is the number of columns

  • \(\gamma\) is the ratio \(\frac{m}{n}\) (assuming \(m \le n\))

  • \(\beta_\pm = \sigma^2(1\pm\sqrt{\gamma})^2\)

dyson_equalizer.algorithm.marchenko_pastur_cdf(x, gamma: float, sigma: float = 1)[source]#

Computes the cumulative density function of the Marchenko-Pastur distribution for the given values

Parameters:
x: (n) numpy.array or float

a vector or a value for the xs to be computed

gamma: float

the ratio between the number of rows and the number of columns (between 0 and 1)

sigma: float, optional

the variance of the entries of the random matrix (defaults to 1)

Returns:
y: (n) numpy.ndarray

The values of the cdf of the Marchenko-Pastur distribution

dyson_equalizer.algorithm.scale_matrix(Y: ndarray, x: ndarray, y: ndarray) ndarray[source]#

Scales a matrix by the given normalization factors for rows and columns

\[\hat{Y} = D\{x\} Y D\{y\}\]
Parameters:
Y: (m, n) numpy.ndarray

a matrix

x: (m) numpy.ndarray

the row weights

x: (n) numpy.ndarray

the column weights

Returns:
Y_hat: (m, n) numpy.ndarray

The normalized matrix