dyson_equalizer.algorithm module#
The dyson_equalizer.algorithm
module provides functions implementing the algorithms needed
to compute the Dyson Equalizer and related auxiliary functions.
The functions may be used to build specialized implementation of the Dyson Equalizer.
- dyson_equalizer.algorithm.compute_low_rank_approximation_mp(svd) tuple[ndarray, int] [source]#
Computes the low rank approximation by keeping all eigeinvalues above the maximum of the Marchenko-Pastur distribution. Details, derivation and convergence analysis are provided in [1], in particular Algorithms 2 and 3.
- Parameters:
- svd:
The svd of the data matrix, computed e.g. using
numpy.linalg.svd(Y, full_matrices=False)
- Returns:
- Y_tr: (m, n) numpy.array
The low-rank approximation of the data matrix truncated to r_hat
- r_hat: int
The rank of the truncated matrix
See also
Notes
The threshold for significance is based on the Marchenko-Pastur distribution and is estimated as \(\sqrt{m} + \sqrt{m}\), where \(m \le n\) are the matrix’s dimensions.
References
[1]Landa B., Kluger Y., “The Dyson Equalizer: Adaptive Noise Stabilization for Low-Rank Signal Detection and Recovery,” arXiv, https://arxiv.org/abs/2306.11263
- dyson_equalizer.algorithm.compute_scaling_factors(svd) tuple[ndarray, ndarray] [source]#
Compute the scaling factors for the Dyson equalizer
- Parameters:
- svd:
The svd of the data matrix, computed e.g. using
numpy.linalg.svd(Y, full_matrices=False)
- Returns:
- x_hat: (m) numpy.array
Normalizing factors for the rows
- y_hat: (n) numpy.array
Normalizing factors for the columns
See also
Notes
This function computes the normalizing factors for the Dyson equalizer. Details, derivation and convergence analysis are provided in [1], in particular Algorithm 1.
First, it computes the solutions to the Dyson equation as
\[\hat{g}_{i}^{(1)} = \sum_{k=1}^{m} \frac{\eta^2}{\sigma^2_k + \eta^2} U^2_{ik}\]\[\hat{g}_{j}^{(2)} = \frac{1}{\eta} \sum_{k=1}^{m} \left( \frac{\eta^2}{\sigma^2_k + \eta^2} - \frac{1}{\eta} \right) V^2_{jk}\]Then, assuming \(m \le n\), the normalizing factors are computed as:
\[\hat{x}_i = \frac{1}{\sqrt{m - \eta \Vert \hat{g}_{i}^{(1)} \Vert_{1}}} \left( \frac{1}{\hat{g}_{i}^{(1)}} - \eta \right)\]\[\hat{y}_i = \frac{1}{\sqrt{n - \eta \Vert \hat{g}_{i}^{(2)} \Vert_{1}}} \left( \frac{1}{\hat{g}_{i}^{(2)}} - \eta \right)\]- where:
\(m\) is the number of rows
\(m\) is the number of columns
\(\eta\) is the median principal value
\(\sigma_k\) is the \(k\)-th principal value
\(\hat{g}_{i}^{(1)}\) is the solution to the Dyson equation for the smallest dimension
\(\hat{g}_{j}^{(2)}\) is the solution to the Dyson equation for the largest dimension
References
[1]Landa B., Kluger Y., “The Dyson Equalizer: Adaptive Noise Stabilization for Low-Rank Signal Detection and Recovery,” arXiv, https://arxiv.org/abs/2306.11263
- dyson_equalizer.algorithm.marchenko_pastur(x: ndarray, gamma: float, sigma: float = 1) ndarray [source]#
Computes the density of the Marchenko-Pastur distribution for the given values
- Parameters:
- x: (n) numpy.array or float
a vector or a value for the xs to be computed
- gamma: float
the ratio between the number of rows and the number of columns (between 0 and 1)
- sigma: float, optional
the variance of the entries of the random matrix (defaults to 1)
- Returns:
- y: (n) numpy.ndarray
The values of the Marchenko-Pastur distribution
Notes
The density of the Marchenko-Pastur distribution can be defined as
\[dF_{\gamma, \sigma}(x) = \frac{\sqrt{(\beta_+ - x)(x - \beta_-)}}{2 \pi \sigma^2 \gamma x} \mathbb{1}(\beta_- \le x \le \beta_+)\]- where:
\(m\) is the number of rows
\(n\) is the number of columns
\(\gamma\) is the ratio \(\frac{m}{n}\) (assuming \(m \le n\))
\(\beta_\pm = \sigma^2(1\pm\sqrt{\gamma})^2\)
- dyson_equalizer.algorithm.marchenko_pastur_cdf(x, gamma: float, sigma: float = 1)[source]#
Computes the cumulative density function of the Marchenko-Pastur distribution for the given values
- Parameters:
- x: (n) numpy.array or float
a vector or a value for the xs to be computed
- gamma: float
the ratio between the number of rows and the number of columns (between 0 and 1)
- sigma: float, optional
the variance of the entries of the random matrix (defaults to 1)
- Returns:
- y: (n) numpy.ndarray
The values of the cdf of the Marchenko-Pastur distribution
- dyson_equalizer.algorithm.scale_matrix(Y: ndarray, x: ndarray, y: ndarray) ndarray [source]#
Scales a matrix by the given normalization factors for rows and columns
\[\hat{Y} = D\{x\} Y D\{y\}\]- Parameters:
- Y: (m, n) numpy.ndarray
a matrix
- x: (m) numpy.ndarray
the row weights
- x: (n) numpy.ndarray
the column weights
- Returns:
- Y_hat: (m, n) numpy.ndarray
The normalized matrix