locat.locat module#
- class locat.locat.LOCAT(adata: AnnData, cell_embedding: ndarray, k: int, n_bootstrap_inits: int = 50, show_progress: bool = False, wgmm_dtype: str = 'same', knn=None, knn_k: int | None = None, knn_mode: str = 'binary')[source]#
Bases:
objectThe main LOCAT class
- Attributes:
W_tImplementation-oriented expression matrix (cells x genes).
- cell_dist
- min_dist
Methods
background_n_components_init([...])bic_score(gmm1, gene_prior)calc_lratio(f1, ix, log_bkg_pdf, sample_size)Per-cell LRT contribution on expressing cells:
depletion_pval_scan(gmm1, gene_prior, *[, ...])estimate_null_parameters([fractions, n_reps])Estimate LTST null mean/std as a function of expression fraction p using random pseudo-genes: - pick random expressing cells at frequency p - fit signal GMM with the same pipeline as real genes - compute LTST exactly as in gmm_scan_new
get_genes_indices(genes)gmm_scan([genes, weights_transform, ...])Runs Locat and identifies Localized genes
init_rng([seed])Initialized the random number generator.
knn()Return a KNN adjacency/connectivity matrix.
localization_pval_dep_scan(*args, **kwargs)set_knn(knn)Store a precomputed KNN graph.
auto_bkg_components
auto_n_components
background_pdf
fit_wgmm
get_gene_prior
gmm_local_pvalue
gmm_local_scan
gmm_loglikelihoodtest
random_pdf
reg_covar
show_progress
signal_gmm
signal_pdf
- property W_t[source]#
Implementation-oriented expression matrix (cells x genes).
This is the transpose of the manuscript’s W object, which is described in gene x cell orientation.
- __init__(adata: AnnData, cell_embedding: ndarray, k: int, n_bootstrap_inits: int = 50, show_progress: bool = False, wgmm_dtype: str = 'same', knn=None, knn_k: int | None = None, knn_mode: str = 'binary')[source]#
Create a Locat object
- Parameters:
- adata: AnnData
The data in AnnData format, typically generated from Scanpy
- cell_embedding: np.ndarray
The embedding to use in the analysis
- k: int
The number of components to use in the GMM
- n_bootstrap_inits: int, optional
The number of initializations used in bootstrapping (default:50)
- show_progress: bool, optional
If True, shows progress bar (default: True)
- wgmm_dtype: str, optional
The data type to use in the weighted GMM (default: same). Allowed values: “same”, “float32” or “float64”.
- knn: np.ndarray, optional
K-nearest neighbor connectivities. Can be computed in a scanpy object by scanpy.pp.neighbors and accessed from a scanpy object from adata.obsp[“connectivities”]
- knn_k: int, optional
The k parameter for computing k-nearest neighbors
- knn_mode: KnnMode, optional
The mode to compute the K-nearest neighbors (default: “binary”) “binary” or “connectivity”
See also
scanpy.pp.neigbors
- static calc_lratio(f1, ix, log_bkg_pdf, sample_size, eps=1e-300)[source]#
- Per-cell LRT contribution on expressing cells:
-2 * sum_{i in ix} ( log f0(i) - log f1(i) ) / sample_size
- depletion_pval_scan(gmm1, gene_prior, *, lambda_values=None, soft_bound=None, min_p0_abs=0.1, min_expected=30, min_abs_deficit=0.02, n_trials_cap=500, weight_mode='binary', p_floor=1e-12, n_eff_scale=0.6, rho_bb=0.02, eps_rel=0.01, debug=False, debug_store_masks=False, debug_max_cells=5000)[source]#
- estimate_null_parameters(fractions=None, n_reps=50)[source]#
Estimate LTST null mean/std as a function of expression fraction p using random pseudo-genes:
pick random expressing cells at frequency p
fit signal GMM with the same pipeline as real genes
compute LTST exactly as in gmm_scan_new
- gmm_local_pvalue(genes=None, n_comp=None, weights_transform=None, alpha=0.05, n_inits=100, normalize_knn=True, eps=1e-12)[source]#
- gmm_scan(genes: list[str] | None = None, weights_transform: Callable | None = None, zscore_thresh: float = None, max_freq: float = 0.9, verbose: bool = False, n_bootstrap_inits: int = None, rc_lambda_values: list | None = None, rc_min_p0_abs: float = 0.1, rc_min_expected: int = 3, rc_min_abs_deficit: float = 0.04, rc_n_trials_cap: float = None, rc_soft_bound: float = 1.0, rc_n_eff_scale: float = 0.6, rc_p_floor: float = 1e-12, rc_rho_bb: float = 0.02, rc_weight_mode: str = 'binary', rc_eps_rel: float = 0.01, include_depletion_scan: bool = False) dict[str, LocatResult][source]#
Runs Locat and identifies Localized genes
- Parameters:
- genes: list[str] | None, optional
If specified, only analyze the given list of genes
- weights_transform: Callable, optional
If specified, call this function to normalize the data
- zscore_thresh: float, optional
The z_score threshold to use when keeping localized genes
- max_freq: float, optional
The maximum fraction of cells allowed to express the gene
- verbose: bool, optional
If True, prints to the standard output
- n_bootstrap_inits: int, optional
The number of initializations used in bootstrapping (default:50)
- rc_lambda_values: list[float], optional
If not specified, a default is used
- rc_min_p0_abs: float, optional
The minimum proportion of f0 density in depleted region required for the region pval to be estimated
- rc_min_expected: int, optional
The minimum expected cells in depleted region required for the region pval to be estimated
- rc_min_abs_deficit: float, optional
The minimum absolute difference in f1(x) - f0(x) for all x in depleted region
- rc_n_trials_cap: float, optional
If None, defaults to sqrt(n_cells)
- rc_soft_bound: float, optional
The minimum value allowed for pvals
- rc_n_eff_scale: float, optional
The scaling factor for effective sample sizes – can be tweaked to stabilize pvals across various gene sample sizes
- rc_p_floor: float, optional
The minimum p-value to use (default: 1e-12)
- rc_rho_bb: float, optional
The strength of the beta binomial (0.0 is standard binomial, set at 0.02-0.05 for wider tails, default: 0.02)
- rc_weight_mode: str, optional
The mode to compute the K-nearest neighbors (default: “binary”) “binary” or “connectivity”
- rc_eps_rel: float, optional
The rc_eps_rel
- include_depletion_scan: bool, optional
If True, If True, adds the depletion scan to the output for debugging purposes
- Returns:
- dict[str, LocatResult]
A dictionary containing the LocatResult for each gene
- init_rng(seed: int = 0)[source]#
Initialized the random number generator.
- Parameters:
- seed: int, optional
The seed to use
- knn()[source]#
Return a KNN adjacency/connectivity matrix.
If a KNN was provided at init (or via set_knn), returns it. Otherwise computes one from the embedding.
- class locat.locat.LOCATNullDistribution(mean_func, std_func)[source]#
Bases:
object- Attributes:
- mean
- std
Methods
from_estimates
to_zscore
- locat.locat.cauchy_combine(pvals, weights=None)[source]#
Robust p-value combiner for dependent tests (Cauchy combination). Liu & Xie (2020).
- locat.locat.localization_pvalue_nn_func(x1, f1, f0, nn)[source]#
Sparse-safe rewrite of the original localization_pvalue_nn_func.
- Preserves:
i1/obs1/obs2 definitions
n and o as neighborhood (weighted) counts / signed balance
f2 weighting for global mu_hat, p1, p0
effective_n, sd_hat
per-cell p via normal_sf(o[i], mu_hat, sd_hat[i])
- Works for:
nn sparse CSR/CSC/COO
nn dense numpy array
- locat.locat.logsidak_from_logp(logp_min: float, m_eff: int) float[source]#
- Sidák combine in log-space:
- log p_sidák = log(1 - (1 - p_min)^m_eff)
= log(1 - exp(m_eff * log(1 - p_min)))
with log(1 - p_min) = log1p(-exp(logp_min)).
This preserves the original locat_condensed depletion-scan behavior, where the Sidák correction is applied for ordinary negative log-p values and only skipped when there is effectively a single test.