locat.locat module#

class locat.locat.LOCAT(adata: AnnData, cell_embedding: ndarray, k: int = 20, n_bootstrap_inits: int = 50, show_progress: bool = False, wgmm_dtype: str = 'same', knn=None, knn_k: int | None = None, knn_mode: str = 'binary', reg_covar: float | None = None)[source]#

Bases: object

The main LOCAT class

Attributes:

W_t: Implementation-oriented expression matrix (cells x genes).
cell_dist
min_dist

Methods

`background_n_components_init`([...])
`calc_lratio`(f1, ix, log_bkg_pdf, sample_size)	Per-cell LRT contribution on expressing cells:
`depletion_pval_scan`(gmm1, gene_prior, *[, ...])
`estimate_null_parameters`([fractions, n_reps])	Estimate LTST null mean/std as a function of expression fraction p using random pseudo-genes: - pick random expressing cells at frequency p - fit signal GMM with the same pipeline as real genes - compute LTST exactly as in gmm_scan_new
`get_genes_indices`(genes)
`gmm_scan`([genes, weights_transform, ...])	Runs Locat and identifies Localized genes
`init_rng`([seed, global_seed])	Initialized the random number generator.
`knn`()	Return a KNN adjacency/connectivity matrix.
`localization_pval_dep_scan`(args, *kwargs)
`set_knn`(knn)	Store a precomputed KNN graph.

auto_bkg_components
auto_n_components
background_pdf
bic_score
fit_wgmm
get_gene_prior
gmm_local_pvalue
gmm_local_scan
gmm_loglikelihoodtest
random_pdf
reg_covar
show_progress
signal_gmm
signal_pdf

property W_t[source]#

Implementation-oriented expression matrix (cells x genes).

This is the transpose of the manuscript’s W object, which is described in gene x cell orientation.

__init__(adata: AnnData, cell_embedding: ndarray, k: int = 20, n_bootstrap_inits: int = 50, show_progress: bool = False, wgmm_dtype: str = 'same', knn=None, knn_k: int | None = None, knn_mode: str = 'binary', reg_covar: float | None = None)[source]#

Create a Locat object

Parameters:

adata: AnnData: The data in AnnData format, typically generated from Scanpy
cell_embedding: np.ndarray: The embedding to use in the analysis
k: int, optional: The number of nearest neighbors used to compute cell weights (default: 20)
n_bootstrap_inits: int, optional: The number of initializations used in bootstrapping (default: 50)
show_progress: bool, optional: If True, shows progress bar (default: False)
wgmm_dtype: str, optional: The data type to use in the weighted GMM (default: same). Allowed values: “same”, “float32” or “float64”.
knn: np.ndarray, optional: K-nearest neighbor connectivities. Can be computed in a scanpy object by scanpy.pp.neighbors and accessed from a scanpy object from adata.obsp[“connectivities”]
knn_k: int, optional: The k parameter for computing k-nearest neighbors
knn_mode: KnnMode, optional: The mode to compute the K-nearest neighbors (default: “binary”) “binary” or “connectivity”
reg_covar: float, optional: Regularization added to the covariance matrix diagonal for numerical stability. If None, an automatic value is derived from the data geometry (default: None).

See also

scanpy.pp.neigbors

auto_bkg_components(n_points, weights_transform=None)[source]#

auto_n_components(coords, weights=None, indices=None, min_points_fraction=0.95)[source]#

background_n_components_init(weights_transform=None, min_points=10, n_reps=30)[source]#

background_pdf(n_comp=None, reps=10, weights_transform=None, force_refresh=False)[source]#

bic_score(gmm1, gene_prior)[source]#

static calc_lratio(f1, ix, log_bkg_pdf, sample_size, eps=1e-300)[source]#

Per-cell LRT contribution on expressing cells:: -2 * sum_{i in ix} ( log f0(i) - log f1(i) ) / sample_size

property cell_dist: ndarray[source]#

depletion_pval_scan(gmm1, gene_prior, *, lambda_values=None, soft_bound=None, min_p0_abs=0.1, min_expected=30, min_abs_deficit=0.02, n_trials_cap=500, weight_mode='binary', p_floor=1e-12, n_eff_scale=0.6, rho_bb=0.02, eps_rel=0.01, debug=False, debug_store_masks=False, debug_max_cells=5000)[source]#

estimate_null_parameters(fractions=None, n_reps=50)[source]#

Estimate LTST null mean/std as a function of expression fraction p using random pseudo-genes:

pick random expressing cells at frequency p

fit signal GMM with the same pipeline as real genes

compute LTST exactly as in gmm_scan_new

fit_wgmm(n_comp, weights=None) → WGMM[source]#

get_gene_prior(i_gene, weights_transform)[source]#

get_genes_indices(genes)[source]#

gmm_local_pvalue(genes=None, n_comp=None, weights_transform=None, alpha=0.05, n_inits=100, normalize_knn=True, eps=1e-12)[source]#

gmm_local_scan(genes=None, weights_transform=None, zscore_thresh=None, max_freq=0.5)[source]#

gmm_loglikelihoodtest(genes=None, weights_transform=None, max_freq=0.5)[source]#

gmm_scan(genes: list[str] | None = None, weights_transform: ~typing.Callable = <function clip_weights>, zscore_thresh: float = None, max_freq: float = 0.9, verbose: bool = False, n_bootstrap_inits: int = None, rc_lambda_values: list | None = None, rc_min_p0_abs: float = 0.1, rc_min_expected: int = 3, rc_min_abs_deficit: float = 0.04, rc_n_trials_cap: float = None, rc_soft_bound: float = 1.0, rc_n_eff_scale: float = 0.6, rc_p_floor: float = 1e-12, rc_rho_bb: float = 0.02, rc_weight_mode: str = 'binary', rc_eps_rel: float = 0.01, include_depletion_scan: bool = False) → dict[str, LocatResult][source]#

Runs Locat and identifies Localized genes

Parameters:

genes: list[str] | None, optional: If specified, only analyze the given list of genes
weights_transform: Callable, optional: If specified, call this function to normalize the data
zscore_thresh: float, optional: The z_score threshold to use when keeping localized genes
max_freq: float, optional: The maximum fraction of cells allowed to express the gene
verbose: bool, optional: If True, prints to the standard output
n_bootstrap_inits: int, optional: The number of initializations used in bootstrapping (default:50)
rc_lambda_values: list[float], optional: If not specified, a default is used
rc_min_p0_abs: float, optional: The minimum proportion of f0 density in depleted region required for the region pval to be estimated
rc_min_expected: int, optional: The minimum expected cells in depleted region required for the region pval to be estimated
rc_min_abs_deficit: float, optional: The minimum absolute difference in f1(x) - f0(x) for all x in depleted region
rc_n_trials_cap: float, optional: If None, defaults to sqrt(n_cells)
rc_soft_bound: float, optional: The minimum value allowed for pvals
rc_n_eff_scale: float, optional: The scaling factor for effective sample sizes – can be tweaked to stabilize pvals across various gene sample sizes
rc_p_floor: float, optional: The minimum p-value to use (default: 1e-12)
rc_rho_bb: float, optional: The strength of the beta binomial (0.0 is standard binomial, set at 0.02-0.05 for wider tails, default: 0.02)
rc_weight_mode: str, optional: The mode to compute the K-nearest neighbors (default: “binary”) “binary” or “connectivity”
rc_eps_rel: float, optional: The rc_eps_rel
include_depletion_scan: bool, optional: If True, If True, adds the depletion scan to the output for debugging purposes

Returns:

dict[str, LocatResult]: A dictionary containing the LocatResult for each gene

init_rng(seed: int = 0, global_seed: int = 13)[source]#

Initialized the random number generator.

Parameters:

seed: int, optional: The seed to use
global_seed: int, optional: The seed to use for the global np.random

knn()[source]#

Return a KNN adjacency/connectivity matrix.

If a KNN was provided at init (or via set_knn), returns it. Otherwise computes one from the embedding.

localization_pval_dep_scan(*args, **kwargs)[source]#

property min_dist: float[source]#

random_pdf(weights, n_comp=None, n_inits=300, buckets=None)[source]#

reg_covar(sample_size=None)[source]#

set_knn(knn)[source]#

Store a precomputed KNN graph.

Accepts:

scipy sparse (csr/csc/coo) adjacency or connectivities
dense numpy array adjacency/connectivities

Expected shape: (n_cells, n_cells)

show_progress(show_progress=True)[source]#

signal_gmm(weights, n_comp=None)[source]#

signal_pdf(weights, n_comp=None)[source]#

class locat.locat.LOCATNullDistribution(mean_func, std_func)[source]#

Bases: object

Attributes:

mean
std

Methods

from_estimates
to_zscore

__init__(mean_func, std_func)[source]#

classmethod from_estimates(p, means, stds)[source]#

mean = None[source]#

std = None[source]#

to_zscore(raw_score, p)[source]#

locat.locat.cauchy_combine(pvals, weights=None)[source]#: Robust p-value combiner for dependent tests (Cauchy combination). Liu & Xie (2020).

locat.locat.clip_weights(x)[source]#: Default weights transform: clip negative values to zero.

locat.locat.localization_pvalue_nn_func(x1, f1, f0, nn)[source]#

Sparse-safe rewrite of the original localization_pvalue_nn_func.

Preserves:

i1/obs1/obs2 definitions
n and o as neighborhood (weighted) counts / signed balance
f2 weighting for global mu_hat, p1, p0
effective_n, sd_hat
per-cell p via normal_sf(o[i], mu_hat, sd_hat[i])

Works for:

nn sparse CSR/CSC/COO
nn dense numpy array

locat.locat.logsidak_from_logp(logp_min: float, m_eff: int) → float[source]#

Sidák combine in log-space:

log p_sidák = log(1 - (1 - p_min)^m_eff): = log(1 - exp(m_eff * log(1 - p_min)))

with log(1 - p_min) = log1p(-exp(logp_min)).

This preserves the original locat_condensed depletion-scan behavior, where the Sidák correction is applied for ordinary negative log-p values and only skipped when there is effectively a single test.

locat.locat.ltst_score_func(f0, f1, p)[source]#

locat.locat.normal_sf(x, mu, sigma)[source]#: Normal survival function SF = 1 - CDF, numba-jitted.

locat.locat.sens_score_func(f0, f1, i)[source]#

locat.locat.smooth_qvals(x)[source]#

locat.locat.summarize_rc_debug(cs_res, top=8)[source]#: Convenience helper to inspect per-threshold diagnostics from depletion_pval_scan(…, debug=True).