locat.locat module#

class locat.locat.LOCAT(adata: AnnData, cell_embedding: ndarray, k: int, n_bootstrap_inits: int = 50, show_progress: bool = False, wgmm_dtype: str = 'same', knn=None, knn_k: int | None = None, knn_mode: str = 'binary')[source]#

Bases: object

The main LOCAT class

Attributes:
W_t

Implementation-oriented expression matrix (cells x genes).

cell_dist
min_dist

Methods

background_n_components_init([...])

bic_score(gmm1, gene_prior)

calc_lratio(f1, ix, log_bkg_pdf, sample_size)

Per-cell LRT contribution on expressing cells:

depletion_pval_scan(gmm1, gene_prior, *[, ...])

estimate_null_parameters([fractions, n_reps])

Estimate LTST null mean/std as a function of expression fraction p using random pseudo-genes: - pick random expressing cells at frequency p - fit signal GMM with the same pipeline as real genes - compute LTST exactly as in gmm_scan_new

get_genes_indices(genes)

gmm_scan([genes, weights_transform, ...])

Runs Locat and identifies Localized genes

init_rng([seed])

Initialized the random number generator.

knn()

Return a KNN adjacency/connectivity matrix.

localization_pval_dep_scan(*args, **kwargs)

set_knn(knn)

Store a precomputed KNN graph.

auto_bkg_components

auto_n_components

background_pdf

fit_wgmm

get_gene_prior

gmm_local_pvalue

gmm_local_scan

gmm_loglikelihoodtest

random_pdf

reg_covar

show_progress

signal_gmm

signal_pdf

property W_t[source]#

Implementation-oriented expression matrix (cells x genes).

This is the transpose of the manuscript’s W object, which is described in gene x cell orientation.

__init__(adata: AnnData, cell_embedding: ndarray, k: int, n_bootstrap_inits: int = 50, show_progress: bool = False, wgmm_dtype: str = 'same', knn=None, knn_k: int | None = None, knn_mode: str = 'binary')[source]#

Create a Locat object

Parameters:
adata: AnnData

The data in AnnData format, typically generated from Scanpy

cell_embedding: np.ndarray

The embedding to use in the analysis

k: int

The number of components to use in the GMM

n_bootstrap_inits: int, optional

The number of initializations used in bootstrapping (default:50)

show_progress: bool, optional

If True, shows progress bar (default: True)

wgmm_dtype: str, optional

The data type to use in the weighted GMM (default: same). Allowed values: “same”, “float32” or “float64”.

knn: np.ndarray, optional

K-nearest neighbor connectivities. Can be computed in a scanpy object by scanpy.pp.neighbors and accessed from a scanpy object from adata.obsp[“connectivities”]

knn_k: int, optional

The k parameter for computing k-nearest neighbors

knn_mode: KnnMode, optional

The mode to compute the K-nearest neighbors (default: “binary”) “binary” or “connectivity”

See also

scanpy.pp.neigbors
auto_bkg_components(n_points, weights_transform=None)[source]#
auto_n_components(coords, weights=None, indices=None, min_points_fraction=0.95)[source]#
background_n_components_init(weights_transform=None, min_points=10, n_reps=30)[source]#
background_pdf(n_comp=None, reps=10, weights_transform=None, force_refresh=False)[source]#
bic_score(gmm1, gene_prior)[source]#
static calc_lratio(f1, ix, log_bkg_pdf, sample_size, eps=1e-300)[source]#
Per-cell LRT contribution on expressing cells:

-2 * sum_{i in ix} ( log f0(i) - log f1(i) ) / sample_size

property cell_dist: ndarray[source]#
depletion_pval_scan(gmm1, gene_prior, *, lambda_values=None, soft_bound=None, min_p0_abs=0.1, min_expected=30, min_abs_deficit=0.02, n_trials_cap=500, weight_mode='binary', p_floor=1e-12, n_eff_scale=0.6, rho_bb=0.02, eps_rel=0.01, debug=False, debug_store_masks=False, debug_max_cells=5000)[source]#
estimate_null_parameters(fractions=None, n_reps=50)[source]#

Estimate LTST null mean/std as a function of expression fraction p using random pseudo-genes:

  • pick random expressing cells at frequency p

  • fit signal GMM with the same pipeline as real genes

  • compute LTST exactly as in gmm_scan_new

fit_wgmm(n_comp, weights=None) WGMM[source]#
get_gene_prior(i_gene, weights_transform)[source]#
get_genes_indices(genes)[source]#
gmm_local_pvalue(genes=None, n_comp=None, weights_transform=None, alpha=0.05, n_inits=100, normalize_knn=True, eps=1e-12)[source]#
gmm_local_scan(genes=None, weights_transform=None, zscore_thresh=None, max_freq=0.5)[source]#
gmm_loglikelihoodtest(genes=None, weights_transform=None, max_freq=0.5)[source]#
gmm_scan(genes: list[str] | None = None, weights_transform: Callable | None = None, zscore_thresh: float = None, max_freq: float = 0.9, verbose: bool = False, n_bootstrap_inits: int = None, rc_lambda_values: list | None = None, rc_min_p0_abs: float = 0.1, rc_min_expected: int = 3, rc_min_abs_deficit: float = 0.04, rc_n_trials_cap: float = None, rc_soft_bound: float = 1.0, rc_n_eff_scale: float = 0.6, rc_p_floor: float = 1e-12, rc_rho_bb: float = 0.02, rc_weight_mode: str = 'binary', rc_eps_rel: float = 0.01, include_depletion_scan: bool = False) dict[str, LocatResult][source]#

Runs Locat and identifies Localized genes

Parameters:
genes: list[str] | None, optional

If specified, only analyze the given list of genes

weights_transform: Callable, optional

If specified, call this function to normalize the data

zscore_thresh: float, optional

The z_score threshold to use when keeping localized genes

max_freq: float, optional

The maximum fraction of cells allowed to express the gene

verbose: bool, optional

If True, prints to the standard output

n_bootstrap_inits: int, optional

The number of initializations used in bootstrapping (default:50)

rc_lambda_values: list[float], optional

If not specified, a default is used

rc_min_p0_abs: float, optional

The minimum proportion of f0 density in depleted region required for the region pval to be estimated

rc_min_expected: int, optional

The minimum expected cells in depleted region required for the region pval to be estimated

rc_min_abs_deficit: float, optional

The minimum absolute difference in f1(x) - f0(x) for all x in depleted region

rc_n_trials_cap: float, optional

If None, defaults to sqrt(n_cells)

rc_soft_bound: float, optional

The minimum value allowed for pvals

rc_n_eff_scale: float, optional

The scaling factor for effective sample sizes – can be tweaked to stabilize pvals across various gene sample sizes

rc_p_floor: float, optional

The minimum p-value to use (default: 1e-12)

rc_rho_bb: float, optional

The strength of the beta binomial (0.0 is standard binomial, set at 0.02-0.05 for wider tails, default: 0.02)

rc_weight_mode: str, optional

The mode to compute the K-nearest neighbors (default: “binary”) “binary” or “connectivity”

rc_eps_rel: float, optional

The rc_eps_rel

include_depletion_scan: bool, optional

If True, If True, adds the depletion scan to the output for debugging purposes

Returns:
dict[str, LocatResult]

A dictionary containing the LocatResult for each gene

init_rng(seed: int = 0)[source]#

Initialized the random number generator.

Parameters:
seed: int, optional

The seed to use

knn()[source]#

Return a KNN adjacency/connectivity matrix.

If a KNN was provided at init (or via set_knn), returns it. Otherwise computes one from the embedding.

localization_pval_dep_scan(*args, **kwargs)[source]#
property min_dist: float[source]#
random_pdf(weights, n_comp=None, n_inits=300, buckets=None)[source]#
reg_covar(sample_size=None)[source]#
set_knn(knn)[source]#

Store a precomputed KNN graph.

Accepts:
  • scipy sparse (csr/csc/coo) adjacency or connectivities

  • dense numpy array adjacency/connectivities

Expected shape: (n_cells, n_cells)

show_progress(show_progress=True)[source]#
signal_gmm(weights, n_comp=None)[source]#
signal_pdf(weights, n_comp=None)[source]#
class locat.locat.LOCATNullDistribution(mean_func, std_func)[source]#

Bases: object

Attributes:
mean
std

Methods

from_estimates

to_zscore

__init__(mean_func, std_func)[source]#
classmethod from_estimates(p, means, stds)[source]#
mean = None[source]#
std = None[source]#
to_zscore(raw_score, p)[source]#
locat.locat.cauchy_combine(pvals, weights=None)[source]#

Robust p-value combiner for dependent tests (Cauchy combination). Liu & Xie (2020).

locat.locat.localization_pvalue_nn_func(x1, f1, f0, nn)[source]#

Sparse-safe rewrite of the original localization_pvalue_nn_func.

Preserves:
  • i1/obs1/obs2 definitions

  • n and o as neighborhood (weighted) counts / signed balance

  • f2 weighting for global mu_hat, p1, p0

  • effective_n, sd_hat

  • per-cell p via normal_sf(o[i], mu_hat, sd_hat[i])

Works for:
  • nn sparse CSR/CSC/COO

  • nn dense numpy array

locat.locat.logsidak_from_logp(logp_min: float, m_eff: int) float[source]#
Sidák combine in log-space:
log p_sidák = log(1 - (1 - p_min)^m_eff)

= log(1 - exp(m_eff * log(1 - p_min)))

with log(1 - p_min) = log1p(-exp(logp_min)).

This preserves the original locat_condensed depletion-scan behavior, where the Sidák correction is applied for ordinary negative log-p values and only skipped when there is effectively a single test.

locat.locat.ltst_score_func(f0, f1, p)[source]#
locat.locat.normal_sf(x, mu, sigma)[source]#

Normal survival function SF = 1 - CDF, numba-jitted.

locat.locat.sens_score_func(f0, f1, i)[source]#
locat.locat.smooth_qvals(x)[source]#
locat.locat.summarize_rc_debug(cs_res, top=8)[source]#

Convenience helper to inspect per-threshold diagnostics from depletion_pval_scan(…, debug=True).