locat.preprocessing module#
- locat.preprocessing.DEFAULT_PSEUDO_PATTERNS = ['^Gm\\d+', 'Rik$', '^AC\\d+', '^AA\\d+', '^A[0-9]{6,}', '^Mir\\d+', '^Rpl\\d*-\\d+', '^Rps\\d*-\\d+', '^Linc'][source]#
Default regex patterns for pseudogene-like gene symbols.
- locat.preprocessing.filter_genes(adata, pseudo_patterns=None, min_cell_frac=0.01)[source]#
Filter genes from an AnnData object.
Removes pseudogene-like symbols matched by pseudo_patterns and genes expressed in fewer than min_cell_frac of cells.
- Parameters:
- adata:
AnnData object whose
.var_namesare gene symbols and.Xis an expression matrix (dense or sparse).- pseudo_patterns:
List of regex patterns identifying pseudogene-like symbols. Defaults to
DEFAULT_PSEUDO_PATTERNS.- min_cell_frac:
Minimum fraction of cells that must express a gene (
X > 0) for it to be retained. Default is 0.01 (1 %).
- Returns:
- AnnData
A filtered copy of adata.
- locat.preprocessing.get_embedding(adata, key='X_pca', n_dims=None)[source]#
Extract a cell embedding from an AnnData object as a float64 array.
- Parameters:
- adata:
AnnData object with the embedding stored in
obsm.- key:
Key in
adata.obsmto extract (default:"X_pca").- n_dims:
Number of dimensions to keep. If None, all dimensions are returned.
- Returns:
- np.ndarray
2-D float64 array of shape
(n_cells, n_dims).