The core function for MOSAIC's differential connectivity (DC) analysis.
For each feature, this function collects the feature's embedding vector
across all samples, builds a sample-by-sample distance matrix from these
vectors, and then runs PERMANOVA (vegan::adonis2) to test whether
the embedding structure differs significantly between conditions. It also
computes silhouette scores to quantify the separation between condition
groups in the feature's embedding space.
Arguments
- mosaic_embed_list
Named list of per-sample projected feature matrices (features x latent dims), as returned by
run_MOSAIC()$mosaic_embed_list. Each element is a matrix where row i is feature i's embedding in that sample.- n_sample
Integer. Number of samples (length of the input list).
- groups
Character or factor vector of condition labels, one per sample, in the same order as the names of
mosaic_embed_list. Typically obtained fromresult$annotation$Condition.- n_cores
Integer. Number of parallel cores to use. If
NULL(default), usesparallel::detectCores() - 1.- dist_method
Character string specifying the distance metric for the per-feature sample distance matrix:
"euclidean"(default): Euclidean distance."cosine": Cosine dissimilarity (1 - cosine similarity).
Value
A list with the following elements:
pvalue_listNumeric vector of PERMANOVA p-values, one per feature.
r2_listNumeric vector of PERMANOVA R-squared values (effect sizes), one per feature.
F_stats_listNumeric vector of PERMANOVA F-statistics, one per feature. Used as the test statistic for empirical p-value calibration.
silhouette_score_listList of average silhouette scores per feature.
similarity_matrix_listList of per-feature sample-by-sample similarity/distance matrices. Each matrix has dimension n_valid_samples x n_valid_samples (after zero-row removal).
group_listList of condition label vectors per feature (after zero-row removal).
Details
Samples where a feature has an all-zero embedding (e.g. due to dropout) are
automatically excluded on a per-feature basis. The function runs in parallel
across features using the foreach / doParallel framework.
To obtain calibrated p-values, run this function on both the real data and a
label-shuffled null, then compare the F-statistics using
calculate_empirical_pvalue.
See also
run_MOSAIC to generate the input embedding,
calculate_empirical_pvalue to compute calibrated p-values
by comparing observed F-statistics against a shuffled null distribution.
Examples
if (FALSE) { # \dontrun{
# Step 1: Run MOSAIC embedding
result <- run_MOSAIC(
list(RNA = seurat_rna, ADT = seurat_adt),
assays = c("SCT", "ADT"),
sample_meta = "sample_id",
condition_meta = "time"
)
# Step 2: Run DC analysis on real data
n_sample <- length(result$mosaic_embed_list)
dc_result <- run_DC_test(
result$mosaic_embed_list,
n_sample = n_sample,
groups = result$annotation$Condition
)
# Step 3: Run on shuffled labels for null distribution
shuffle_dc <- run_DC_test(
shuffle_mosaic$mosaic_embed_list,
n_sample = n_sample,
groups = shuffle_mosaic$annotation$Condition
)
# Step 4: Compute empirical p-values
F_obs <- unlist(dc_result$F_stats_list)
F_null <- unlist(shuffle_dc$F_stats_list)
pvalues <- sapply(F_obs, function(x)
calculate_empirical_pvalue(x, F_null))
# Features with significant DC
dc_features <- which(pvalues < 0.05)
} # }