Skip to contents

The core function for MOSAIC's differential connectivity (DC) analysis. For each feature, this function collects the feature's embedding vector across all samples, builds a sample-by-sample distance matrix from these vectors, and then runs PERMANOVA (vegan::adonis2) to test whether the embedding structure differs significantly between conditions. It also computes silhouette scores to quantify the separation between condition groups in the feature's embedding space.

Usage

run_DC_test(
  mosaic_embed_list,
  n_sample,
  groups,
  n_cores = NULL,
  dist_method = "euclidean"
)

Arguments

mosaic_embed_list

Named list of per-sample projected feature matrices (features x latent dims), as returned by run_MOSAIC()$mosaic_embed_list. Each element is a matrix where row i is feature i's embedding in that sample.

n_sample

Integer. Number of samples (length of the input list).

groups

Character or factor vector of condition labels, one per sample, in the same order as the names of mosaic_embed_list. Typically obtained from result$annotation$Condition.

n_cores

Integer. Number of parallel cores to use. If NULL (default), uses parallel::detectCores() - 1.

dist_method

Character string specifying the distance metric for the per-feature sample distance matrix:

  • "euclidean" (default): Euclidean distance.

  • "cosine": Cosine dissimilarity (1 - cosine similarity).

Value

A list with the following elements:

pvalue_list

Numeric vector of PERMANOVA p-values, one per feature.

r2_list

Numeric vector of PERMANOVA R-squared values (effect sizes), one per feature.

F_stats_list

Numeric vector of PERMANOVA F-statistics, one per feature. Used as the test statistic for empirical p-value calibration.

silhouette_score_list

List of average silhouette scores per feature.

similarity_matrix_list

List of per-feature sample-by-sample similarity/distance matrices. Each matrix has dimension n_valid_samples x n_valid_samples (after zero-row removal).

group_list

List of condition label vectors per feature (after zero-row removal).

Details

Samples where a feature has an all-zero embedding (e.g. due to dropout) are automatically excluded on a per-feature basis. The function runs in parallel across features using the foreach / doParallel framework.

To obtain calibrated p-values, run this function on both the real data and a label-shuffled null, then compare the F-statistics using calculate_empirical_pvalue.

See also

run_MOSAIC to generate the input embedding, calculate_empirical_pvalue to compute calibrated p-values by comparing observed F-statistics against a shuffled null distribution.

Examples

if (FALSE) { # \dontrun{
# Step 1: Run MOSAIC embedding
result <- run_MOSAIC(
  list(RNA = seurat_rna, ADT = seurat_adt),
  assays = c("SCT", "ADT"),
  sample_meta = "sample_id",
  condition_meta = "time"
)

# Step 2: Run DC analysis on real data
n_sample <- length(result$mosaic_embed_list)
dc_result <- run_DC_test(
  result$mosaic_embed_list,
  n_sample = n_sample,
  groups = result$annotation$Condition
)

# Step 3: Run on shuffled labels for null distribution
shuffle_dc <- run_DC_test(
  shuffle_mosaic$mosaic_embed_list,
  n_sample = n_sample,
  groups = shuffle_mosaic$annotation$Condition
)

# Step 4: Compute empirical p-values
F_obs <- unlist(dc_result$F_stats_list)
F_null <- unlist(shuffle_dc$F_stats_list)
pvalues <- sapply(F_obs, function(x)
  calculate_empirical_pvalue(x, F_null))

# Features with significant DC
dc_features <- which(pvalues < 0.05)
} # }