Skip to contents

Given a set of feature indices defining a module (e.g. a cluster of co-regulated features), computes a sample-by-sample cosine similarity matrix based on the concatenated feature embeddings within that module. This captures how similar two samples are in terms of the connectivity structure of the module's features.

Usage

compute_module_similarity(projected_list, feature_idx)

Arguments

projected_list

Named list of per-sample projected feature matrices (features x latent dims), as returned by run_MOSAIC()$mosaic_embed_list. Each element is a matrix where row i is feature i's embedding in that sample.

feature_idx

Integer vector of feature indices (row indices) belonging to the module. For example, which(feature_clusters == module_id).

Value

A symmetric numeric matrix of dimension n_samples x n_samples, with cosine similarity values (range -1 to 1). Row and column names correspond to sample names from projected_list.

Details

This is the first step in MOSAIC's unsupervised subgroup detection pipeline: compute module similarity, then test for a balanced partition with find_partition_hclust.

See also

find_partition_hclust for testing whether the similarity matrix reveals distinct subgroups, plot_mds_cluster for visualizing the sample similarity.

Examples

if (FALSE) { # \dontrun{
# Run MOSAIC
result <- run_MOSAIC(
  list(RNA = seurat_rna, ATAC = seurat_atac),
  assays = c("RNA", "GeneACT"),
  sample_meta = "sample_name",
  condition_meta = "condition"
)

# Suppose feature_clusters is a vector of module assignments
# Compute similarity for module 5
module_idx <- which(feature_clusters == 5)
sim_mat <- compute_module_similarity(
  result$mosaic_embed_list,
  feature_idx = module_idx
)

# Visualize
plot_mds_cluster(sim_mat, "Module 5",
                 cluster = result$annotation$Condition)
} # }