The main entry point of the MOSAIC framework. For each sample, MOSAIC constructs a sample-specific coupling matrix that captures intra- and cross-modality feature interactions using cosine similarity. It then performs spectral decomposition on each per-sample coupling matrix, aggregates the resulting projection matrices across all samples, and applies a second-level spectral decomposition to obtain a shared latent space. Finally, every sample's coupling matrix is projected into this shared space, yielding a per-sample feature embedding that can be used for downstream differential connectivity analysis, subgroup detection, or clinical prediction.
Usage
run_MOSAIC(
seurat_list,
assays = NULL,
sample_meta = "sample_id",
condition_meta = "condition",
n_eigen = 50,
verbose = TRUE
)Arguments
- seurat_list
A named list of 1 to 3 Seurat objects, one per modality. Each object must contain cells from all samples, with sample membership stored in the metadata column specified by
sample_meta. The data should be normalized (e.g. viaSeurat::NormalizeData) but does not need to be scaled; MOSAIC will callSeurat::ScaleDatainternally. Examples:One modality:
list(RNA = seurat_rna)Two modalities:
list(RNA = seurat_rna, ATAC = seurat_atac)Three modalities:
list(RNA = seurat_rna, ATAC = seurat_atac, ADT = seurat_adt)
- assays
Character vector of assay names to use for each Seurat object, in the same order as
seurat_list. For example,c("RNA", "ATAC")for a two-modality analysis. IfNULL(default), the default assay of each Seurat object is used.- sample_meta
Character string specifying the column name in the Seurat metadata that contains sample (individual) identifiers. All Seurat objects in
seurat_listmust share the same sample IDs in this column. Default:"sample_id".- condition_meta
Character string specifying the column name in the Seurat metadata that contains condition or group labels (e.g. "control" vs "disease"). Used to build the annotation table returned in the output. Default:
"condition".- n_eigen
Integer specifying the number of eigenvalues/eigenvectors to compute in each spectral decomposition step. A larger value retains more spectral information but increases computation time. The kneedle algorithm (
find_elbow_kneedle) is applied to automatically select the effective dimensionality from the topn_eigeneigenvalues. Default:50.- verbose
Logical. If
TRUE(default), print progress messages including the number of modalities and samples detected, per-sample processing status, and the selected ranks. Set toFALSEto suppress all messages.
Value
A list with the following elements:
mosaic_embed_listA named list (one entry per sample) of feature embedding matrices. Each matrix has dimensions n_features x r, where n_features is the total number of features across all modalities (stacked) and r is the automatically selected latent dimensionality. Row i of each matrix is the embedding of feature i in that sample. Feature order is: all features from modality 1, then modality 2, then modality 3.
sample_eigen_listA named list (one entry per sample) containing the per-sample eigendecomposition output (
$eigen: aneigs_symresult with$valuesand$vectors;$r: the automatically selected rank for that sample).annotationA data frame with one row per sample. Row names are sample IDs and column
Conditioncontains the condition labels fromcondition_meta.coupling_matrix_listA named list (one entry per sample) of the raw coupling matrices (features x features). Useful for inspecting per-sample feature interaction structure or for downstream neighborhood analysis.
eigenvaluesNumeric vector of length
n_eigencontaining the eigenvalues from the aggregated (second-level) spectral decomposition. Can be visualized withplot_eigen.
Details
The function automatically handles 1, 2, or 3 modalities based on the length
of seurat_list. Each Seurat object should contain cells from multiple
samples (individuals), identified by a shared metadata column
(sample_meta). Condition labels (condition_meta) are extracted
and returned in the annotation table for downstream analyses.
See also
find_elbow_kneedle for the dimensionality selection algorithm,
run_DC_test for differential
connectivity testing on the output,
compute_module_similarity for subgroup detection,
plot_eigen for visualizing the eigenvalue spectrum.
Examples
if (FALSE) { # \dontrun{
# ---- One modality (RNA only) ----
result <- run_MOSAIC(
list(RNA = seurat_rna),
assays = c("RNA"),
sample_meta = "sample_id",
condition_meta = "condition"
)
# Inspect eigenvalue spectrum
plot_eigen(result$eigenvalues)
# Per-sample feature embedding for first sample
head(result$mosaic_embed_list[[1]])
# ---- Two modalities (RNA + ATAC) ----
result <- run_MOSAIC(
list(RNA = seurat_rna, ATAC = seurat_atac),
assays = c("RNA", "ATAC"),
sample_meta = "sample_id",
condition_meta = "condition"
)
# ---- Three modalities (RNA + ATAC + ADT) ----
result <- run_MOSAIC(
list(RNA = seurat_rna, ATAC = seurat_atac, ADT = seurat_adt),
assays = c("RNA", "ATAC", "ADT"),
sample_meta = "sample_id",
condition_meta = "condition"
)
# ---- Downstream: Differential Connectivity ----
dc <- run_DC_test(
result$mosaic_embed_list,
n_sample = length(result$mosaic_embed_list),
groups = result$annotation$Condition
)
} # }