Run MOSAIC Multi-Omics Co-Embedding

The main entry point of the MOSAIC framework. For each sample, MOSAIC constructs a sample-specific coupling matrix that captures intra- and cross-modality feature interactions using cosine similarity. It then performs spectral decomposition on each per-sample coupling matrix, aggregates the resulting projection matrices across all samples, and applies a second-level spectral decomposition to obtain a shared latent space. Finally, every sample's coupling matrix is projected into this shared space, yielding a per-sample feature embedding that can be used for downstream differential connectivity analysis, subgroup detection, or clinical prediction.

Usage

run_MOSAIC(
  seurat_list,
  assays = NULL,
  sample_meta = "sample_id",
  condition_meta = "condition",
  n_eigen = 50,
  verbose = TRUE
)

Arguments

seurat_list

A named list of 1 to 3 Seurat objects, one per modality. Each object must contain cells from all samples, with sample membership stored in the metadata column specified by sample_meta. The data should be normalized (e.g. via Seurat::NormalizeData) but does not need to be scaled; MOSAIC will call Seurat::ScaleData internally. Examples:

One modality: list(RNA = seurat_rna)
Two modalities: list(RNA = seurat_rna, ATAC = seurat_atac)
Three modalities: list(RNA = seurat_rna, ATAC = seurat_atac, ADT = seurat_adt)

assays

Character vector of assay names to use for each Seurat object, in the same order as seurat_list. For example, c("RNA", "ATAC") for a two-modality analysis. If NULL (default), the default assay of each Seurat object is used.

sample_meta

Character string specifying the column name in the Seurat metadata that contains sample (individual) identifiers. All Seurat objects in seurat_list must share the same sample IDs in this column. Default: "sample_id".

condition_meta

Character string specifying the column name in the Seurat metadata that contains condition or group labels (e.g. "control" vs "disease"). Used to build the annotation table returned in the output. Default: "condition".

n_eigen

Integer specifying the number of eigenvalues/eigenvectors to compute in each spectral decomposition step. A larger value retains more spectral information but increases computation time. The kneedle algorithm (find_elbow_kneedle) is applied to automatically select the effective dimensionality from the top n_eigen eigenvalues. Default: 50.

verbose

Logical. If TRUE (default), print progress messages including the number of modalities and samples detected, per-sample processing status, and the selected ranks. Set to FALSE to suppress all messages.

Value

A list with the following elements:

mosaic_embed_list: A named list (one entry per sample) of feature embedding matrices. Each matrix has dimensions n_features x r, where n_features is the total number of features across all modalities (stacked) and r is the automatically selected latent dimensionality. Row i of each matrix is the embedding of feature i in that sample. Feature order is: all features from modality 1, then modality 2, then modality 3.
sample_eigen_list: A named list (one entry per sample) containing the per-sample eigendecomposition output ($eigen: an eigs_sym result with $values and $vectors; $r: the automatically selected rank for that sample).
annotation: A data frame with one row per sample. Row names are sample IDs and column Condition contains the condition labels from condition_meta.
coupling_matrix_list: A named list (one entry per sample) of the raw coupling matrices (features x features). Useful for inspecting per-sample feature interaction structure or for downstream neighborhood analysis.
eigenvalues: Numeric vector of length n_eigen containing the eigenvalues from the aggregated (second-level) spectral decomposition. Can be visualized with plot_eigen.

Details

The function automatically handles 1, 2, or 3 modalities based on the length of seurat_list. Each Seurat object should contain cells from multiple samples (individuals), identified by a shared metadata column (sample_meta). Condition labels (condition_meta) are extracted and returned in the annotation table for downstream analyses.

Examples

if (FALSE) { # \dontrun{
# ---- One modality (RNA only) ----
result <- run_MOSAIC(
  list(RNA = seurat_rna),
  assays = c("RNA"),
  sample_meta = "sample_id",
  condition_meta = "condition"
)

# Inspect eigenvalue spectrum
plot_eigen(result$eigenvalues)

# Per-sample feature embedding for first sample
head(result$mosaic_embed_list[[1]])

# ---- Two modalities (RNA + ATAC) ----
result <- run_MOSAIC(
  list(RNA = seurat_rna, ATAC = seurat_atac),
  assays = c("RNA", "ATAC"),
  sample_meta = "sample_id",
  condition_meta = "condition"
)

# ---- Three modalities (RNA + ATAC + ADT) ----
result <- run_MOSAIC(
  list(RNA = seurat_rna, ATAC = seurat_atac, ADT = seurat_adt),
  assays = c("RNA", "ATAC", "ADT"),
  sample_meta = "sample_id",
  condition_meta = "condition"
)

# ---- Downstream: Differential Connectivity ----
dc <- run_DC_test(
  result$mosaic_embed_list,
  n_sample = length(result$mosaic_embed_list),
  groups = result$annotation$Condition
)
} # }

Usage

Arguments

Value

Details

See also

Examples