Creates synthetic multi-modal single-cell data following the generative model
X = U * S * t(V), where U is a sample-specific cell embedding matrix, S is a
diagonal singular value matrix, and V is a sample-specific feature loading
matrix. The number of modalities (1 to 3) is determined by the length of
n_features. The simulation generates 20 samples in 2 conditions
(10 per condition), with 10 latent cell types.
Usage
simulate_multimodal_data(
simulation_type = "DC",
n_features = c(1000, 800, 600),
signal_prop = 0.2,
fold_change = 2,
seed = 1,
rescale = TRUE,
sample_noise_level = 0.5,
feature_noise_level = 0.1,
cell_noise_level = 0.1,
nonlinear = FALSE,
singular_values = NULL,
add_batch = TRUE
)Arguments
- simulation_type
Character.
"DC"for differential connectivity or"DE"for differential expression.- n_features
Integer vector specifying the number of features per modality. Its length determines the number of modalities (1 to 3). Default:
c(1000, 800, 600)(3 modalities).- signal_prop
Numeric between 0 and 1. Proportion of features affected by DC or DE signal in each modality. Default:
0.2.- fold_change
Numeric. Fold change applied to affected features in condition B. Only used when
simulation_type = "DE". Default:2.- seed
Integer. Random seed for reproducibility.
- rescale
Logical. For DC simulation only: if
TRUE(default), rescale mean expression of DC features in condition B to match condition A, ensuring a pure connectivity signal without abundance change.- sample_noise_level
Numeric. Standard deviation of Gaussian noise added to each sample's feature loading matrix. Controls between-sample variability. Default:
0.5.- feature_noise_level
Numeric. Standard deviation of Gaussian noise around feature cluster centers in the loading space. Controls how tight feature clusters are. Default:
0.1.- cell_noise_level
Numeric. Standard deviation of Gaussian noise around cell type cluster centers in the cell embedding space. Controls cell type separation. Default:
0.1.- nonlinear
Logical. If
TRUE, apply a sigmoid transformation (1 / (1 + exp(-x))) to the expression matrices after the linear generative step. Simulates nonlinear gene regulation. Default:FALSE.- singular_values
Numeric vector of length
r(latent rank, default 10) specifying the diagonal entries of the singular value matrix. IfNULL(default), usesseq(5, 1, length.out = 10).- add_batch
Logical. If
TRUE(default), add per-feature batch-specific scale and shift effects across 4 batches.
Value
A list with:
seurat_listA named list of Seurat objects, one per modality (named
"modality_1","modality_2", etc.). Each uses assay"originalexp". Feature metadata includesfeature_cluster(integer 1-10) andis_de(logical). Cell metadata includessample_id,condition("A" or "B"),batch, andtrue_cell_type.raw_matrix_listA named list (one per modality) of lists containing 20 raw expression matrices (cells x features), one per sample.
sample_metadataData frame with columns
sample_id,condition, andbatch.
Details
Two simulation modes are supported:
DC (Differential Connectivity): Feature loadings in condition B are permuted across clusters for a fraction of features, rewiring their inter-feature relationships while optionally preserving mean expression.
DE (Differential Expression): A multiplicative fold change is applied to a fraction of features in condition B, altering abundance without changing connectivity.
The returned Seurat objects include per-feature metadata (feature_cluster
and is_de) accessible via
seurat_obj[["originalexp"]][[]]$feature_cluster, which provides ground
truth for benchmarking.
See also
run_MOSAIC to run the MOSAIC pipeline on the
simulated data.
Examples
if (FALSE) { # \dontrun{
# --- Simulate 2-modality DC data with nonlinearity ---
sim <- simulate_multimodal_data(
simulation_type = "DC",
n_features = c(1000, 800),
signal_prop = 0.2,
seed = 42,
nonlinear = TRUE
)
# Ground truth
is_dc <- sim$seurat_list[[1]][["originalexp"]][[]]$is_de
# Run MOSAIC
result <- run_MOSAIC(
sim$seurat_list,
assays = rep("originalexp", 2),
sample_meta = "sample_id",
condition_meta = "condition"
)
# --- Simulate 3-modality DE data ---
sim_de <- simulate_multimodal_data(
simulation_type = "DE",
n_features = c(1000, 800, 600),
signal_prop = 0.3,
fold_change = 3,
seed = 123
)
# --- Simulate 1-modality with custom singular values ---
sim_1mod <- simulate_multimodal_data(
n_features = c(500),
singular_values = seq(10, 1, length.out = 10),
seed = 1
)
} # }