Find a Balanced Two-Way Partition via Hierarchical Clustering
Source:R/subgroup.R
find_partition_hclust.RdPerforms hierarchical clustering on a distance matrix and cuts the
dendrogram into two groups. A partition is considered "balanced" if both
groups contain at least max(3, ceiling(n * min_group_frac)) samples.
For balanced partitions, the average silhouette width is computed to
quantify separation quality.
Arguments
- dist_mat
A
distobject representing pairwise distances between samples. Typically obtained from a module similarity matrix viaas.dist(1 - sim_mat).- min_group_frac
Numeric value between 0 and 0.5 specifying the minimum fraction of total samples required in each group. The actual minimum group size is
max(3, ceiling(n_samples * min_group_frac)). Default:0.25.
Value
A list with:
groupsNamed integer vector of group assignments (1 or 2), one per sample.
NULLif no balanced partition was found.silhouetteAverage silhouette width (higher = better separation).
NAif the partition is imbalanced.hclustThe
hclustobject, which can be passed topheatmapfor consistent dendrogram ordering.balancedLogical indicating whether a balanced partition was found.
Details
This function is used in MOSAIC's subgroup detection pipeline to test
whether a feature module defines meaningful patient subtypes: compute a
module similarity matrix with compute_module_similarity,
convert to distance, and test for a balanced partition here.
See also
compute_module_similarity to create the input
similarity matrix, plot_mds_cluster to visualize subgroups.
Examples
if (FALSE) { # \dontrun{
# Compute module similarity
sim_mat <- compute_module_similarity(
result$mosaic_embed_list,
feature_idx = which(feature_clusters == module_id)
)
# Test for subgroups
partition <- find_partition_hclust(as.dist(1 - sim_mat))
if (partition$balanced) {
cat("Silhouette:", partition$silhouette, "\n")
cat("Group sizes:", table(partition$groups), "\n")
# Visualize
plot_mds_cluster(sim_mat, "Subgroups",
cluster = partition$groups)
}
# Permutation test for significance
n_perm <- 1000
null_sils <- numeric(n_perm)
for (p in seq_len(n_perm)) {
rand_idx <- sample(total_features, length(module_idx))
rand_sim <- compute_module_similarity(projected_list, rand_idx)
rand_part <- find_partition_hclust(as.dist(1 - rand_sim))
null_sils[p] <- ifelse(rand_part$balanced, rand_part$silhouette, 0)
}
pval <- (sum(null_sils >= partition$silhouette) + 1) / (n_perm + 1)
} # }