Publication / Coherent Cross-modal Generation of Synthetic Biomedical Data to Advance Multimodal Precision Medicine

Qualitative comparison of real and generated data manifolds. UMAP projections of multimodal embeddings from the test set. The distribution of ground-truth data (Left) is compared with data reconstructed by the multi-condition model (Middle) and the Coherent Denoising ensemble (Right). Each point is a patient sample, colored by cancer type. Both generative approaches successfully capture the global topology and distinct cancer-type clustering of the original data.

Integration of multimodal, multi-omics data is critical for advancing precision medicine, yet its application is frequently limited by incomplete datasets where one or more modalities are missing. To address this challenge, we developed a generative framework capable of synthesizing any missing modality from an arbitrary subset of available modalities. We introduce Coherent Denoising, a novel ensemble-based generative diffusion method that aggregates predictions from multiple specialized, single-condition models and enforces consensus during the sampling process. We compare this approach against a multi-condition, generative model that uses a flexible masking strategy to handle arbitrary subsets of inputs. The results show that our architectures successfully generate high-fidelity data that preserve the complex biological signals required for downstream tasks. We demonstrate that the generated synthetic data can be used to maintain the performance of predictive models on incomplete patient profiles and can leverage counterfactual analysis to guide the prioritization of diagnostic tests. We validated the framework’s efficacy on a large-scale multimodal, multi-omics cohort from The Cancer Genome Atlas (TCGA) of over 10,000 samples spanning across 20 tumor types, using data modalities such as copy-number alterations (CNA), transcriptomics (RNA-Seq), proteomics (RPPA), and histopathology (WSI). This work establishes a robust and flexible generative framework to address sparsity in multimodal datasets, providing a key step toward improving precision oncology.

Read the full article here.

Next
Next

Publication / Precision-Engineered Plasmonic Nanostar Arrays for High-Performance SERS Sensing