Site icon Maverick Studios

Applying multimodal biological foundation models across therapeutics and patient care

Healthcare and life sciences decision making increasingly relies on multimodal data to diagnose diseases, prescribe medicine and predict treatment outcomes, develop and optimize innovative therapies accurately. Traditional approaches analyze fragmented data, such as ‘omics for drug discovery, medical images for diagnostics, clinical trial reports for validation, and electronic health records (EHR) for patient treatment. As a result, decision makers (CxOs, VPs, Directors) often miss critical insights hidden in the relationships between data types. Recent advancements in AI enable you to integrate and analyze these fragmented data streams efficiently to support a more complete understanding of therapeutics and patient care.

AWS provides a unified environment for multimodal biological foundation models (BioFMs), enabling you to make more confident, timely decision-making in personalized medicine. This AI system combines biological data, model development, scalable compute, and partner tools to support the drug development life cycle. In this post, we’ll explore how multimodal BioFMs work, showcase real-world applications in drug discovery and clinical development, and contextualize how AWS enables organizations to build and deploy multimodal BioFMs.

Multimodal biological foundation models

Biological foundation models (BioFMs) are AI models pre-trained on large biological datasets. BioFMs demonstrate advanced capabilities on specific healthcare and life sciences tasks. The commonly used BioFMs span drug discovery and clinical development domains, particularly in protein structure and molecule design (~20%), omics data analysis including DNA, epigenetic, and RNA (~30%), medical imaging (15%), and clinical documentation (~35%) (Delile et al. 2025).

Unimodal BioFMs are trained exclusively on a single data modality (for example, amino acid sequences) for relevant downstream applications like predicting protein structures; this breakthrough earned the 2024 Nobel Prize in Chemistry. Multimodal BioFMs train across multiple data types (text, audio, image, and video, hereafter “modalities”) and can simultaneously infer across different streams in a single model (for example, text prompts to generate new images or match images to captions).

Notable multimodal BioFM examples include:

  1. Latent Labs’ Latent-X1 and Latent-X2 not only predict 3D structures of proteins, but also generate novel binders like antibodies, macrocyclic peptides, and miniproteins and predict how they interact with targets.
  2. Arc Institute’s Evo 2 maps the central dogma of biology to interpret and predict the structure and function of DNA, RNA, and proteins.
  3. Insilco Medicine’s Nach01 integrates natural language, chemical intelligence, and 3D molecular structure data to accelerate drug discovery.
  4. Bioptimus’ M-Optimus decodes histology and clinical data for rich biological insights, supporting multiple stages from research to patient care.
  5. Harvard and AstraZeneca’s MADRIGAL integrates structural, pathway, cell viability, and transcriptomic data to predict drug combination clinical outcome, identify adverse interactions, and optimize polypharmacy management.
  6. John Snow Lab’s vision language model Medical VLM-24B processes clinical notes, lab reports, and imaging (X‑ray, MRI, CT) for unified, context‑aware diagnostics.
  7. GEHC’s 3D magnetic resonance imaging (MRI) foundation model, designed to enable developers to build applications for tasks such as image retrieval, classification, image segmentation, and report generation.

The multimodal advantage

The current frontier of models pushes the boundary of multimodal understanding and generation capabilities. General-purpose models like Amazon Nova 2 Omni can process text, images, video, and speech inputs while generating both text and images. This multimodality trend extends to BioFMs, where combining multiple data types like medical images and clinical documentation achieves higher predictive accuracy and broader applicability across diverse clinical outcomes (Siam et al. 2025).

Integrating diverse biological data types yields measurable performance gains:

BioFMs in action at AWS customers

These performance gains explain why leading biopharma organizations are increasingly adopting multimodal BioFMs. Leading biopharma organizations invest in BioFMs for analyzing biologic (Merck and Novo Nordisk), genomic (AstraZeneca), pathology (Bayer), and clinical (Roche) data. You can realize up to 50% in cost and time savings for drug development and up to 90% in time savings for medical image diagnosis when using these specialized AI models (State of the Art-ificial Intelligence 2025, Jeong et al. 2025). Multimodal BioFMs show promise in multiple stages of the healthcare and life sciences value chain (Figure 1).

Figure 1. Multimodal BioFMs integrate various biological data types (for example, protein, small molecule, omics, imaging, sensors, clinical documentation) to power applications across the drug development lifecycle (research, clinical development, manufacturing, commercial).

For a deeper dive, we’ve selected two use cases: drug discovery and clinical development.

Figure 2. Multimodal BioFMs integrate 3D protein structure, computational metrics, and biophysical measurements through iterative design-validation loops to accelerate therapeutic protein discovery for undruggable multidomain disease targets.

Figure 3. Multimodal BioFM approach combines sequencing, spatial transcriptomics, pathology, and patient records to simulate tumor microenvironments and prioritize patient subpopulations, potentially reducing early-phase trial failures

Solution: AWS environment for multimodal BioFMs

AWS provides a unified environment for building, training, and deploying multimodal BioFMs that help you convert healthcare and life science data into actionable insights. This environment comprises four layers: an AI solution for model development, a unified data foundation for biological data management, scalable infrastructure for compute and storage, and partner integrations that extend capabilities across the drug development lifecycle.

AWS Partner solutions and implementation support

You can deploy pre-built multimodal BioFMs from partners like NVIDIA directly through AWS. Combine these production-ready NVIDIA NIM microservices with AWS HIPAA-eligible imaging services, multimodal reasoning capabilities, and parallel genomics pipelines to build end-to-end discovery-to-clinic applications. Example partner multimodal BioFMs include:

You can consult with implementation partners like Loka, Deloitte, and Accenture on transitioning from proof-of-concept to production deployment for multimodal BioFMs use cases. These partners bring specialized expertise in bioinformatics, cloud architecture, and regulatory compliance to accelerate time-to-value. Visit the AWS Partner Network to explore additional qualified partners with healthcare and life sciences competencies.

Conclusion

Multimodal BioFMs are reimagining what we can discover about disease, treatment, and human health. By integrating omics data, medical imaging, and clinical information, these models reveal hidden insights that were previously difficult to detect through traditional methods. Decision makers can now make more accurate, confident decisions across disease diagnosis, treatment prediction, and therapeutic optimization.

AWS provides a unified environment to overcome the technical barriers of building and deploying multimodal BioFMs at scale. Rather than investing in fragmented, single-use AI solutions for each therapeutic area or clinical application, you can leverage reusable foundation models that adapt across therapeutics and patient care. This system reduces time-to-value while preserving the flexibility to adapt as new data sources and use cases emerge for multimodal BioFMs across therapeutics and patient care.

To learn more about using AWS for BioFM training or inference in a therapeutic or medical context, please contact an AWS Life Sciences representative.

Further reading


About the authors

Kristin Ambrosini

Kristin Ambrosini is a Generative AI Specialist in Healthcare and Life Sciences at Amazon Web Services. She leads go-to-market for BioFMs to accelerate drug discovery and improve patient care. She combines scientific expertise, technical fluency, and strategic insight to drive innovation across healthcare and life sciences. Kristin holds a Ph.D. in Biological Sciences and brings hands-on experience in DNA sequencing, cancer therapeutics, and viral diagnostics – giving her a unique lens into the challenges and opportunities multimodal BioFMs are built to solve.

Brian Loyal

Brian Loyal is a Principal AI/ML Solutions Architect in the Global Healthcare and Life Sciences team at Amazon Web Services. He has more than 20 years’ experience in biotechnology and machine learning and is passionate about using AI to improve human health and well-being.

Mike Tarselli

Mike Tarselli is a Specialist Leader in Healthcare and Life Sciences Data and AI at Amazon Web Services. He has spent more than 25 years in the biopharma industry. As a leader in AI and data strategy, he works with scientific and technical teams to help them realize their vision, while embracing the fast pace and enormity of AI.

Zheng Yang

Zheng Yang is the global Head of AI/ML Strategy for Healthcare and Life Sciences at AWS. He brings more than 25 years experience in AI/ML solution development across the life sciences value chain. Before AWS, Zheng architected holistic data solutions to accelerate new medicine launches and championed technology adoption in pharmaceutical research. He is passionate about using technology to transform patient care.

Exit mobile version