Advancing Biomedical Understanding with Multimodal Gemini

Published
View publication Download

Abstract

Recent advances in generative AI are revolutionizing how AI can be leveraged for positive global change.Google launched the Gemini family of natively multimodal models in December 2023. While medicine is a rapidly growing use case for generative AI, medical data is highly bespoke; given the many differences in medicine compared to those found in other fields, general purpose models do not automatically work well in the medical domain.In this work, we adapt the Gemini family of models for the clinical medical domain, allowing the models to interpret medical data across 2D and 3D radiology, histopathology, ophthalmology, dermatology and genomics. We introduce a new medically optimized suite of Gemini-based models, Mosaic, developed by fine-tuning on a set of 2.2 million data samples representing a diverse set of biomedical tasks.Using Mosaic, we demonstrate novel results in chest X-ray report generation, where 65% of examined AI reports were evaluated by expert radiologists to be of equivalent or better quality than the original radiologist reports, while 82% of reports were considered clinically acceptable. For 3D computed tomography imaging of the head, 48% of the AI generated reports were evaluated as of equivalent or better quality than expert-written reports, while 68% were considered clinically acceptable. This is the first work to date that demonstrates expert-level performance on 3D image report generation.Beyond human evaluation, the Mosaic models demonstrates strong performance across a diverse set of automated metrics. Mosaic achieves a RadGraph F1-score of 24.1% on chest X-ray report generation,demonstrating a 3.5%+ improvement over the previous best-in-class score. In chest X-ray classification (5 classes) on the MIMIC-CXR dataset, Mosaic achieves an accuracy of 82.6%, outperforming the previous best score by an absolute margin of 1.3%. In histopathology patch classification, Mosaic approaches the performance of models trained using orders of magnitude more training examples. Mosaic also achieves competitive performance across several visual question answering tasks across pathology and radiology.Beyond medical images, Mosaic is capable of replicating the predictive abilities of traditional polygenic risk scoring to predict the risk of various genetic-associated diseases both in and out of the training data distribution while using significantly less training data.

Authors

Lin Yang, Shawn Xu, Andrew Sellergren, Timo Kohlberger, Yuchen Zhou, Ira Ktena, Atilla Kiraly, Faruk Ahmed, Farhad Hormozdiari, Tiam Jaroensri, Eric Wang, Ellery Wulczyn, Fayaz Jamil, Theo Guidroz, Chuck Lau, Siyuan Qiao, Yun Liu, Akshay Goel, Kendall Park, Arnav Agharwal, Nick George, Yang Wang, Ryutaro Tanno, David G. T. Barrett, Wei-Hung Weng, S. Sara Mahdavi, Khaled Saab, Tao Tu, Sreenivasa Raju Kalidindi, Mozziyar Etemadi, Jorge Cuadros, Gregory Sorensen, Yossi Matias, Katherine Chou, Greg Corrado, Joelle Barral, Shravya Shetty, David Fleet, S. M. Ali Eslami, Daniel Tse, Shruthi Prabhakara, Cory McLean, Dave Steiner, Rory Pilgrim, Christopher Kelly, Shekoofeh Azizi, Daniel Golden

Venue

arXiv