Research
I'm interested in vision language models, computer vision, and statistical machine learning.
|
|
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature
Alejandro Lozano*,
Min Woo Sun*,
James Burgess*,
Liangyu Chen,
Jeffrey J. Nirschl,
Jeffrey Gu,
Ivan Lopez,
Josiah Aklilu,
Anita Rau,
Austin Wolfgana Katzer,
Yuhui Zhang,
Collin Chiu,
Xiaohan Wang,
Alfred Seunghoon Song,
Robert Tibshirani,
Serena Yeung-Levy
CVPR 2025
arXiv
/
github
/
data
/
project page
*co-first authorship
We introduce BIOMEDICA, an open-source framework that transforms the PubMed Central Open Access subset into a comprehensive dataset of over 24 million image-text pairs with expert-guided annotations, enabling state-of-the-art performance in biomedical vision-language models across diverse tasks and domains.
|
|
regionalpcs: improved discovery of DNA methylation associations with complex traits
Tiffany Eulalio,
Min Woo Sun,
Olivier Gevaert,
Michael D. Greicius,
Thomas J. Montine,
Daniel Nachun,
Stephen B. Montgomery
Nature Communications, 2025 (featured in Nature Comm Editor's Highlight)
Nature Communications
/
github
/
Bioconductor
Functions to summarize DNA methylation data using regional principal components. Regional principal components are computed using principal components analysis within genomic regions to summarize the variability in methylation levels across CpGs.
|
|
Artificial Intelligence Identifies Factors Associated with Blood Loss and Surgical Experience in Cholecystectomy
Josiah G. Aklilu,
Min Woo Sun,
Shelly Goel,
Sebastiano Bartoletti,
Anita Rau,
Griffin Olsen,
Kay S. Hung,
Sophie L. Mintz,
Vicki Luong,
Arnold Milstein,
Mark J. Ott,
Robert Tibshirani,
Jeffrey K. Jopling,
Eric C. Sorenson,
Dan E. Azagury,
Serena Yeung-Levy
NEJM AI, 2024
NEJM AI
/
github
We developed a computer vision model to analyze laparoscopic surgery videos, identifying fine-grained surgical actions linked to operative blood loss and surgeon experience.
|
|
CoCA: Cooperative Component Analysis
Daisy D. Ying*,
Alden Green*,
Min Woo Sun,
Robert Tibshirani
arXiv, 2024
arXiv
Cooperative Component Analysis (CoCA) is an unsupervised multi-view analysis method that balances within-view variance preservation and cross-view correlation through a trade-off between approximation error and agreement penalty. It generalizes PCA and CCA, with a sparse variant enabling feature selection, and demonstrates effectiveness in integrating multiomics data for disease progression prediction.
|
|
Intraoperative Evaluation of Breast Tissues During Breast Cancer Operations Using the MasSpec Pen
Kyana Y. Garza,
Mary E. King,
Chandandeep Nagi,
Rachel J. DeHoog,
Jialing Zhang,
Marta Sans,
Anna Krieger,
Clara L. Feider,
Alena V. Bensussan,
Michael F. Keating,
John Q. Lin,
Min Woo Sun,
Robert Tibshirani,
Christopher Pirko,
Kirtan A. Brahmbhatt,
Ahmed R. Al-Fartosi,
Alastair M. Thompson,
Elizabeth Bonefas,
James Suliburk,
Stacey A. Carter,
Livia S. Eberlin
JAMA Network, 2024
JAMA Network
Molecular data from mass spectrometry were used to build classifiers, achieving high diagnostic accuracy when compared to pathology results, highlighting its potential for real-time surgical guidance.
|
|
Confidence intervals for the Cox model test error from cross-validation
Min Woo Sun,
Robert Tibshirani
Statistics in Medicine, 2023
Statistics in Medicine
/
arXiv
/
github
Cross-validation (CV) can underestimate test error variance due to correlated error estimates from using the same samples for training and testing. Nested CV mitigates this issue by providing more accurate coverage through improved error variance estimation, which this work extends to the Cox proportional hazards model.
|
|
Public health factors help explain cross country heterogeneity in excess death during the COVID19 pandemic
Min Woo Sun*,
David Troxell*,
Robert Tibshirani
Nature Scientific Reports, 2023
Nature Scientific Reports
/
github
The COVID-19 pandemic has caused 14.9 million excess deaths globally, with prior studies linking cross-country differences in COVID-19 deaths to demographic and health factors. This analysis extends the scope by incorporating government policies, showing the critical role of public health efforts in reducing excess deaths.
|
|
Improved Digital Therapy for Developmental Pediatrics Using Domain-Specific Artificial Intelligence: Machine Learning Study
Peter Washington,
Haik Kalantarian,
John Kent,
Arman Husic,
Aaron Kline,
Emilie Leblanc,
Cathy Hou,
Onur Cezmi Mutlu,
Kaitlyn Dunlap,
Yordan Penev,
Maya Varma,
Nate Tyler Stockham,
Brianna Chrisman,
Kelley Paskov,
Min Woo Sun,
Jae-Yoon Jung,
Catalin Voss,
Nick Haber,
Dennis Paul Wall,
JMIR Pediatrics, 2022
JMIR Pediatrics
A therapeutic smartphone game, GuessWhat, was used to collect and label a large dataset of child emotion expressions, enabling the training of a CNN classifier with significantly improved accuracy (up to 79.1% balanced accuracy) compared to prior models. This approach shows the potential of gamified data collection for enhancing emotion recognition in pediatric digital health care.
|
|
Game theoretic centrality: a novel approach to prioritize disease candidate genes by combining biological networks with the Shapley value
Min Woo Sun,
Stefano Moretti,
Kelley M. Paskov,
Nate T. Stockham,
Maya Varma,
Brianna S. Chrisman,
Peter Y. Washington,
Jae-Yoon Jung,
Dennis P. Wall
BMC Bioinformatics, 2020
BMC
We introduce game theoretic centrality, which integrates biological network knowledge with Shapley value from coalitional game theory to prioritize disease-associated genes. Applied to autism spectrum disorder (ASD), the approach identifies biologically relevant genes, demonstrating potential regulatory interactions and offering insights into the genetic basis of complex disorders.
|
|