
Awarded in 2024
Home Department: Computer Science
Faculty Advisors: Russ Altman (Bioengineering, Genetics, Medicine – Biomedical Informatics Research, and Biomedical Data Science), Serena Yeung (Biomedical Data Science), and Sanmi Koyejo (Computer Science)
Research Title: Integrating generative models to address selection bias and patient privacy in EHR data
Machine learning (ML) models trained on sensitive personal information have the potential to drive innovation in many domains, including fairer prediction methods and novel clinical insights. However, privacy regulations heavily restrict the sharing and subsequent research of valuable healthcare data. To address this, Kara proposes a novel generative model for synthetic electronic health records (EHRs) that ensures patient privacy while accounting for selection biases that emerge during the data collection process. Her work will leverage an unbiased dataset to learn which features lead to selection biases in real-world medical datasets, such as access to health insurance. By conditioning on these features of selection using a generative diffusion model (DDPM), her project will generate realistic EHR data that corrects for selection bias and better represents the general population. Finally, Kara will train the generative method to enforce patient privacy. By integrating advanced techniques from both medicine and computer science, her research holds promise in democratizing access to EHR data, fostering widespread medical research and enabling the development of improved healthcare solutions.