Machine Learning in Healthcare | Computational Biology PhD Candidate | Published in Science & Nature Methods
Research Data Scientist with 10+ years of experience leading machine learning initiatives and predictive modeling in healthcare and biomedical domains.
Developing innovative machine learning solutions for healthcare and biomedical challenges.
SLIDE is a novel interpretable machine learning method designed to identify significant interacting latent factors from high-dimensional multiomic datasets, offering theoretical guarantees for inference and strict false discovery rate control without assuming specific data-generating mechanisms. Applied to single-cell and spatial omics, SLIDE outperforms existing methods in both predictive performance and biological interpretability, enabling deeper insights into molecular, cellular, and organismal phenotypes.
Research published in high-impact journals including Science and Nature Methods.
Modern multiomic technologies can generate deep multiscale profiles. However, differences in data modalities, multicollinearity of the data, and large numbers of irrelevant features make analyses and integration of high-dimensional omic datasets challenging. Here we present Significant Latent Factor Interaction Discovery and Exploration (SLIDE), a first-in-class interpretable machine learning technique for identifying significant interacting latent factors underlying outcomes of interest from high-dimensional omic datasets. SLIDE makes no assumptions regarding data-generating mechanisms, comes with theoretical guarantees regarding identifiability of the latent factors/corresponding inference, and has rigorous false discovery rate control. Using SLIDE on single-cell and spatial omic datasets, we uncovered significant interacting latent factors underlying a range of molecular, cellular and organismal phenotypes. SLIDE outperforms/performs at least as well as a wide range of state-of-the-art approaches, including other latent factor approaches. More importantly, it provides biological inference beyond prediction that other methods do not afford. Thus, SLIDE is a versatile engine for biological discovery from modern multiomic datasets.
View PublicationSchistosomiasis, a highly prevalent parasitic disease, affects more than 200 million people worldwide. Current diagnostics based on parasite egg detection in stool detect infection only at a late stage, and current antibody-based tests cannot distinguish past from current infection. Here, we developed and used a multiplexed antibody profiling platform to obtain a comprehensive repertoire of antihelminth humoral profiles including isotype, subclass, Fc receptor (FcR) binding, and glycosylation profiles of antigen-specific antibodies. Using Essential Regression (ER) and SLIDE, interpretable machine learning methods, we identified latent factors (context-specific groups) that move beyond biomarkers and provide insights into the pathophysiology of different stages of schistosome infection. By comparing profiles of infected and healthy individuals, we identified modules with unique humoral signatures of active disease, including hallmark signatures of parasitic infection such as elevated immunoglobulin G4 (IgG4). However, we also captured previously uncharacterized humoral responses including elevated FcR binding and specific antibody glycoforms in patients with active infection, helping distinguish them from those without active infection but with equivalent antibody titers. This signature was validated in an independent cohort. Our approach also uncovered two distinct endotypes, nonpatent infection and prior infection, in those who were not actively infected. Higher amounts of IgG1 and FcR1/FcR3A binding were also found to be likely protective of the transition from nonpatent to active infection. Overall, we unveiled markers for antibody-based diagnostics and latent factors underlying the pathogenesis of schistosome infection. Our results suggest that selective antigen targeting could be useful in early detection, thus controlling infection severity.
View PublicationReview of methodologies for integrating diverse biomedical data types. This paper provides a comprehensive overview of current approaches and future directions in multi-omics integration.
View PublicationRNA-seq technology provides unprecedented power in the assessment of the transcription abundance and can be used to perform a variety of downstream tasks such as inference of gene-correlation network and eQTL discovery. However, raw gene expression values have to be normalized for nuisance biological variation and technical covariates, and different normalization strategies can lead to dramatically different results in the downstream study.
View PublicationIdentification of key factors in disease progression using advanced statistical methods. This work was conducted in collaboration with the midwest pediatric nephrology consortium.
View PublicationTechnical and domain expertise in data science, machine learning, and healthcare analytics.
Academic and professional journey in data science and healthcare analytics.
2018 - Expected July 2025
Dissertation: "Machine Learning Approaches for Interpretable Healthcare Analytics and Clinical Decision Support"
Relevant Coursework: Advanced Machine Learning, Statistical Learning Theory, Computational Genomics, Healthcare Data Science
September 2018 – Present
Led development of SLIDE, a novel machine learning framework for healthcare data analysis, resulting in publication in Nature Methods and adoption by three research institutions.
Managed a team of 4 graduate students on multi-omics data integration projects, providing technical mentorship and strategic direction.
2015 - 2017
Thesis: "Statistical Methods for Biomedical Data Analysis"
Relevant Coursework: Regression Analysis, Categorical Data Analysis, Multivariate Statistics, Experimental Design
August 2015 – August 2017
Designed and delivered comprehensive statistics curriculum to undergraduate students, maintaining a 4.8/5.0 instructor rating.
Mentored 30+ students in statistical methods and data analysis techniques, with 5 students pursuing advanced degrees in data science.
2005 - 2009
Focus: Operations Research and Systems Engineering
Relevant Coursework: Operations Research, Systems Engineering, Probability & Statistics, Optimization
June 2009 – December 2010
Implemented optimization models for healthcare delivery systems, reducing operational costs by 18%.
Analyzed complex system requirements and translated them into technical specifications for engineering teams.
Interested in collaboration or have questions about my research? Get in touch.
javad@pitt.edu
Pittsburgh, PA
Available after July 15, 2025