Machine Learning in Healthcare | Computational Biology PhD Candidate | Published in Science & Nature Methods
Research Data Scientist with 10+ years of experience leading machine learning initiatives and predictive modeling in healthcare and biomedical domains.
My research focuses on developing and applying advanced computational methods to solve complex problems in healthcare and biology.
Graphical models, generative models, deep learning, interpretable machine learning models, causality
Multi-omics and multimodal integration, gene perturbation analysis, identifiable matrix factorization
Linear and non-linear programming, convex optimization
University of Pittsburgh
2017-2018
West Virginia University
2016
Developing innovative machine learning solutions for healthcare and biomedical challenges.
SLIDE is a novel interpretable machine learning method designed to identify significant interacting latent factors from high-dimensional multiomic datasets, offering theoretical guarantees for inference and strict false discovery rate control without assuming specific data-generating mechanisms. Applied to single-cell and spatial omics, SLIDE outperforms existing methods in both predictive performance and biological interpretability, enabling deeper insights into molecular, cellular, and organismal phenotypes.
A novel multi-scale framework that couples network-based genome-wide association studies (GWAS) to functional genomic data to uncover network modules distinguishing CCP+ and CCP- rheumatoid arthritis. Using the RACER cohort of 555 CCP+/RF+ and 384 CCP-/RF+ RA patients, we uncovered significant differences in heritability and identified 14 putative gene modules explaining genetic differences between these disease groups. Our findings demonstrate the utility of network-based approaches in elucidating the complex genetic landscape of RA, offering new insights into differential genetic risk factors and paving the way for more personalized therapeutic strategies.
Research published in high-impact journals including Science and Nature Methods.
Modern multiomic technologies can generate deep multiscale profiles. However, differences in data modalities, multicollinearity of the data, and large numbers of irrelevant features make analyses and integration of high-dimensional omic datasets challenging. Here we present Significant Latent Factor Interaction Discovery and Exploration (SLIDE), a first-in-class interpretable machine learning technique for identifying significant interacting latent factors underlying outcomes of interest from high-dimensional omic datasets. SLIDE makes no assumptions regarding data-generating mechanisms, comes with theoretical guarantees regarding identifiability of the latent factors/corresponding inference, and has rigorous false discovery rate control. Using SLIDE on single-cell and spatial omic datasets, we uncovered significant interacting latent factors underlying a range of molecular, cellular and organismal phenotypes. SLIDE outperforms/performs at least as well as a wide range of state-of-the-art approaches, including other latent factor approaches. More importantly, it provides biological inference beyond prediction that other methods do not afford. Thus, SLIDE is a versatile engine for biological discovery from modern multiomic datasets.
View PublicationSchistosomiasis, a highly prevalent parasitic disease, affects more than 200 million people worldwide. Current diagnostics based on parasite egg detection in stool detect infection only at a late stage, and current antibody-based tests cannot distinguish past from current infection. Here, we developed and used a multiplexed antibody profiling platform to obtain a comprehensive repertoire of antihelminth humoral profiles including isotype, subclass, Fc receptor (FcR) binding, and glycosylation profiles of antigen-specific antibodies. Using Essential Regression (ER) and SLIDE, interpretable machine learning methods, we identified latent factors (context-specific groups) that move beyond biomarkers and provide insights into the pathophysiology of different stages of schistosome infection. By comparing profiles of infected and healthy individuals, we identified modules with unique humoral signatures of active disease, including hallmark signatures of parasitic infection such as elevated immunoglobulin G4 (IgG4). However, we also captured previously uncharacterized humoral responses including elevated FcR binding and specific antibody glycoforms in patients with active infection, helping distinguish them from those without active infection but with equivalent antibody titers. This signature was validated in an independent cohort. Our approach also uncovered two distinct endotypes, nonpatent infection and prior infection, in those who were not actively infected. Higher amounts of IgG1 and FcR1/FcR3A binding were also found to be likely protective of the transition from nonpatent to active infection. Overall, we unveiled markers for antibody-based diagnostics and latent factors underlying the pathogenesis of schistosome infection. Our results suggest that selective antigen targeting could be useful in early detection, thus controlling infection severity.
View PublicationReview of methodologies for integrating diverse biomedical data types. This paper provides a comprehensive overview of current approaches and future directions in multi-omics integration.
View PublicationPreprint posted August 5, 2025; manuscript in revision at Arthritis and Rheumatology.
View PublicationRNA-seq technology provides unprecedented power in the assessment of the transcription abundance and can be used to perform a variety of downstream tasks such as inference of gene-correlation network and eQTL discovery.
View PublicationIdentification of key factors in disease progression using advanced statistical methods. This work was conducted in collaboration with the midwest pediatric nephrology consortium.
View PublicationMidwest Pediatric Nephrology Consortium study on identifying predictive factors for arteriovenous fistula cannulation timing.
View PublicationNovel approach to enhance energy efficiency in the US manufacturing industry using fuzzy logic and TOPSIS methodology.
View PublicationAssessment of climate change impacts on cholera prevalence in the Ganges-Brahmaputra basin using hydroclimatic modeling.
View PublicationComparative analysis of aggregation methods in fuzzy group decision-making using TOPSIS methodology.
View PublicationTechnical and domain expertise in data science, machine learning, and healthcare analytics.
Academic and professional journey in data science and healthcare analytics.
2015 - 2017
Interested in collaboration or have questions about my research? Get in touch.
javad@pitt.edu
Pittsburgh, PA
Available after Nov 2025