Javad Rahimikollu

Research Data Scientist

Machine Learning in Healthcare | Computational Biology PhD Candidate | Published in Science & Nature Methods

Research Data Scientist with 10+ years of experience leading machine learning initiatives and predictive modeling in healthcare and biomedical domains.

Javad Rahimikollu

Research Projects

Developing innovative machine learning solutions for healthcare and biomedical challenges.

SLIDE: Significant Latent Factor Interaction Discovery and Exploration

SLIDE is a novel interpretable machine learning method designed to identify significant interacting latent factors from high-dimensional multiomic datasets, offering theoretical guarantees for inference and strict false discovery rate control without assuming specific data-generating mechanisms. Applied to single-cell and spatial omics, SLIDE outperforms existing methods in both predictive performance and biological interpretability, enabling deeper insights into molecular, cellular, and organismal phenotypes.

Machine Learning Python Healthcare Analytics

Publications

Research published in high-impact journals including Science and Nature Methods.

Nature Methods

SLIDE: Statistical Learning for Interpretable Disease Endotypes

Rahimikollu J et al. (2024)

Modern multiomic technologies can generate deep multiscale profiles. However, differences in data modalities, multicollinearity of the data, and large numbers of irrelevant features make analyses and integration of high-dimensional omic datasets challenging. Here we present Significant Latent Factor Interaction Discovery and Exploration (SLIDE), a first-in-class interpretable machine learning technique for identifying significant interacting latent factors underlying outcomes of interest from high-dimensional omic datasets. SLIDE makes no assumptions regarding data-generating mechanisms, comes with theoretical guarantees regarding identifiability of the latent factors/corresponding inference, and has rigorous false discovery rate control. Using SLIDE on single-cell and spatial omic datasets, we uncovered significant interacting latent factors underlying a range of molecular, cellular and organismal phenotypes. SLIDE outperforms/performs at least as well as a wide range of state-of-the-art approaches, including other latent factor approaches. More importantly, it provides biological inference beyond prediction that other methods do not afford. Thus, SLIDE is a versatile engine for biological discovery from modern multiomic datasets.

View Publication
Science Translational Medicine

Deep humoral profiling coupled to interpretable machine learning unveils diagnostic markers and pathophysiology of schistosomiasis

Anushka Saha, Trirupa chakraborty, Rahimikollu J et al. (2024)

Schistosomiasis, a highly prevalent parasitic disease, affects more than 200 million people worldwide. Current diagnostics based on parasite egg detection in stool detect infection only at a late stage, and current antibody-based tests cannot distinguish past from current infection. Here, we developed and used a multiplexed antibody profiling platform to obtain a comprehensive repertoire of antihelminth humoral profiles including isotype, subclass, Fc receptor (FcR) binding, and glycosylation profiles of antigen-specific antibodies. Using Essential Regression (ER) and SLIDE, interpretable machine learning methods, we identified latent factors (context-specific groups) that move beyond biomarkers and provide insights into the pathophysiology of different stages of schistosome infection. By comparing profiles of infected and healthy individuals, we identified modules with unique humoral signatures of active disease, including hallmark signatures of parasitic infection such as elevated immunoglobulin G4 (IgG4). However, we also captured previously uncharacterized humoral responses including elevated FcR binding and specific antibody glycoforms in patients with active infection, helping distinguish them from those without active infection but with equivalent antibody titers. This signature was validated in an independent cohort. Our approach also uncovered two distinct endotypes, nonpatent infection and prior infection, in those who were not actively infected. Higher amounts of IgG1 and FcR1/FcR3A binding were also found to be likely protective of the transition from nonpatent to active infection. Overall, we unveiled markers for antibody-based diagnostics and latent factors underlying the pathogenesis of schistosome infection. Our results suggest that selective antigen targeting could be useful in early detection, thus controlling infection severity.

View Publication
Patterns (Cell Press)

Rahimikollu J, Das J. (2022)

Review of methodologies for integrating diverse biomedical data types. This paper provides a comprehensive overview of current approaches and future directions in multi-omics integration.

View Publication
Bioinformatics

DataRemix: a universal data transformation for optimal inference from gene expression datasets

Weiguang Mao, Rahimikollu J et al. (2021)

RNA-seq technology provides unprecedented power in the assessment of the transcription abundance and can be used to perform a variety of downstream tasks such as inference of gene-correlation network and eQTL discovery. However, raw gene expression values have to be normalized for nuisance biological variation and technical covariates, and different normalization strategies can lead to dramatically different results in the downstream study.

View Publication
Pediatric Nephrology

Predictors of patency for arteriovenous fistulae and grafts in pediatric hemodialysis patients

Alimirza Onder,...,Rahimikollu J et al. (2020)

Identification of key factors in disease progression using advanced statistical methods. This work was conducted in collaboration with the midwest pediatric nephrology consortium.

View Publication

Skills & Expertise

Technical and domain expertise in data science, machine learning, and healthcare analytics.

Machine Learning & AI

Predictive Modeling 95%
Statistical Learning 90%
Natural Language Processing 85%
Deep Learning 80%
Feature Engineering 90%

Programming & Development

Python 95%
R 90%
SQL 85%
MATLAB 80%
C++ 75%

Data Engineering & Analytics

Data Visualization 90%
ETL Pipeline Development 85%
Large-scale Data Processing 80%
Data Integration 90%
Database Management 80%

Healthcare Domain Knowledge

Electronic Medical Records 90%
Biomedical Data Analysis 95%
Clinical Outcome Prediction 90%
Multi-omics Data Analysis 85%
Healthcare Systems 80%

Leadership & Communication

Team Leadership 85%
Technical Mentorship 90%
Stakeholder Communication 85%
Project Management 80%
Research Publication 95%

Education & Experience

Academic and professional journey in data science and healthcare analytics.

Ph.D., Computational Biology

Carnegie Mellon University & University of Pittsburgh

2018 - Expected July 2025

Dissertation: "Machine Learning Approaches for Interpretable Healthcare Analytics and Clinical Decision Support"

Relevant Coursework: Advanced Machine Learning, Statistical Learning Theory, Computational Genomics, Healthcare Data Science

Research Assistant, Computational Biology

Carnegie Mellon University & University of Pittsburgh

September 2018 – Present

Led development of SLIDE, a novel machine learning framework for healthcare data analysis, resulting in publication in Nature Methods and adoption by three research institutions.

Managed a team of 4 graduate students on multi-omics data integration projects, providing technical mentorship and strategic direction.

M.S., Statistics

West Virginia University

2015 - 2017

Thesis: "Statistical Methods for Biomedical Data Analysis"

Relevant Coursework: Regression Analysis, Categorical Data Analysis, Multivariate Statistics, Experimental Design

Graduate Teaching Assistant and Statistics Instructor

West Virginia University

August 2015 – August 2017

Designed and delivered comprehensive statistics curriculum to undergraduate students, maintaining a 4.8/5.0 instructor rating.

Mentored 30+ students in statistical methods and data analysis techniques, with 5 students pursuing advanced degrees in data science.

B.S., Industrial Engineering

Khajeh Nasir Toosi University of Technology

2005 - 2009

Focus: Operations Research and Systems Engineering

Relevant Coursework: Operations Research, Systems Engineering, Probability & Statistics, Optimization

Project Engineer

Amir AyandeNegar Co.

June 2009 – December 2010

Implemented optimization models for healthcare delivery systems, reducing operational costs by 18%.

Analyzed complex system requirements and translated them into technical specifications for engineering teams.

Contact

Interested in collaboration or have questions about my research? Get in touch.

javad@pitt.edu

Pittsburgh, PA

Available after July 15, 2025

© 2025 Javad Rahimikollu. All rights reserved.