Cancer Data Research via Bipartite Graph Modeling for Subtype Discovery, Therapeutic Linking, and Outcome Prediction

PI: Faisal N. Abu-Khzam

Project description

We use a graph-based framework that models cancer cohorts as bipartite networks connecting patients to molecular/clinical entities (e.g., genes, variants, pathways, drugs/targets). This representation preserves the inherent two-mode structure of oncology data and supports subtype discovery, link prediction for therapy mapping, and risk/outcome modeling. Using proven technology, we will develop, validate, and release a toolkit and a secure analytics workflow suitable for multi-omics and clinical integration.

Research partners

We welcome partners who are keen to apply novel (yet proven) methodology/technology to solve high-dimensional data problems in cancer research and uncover new insights in diagnosis, risks, and therapy—particularly organizations that possess clinical data and are open to underwriting a project and/or partnering on research grants.

Our methodology

1. Data-to-graph integration

Deliver a harmonized schema with robust QC, missingness handling, and edge weighting (frequency, effect size, confidence).

2. Bi-clustering & community detection for subtype discovery

Apply bipartite-aware methods (e.g., bi-Louvain, spectral bi-clustering, stochastic block models, NMF) to identify patient–feature bi-clusters that correspond to molecular subtypes and co-alteration modules. Evaluate biological coherence (pathway enrichment), stability, and agreement with known labels (e.g., PAM50 in breast cancer).

3. Link prediction for therapy mapping and novel associations

4. Prognostic modeling using graph-derived features

Engineer graph features (bipartite degrees, authority/hub scores, community memberships, graph embeddings) and test their value in survival/time-to-event models (Cox, RSF, DeepSurv). Report C-index, time-dependent AUC, and calibration; compare against baselines without graph features.

Data and cohorts

All protected health information—if any institutional data are added—will follow IRB approvals, de-identification, and secure compute controls.

Evaluation and success criteria

Deliverables

Ethics, privacy, and compliance

All human data work will follow IRB approval, data-use agreements, and GDPR/HIPAA-compliant storage. Only de-identified data will be used; results are research-only, not for clinical decision-making.