Future Work

The Breast Cancer Identifier project demonstrates a strong foundation for predictive diagnostics. The following areas outline clear paths for further development, optimization, and real-world application.

Model Enhancements

Although logistic regression performed well, future work could explore:

  • Ensembling with random forests or gradient boosting
  • Dimensionality reduction via PCA or LDA
  • Bayesian modeling for uncertainty quantification
Even small performance gains may matter in edge-case clinical diagnoses.

Broader Dataset Integration

The current model was trained on WDBC data, which is well-structured but limited in demographic and imaging diversity. Future iterations could:

  • Incorporate additional clinical markers (e.g. genetic profiles, imaging)
  • Augment with multi-center datasets for generalizability
  • Apply domain adaptation techniques to minimize distribution shift

Model Explainability

To increase clinical trust and accountability, visualizing why the model made its decision will be essential. Potential additions:

SHAP Values

Breaks down feature contributions to each individual prediction.

Feature Importance Charts

Highlights which inputs consistently affect predictions.

Deployment Expansion

The current prototype runs locally. Future work could target:

  • Dockerized deployment on secure servers
  • Serverless architecture for scalable use
  • Progressive web app (PWA) for mobile diagnosis

A production-ready version would also need role-based access and audit capabilities.

Key Takeaways

  • More complex models could improve marginal accuracy and sensitivity.
  • Additional data sources are needed to generalize the model beyond WDBC.
  • Transparency tools like SHAP will be key for clinical adoption.
  • With minor upgrades, this prototype could serve as a foundation for a full diagnostic assistant.