Future Work
The Breast Cancer Identifier project demonstrates a strong foundation for predictive diagnostics. The following areas outline clear paths for further development, optimization, and real-world application.
Model Enhancements
Although logistic regression performed well, future work could explore:
- Ensembling with random forests or gradient boosting
- Dimensionality reduction via PCA or LDA
- Bayesian modeling for uncertainty quantification
Broader Dataset Integration
The current model was trained on WDBC data, which is well-structured but limited in demographic and imaging diversity. Future iterations could:
- Incorporate additional clinical markers (e.g. genetic profiles, imaging)
- Augment with multi-center datasets for generalizability
- Apply domain adaptation techniques to minimize distribution shift
Model Explainability
To increase clinical trust and accountability, visualizing why the model made its decision will be essential. Potential additions:
SHAP Values
Breaks down feature contributions to each individual prediction.
Feature Importance Charts
Highlights which inputs consistently affect predictions.
Deployment Expansion
The current prototype runs locally. Future work could target:
- Dockerized deployment on secure servers
- Serverless architecture for scalable use
- Progressive web app (PWA) for mobile diagnosis
A production-ready version would also need role-based access and audit capabilities.
Key Takeaways
- More complex models could improve marginal accuracy and sensitivity.
- Additional data sources are needed to generalize the model beyond WDBC.
- Transparency tools like SHAP will be key for clinical adoption.
- With minor upgrades, this prototype could serve as a foundation for a full diagnostic assistant.