SVM Comparison
After establishing a strong baseline with logistic regression, this section evaluates Support Vector Machines (SVM) using the same input data. The goal was to test how a kernel-based model performs on structured medical features.
Why Try SVM?
SVMs are powerful classifiers capable of separating data with non-linear boundaries. While slower to train, they can excel in datasets where class clusters are tightly shaped and margins matter.
Because the WDBC features reflect nuanced cell structure, SVM offered a chance to capture relationships that logistic regression might miss.
Kernel Selection and Scaling
A radial basis function (RBF) kernel was selected for its ability to model complex curvature. Features were scaled using the same standardization strategy as logistic regression to ensure fair comparison.
from sklearn.svm import SVC
svm = SVC(kernel="rbf", C=1.0, gamma="scale")
svm.fit(X_train, y_train)
The model was cross-validated using the same stratified training and test splits used throughout the project.
Performance and Confusion Matrix
On the held-out test set, the SVM achieved comparable results to logistic regression, with minor differences in sensitivity to borderline cases.
precision recall f1-score support
Benign 0.97 0.99 0.98 72
Malignant 0.97 0.93 0.95 42
accuracy 0.97 114
macro avg 0.97 0.96 0.96 114
weighted avg 0.97 0.97 0.97 114
These results indicate that SVM slightly improved the false positive rate for malignant predictions, which could be beneficial in sensitive diagnoses.
Model Comparison
Both classifiers showed high accuracy, but logistic regression had the advantage of transparency and faster training. SVM's nonlinear boundary gave it a slight edge in recall.
Logistic Regression
- Accuracy: 96%
- Faster to train
- Easy to interpret
SVM (RBF)
- Accuracy: 97%
- Handles nonlinear boundaries
- Slower training, harder to explain
Key Takeaways
- SVM delivered strong results on the same input features as logistic regression.
- Radial kernels captured subtle patterns without needing feature transformation.
- Interpretability was reduced compared to the linear model.
- Both models demonstrated the diagnostic value of structured FNA data.