SVM Comparison

After establishing a strong baseline with logistic regression, this section evaluates Support Vector Machines (SVM) using the same input data. The goal was to test how a kernel-based model performs on structured medical features.

Why Try SVM?

SVMs are powerful classifiers capable of separating data with non-linear boundaries. While slower to train, they can excel in datasets where class clusters are tightly shaped and margins matter.

Because the WDBC features reflect nuanced cell structure, SVM offered a chance to capture relationships that logistic regression might miss.

Kernel Selection and Scaling

A radial basis function (RBF) kernel was selected for its ability to model complex curvature. Features were scaled using the same standardization strategy as logistic regression to ensure fair comparison.

from sklearn.svm import SVC

svm = SVC(kernel="rbf", C=1.0, gamma="scale")
svm.fit(X_train, y_train)

The model was cross-validated using the same stratified training and test splits used throughout the project.

Performance and Confusion Matrix

On the held-out test set, the SVM achieved comparable results to logistic regression, with minor differences in sensitivity to borderline cases.

              precision    recall  f1-score   support

       Benign       0.97      0.99      0.98       72
    Malignant       0.97      0.93      0.95       42

    accuracy                           0.97      114
   macro avg       0.97      0.96      0.96      114
weighted avg       0.97      0.97      0.97      114

These results indicate that SVM slightly improved the false positive rate for malignant predictions, which could be beneficial in sensitive diagnoses.

Model Comparison

Both classifiers showed high accuracy, but logistic regression had the advantage of transparency and faster training. SVM's nonlinear boundary gave it a slight edge in recall.

Logistic Regression

  • Accuracy: 96%
  • Faster to train
  • Easy to interpret

SVM (RBF)

  • Accuracy: 97%
  • Handles nonlinear boundaries
  • Slower training, harder to explain

Key Takeaways

  • SVM delivered strong results on the same input features as logistic regression.
  • Radial kernels captured subtle patterns without needing feature transformation.
  • Interpretability was reduced compared to the linear model.
  • Both models demonstrated the diagnostic value of structured FNA data.