6 In simple terms

6.1 ML model interpretation

  • ROC Curve: The curve plots the true positive rate (sensitivity) against the false positive rate (1 - specificity) at various threshold settings. The true positive rate is on the y-axis, and the false positive rate is on the x-axis.

  • Performance: A perfect classifier would have a point in the upper left corner of the graph, where the true positive rate is 1 (or 100%) and the false positive rate is 0. The closer the curve follows the left-hand border and then the top border of the ROC space, the more accurate the test.

  • Diagonal Line: The dotted diagonal line represents a no-skill classifier (e.g., random guessing). A good classifier stays as far away from this line as possible (toward the upper left corner).

  • Area Under the Curve (AUC): The area under each ROC curve (AUC) is a measure of the test’s accuracy. An AUC of 0.5 suggests no discrimination (no better than random chance), while an AUC of 1.0 suggests perfect discrimination.

6.2 ML model explanation

  • SHAP values: help in understanding how each predictor in the dataset contributed to each particular prediction. A high positive SHAP value for a feature increases the probability of a certain prediction, while a high negative SHAP value decreases it.