Confusion matrices . The confusion matrix is a fundamental tool when evaluating the performance of a classification algorithm, since it will give a better idea of how the algorithm is being classified, based on a count of the successes and errors of each of the classes in the classification. This way you can check if the algorithm is misclassifying the classes and to what extent.
Functioning
The confusion matrix is a fundamental tool when evaluating the performance of a classification algorithm, since it will give a better idea of how the algorithm is being classified, based on a count of the successes and errors of each of the classes in the classification. This way you can check if the algorithm is misclassifying the classes and to what extent.
The performance of a system is usually evaluated using the data in that matrix. The following table shows the confusion matrix for a classifier into two classes:
Sorter | |||
Negative | Positive | ||
Actual values | Negative | to | b |
Positive | c | d |
In said table:
- ais the number of correct predictionsthat a case is negative .
- bis the number of incorrect predictionsthat a case is positive , that is, the prediction is positive when the value should actually be negative. These cases are also called type I errors.
- cis the number of incorrect predictionsthat a case is negative , that is, the prediction is negative when the value would actually have to be positive. These cases are also called type II errors.
- dis the number of correct predictions that a case is positive.
Several standard terms have been defined to measure the performance of a classifier in any branch where classification systems are applied:
- Accuracy ( Ac, of the English Accuracy ) is the ratio of the total number of predictions were correct:
- Reason of true positives ( TPrateof the English True Positive Rate ), sometimes also called Recall is the proportion of positive cases were correctly identified:
- Reason of False Positives ( FPrateof the English False Positive Rate ) is the proportion of negative cases that have been incorrectly classified as positive:
- Reason of True Negatives ( TNrateof the English True Negative Rate ) is the proportion of negative cases were correctly classified
- Reason of False Negatives ( FNrateof the English False Negative Rate ) is the proportion of positive cases that were incorrectly classified as negative:
- Precision (P, also Precision) is the proportion of positive predicted cases that were correct
The following terms are also frequently used:
- Sensitivity( Se , from English Sensitivity ) as a synonym for TPrate because it is the classifier’s ability to be “sensitive” to positive cases. Notice that 1-Se = FNrate
- Specificity( Sp , from English Specificity ) as a synonym for TNrate , because it can give a measure of the specificity of the test to mark positive cases. Notice that 1-Sp = FPrate
If a classifier can vary certain parameters, it is possible to increase the TP at the cost of increasing the FP or vice versa. In other words, high sensitivity with high specificity (or equivalently low FPrate ) is desired
Calculation in R
In R (R Core Team, 2017), confusion matrices can be calculated with the random Forest () function of the randomForest library, which outputs a confusion matrix among its outputs, in addition, the caret library has the confusionMatrix () function implemented