Infer.NET user guide : Learners : Bayes Point Machine classifiers : Command-line runners
Evaluation
The Evaluate
module computes a number of performance metrics for given predictions and ground truth labels. The module can hence be used to evaluate predictions from any classifier, not just the Bayes Point Machine!
The Evaluate
module has the following command-line arguments:
Required arguments
ground-truth
: The file containing the ground truth labels (in the format described earlier, features can be absent, however).predictions
: The file from which the predictions will be loaded.
Optional arguments
report
: The text file to which an evaluation report will be written, containing most of the classification metrics of interest.calibration-curve
: The CSV file to which the empirical calibration curve will be written.roc-curve
: The CSV file to which the receiver operating characteristic (ROC) curve will be written.precision-recall-curve
: The CSV file to which the precision-recall curve will be written.positive-class
: The label indicating the positive class in the computation of calibration, ROC, and precision-recall curves. If left unspecified, the first class label encountered in the file with ground truth labels will be used.
A more detailed explanation of classifier evaluation and performance metrics is available here.
Example
Learner Classifier Evaluate --ground-truth iris-test-set.dat
--predictions iris-predictions.dat --report evaluation.txt
--calibration-curve calibration.csv --roc-curve roc.csv
--precision-recall-curve pr.csv --positive-class Iris-virginica
Sample output
Here is an example of an evaluation report:
Classifier evaluation report
******************************
Date: 14/10/2014 18:50:37
Ground truth: test-set.dat
Predictions: predictions.dat
Instance-averaged performance (micro-averages)
================================================
Precision = 0.9429
Recall = 0.9427
F1 = 0.9427
#Correct = 1118
#Total = 1186
Accuracy = 0.9427
Error = 0.0573
AUC = 0.9915
Log-loss = 0.2487
Class-averaged performance (macro-averages)
=============================================
Precision = 0.9352
Recall = 0.9383
F1 = 0.9366
Accuracy = 0.9383
Error = 0.0617
AUC = 0.9917
M (pairwise AUC) = 0.9952
Performance on individual classes
===================================
Index Label #Truth #Predicted #Correct Precision Recall F1 AUC
---------------------------------------------------------------------------------------------------
1 3 603 596 575 0.9648 0.9536 0.9591 0.9908
2 2 280 277 255 0.9206 0.9107 0.9156 0.9910
3 1 303 313 288 0.9201 0.9505 0.9351 0.9935
Confusion matrix
==================
Truth \ Prediction ->
3 2 1
3 575 15 13
2 13 255 12
1 8 7 288
Pairwise AUC matrix
=====================
Truth \ Prediction ->
3 2 1
3 . 0.9942 0.9963
2 0.9942 . 0.9950
1 0.9963 0.9950 .