Skip to main content

Infer.NET user guide : Learners : Bayes Point Machine classifiers

Command-line runners

Learner.exe (which can be found in the bin directory of the Infer.NET release) provides access to the Infer.NET learners via modules that can be run from command-line. While the command-line modules make some assumptions about the data format (see below) and do not allow you to provide data via custom mappings tailored to your data, they are easy-to-use and do not require any implementation.

The command-line modules are organized hierarchically. There are two top-level modules: Recommender and Classifier. In this section we are only interested in the modules registered under Classifier. These are:

Both Bayes Point Machine classifier modules, BinaryBayesPointMachine and MulticlassBayesPointMachine, provide a number of additional modules defining operations:

The Evaluate module itself has no sub-modules and can be used to evaluate the performance of any classifier.

Example

In this section and its subsections we describe the aforementioned modules together with their operation-specific options, all of which are prefixed by -- (dash, dash). Before we explain all these options in more detail, let us begin with a simple example sequence of commands to train, test and evaluate a multi-class Bayes Point Machine classifier.

Learner Classifier MulticlassBayesPointMachine Train   
    --training-set dna.train --model dna.mdl   

Learner Classifier MulticlassBayesPointMachine Predict   
    --test-set dna.test --model dna.mdl --predictions dna.predictions  

Learner Classifier Evaluate --ground-truth dna.test   
    --predictions dna.predictions --report dna.evaluation.txt

Data format

The data format for the command-line modules is fixed since it is impossible to have the user define data mappings. This means that the classification data needs to be converted into the format required by the command-line runners.

Internally, the Bayes Point Machine command-line modules implement a standard data format mapping based on a sparse feature representation. No bias is added by default, so that it may be necessary to add an additional feature with constant value to all instances.

The command-line classifier modules expect classification data in a single text file. The file’s format is as follows:

In short, the format specifies one instance per line, allows for a zero-sparse feature representation which separates feature identifiers from feature values using a colon, and specifies labels on the beginning of a line.

Here is an example of how such a file might look (illustrating some corner cases):

// Six instances:  

A/I first1:2 second-2 third_3:1.3e-10  
J/O      
R/Z second-2:3.1234511  
J/O first1:0.12 second-2:4   
A/I first1:2           third_3:1.45e-10  
J/O first1:0.22  

    %Four more instances:  
A/I second-2 third_3:2.762e-10 first1:1.97  
J/O      first1:2.32  
R/Z second-2:3.1234511  
R/Z second-2:2.519 third_3

The resulting labels and feature values are (in a dense representation for the purpose of clarity):

Label Feature 1 first1 Feature 2 second-2 Feature 3 third_3
A/I 2 1 1.3e-10
J/O 0 0 0
R/Z 0 3.1234511 0
J/O 0.12 4 0
A/I 2 0 1.45e-10
J/O 0.22 0 0
A/I 1.97 1 2.762e-10
J/O 2.32 0 0
R/Z 0 3.1234511 0
R/Z 0 2.519 1