Skip to main content

Infer.NET development

String inference API design

Introduction

The purpose of this document is to describe the way inference over sequences such as strings or lists is implemented in Infer.NET.

Four types of entities are involved in performing inference over sequences, namely automaton functions, automata, transducers and sequence manipulators. For each of these entity types there exists a base class providing an interface and core functionality for an arbitrary class of sequences, as well as derived classes customized for a particular sequence type, like strings or lists. The purpose of each type and the nature of interactions between them is explained in the following sections of this document.

The definitions of a weighted state automaton and weighted finite state transducer given in this doc may be slightly different from those you can find in your favorite textbook on the topic, but a) they should be equivalent b) the definitions given in the document match our implementation much better.

Sequence Manipulators

Sequence manipulators are used to abstract basic operations on a sequence type. We didn’t use a common base type for sequences instead because we wanted inference to be able to operate on existing .NET sequence types, like System.String or System.Collections.Generic.List<T>. Each sequence manipulator implements the ISequenceManipulator<TSequence, TElement> interface defined as follows:

public interface ISequenceManipulator<TSequence, TElement>
    where TSequence : class, IEnumerable<TElement>
{
    /// <summary>
    /// Converts a given collection of elements to a sequence.
    /// </summary>
    /// <param name="elements">The collection of elements to convert to a sequence.</param>
    /// <returns>The sequence containing the elements.</returns>
    TSequence ToSequence(IEnumerable<TElement> elements);

    /// <summary>
    /// Gets the length of a given sequence.
    /// </summary>
    /// <param name="sequence">The sequence.</param>
    /// <returns>The length of the sequence.</returns>
    int GetLength(TSequence sequence);

    /// <summary>
    /// Gets the element at a given position in a given sequence.
    /// </summary>
    /// <param name="sequence">The sequence.</param>
    /// <param name="index">The position.</param>
    /// <returns>The element at the given position in the sequence.</returns>
    TElement GetElement(TSequence sequence, int index);

    /// <summary>
    /// Checks if given sequences are equal.
    /// Sequences are considered equal if they contain the same elements in the same order.
    /// </summary>
    /// <param name="sequence1">The first sequence.</param>
    /// <param name="sequence2">The second sequence.</param>
    /// <returns><see langword="true"/> if the sequences are equal, <see langword="false"/> otherwise.</returns>
    bool SequencesAreEqual(TSequence sequence1, TSequence sequence2);

    /// <summary>
    /// Creates a sequence by copying the first sequence and then appending the second sequence to it.
    /// </summary>
    /// <param name="sequence1">The first sequence.</param>
    /// <param name="sequence2">The second sequence.</param>
    /// <returns>The created sequence.</returns>
    TSequence Concat(TSequence sequence1, TSequence sequence2);
}

 

Automaton Functions & Automata

Distributions over sequences are defined by properly normalized weighted finite state automata (WFSA). In the code we represented WFSA using two concepts: automata and automaton functions. In the following sections of this document we will explain what a WFSA is, why this representation was chosen and how we use it to represent probability distributions.

What Is a Weighted Finite State Automaton?

A weighted finite state automaton is a directed graph with additional information associated with every node, which we will call a state, and edge, which we will call a transition. In particular, each state has a so-called ending weight, and each transition has a weight and, optionally, a distribution over sequence elements. If a transition has no element distribution, it is called an epsilon transition. Every automaton also has a designated start state.

weighted_automata.jpg

An example of a weighted finite state automaton is given in the figure above. Here we have 3 states: 0, 1 and 2. State 0 is the start state. States 1 and 2 have the ending weight of 1, while the ending weight of state 0 is zero. There exists an epsilon transition between states 0 and 2 with the weight of 3 and two regular transitions labelled with point mass element distributions between states 0 and 1 and 2 and 1. There is also a self-loop on state 2 labelled with the uniform distribution over ‘a’ and ‘b’.

We call a series of transitions in an automaton a path if the source state of the first transition in the series is the start state. The weight of a path for sequence s is defined as follows:

We call a path valid for sequence s if it’s weight for that sequence is not zero. The following table shows the weight and validity for all paths for the sequence “a”:

Path for sequence “a” Weight Validity of path
0 → 1 1 * 0.5 * 1 = 0.5 Valid
0 → 2 → 2 3 * 0.5 * 1 * 1 = 1.5 Valid
0 → 2 → 1 3 * 0 * 1 * 1 = 0 (there’s no mass on “a” between states 2 and 1) Not Valid

Essentially, each transition of a valid path corresponds to a particular position in the sequence. Traversing a non-epsilon edge advances the position by 1, while traversing an epsilon transition preserves the position. And in order to compute the weight of the path we weight each transition according to the corresponding symbol of the sequence.

The value of a weighted finite state automaton on a sequence is defined as the sum of the weights of all valid paths for that sequence. For example, here are the values of our example automaton on several sequences:

Sequence Weight
”” 3.0
“a” 2.0
“b” 1.5
“bb” 0.75
“c” 3
“abac” 0.375

A table like that with all possible sequences in the first column is called a weighted regular language induced by a weighted finite state automaton.

If the values of an automaton are non-negative, and the sum of the values over all possible sequences is 1, it is essentially defining a probability distribution over sequences. The reason this formalism was chosen to represent such distributions in Infer.NET is that weighted finite state automata are closed under multiplication, which is a necessary requirement to fulfil if one wants to use them as messages in EP or VMP framework. WFSA also provide simple ways to perform other operations useful for doing inference, such as summation (extensively used in inference with gates). Finally, a lot of message operators for WFSA have simple implementation in terms of weighted finite state transducers (see the corresponding section).

Automaton Function

An automaton function is a mapping from sequences to non-negative real values as defined by a weighted finite state automaton, with some additional operations on it. In our current implementation all transition weights must be non-negative, but this limitation is likely to be relaxed in the future because negative weights play an important role in implementing some operations.

The set of supported operations on an automaton function includes the following:

All the core functionality is provided by the class with the following signature:

[Serializable]
public abstract partial class AutomatonFunction<TSequence, TElement, TElementDistribution, TSequenceManipulator, TThis>
    where TSequence : class, IEnumerable<TElement>
    where TElementDistribution : class, IDistribution<TElement>, SettableToProduct<TElementDistribution>, SettableToWeightedSumExact<TElementDistribution>, CanGetLogAverageOf<TElementDistribution>, SettableToPartialUniform<TElementDistribution>, new()
    where TSequenceManipulator : ISequenceManipulator<TSequence, TElement>, new()
    where TThis : AutomatonFunction<TSequence, TElement, TElementDistribution, TSequenceManipulator, TThis>, new()
{
...
}

The generic parameters of this class are bound by the derived classes and, thus, are not directly exposed to a user. The first two parameters, TSequence and TElement, define the type of sequences the automaton function is operating on. The third parameter, TElementDistribution, defines the type of a distribution over sequence elements that can be associated with an edge. The TSequenceManipulator parameter specifies the manipulator for instances of TSequence. The last parameter, TThis, specifies the type of a concrete class that is supposed to be used by an API user. It is needed for two reasons: to be able to define factory methods for the automaton function in the base class, as well as to simplify the signatures of member functions taking other automaton functions as an argument.

Here is an example of the definition of a derived class:

public class StringAutomatonFunction :
        AutomatonFunction<string, char, DiscreteChar, StringManipulator, StringAutomatonFunction>
{
...
}

Automaton

An automaton is an implementation of IDistribution over a sequence type that uses AutomatonFunction to store the mapping from sequences to probabilities. The connection between an automaton and an automaton function is similar to the one between Discrete and Vector classes in Infer.NET: Discrete uses Vector to store probabilities, keeps its values normalized and uses arithmetic operations defined on it to perform various operations on the distribution.

More precisely, the normalization contract defined by an automaton is as follows:

When the distribution is a point mass, Automaton doesn’t represent it using AutomatonFunction, but instead stores the point mass explicitly. It helps performance since some operations are able to handle this special case more efficiently.

As is the case with automaton functions, all core functionality is provided by the base class, and the derived classes simply bind the generic arguments and provide additional functionality like type-specific factory methods. Here is the signature of the Automaton class:

[Serializable]
public abstract class Automaton<TSequence, TElement, TElementDistribution, TSequenceManipulator, TWeightFunction, TThis> :
    IDistribution<TSequence>,
    SettableTo<TThis>,
    SettableToProduct<TThis>,
    SettableToRatio<TThis>,
    SettableToPower<TThis>,
    CanGetLogAverageOf<TThis>,
    CanGetLogAverageOfPower<TThis>,
    CanGetAverageLog<TThis>,
    SettableToWeightedSumExact<TThis>,
    SettableToPartialUniform<TThis>,
    CanGetLogNormalizer,
    Sampleable<TSequence>
    where TSequence : class, IEnumerable<TElement>
    where TSequenceManipulator : ISequenceManipulator<TSequence, TElement>, new()
    where TElementDistribution : class, IDistribution<TElement>, SettableToProduct<TElementDistribution>, SettableToWeightedSumExact<TElementDistribution>, CanGetLogAverageOf<TElementDistribution>, SettableToPartialUniform<TElementDistribution>, Sampleable<TElement>, new()
    where TWeightFunction : AutomatonFunction<TSequence, TElement, TElementDistribution, TSequenceManipulator, TWeightFunction>, new()
    where TThis : Automaton<TSequence, TElement, TElementDistribution, TSequenceManipulator, TWeightFunction, TThis>, new()
{
...
}

The generic parameters of the class have the same meaning as the parameters of AutomatonFunction. The only new parameter is TWeightFunction, which specifies the type of an automaton function that will be used to store probabilities, i.e. StringAutomatonFunction for StringAutomaton.

Automata API usage examples

Here is a set of small examples to give you a taste of what the API looks like.

Explicit construction of an automaton function representing the weighted automaton we’ve seen in the section on WFSA

StringAutomatonFunction func = StringAutomatonFunction.Zero();
StringAutomatonFunction.Node node0 = func.AddNode();
StringAutomatonFunction.Node node1 = func.AddNode();
StringAutomatonFunction.Node node2 = func.AddNode();
func.Start = node0;
node1.EndLogWeight = node2.EndLogWeight = Math.Log(1.0);
node0.AddEpsilonEdge(Math.Log(3.0), node2);
node0.AddEdge(DiscreteChar.PointMass('a'), Math.Log(0.5), node1);
node2.AddEdge(DiscreteChar.PointMass('c'), Math.Log(1.0), node1);
node2.AddEdge(DiscreteChar.OneOf('a', 'b'), Math.Log(1.0), node2);

// Prints the table from the section on WFSA
foreach (var str in new[] { string.Empty, "a", "b", "bb", "c", "abac" })
{
    Console.WriteLine("\"{0}\"\t{1}", str, func.GetValue(str));
}

Construction of an automaton function using factory methods and operations only

Recall that the Sum and Product operations refer to the point-wise sum and product of the functions. So

(f+g)(sequence) = f(sequence) + g(sequence)
(f*g)(sequence) = f(sequence) * g(sequence)
// func1("a") = func1("abfrjhf") = func1(any lowercase string) = 1.0
var func1 = StringAutomatonFunction.Constant(1.0, DiscreteChar.Lower());

// func2("abc") = func2("def") = 2.0, func2(anything else) = 0.0
var func2 = StringAutomatonFunction.ConstantOn(2.0, "abc", "def");

// func3("abc") = func3("def") = 3.5, func3(any other lowercase string) = 1.5
var func3 = StringAutomatonFunction.Sum(func1.Scale(1.5), func2);

// func4("abc") = func4("def") = 7.0, func4(any other string) = 0.0
var func4 = func3.Product(func2);

Construction of a uniform distribution over all strings that are case-invariant matches of a template

StringAutomaton result = StringAutomaton.Empty();
foreach (var ch in template)
{
    char upper = char.ToUpperInvariant(ch);
    char lower = char.ToLowerInvariant(ch);
    if (upper == lower)
    {
        result.AppendInPlace(ch);
    }
    else
    {
        result.AppendInPlace(DiscreteChar.OneOf(lower, upper));
    }
}

Transducers

In the automata theory weighted finite state transducers (WFST) can be viewed either as a generalization of an automaton that can take more than one argument, or as a class of operations transforming one weighted finite state automaton into another. It turns out that many message operators for operations on sequences can be easily implemented in terms of WFST. In this section we will explain what a weighted finite state transducer is, show examples of doing BP message computations using a WFST and explain how is this mapped to our implementation.

Weighted finite state transducer

Definition

The definition of a weighted finite state transducer is similar to the one of a weighted finite state automaton. The difference is that transducer edges are labelled with a distribution not over sequence elements, but pairs of input and output elements. Also, the distribution over input or output can be missing, which gives raise to additional types of epsilon edges. So, in total a transducer can have 4 types of edges:

Has input Has output Name Distribution
Yes Yes Normal edge P(input, output)
No Yes Input-epsilon edge P(output)
Yes No Output-epsilon edge P(input)
No No Epsilon edge None

An example of a weighted finite state transducer is given in the figure below.

weighted_state_transducer.jpg

This transducer has 2 normal self-loops on nodes 0 and 3, an output-epsilon edge between nodes 0 and 1, an input-epsilon edge between nodes 2 and 3, and an epsilon edge between nodes 1 and 2.

As it was the case with automata, you can think of each edge of a path in a transducer graph as having the corresponding positions in two sequences. Traversing normal edge advances both positions, traversing input-epsilon edge advances position in the second one, while traversing output-epsilon edge advances position in the first. Traversing epsilon edge keeps the positions intact. This view makes it straightforward to extend the concepts of a valid path weight and a value to a weighted finite state transducer, so we are not going to define them in this document.

Here are the values of the example transducer above on a number of sequence pairs:

“a” “b” 1
”” ”” 0
“bba” “bbb” 1
“aba” “bba” 1
“aba” “abb” 1
“aba” “abba” 0

Such table, filled for all possible sequence pairs, is called a weighted regular relation induced by a weighted finite state transducer. The relation induced by our example transducer gives the value of 1 to the pairs of sequences such that

Projection

A weighted regular language can be projected onto a weighted regular relation to obtain another weighted regular language. Given a language L={(si, wi)} and a relation R={(fj, tj, uj)}, the projection is defined as

projection_formula.jpg

That is, to determine the weight of sequence s′k in the resulting language we take all entries in the relation where the second sequence is s′k, multiply relation entry weight by the weight of the first sequence in the input language and sum over all such entries.

There exists a fast polynomial algorithm that builds an automaton corresponding to the projection result given an automaton for the input language and a transducer for the regular relation.

Let’s consider an example. The projection of the regular language assigning the weight of 1/3 to sequences “”, “aba” and “aab”, and zero to all other sequences onto the relation induced by the example transducer is the regular language assigning the weight of 1/3 to “bba” and “bab”, and 2/3 to “abb” (because “abb” can be obtained by replacing one ‘a’ with ‘b’ in both “aba” and “aab”).

This example has an interesting interpretation of the projection results. If the input language is a (possibly unnormalized) probability distribution, than the output language is an unnormalized distribution defined by the following sampling procedure:

So, the projection is essentially a forward message that the factor “randomly replace one occurrence of ‘a’ with ‘b’, given that the input is a non-empty sequence of ‘a’ and ‘b’ characters” sends in the belief propagation framework.

It turns out that both forward and backward message computations can be performed by a projection on a certain weighted regular relation for many operations on sequences. Examples of such operations include concatenation, substring, substring replacement and trimming. Moreover, the corresponding transducers can usually be built by combining together 3 primitive building blocks, namely Copy, Consume and Produce. These building blocks can be illustrated on our example transducer:

Primitives can be combined together with epsilon edges as illustrated by the edge between the nodes 1 and 2.

The theory behind implementing BP message operators via transducers is explained in greater depth in this document.

The Transducer class

Weighted finite state transducers are implemented by means of the TransducerBase class. All concrete transducer implementations must derive from one of its two children, which are both called Transducer but have different sets of generic parameters. One serves as the base class for transducers that operate on pairs of sequences of different types, while another restricts the sequence types to be the same. Internally, TransducerBase is implemented using an instance of AutomatonFunction defined on sequences of pairs of input and output elements.

The set of operations implemented by the TransducerBase class includes the following:

The version of Transducer that restricts sequence types to be the same additionally provides functionality like transposing a transducer in-place, or creating a copying transducer, that is, operations that don’t make sense if the sequence types differ.

The definition of TransducerBase looks as follows:

public abstract class TransducerBase<TSrcSequence, TSrcElement, TSrcElementDistribution, TSrcSequenceManipulator, TSrcFunction, TDestSequence, TDestElement, TDestElementDistribution, TDestSequenceManipulator, TDestFunction, TPairDistribution, TThis>
    where TSrcElementDistribution : class, IDistribution<TSrcElement>, CanGetLogAverageOf<TSrcElementDistribution>, SettableToProduct<TSrcElementDistribution>, SettableToWeightedSumExact<TSrcElementDistribution>, SettableToPartialUniform<TSrcElementDistribution>, Sampleable<TSrcElement>, new()
    where TDestElementDistribution : class, IDistribution<TDestElement>, CanGetLogAverageOf<TDestElementDistribution>, SettableToProduct<TDestElementDistribution>, SettableToWeightedSumExact<TDestElementDistribution>, SettableToPartialUniform<TDestElementDistribution>, Sampleable<TDestElement>, new()
    where TSrcSequence : class, IEnumerable<TSrcElement>
    where TDestSequence : class, IEnumerable<TDestElement>
    where TSrcSequenceManipulator : ISequenceManipulator<TSrcSequence, TSrcElement>, new()
    where TDestSequenceManipulator : ISequenceManipulator<TDestSequence, TDestElement>, new()
    where TSrcFunction : AutomatonFunction<TSrcSequence, TSrcElement, TSrcElementDistribution, TSrcSequenceManipulator, TSrcFunction>, new()
    where TDestFunction : AutomatonFunction<TDestSequence, TDestElement, TDestElementDistribution, TDestSequenceManipulator, TDestFunction>, new()
    where TPairDistribution : PairDistributionBase<TSrcElement, TSrcElementDistribution, TDestElement, TDestElementDistribution, TPairDistribution>, new()
    where TThis : TransducerBase<TSrcSequence, TSrcElement, TSrcElementDistribution, TSrcSequenceManipulator, TSrcFunction, TDestSequence, TDestElement, TDestElementDistribution, TDestSequenceManipulator, TDestFunction, TPairDistribution, TThis>, new()
{
...
}

The meaning of its generic parameters is the same as it was for the AutomatonFunction and Automaton classes, except that transducers have two sets of generic parameters: for input and for output sequences. The only previously unseen entity is TPairDistribution. This is the type of a distribution over pairs of elements that is being used by the AutomatonFunction hidden inside a transducer. The two versions of Transducer bind this parameter to their own type of element pair distribution, which must derive from PairDistributionBase, a base class that provides the majority of functionality.

This is the signature of Transducer for sequences of different types:

public abstract class Transducer<TSrcSequence, TSrcElement, TSrcElementDistribution, TSrcSequenceManipulator, TSrcFunction, TDestSequence, TDestElement, TDestElementDistribution, TDestSequenceManipulator, TDestFunction, TThis> :
TransducerBase<TSrcSequence, TSrcElement, TSrcElementDistribution, TSrcSequenceManipulator, TSrcFunction, TDestSequence, TDestElement, TDestElementDistribution, TDestSequenceManipulator, TDestFunction, PairDistribution<TSrcElement, TSrcElementDistribution, TDestElement, TDestElementDistribution>, TThis>
    where TSrcElementDistribution : class, IDistribution<TSrcElement>, CanGetLogAverageOf<TSrcElementDistribution>, SettableToProduct<TSrcElementDistribution>, SettableToWeightedSumExact<TSrcElementDistribution>, SettableToPartialUniform<TSrcElementDistribution>, Sampleable<TSrcElement>, new()
    where TDestElementDistribution : class, IDistribution<TDestElement>, CanGetLogAverageOf<TDestElementDistribution>, SettableToProduct<TDestElementDistribution>, SettableToWeightedSumExact<TDestElementDistribution>, SettableToPartialUniform<TDestElementDistribution>, Sampleable<TDestElement>, new()
    where TSrcSequence : class, IEnumerable<TSrcElement>
    where TDestSequence : class, IEnumerable<TDestElement>
    where TSrcSequenceManipulator : ISequenceManipulator<TSrcSequence, TSrcElement>, new()
    where TDestSequenceManipulator : ISequenceManipulator<TDestSequence, TDestElement>, new()
    where TSrcFunction : AutomatonFunction<TSrcSequence, TSrcElement, TSrcElementDistribution, TSrcSequenceManipulator, TSrcFunction>, new()
    where TDestFunction : AutomatonFunction<TDestSequence, TDestElement, TDestElementDistribution, TDestSequenceManipulator, TDestFunction>, new()
    where TThis : Transducer<TSrcSequence, TSrcElement, TSrcElementDistribution, TSrcSequenceManipulator, TSrcFunction, TDestSequence, TDestElement, TDestElementDistribution, TDestSequenceManipulator, TDestFunction, TThis>, new()
{
...
}

And this is the version of Transducer that restricts input and output sequence types to be the same:

public abstract class Transducer<TSequence, TElement, TElementDistribution, TSequenceManipulator, TFunction, TThis> :
TransducerBase<TSequence, TElement, TElementDistribution, TSequenceManipulator, TFunction, TSequence, TElement, TElementDistribution, TSequenceManipulator, TFunction, PairDistribution<TElement, TElementDistribution>, TThis>
    where TElementDistribution : class, IDistribution<TElement>, CanGetLogAverageOf<TElementDistribution>, SettableToProduct<TElementDistribution>, SettableToWeightedSumExact<TElementDistribution>, SettableToPartialUniform<TElementDistribution>, Sampleable<TElement>, new()
    where TSequence : class, IEnumerable<TElement>
    where TSequenceManipulator : ISequenceManipulator<TSequence, TElement>, new()
    where TFunction : AutomatonFunction<TSequence, TElement, TElementDistribution, TSequenceManipulator, TFunction>, new()
    where TThis : Transducer<TSequence, TElement, TElementDistribution, TSequenceManipulator, TFunction, TThis>, new()
{
...
}

Finally, here is an example of a derived class binding the generic parameters:

public class StringTransducer :
    Transducer<string, char, DiscreteChar, StringManipulator, StringAutomatonFunction, StringTransducer>
{
...
}

Transducer API usage examples

Concatenation factor message operator

public static StringAutomaton Str1AverageConditional(StringAutomaton concat, StringAutomaton str2)
{
    StringTransducer transducer = StringTransducer.Copy();
    transducer.AppendInPlace(StringTransducer.Consume(str2.GetProbabilityFunction()));
    return StringAutomaton.FromWorkspace(transducer.ProjectSource(concat));
}

Substring factor message operator

public static StringAutomaton SubAverageConditional(StringAutomaton str, int start, int length)
{
    if (str.IsPointMass)
    {
        return SubAverageConditional(str.Point, start, length);
    }

    var anyChar = StringAutomatonFunction.ConstantOnElement(1.0, DiscreteChar.Any());
    var transducer = StringTransducer.Consume(StringAutomatonFunction.Repeat(anyChar, minTimes: start, maxTimes: start));
    transducer.AppendInPlace(StringTransducer.Copy(StringAutomatonFunction.Repeat(anyChar, minTimes: length, maxTimes: length)));
    transducer.AppendInPlace(StringTransducer.Consume(StringAutomatonFunction.Constant(1.0)));

    return StringAutomaton.FromWorkspace(transducer.ProjectSource(str));
}